-
WeRide Technology Co. Ltd.
- Shanghai
Lists (7)
Sort Name ascending (A-Z)
Starred repositories
Instant neural graphics primitives: lightning fast NeRF and more
how to optimize some algorithm in cuda.
Deformable ConvNets V2 (DCNv2) in PyTorch
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
Neighborhood Attention Extension. Bringing attention to a neighborhood near you!
Deforming kernels to adapt towards object deformation. In ICLR 2020.
A high performance CUDA implementation of Scan Matching via the Iterative Closest Point Algorithm
Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)
An GPU/CUDA implementation of the Hungarian algorithm
CUDA implementation of parallel radix sort using Blelloch scan
Development a customized op in TensorFlow for convolution with sparse kernel
CUDA implementation of exclusive prefix sum via Blelloch's algorithm
Parallel Prefix Sum (Scan) with CUDA.
CUDA implementation of "A Fast Hybrid Approach for Stream Compaction on GPUs" by Rego, Sang and Yu