Skip to content
View stevexiaofei's full-sized avatar
  • WeRide Technology Co. Ltd.
  • Shanghai

Block or report stevexiaofei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

22 stars written in Cuda
Clear filter

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 16,134 1,941 Updated Nov 7, 2024

how to optimize some algorithm in cuda.

Cuda 1,761 146 Updated Dec 22, 2024

Deformable ConvNets V2 (DCNv2) in PyTorch

Cuda 1,457 230 Updated Nov 18, 2022

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,243 142 Updated Nov 12, 2024

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 723 37 Updated Dec 24, 2024
Cuda 703 53 Updated Oct 20, 2023

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

Cuda 500 96 Updated Sep 25, 2024

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

Cuda 387 31 Updated Dec 2, 2024

Deforming kernels to adapt towards object deformation. In ICLR 2020.

Cuda 199 28 Updated Feb 12, 2020

A high performance CUDA implementation of Scan Matching via the Iterative Closest Point Algorithm

Cuda 157 21 Updated Oct 4, 2022

高性能编程 笔记

Cuda 149 35 Updated May 20, 2022

Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)

Cuda 126 14 Updated Jan 17, 2022

Introduction to CUDA programming

Cuda 113 28 Updated May 19, 2017

An GPU/CUDA implementation of the Hungarian algorithm

Cuda 108 19 Updated Apr 12, 2019

CUDA implementation of parallel radix sort using Blelloch scan

Cuda 61 14 Updated Feb 29, 2024

Efficient CUDA Stream Compaction Library

Cuda 33 6 Updated Jun 9, 2023

Development a customized op in TensorFlow for convolution with sparse kernel

Cuda 28 7 Updated Jun 27, 2019

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Cuda 26 11 Updated Jul 19, 2017

MIT-licensed stand-alone CUDA utility functions.

Cuda 16 2 Updated Jul 3, 2020

Parallel Prefix Sum (Scan) with CUDA.

Cuda 15 2 Updated Jul 17, 2020

CUDA implementation of "A Fast Hybrid Approach for Stream Compaction on GPUs" by Rego, Sang and Yu

Cuda 1 Updated Mar 17, 2019