Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 723 37 Updated Dec 24, 2024

princeton-vl / lietorch

Cuda 703 53 Updated Oct 20, 2023

DavidDiazGuerra / gpuRIR

Python library for Room Impulse Response (RIR) simulation with GPU acceleration

Cuda 500 96 Updated Sep 25, 2024

SHI-Labs / NATTEN

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

Cuda 387 31 Updated Dec 2, 2024

hangg7 / deformable-kernels

Deforming kernels to adapt towards object deformation. In ICLR 2020.

Cuda 199 28 Updated Feb 12, 2020

botforge / CUDA-ScanMatcher-ICP

A high performance CUDA implementation of Scan Matching via the Iterative Closest Point Algorithm

Cuda 157 21 Updated Oct 4, 2022

littlebearsama / CUDA-notes

高性能编程笔记

Cuda 149 35 Updated May 20, 2022

thomasverelst / dynconv

Code for Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference (CVPR2020)

Cuda 126 14 Updated Jan 17, 2022

csc-training / CUDA

Introduction to CUDA programming

Cuda 113 28 Updated May 19, 2017

paclopes / HungarianGPU

An GPU/CUDA implementation of the Hungarian algorithm

Cuda 108 19 Updated Apr 12, 2019

mark-poscablo / gpu-radix-sort

CUDA implementation of parallel radix sort using Blelloch scan

Cuda 61 14 Updated Feb 29, 2024

knotman90 / cuStreamComp

Efficient CUDA Stream Compaction Library

Cuda 33 6 Updated Jun 9, 2023

Connor323 / Convolution-with-sparse-kernel-in-TF

Development a customized op in TensorFlow for convolution with sparse kernel

Cuda 28 7 Updated Jun 27, 2019

mark-poscablo / gpu-prefix-sum

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Cuda 26 11 Updated Jul 19, 2017

cdyk / ComputeStuff

MIT-licensed stand-alone CUDA utility functions.

Cuda 16 2 Updated Jul 3, 2020

TVycas / CUDA-Parallel-Prefix-Sum

Parallel Prefix Sum (Scan) with CUDA.

Cuda 15 2 Updated Jul 17, 2020

hcyang99 / cuda-stream-compaction

Cuda 1 Updated Feb 13, 2022

1danielcoelho / hybrid-stream-compaction

CUDA implementation of "A Fast Hybrid Approach for Stream Compaction on GPUs" by Rego, Sang and Yu

Cuda 1 Updated Mar 17, 2019

Edmond stevexiaofei

Lists (7)

high perfermance compute

learn infra

lib

lidar related model

model

slam

utility tool

Starred repositories

biomedical-engineering

fullstack-development

pointcloud-segmentation

Point cloud

Bootstrap

Algorithm

semantic-slam