Wanli WanliZhong

Focusing

A new 🐤 who want to be a great person. (Member of OpenCV China; Master Student in SUSTech)

98 followers · 102 following

OpenCV China
SUSTech
04:30 (UTC +08:00)

Achievements

x2 x2

Achievements

x2 x2

Highlights

Organizations

Lists (1)

Sort

🗜️Quantization

1 repository

Stars

4 stars written in Cuda

Clear filter

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,877 478 Updated Mar 10, 2025

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,774 286 Updated Mar 4, 2025

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,100 65 Updated Feb 28, 2025

thu-ml / SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 252 8 Updated Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wanli WanliZhong

Achievements

Achievements

Highlights

Organizations

Block or report WanliZhong

Lists (1)

🗜️Quantization

Stars

deepseek-ai / DeepGEMM

DefTruth / CUDA-Learn-Notes

thu-ml / SageAttention

thu-ml / SpargeAttn