
Focusing
A new 🐤 who want to be a great person. (Member of OpenCV China; Master Student in SUSTech)
-
OpenCV China
- SUSTech
-
04:30
(UTC +08:00)
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
4
stars
written in Cuda
Clear filter
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
SpargeAttention: A training-free sparse attention that can accelerate any model inference.