Skip to content
View WanliZhong's full-sized avatar
:octocat:
Focusing
:octocat:
Focusing
  • OpenCV China
  • SUSTech
  • 04:30 (UTC +08:00)

Highlights

  • Pro

Organizations

@opencv @SUSTown

Block or report WanliZhong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
4 stars written in Cuda
Clear filter

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,877 478 Updated Mar 10, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,774 286 Updated Mar 4, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,100 65 Updated Feb 28, 2025

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 252 8 Updated Mar 7, 2025