Skip to content
View rGitcy's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Fudan University
  • Pu dong new area, Shanghai, China

Block or report rGitcy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
4 stars written in Cuda
Clear filter

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,724 283 Updated Mar 4, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 936 149 Updated Jul 29, 2023

From zero to hero CUDA for accelerating maths and machine learning on GPU.

Cuda 179 5 Updated Jul 23, 2024

使用 CUDA C++ 实现的 llama 模型推理框架

Cuda 48 5 Updated Nov 8, 2024