Skip to content
View stulai's full-sized avatar

Block or report stulai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

9 stars written in Cuda
Clear filter

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,804 465 Updated Mar 5, 2025

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,733 283 Updated Mar 4, 2025

Sample codes for my CUDA programming book

Cuda 1,658 338 Updated Feb 15, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 936 149 Updated Jul 29, 2023

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 326 49 Updated Jan 2, 2025

CUDA C 编程权威指南代码实现 包含了书上第二章到第八章的大部分代码实现和作者笔记,全由作者本人手动实现,难免有错误的地方,请大家谨慎参考,非常欢迎对错误的指正。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!

Cuda 320 22 Updated Oct 20, 2022

Yinghan's Code Sample

Cuda 312 55 Updated Jul 25, 2022

基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。

Cuda 289 63 Updated Jan 15, 2024

GEMM and Winograd based convolutions using CUTLASS

Cuda 26 3 Updated Jul 15, 2020