Skip to content
View stulai's full-sized avatar

Block or report stulai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

7 stars written in Cuda
Clear filter

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,096 222 Updated Jan 23, 2025

Sample codes for my CUDA programming book

Cuda 1,625 333 Updated Jul 27, 2023

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 891 141 Updated Jul 29, 2023

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 313 47 Updated Jan 2, 2025

CUDA C 编程权威指南代码实现 包含了书上第二章到第八章的大部分代码实现和作者笔记,全由作者本人手动实现,难免有错误的地方,请大家谨慎参考,非常欢迎对错误的指正。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!

Cuda 307 22 Updated Oct 20, 2022

基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。

Cuda 268 57 Updated Jan 15, 2024

GEMM and Winograd based convolutions using CUTLASS

Cuda 26 3 Updated Jul 15, 2020