Skip to content
View stulai's full-sized avatar

Block or report stulai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 312 35 Updated Dec 17, 2024
SCSS 1 Updated Nov 24, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,433 5,105 Updated Jan 9, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,736 220 Updated Jan 9, 2025

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 10,746 837 Updated Aug 20, 2024

《大语言模型》作者:赵鑫,李军毅,周昆,唐天一,文继荣

2,686 176 Updated Apr 22, 2024

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

3,147 211 Updated Jan 8, 2025

The fastest feature-rich C++11/14/17/20/23 single-header testing framework

C++ 6,048 648 Updated Dec 1, 2024

MLX: An array framework for Apple silicon

C++ 18,227 1,048 Updated Jan 9, 2025

Material for gpu-mode lectures

Jupyter Notebook 3,431 347 Updated Jan 6, 2025

c++后台服务器开发面经或八股总结!(有深度有广度,和仅有概念的总结文章不同!)

1,575 226 Updated Sep 9, 2024

📚 计算机经典编程书籍、大黑书、编程电子书、电子书、编程书籍,包括计算机基础、C/C++、Java、Python、面试题、架构设计、算法系列等经典电子书。

3,263 412 Updated Jan 9, 2024

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 304 47 Updated Jan 2, 2025

Xiao's CUDA Optimization Guide [Active Adding New Contents]

253 17 Updated Nov 8, 2022

Kernel Tuner

Python 301 50 Updated Dec 17, 2024

C++ Tip Of The Week

Python 1,583 73 Updated Nov 25, 2024

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

HTML 12,486 847 Updated Jan 7, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 883 140 Updated Jul 29, 2023

📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 1,926 204 Updated Jan 8, 2025

Transformer related optimization, including BERT, GPT

C++ 5,976 895 Updated Mar 27, 2024

Pytorch 中文文档

Shell 4,150 1,002 Updated Dec 5, 2024

本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。

Jupyter Notebook 18,538 5,415 Updated Oct 14, 2021

《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。

Jupyter Notebook 2,820 312 Updated Jan 4, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 85,677 23,076 Updated Jan 9, 2025

📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/

C++ 24,352 3,021 Updated Aug 17, 2024

中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)

C++ 9,880 1,586 Updated Aug 20, 2024

基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。

Cuda 263 55 Updated Jan 15, 2024

Sample codes for my CUDA programming book

Cuda 1,615 333 Updated Jul 27, 2023

📚利用Python进行数据处理第二版中文gitbook,用于个人学习

TeX 40 15 Updated Oct 21, 2019

听说C与Linux更搭配哦~ 内容包括:C基础 C++面向对象编程 基础数据结构 linux系统编程以及一些操作系统的相关知识

C 521 169 Updated Apr 19, 2023
Next