Lists (1)
Sort Last updated
Starred repositories
A self-learning tutorail for CUDA High Performance Programing.
A high-throughput and memory-efficient inference and serving engine for LLMs
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
The official GitHub page for the survey paper "A Survey of Large Language Models".
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
The fastest feature-rich C++11/14/17/20/23 single-header testing framework
📚 计算机经典编程书籍、大黑书、编程电子书、电子书、编程书籍,包括计算机基础、C/C++、Java、Python、面试题、架构设计、算法系列等经典电子书。
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
Xiao's CUDA Optimization Guide [Active Adding New Contents]
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Transformer related optimization, including BERT, GPT
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。
《Pytorch实用教程》(第二版)无论是零基础入门,还是CV、NLP、LLM项目应用,或是进阶工程化部署落地,在这里都有。相信在本书的帮助下,读者将能够轻松掌握 PyTorch 的使用,成为一名优秀的深度学习工程师。
Tensors and Dynamic neural networks in Python with strong GPU acceleration
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
Sample codes for my CUDA programming book
听说C与Linux更搭配哦~ 内容包括:C基础 C++面向对象编程 基础数据结构 linux系统编程以及一些操作系统的相关知识