-
The Chinese University of Hong Kong
- Hong Kong SAR
-
13:55
(UTC +08:00) - https://txxx926.github.io/
Highlights
- Pro
Stars
Minimalistic 4D-parallelism distributed training framework for education purpose
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Triton-based implementation of Sparse Mixture of Experts.
Textbook on reinforcement learning from human feedback
A pedagogical implementation of Autograd
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
collection of benchmarks to measure basic GPU capabilities
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Efficient Triton Kernels for LLM Training
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Official implementation for the paper Lancet: Accelerating Mixture-of-Experts Training via Whole Graph Computation-Communication Overlapping, published in MLSys'24.
OneDiff: An out-of-the-box acceleration library for diffusion models.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
FlashInfer: Kernel Library for LLM Serving
ShortcutsBench: A Large-Scale Real-World Benchmark for API-Based Agents
A Easy-to-understand TensorOp Matmul Tutorial
Odysseus: Playground of LLM Sequence Parallelism
FlagGems is an operator library for large language models implemented in Triton Language.
Ring attention implementation with flash attention
A collection of memory efficient attention operators implemented in the Triton language.
Collection of kernels written in Triton language
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.