Stars
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Tensors and Dynamic neural networks in Python with strong GPU acceleration
A minimal GPU design in Verilog to learn how GPUs work from the ground up
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Modeling, training, eval, and inference code for OLMo
Development repository for the Triton language and compiler
High-speed Large Language Model Serving for Local Deployment
Data preparation code for CrystalCoder 7B LLM
Pre-training code for CrystalCoder 7B LLM
Data processing for and with foundation models! π π π½ β‘οΈ β‘οΈπΈ πΉ π·
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Training and serving large-scale neural networks with auto parallelization.
π€ Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
DLRover: An Automatic Distributed Deep Learning System
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.