Stars
Dynamic resources changes for multi-dimensional parallelism training
Fully open reproduction of DeepSeek-R1
Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Recipes to scale inference-time compute of open models
Machine Learning Interviews from FAANG, Snapchat, LinkedIn. I have offers from Snapchat, Coupang, Stitchfix etc. Blog: mlengineer.io.
A low-latency & high-throughput serving engine for LLMs
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)
Minimal, single page, smooth-scrolling theme for Hugo static site generator.
A bibliography and survey of the papers surrounding o1
Official inference library for Mistral models
📺 Discover the latest machine learning / AI courses on YouTube.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
PyTorch implementation for "Parallel Sampling of Diffusion Models", NeurIPS 2023 Spotlight
Official inference repo for FLUX.1 models
🦜🔗 Build context-aware reasoning applications
nnScaler: Compiling DNN models for Parallel Training
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
An open source implementation of CLIP.
Generative Models by Stability AI
Universal LLM Deployment Engine with ML Compilation
Reference implementations of MLPerf™ inference benchmarks
Minimalistic large language model 3D-parallelism training
Python package for dataset imports from UCI ML Repository