Stars
SGLang is a fast serving framework for large language models and vision language models.
A minimal GPU design in Verilog to learn how GPUs work from the ground up
A generative world for general-purpose robotics & embodied AI learning.
Official implementation for SIGGRAPH 2023 paper "Learning Physically Simulated Tennis Skills from Broadcast Videos"
Fast and memory-efficient exact attention
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Ongoing research training transformer models at scale
FlagScale is a large model toolkit based on open-sourced projects.
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
📖 C++11/14/17/20 Concurrency Demystified: From Core Principles to Thread-Safe Code
A fast single-producer, single-consumer lock-free queue for C++
An open-source C++ library developed and used at Facebook.
A debugging and profiling tool that can trace and visualize python code execution
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
High-Resolution Image Synthesis with Latent Diffusion Models
Code for the paper "Language Models are Unsupervised Multitask Learners"
Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
Using Low-rank adaptation to quickly fine-tune diffusion models.
FlagPerf is an open-source software platform for benchmarking AI chips.