Stars
🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Zero Bubble Pipeline Parallelism
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashInfer: Kernel Library for LLM Serving
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Ola: Pushing the Frontiers of Omni-Modal Language Model
verl: Volcano Engine Reinforcement Learning for LLMs
A synthetic data generator for text recognition
SGLang is a fast serving framework for large language models and vision language models.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
Official PyTorch implementation of the paper "DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training".
Some useful custom hive udf functions, especial array, json, math, string functions.
Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Using tabular and deep reinforcement learning methods to infer optimal market making strategies
Ongoing research training transformer models at scale
Fast and memory-efficient exact attention
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
A high-throughput and memory-efficient inference and serving engine for LLMs
Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"