Stars
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
SGLang is a fast serving framework for large language models and vision language models.
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
A high-throughput and memory-efficient inference and serving engine for LLMs
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
Collect every awesome work about r1!
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
Code for "Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction"
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
JerryWu-code / TinyZero
Forked from Jiayi-Pan/TinyZeroDeepseek R1 zero tiny version own reproduce on two A100s.
Video Generation Foundation Models: https://saiyan-world.github.io/goku/
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Train transformer language models with reinforcement learning.
This repo contains the official authors implementation associated with the MeshSplats paper
A jounery to real multimodel R1 ! We are doing on large-scale experiment