Stars
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Janus-Series: Unified Multimodal Understanding and Generation Models
Fully open reproduction of DeepSeek-R1
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Genesis Reinforcement Learning Environments
Distributed Robot Interaction Dataset.
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Stay on top of trending topics on social media and the web with AI
World's First Large-scale High-quality Robotic Manipulation Benchmark
利用AI大模型,一键解说并剪辑视频; Using AI models to automatically provide commentary and edit videos with a single click.
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory
Make websites accessible for AI agents
A generative world for general-purpose robotics & embodied AI learning.
FastVideo is a lightweight framework for accelerating large video diffusion models.
A series of technical report on Slow Thinking with LLM
Implementation of "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation"
Unofficial implementation of Meta's MovieGen models
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
VideoSys: An easy and efficient system for video generation
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Unifying 3D Mesh Generation with Language Models
[ICLR2025] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
A trainable PyTorch reproduction of AlphaFold 3.