Stars
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
A generative world for general-purpose robotics & embodied AI learning.
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
A curated list of image inpainting and video inpainting papers and resources
A collection of awesome image inpainting studies.
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
A minimal and universal controller for FLUX.1.
A simple screen parsing tool towards pure vision based GUI agent
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) …
Semantic Propositional Image Caption Evaluation
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis