Stars
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
[Arxiv 2024] Edicho: Consistent Image Editing in the Wild
FastVideo is a lightweight framework for accelerating large video diffusion models.
Official implementation for paper - LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
A generative world for general-purpose robotics & embodied AI learning.
Memory-optimized training scripts for video models based on Diffusers
Official Implementations for Paper - AniDoc: Animation Creation Made Easier
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
Official implementations for paper: Anydoor: zero-shot object-level image customization
A minimal and universal controller for FLUX.1.
Lumina-T2X is a unified framework for Text to Any Modality Generation
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Official PyTorch implementation of "Framer: Interactive Frame Interpolation".
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
[NeurIPS'23] Emergent Correspondence from Image Diffusion
Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.