Stars
R1-onevision, a visual language model capable of deep CoT reasoning.
Unofficial implementation of "SODA: Bottleneck Diffusion Models for Representation Learning"
Janus-Series: Unified Multimodal Understanding and Generation Models
PyTorch code for hierarchical k-means -- a data curation method for self-supervised learning
Official implementation of "DepthLab: From Partial to Complete"
[CVPR'25] Official Implementations for Paper - AniDoc: Animation Creation Made Easier
[CVPR'25] Official implementation for paper - LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
[ICLR 2025] Animate-X: Universal Character Image Animation with Enhanced Motion Representation
[CVPR 2025] Assessing and Learning Alignment of Unimodal Vision and Language Models
Official PyTorch implementation of Lumos: Learning Visual Generative Priors without Text
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Official implementations for paper: Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
[ICLR'25] Official PyTorch implementation of "Framer: Interactive Frame Interpolation".
[ICLR 2025] Animate-X - PyTorch Implementation
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
CVPR2023 | MVImgNet: A Large-scale Dataset of Multi-view Images
The official repo for the paper "VeCLIP: Improving CLIP Training via Visual-enriched Captions"
Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
[ECCV 2024] Official PyTorch implementation of GANdance: Exploring Guided Sampling of Conditional GANs
[ECCV 2024] Official Repository for DiffiT: Diffusion Vision Transformers for Image Generation
[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models