Lists (1)
Sort Name ascending (A-Z)
Stars
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Janus-Series: Unified Multimodal Understanding and Generation Models
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
LPIPS metric. pip install lpips
Ongoing research training transformer models at scale
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
Official inference repo for FLUX.1 models
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".
Example models using DeepSpeed
Latte: Latent Diffusion Transformer for Video Generation.
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
VideoSys: An easy and efficient system for video generation
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
SEED-Voken: A Series of Powerful Visual Tokenizers
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Lumina-T2X is a unified framework for Text to Any Modality Generation
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Official PyTorch implementation of TATS: A Long Video Generation Framework with Time-Agnostic VQGAN and Time-Sensitive Transformer (ECCV 2022)
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…