Stars
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Official inference repo for FLUX.1 models
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Open-Sora: Democratizing Efficient Video Production for All
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Utilities intended for use with Llama models.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Collection of AWESOME vision-language models for vision tasks
A collection of resources and papers on Diffusion Models
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
EVA Series: Visual Representation Fantasies from BAAI
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization