Lists (28)
Sort Name ascending (A-Z)
3D
clip
CoT
datasets
DETR
Diffusion
🔮 Future ideas
GAN
GPT
latex
Linear attention
Lora
MAE
mamba
Mixup/Cutmix
MLP
Moe
Network
NLP
Optimizers
OVSS+OVD
RNN
SAM
Semantic Segmentation
Uncertainty
Wait
work
Writing
Starred repositories
Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in captur…
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
A high-throughput and memory-efficient inference and serving engine for LLMs
[CVPR 2025] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
A generative world for general-purpose robotics & embodied AI learning.
Official Implementation of Rectified Flow (ICLR2023 Spotlight)
Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"
[CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
Code for Fast Training of Diffusion Models with Masked Transformers
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Efficient vision foundation models for high-resolution generation and perception.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Codebase for the paper-Elucidating the design space of language models for image generation
Scaling Diffusion Transformers with Mixture of Experts
[ICLR 2025] BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
This repo contains the code for 1D tokenizer and generator
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
Official repository for our work on micro-budget training of large-scale diffusion models.
Official PyTorch implementation of ECCV 2024 Paper: ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback.
[ICLR 2025] Diffusion Feedback Helps CLIP See Better
[CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers