Highlights
- Pro
Stars
[NeurIPS 2024] Official PyTorch Implementation of "FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner"
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Code release for "LLMs can see and hear without any training"
Frontier Multimodal Foundation Models for Image and Video Understanding
Investigating CoT Reasoning in Autoregressive Image Generation
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.
Official repository for our work on micro-budget training of large-scale diffusion models.
Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.
Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).
The official implementation of Diffusion-KTO: Aligning Diffusion Models by Optimizing Human Utility
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Simple, unified interface to multiple Generative AI providers
The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating NIM microservices, Milvus, and Llama-3.2 models.
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
Switch EMA: A Free Lunch for Better Flatness and Sharpness
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.