Stars
COYO-700M: Large-scale Image-Text Pair Dataset
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
🔥 Aurora Series: A more efficient multimodal large language model series for video.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
MINT-1T: A one trillion token multimodal interleaved dataset.
Frechet inception distance (FID) evaluation in JAX
[ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
[ECCV 2024] Beyond MOT: Semantic Multi-Object Tracking
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) with Stable Diffusion
[CVPR 2024 Oral] MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[arXiv:2406.07548] Image and Video Tokenization with Binary Spherical Quantization
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
[CVPR 2024] Official implementation of "VRP-SAM: SAM with Visual Reference Prompt"
[ECCV 2024] Tokenize Anything via Prompting
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.