Stars
Code release for "Understanding Bias in Large-Scale Visual Datasets"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
llama3 implementation one matrix multiplication at a time
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Improving Token-Based World Models with Parallel Observation Prediction (ICML 2024)
Computational photography pipeline that performs multiple inferences from any image or video.
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
[NeurIPS 2023] FreeMask: Synthetic Images with Dense Annotations Make Stronger Segmentation Models
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Official PyTorch implementation of Video Probabilistic Diffusion Models in Projected Latent Space (CVPR 2023).
an environment based on XLA for deep learning compiler optimization research.
Stable Diffusion web UI
COYO-700M: Large-scale Image-Text Pair Dataset
High-Resolution Image Synthesis with Latent Diffusion Models
Taming Transformers for High-Resolution Image Synthesis
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
Scenic: A Jax Library for Computer Vision Research and Beyond
Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch