Stars
🌟A curated list of DUSt3R-related papers and resources, tracking recent advancements using this geometric foundation model.
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
[ICLR2025] GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
🐍 Geometric Computer Vision Library for Spatial AI
Official implementation of "DepthMaster: Taming Diffusion Models for Monocular Depth Estimation".
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
[CVPR 2025] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
📚 Collection of awesome generation acceleration resources.
ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors
A generative world for general-purpose robotics & embodied AI learning.
[CVPR 2023] Multi-frame depth estimation in dynamic scenes. -- Li, Rui, et al. "Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes".
Stable Video Diffusion Training Code and Extensions.
We estimate dense, flicker-free, geometrically consistent depth from monocular video, for example hand-held cell phone video.
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
[CVPR 2021] Self-supervised depth estimation from short sequences
[CVPR 2024] SinSR: Diffusion-Based Image Super-Resolution in a Single Step
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25).
Amodal Depth Anything: Amodal Depth Estimation in the Wild
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Code release for https://kovenyu.com/WonderWorld/
[CVPR 2025] Video Depth without Video Models
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
High-resolution models for human tasks.
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Official inference repo for FLUX.1 models