This repository represents the official implementation of the paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation".
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler
We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.
2023-12-08: Added
- try it out with your images for free!
2023-12-05: Added - dive deeper into our inference pipeline!
2023-12-04: Added
paper and inference code (this repository).
We offer a number of way to interact with Marigold:
-
A free online interactive demo is available here: (kudos to the HF team for the GPU grant)
-
Run the demo locally (requires a GPU and an
nvidia-docker2
, see Installation Guide):docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all registry.hf.space/toshas-marigold:latest python app.py
-
If you just want to just see the examples, visit our gallery:
-
Finally, local development instructions are given below.
This code has been tested on:
- Python 3.10.12, PyTorch 2.0.1, CUDA 11.7, GeForce RTX 3090
- Python 3.10.4, Pytorch 2.0.1, CUDA 11.7, GeForce RTX 4090
git clone https://github.com/prs-eth/Marigold.git
cd Marigold
python -m venv venv/marigold
source venv/marigold/bin/activate
pip install -r requirements.txt
bash script/download_sample_data.sh
This script will automatically download the checkpoint.
python run.py \
--input_rgb_dir data/in-the-wild_example \
--output_dir output/in-the-wild_example
-
The inference script by default will resize the input images and resize back to the original resolution.
--resize_to_max_res
: The maximum edge length of resized input image. Default: 768.--not_resize_input
: If given, will not resize the input image.--not_resize_output
: If given, will not resize the output image back to the original resolution. Only valid without--not_resize_input
option.
-
Trade-offs between accuracy and speed (for both options, larger value results in more accurate results at the cost of slower inference speed.)
--n_infer
: Number of inference passes to be ensembled. Default: 10.--denoise_steps
: Number of diffusion denoising steps of each inference pass. Default: 10.
-
--seed
: Random seed, can be set to ensure reproducibility. Default: None (using current time as random seed). -
--depth_cmap
: Colormap used to colorize the depth prediction. Default: Spectral. -
The model cache directory can be controlled by environment variable
HF_HOME
, for example:export HF_HOME=$(pwd)/checkpoint
# Download checkpoint
bash script/download_weights.sh
python run.py \
--checkpoint checkpoint/Marigold_v1_merged \
--input_rgb_dir data/in-the-wild_example\
--output_dir output/in-the-wild_example
@misc{ke2023repurposing,
title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
year={2023},
eprint={2312.02145},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.