Skip to content

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

License

Notifications You must be signed in to change notification settings

MohamedAliRashad/Marigold

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

This repository represents the official implementation of the paper titled "Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation".

Website Paper Open In Colab Hugging Face Space Hugging Face Model License

Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, Konrad Schindler

We present Marigold, a diffusion model and associated fine-tuning protocol for monocular depth estimation. Its core principle is to leverage the rich visual knowledge stored in modern generative image models. Our model, derived from Stable Diffusion and fine-tuned with synthetic data, can zero-shot transfer to unseen data, offering state-of-the-art monocular depth estimation results.

teaser

📢 News

2023-12-08: Added - try it out with your images for free!
2023-12-05: Added - dive deeper into our inference pipeline!
2023-12-04: Added paper and inference code (this repository).

Usage

We offer a number of way to interact with Marigold:

  1. A free online interactive demo is available here: (kudos to the HF team for the GPU grant)

  2. Run the demo locally (requires a GPU and an nvidia-docker2, see Installation Guide): docker run -it -p 7860:7860 --platform=linux/amd64 --gpus all registry.hf.space/toshas-marigold:latest python app.py

  3. Extended demo on a Google Colab:

  4. If you just want to just see the examples, visit our gallery:

  5. Finally, local development instructions are given below.

🛠️ Setup

This code has been tested on:

  • Python 3.10.12, PyTorch 2.0.1, CUDA 11.7, GeForce RTX 3090
  • Python 3.10.4, Pytorch 2.0.1, CUDA 11.7, GeForce RTX 4090

📦 Repository

git clone https://github.com/prs-eth/Marigold.git
cd Marigold

💻 Dependencies

python -m venv venv/marigold
source venv/marigold/bin/activate
pip install -r requirements.txt

🚀 Inference on in-the-wild images

📷 Sample images

bash script/download_sample_data.sh

🎮 Inference

This script will automatically download the checkpoint.

python run.py \
    --input_rgb_dir data/in-the-wild_example \
    --output_dir output/in-the-wild_example

⚙️ Inference settings

  • The inference script by default will resize the input images and resize back to the original resolution.

    • --resize_to_max_res: The maximum edge length of resized input image. Default: 768.
    • --not_resize_input: If given, will not resize the input image.
    • --not_resize_output: If given, will not resize the output image back to the original resolution. Only valid without --not_resize_input option.
  • Trade-offs between accuracy and speed (for both options, larger value results in more accurate results at the cost of slower inference speed.)

    • --n_infer: Number of inference passes to be ensembled. Default: 10.
    • --denoise_steps: Number of diffusion denoising steps of each inference pass. Default: 10.
  • --seed: Random seed, can be set to ensure reproducibility. Default: None (using current time as random seed).

  • --depth_cmap: Colormap used to colorize the depth prediction. Default: Spectral.

  • The model cache directory can be controlled by environment variable HF_HOME, for example:

    export HF_HOME=$(pwd)/checkpoint

⬇ Using local checkpoint

# Download checkpoint
bash script/download_weights.sh
python run.py \
    --checkpoint checkpoint/Marigold_v1_merged \
    --input_rgb_dir data/in-the-wild_example\
    --output_dir output/in-the-wild_example

🎓 Citation

@misc{ke2023repurposing,
      title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation}, 
      author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
      year={2023},
      eprint={2312.02145},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

License

About

Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.7%
  • Shell 2.3%