Omni ↦ Data (Under Construction)

A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

Annotator Repo · Starter Data · >> [Tooling+Training Repo] << · Reference Code · Project Website

Omnidata Tools

The repository contains some tools and code from our paper: Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV2021) It specifically contains utilities such as dataloader to efficiently load the generated data from the Omnidata annotator, pretrained models and code to train state-of-the-art models for tasks such as depth and surface normal estimation, including a first publicly available implementation for MiDaS training code. It also contains an implementation of the 3D image refocusing augmentation introduced in the paper.

The depth networks have DPT-based architectures (similar to MiDaS v3.0) and are trained with scale- and shift-invariant loss and scale-invariant gradient matching term introduced in MiDaS, and also virtual normal loss. You can see a public implementation of the MiDaS loss here. We provide 2 pretrained depth models for both DPT-hybrid and DPT-large architectures with input resolution 384.

Download pretrained models

sh ./tools/download_depth_models.sh
sh ./tools/download_surface_normal_models.sh

These will download the pretrained models for depth and normals to a folder called ./pretrained_models.

Run our models on your own image

After downloading the pretrained models, you can run them on your own image with the following command:

python demo.py --task $TASK --img_path $PATH_TO_IMAGE_OR_FOLDER --output_path $PATH_TO_SAVE_OUTPUT

The --task flag should be either normal or depth. To run the script for a normal target on an example image:

python demo.py --task normal --img_path assets/test1.png --output_path assets/

MiDaS Implementation

We provide an implementation of the MiDaS Loss, specifically the ssimae (scale- and shift invariant MAE) loss and the scale-invariant gradient matching term in losses/midas_loss.py. MiDaS loss is useful for training depth estimation models on mixed datasets with different depth ranges and scales, similar to our dataset. An example usage is shown below:

from losses.midas_loss import MidasLoss
midas_loss = MidasLoss(alpha=0.1)
midas_loss, ssi_mae_loss, reg_loss = midas_loss(depth_prediction, depth_gt, mask)

alpha specifies the weight of the gradient matching term in the total loss, and mask indicates the valid pixels of the image.

3D Image Refocusing

Mid-level cues can be used for data augmentations in addition to training targets. The availability of full scene geometry in our dataset makes the possibility of doing Image Refocusing as a 3D data augmentation. You can find an implementation of this augmentation in data/refocus_augmentation.py. You can run this augmentation on some sample images from our dataset with the following command.

python demo_refocus.py --input_path assets/demo_refocus/ --output_path assets/demo_refocus

This will refocus RGB images by blurring them according to depth_euclidean for each image. You can specify some parameters of the augmentation with the following tags: --num_quantiles (number of qualtiles to use in blur stack), --min_aperture (smallest aperture to use), --max_aperture (largest aperture to use). Aperture size is selected log-uniformly in the range between min and max aperture.

Shallow Focus	Mid Focus	Far Focus

Training State-of-the-Art Models:

Omnidata is a means to train state-of-the-art models in different vision tasks. Here, we provide the code for training our depth and surface normal estimation models. You can train the models with the following commands:

Depth Estimation

We train DPT-based models on Omnidata using 3 different losses: scale- and shift-invariant loss and scale-invariant gradient matching term introduced in MiDaS, and also virtual normal loss introduced here.

python train_depth.py --config_file config/depth.yml --experiment_name rgb2depth --val_check_interval 3000 --limit_val_batches 100 --max_epochs 10

Surface Normal Estimation

We train a UNet architecture (6 down/6 up) for surface normal estimation using L1 Loss and Cosine Angular Loss.

python train_normal.py --config_file config/normal.yml --experiment_name rgb2normal --val_check_interval 3000 --limit_val_batches 100 --max_epochs 10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Omni ↦ Data (Under Construction)

Omnidata Tools

Table of Contents

Installation

Pretrained Models

Network Architecture

Download pretrained models

Run our models on your own image

MiDaS Implementation

3D Image Refocusing

Training State-of-the-Art Models:

Depth Estimation

Surface Normal Estimation

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
assets		assets
config		config
data		data
losses		losses
modules		modules
tools		tools
README.md		README.md
demo.py		demo.py
demo_refocus.py		demo_refocus.py
requirements.txt		requirements.txt
train_depth.py		train_depth.py
train_normal.py		train_normal.py

liuguoyou/omnidata-tools

Folders and files

Latest commit

History

Repository files navigation

Omni ↦ Data (Under Construction)

Omnidata Tools

Table of Contents

Installation

Pretrained Models

Network Architecture

Download pretrained models

Run our models on your own image

MiDaS Implementation

3D Image Refocusing

Training State-of-the-Art Models:

Depth Estimation

Surface Normal Estimation

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages