DirectVoxGO (Direct Voxel Grid Optimization, see our paper) reconstructs a scene representation from a set of calibrated images capturing the scene.
- NeRF-comparable quality for synthesizing novel views from our scene representation.
- Super-fast convergence: Our
15 mins/scene
vs. NeRF's10~20+ hrs/scene
. - No cross-scene pre-training required: We optimize each scene from scratch.
- Better rendering speed: Our
<1 secs
vs. NeRF's29 secs
to synthesize a800x800
images.
Below run-times (mm:ss) of our optimization progress are measured on a machine with a single RTX 2080 Ti GPU.
github_teaser.mp4
- 2021.11.23: Support CO3D dataset.
- 2021.11.23: Initial release.
git clone [email protected]:sunset1995/DirectVoxGO.git
cd DirectVoxGO
pip install -r requirements.txt
Pytorch installation is machine dependent, please install the correct version for your machine. The tested version is pytorch 1.8.1 with python 3.7.4.
Dependencies (click to expand)
PyTorch
,numpy
: main computation.scipy
,lpips
: SSIM and LPIPS evaluation.tqdm
: progress bar.mmcv
: config system.opencv-python
: image processing.imageio
,imageio-ffmpeg
: images and videos I/O.
Directory structure for the datasets (click to expand; only list used files)
data
├── nerf_synthetic # Link: https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1
│ └── [chair|drums|ficus|hotdog|lego|materials|mic|ship]
│ ├── [train|val|test]
│ │ └── r_*.png
│ └── transforms_[train|val|test].json
│
├── Synthetic_NSVF # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/Synthetic_NSVF.zip
│ └── [Bike|Lifestyle|Palace|Robot|Spaceship|Steamtrain|Toad|Wineholder]
│ ├── intrinsics.txt
│ ├── rgb
│ │ └── [0_train|1_val|2_test]_*.png
│ └── pose
│ └── [0_train|1_val|2_test]_*.txt
│
├── BlendedMVS # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/BlendedMVS.zip
│ └── [Character|Fountain|Jade|Statues]
│ ├── intrinsics.txt
│ ├── rgb
│ │ └── [0|1|2]_*.png
│ └── pose
│ └── [0|1|2]_*.txt
│
├── TanksAndTemple # Link: https://dl.fbaipublicfiles.com/nsvf/dataset/TanksAndTemple.zip
│ └── [Barn|Caterpillar|Family|Ignatius|Truck]
│ ├── intrinsics.txt
│ ├── rgb
│ │ └── [0|1|2]_*.png
│ └── pose
│ └── [0|1|2]_*.txt
│
├── deepvoxels # Link: https://drive.google.com/drive/folders/1ScsRlnzy9Bd_n-xw83SP-0t548v63mPH
│ └── [train|validation|test]
│ └── [armchair|cube|greek|vase]
│ ├── intrinsics.txt
│ ├── rgb/*.png
│ └── pose/*.txt
│
└── co3d # Link: https://github.com/facebookresearch/co3d
└── [donut|teddybear|umbrella|...]
├── frame_annotations.jgz
├── set_lists.json
└── [129_14950_29917|189_20376_35616|...]
├── images
│ └── frame*.jpg
└── masks
└── frame*.png
We use the datasets organized by NeRF, NSVF, and DeepVoxels. Download links:
- Synthetic-NeRF dataset (manually extract the
nerf_synthetic.zip
todata/
) - Synthetic-NSVF dataset (manually extract the
Synthetic_NSVF.zip
todata/
) - BlendedMVS dataset (manually extract the
BlendedMVS.zip
todata/
) - Tanks&Temples dataset (manually extract the
TanksAndTemple.zip
todata/
) - DeepVoxels dataset (manually extract the
synthetic_scenes.zip
todata/deepvoxels/
)
Download all our trained models and rendered test views at this link to our logs.
We also support the recent Common Objects In 3D dataset. Our method only performs per-scene reconstruction and no cross-scene generalization.
To train lego
scene and evaluate testset PSNR
at the end of training, run:
$ python run.py --config configs/nerf/lego.py --render_test
Use --i_print
and --i_weights
to change the log interval.
To only evaluate the testset PSNR
, SSIM
, and LPIPS
of the trained lego
without re-training, run:
$ python run.py --config configs/nerf/lego.py --render_only --render_test \
--eval_ssim --eval_lpips_vgg
Use --eval_lpips_alex
to evaluate LPIPS with pre-trained Alex net instead of VGG net.
All config files to reproduce our results:
$ ls configs/*
configs/blendedmvs:
Character.py Fountain.py Jade.py Statues.py
configs/nerf:
chair.py drums.py ficus.py hotdog.py lego.py materials.py mic.py ship.py
configs/nsvf:
Bike.py Lifestyle.py Palace.py Robot.py Spaceship.py Steamtrain.py Toad.py Wineholder.py
configs/tankstemple:
Barn.py Caterpillar.py Family.py Ignatius.py Truck.py
configs/deepvoxels:
armchair.py cube.py greek.py vase.py
Check the comments in configs/default.py
for the configuable settings.
The default values reproduce our main setup reported in our paper.
We use mmcv
's config system.
To create a new config, please inherit configs/default.py
first and then update the fields you want.
Below is an example from configs/blendedmvs/Character.py
:
_base_ = '../default.py'
expname = 'dvgo_Character'
basedir = './logs/blended_mvs'
data = dict(
datadir='./data/BlendedMVS/Character/',
dataset_type='blendedmvs',
inverse_y=True,
white_bkgd=True,
)
Adjusting the data related config fields to fit your camera coordinate system is recommend before implementing a new one. We provide two visualization tools for debugging.
- Inspect the camera and the allocated BBox.
- Export via
--export_bbox_and_cams_only {filename}.npz
:python run.py --config configs/nerf/mic.py --export_bbox_and_cams_only cam_mic.npz
- Visualize the result:
python tools/vis_train.py cam_mic.npz
- Export via
- Inspect the learned geometry after coarse optimization.
- Export via
--export_coarse_only {filename}.npz
(assumedcoarse_last.tar
available in the train log):python run.py --config configs/nerf/mic.py --export_coarse_only coarse_mic.npz
- Visualize the result:
python tools/vis_volume.py coarse_mic.npz 0.001 --cam cam_mic.npz
- Export via
Inspecting the cameras & BBox | Inspecting the learned coarse volume |
---|---|
We have reported some ablation experiments in our paper supplementary material.
Setting N_iters
, N_rand
, num_voxels
, rgbnet_depth
, rgbnet_width
to larger values or setting stepsize
to smaller values typically leads to better quality but need more computation.
Only stepsize
is tunable in testing phase, while all the other fields should remain the same as training.
Plenoxels directly optimize voxel grids and achieve super-fast convergence as well. They use sparse voxel grids but require custom CUDA implementation. They use spherical harmonics to model view-dependent RGB w/o using MLPs. Some of their components could be adapted to our code in the future extension:
- Total variation (TV) and Cauchy sparsity regularizer.
- Use NDC to extend to forward-facing datas.
- Use MSI to extend to unbounded inward-facing 360 datas.
- Replace current local-feature conditioned tiny MLP with the spherical harmonic coefficients.
VaxNeRF use Visual Hull to speedup NeRF training. Only 30 lines of code modification are required based on existing NeRF code base.
The code base is origined from an awesome nerf-pytorch implementation, but it becomes very different from the code base now.