forked from autonomousvision/sdfstudio
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9415c2d
commit 3c1ca94
Showing
2 changed files
with
296 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
# Data | ||
|
||
This is a short documentation of sdfstudio data and it is organized as: | ||
|
||
- [Dataset format](#Dataset-format) | ||
- [Existing dataset](#Existing-dataset) | ||
- [Custom dataset](#Custom-dataset) | ||
|
||
# Dataset format | ||
We scan scan65 of dtu scene to show how the data are organized. It looks like the following: | ||
```bash | ||
└── scan65 | ||
└── meta_data.json | ||
├── pairs.txt | ||
├── 000000_rgb.png | ||
├── 000000_normal.npy | ||
├── 000000_depth.npy | ||
├── ..... | ||
``` | ||
The json file (meta_data.json) stores meta data of the scene, it has the following format: | ||
```yaml | ||
{ | ||
"camera_model": "OPENCV", # camera model, currently only opencv is supported | ||
"height": 384, # height of the images | ||
"width": 384, # width of the images | ||
"has_mono_prior": true, # contains monocualr prior or not | ||
"pairs": "paris.txt", # pairs file used for multi-view photometric consistency loss | ||
"worldtogt": [[ 1, 0, 0, 0], # world to gt transformation, it's usefule for evauation | ||
[ 0, 1, 0, 0], | ||
[ 0, 0, 1, 0], | ||
[ 0, 0, 0, 1]], | ||
"scene_box": { | ||
"aabb": [[-1, -1, -1], # aabb for the bbox | ||
[1, 1, 1]], | ||
"near": 0.5, # near plane for each image | ||
"far": 4.5, # far plane for each image | ||
"radius": 1.0, # radius of ROI region in scene | ||
"collider_type": "near_far" | ||
# collider_type can be "near_far", "box", "sphere", | ||
# it indicates how do we determine the near and far for each ray | ||
# 1. near_far means we use the same near and far value for each ray | ||
# 2. box means we compute the intersection with bbox | ||
# 3. sphere means we compute the intersection with sphere | ||
}, | ||
"frames": [ # this contains information for each image | ||
{ | ||
# note all paths are relateive path | ||
# path of rgb image | ||
"rgb_path": "000000_rgb.png", | ||
# camera to world transform | ||
"camtoworld": [[0.9702627062797546, -0.014742869883775711, -0.2416049987077713, 0.6601868867874146], | ||
[0.007479910273104906, 0.9994929432868958, -0.03095100075006485, 0.07803472131490707], | ||
[0.2419387847185135, 0.028223417699337006, 0.9698809385299683, -2.6397712230682373], | ||
[0.0, 0.0, 0.0, 1.0 ]], | ||
# intrinsic of current imaga | ||
"intrinsics": [[925.5457763671875, -7.8512319305446e-05, 199.4256591796875, 0.0], | ||
[0.0, 922.6160278320312, 198.10269165039062, 0.0 ], | ||
[0.0, 0.0, 1.0, 0.0 ], | ||
[0.0, 0.0, 0.0, 1.0 ]], | ||
# path of monocular depth prior | ||
"mono_depth_path": "000000_depth.npy", | ||
# path of monocular normal prior | ||
"mono_normal_path": "000000_normal.npy" | ||
}, | ||
... | ||
] | ||
} | ||
``` | ||
|
||
The paris.txt is used for multi-view photometric consistency loss. It has the following format: | ||
```bash | ||
# ref image, source image 1, source image 2, ..., source image N | ||
000000.png 000032.png 000023.png 000028.png 000031.png 000029.png 000030.png 000024.png 000002.png 000015.png 000025.png ... | ||
000001.png 000033.png 000003.png 000022.png 000016.png 000027.png 000023.png 000007.png 000011.png 000026.png 000024.png ... | ||
... | ||
``` | ||
# Existing dataset | ||
|
||
We adapted the dataset used in MonoSDF to sdfstudio format and it can be downloaded with | ||
``` | ||
ns-download-data sdfstudio --dataset-name DATASET_NAME | ||
``` | ||
The `DATASET_NAME` can be chosen from `sdfstudio-demo-data, dtu, replica, scannet, tanks-and-temple, tanks-and-temple-highres, all`. Use all if you want to download all the dataset. | ||
|
||
Note that for the dtu dataset, you should use `--pipeline.model.sdf-field.inside-outside False` and for the indoor dataset you should use `--pipeline.model.sdf-field.inside-outside True` druing training. | ||
|
||
We also provide the preprocessed heritage data from neuralreconW and it can be downloaded with | ||
``` | ||
ns-download-data sdfstudio --dataset-name heritage | ||
``` | ||
|
||
# Custom dataset | ||
|
||
You could implement your own data-parser to use custom dataset or convert you dataset to sdfstudio data format as shown above. Here we provide an example to convert scannet dataset to sdfstudio data format. Please change the path accordingly. | ||
```bash | ||
python scripts/datasets/process_scannet_to_sdfstudio.py --input_path /home/yuzh/Projects/datasets/scannet/scene0050_00 --output_path data/custom/scannet_scene0050_00 | ||
``` | ||
|
||
Then, you can extract monocular depths and normals (please install [omnidata model](https://github.com/EPFL-VILAB/omnidata) before running the command): | ||
```bash | ||
python scripts/datasets/extract_monocular_cues.py --task normal --img_path data/custom/scannet_scene0050_00/ --output_path data/custom/scannet_scene0050_00 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS | ||
python scripts/datasets/extract_monocular_cues.py --task normal --img_path data/custom/scannet_scene0050_00/ --output_path data/custom/scannet_scene0050_00 --omnidata_path YOUR_OMNIDATA_PATH --pretrained_models PRETRAINED_MODELS | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
# Documentation | ||
|
||
This is a short documentation of sdfstudio and it is organized as: | ||
|
||
- [Methods](#Methods) | ||
- [Representations](#Representations) | ||
- [Supervisions](#Supervisions) | ||
|
||
# Methods | ||
|
||
We have implemented multiple neural implicit surface reconstruction methods and three basic methods are UniSurf, VolSDF, and NeuS. The main difference of these method is how the points along the ray are sampled and how to use SDF for volume rendering. For detailed of these methods, please check the corresponding paper. Here we explain these methods shortly and give examples of how to use it in the following: | ||
|
||
## UniSurf | ||
|
||
UniSurf first finds the intersection of the surface and sample points around the surface. The sampling range starts from a large range and decrease to a small range during training. When no surface is found for a ray, it samples uniformly according to the near and far value of the ray. To train a unisurf model, you could run as: | ||
``` | ||
ns-train unisurf --pipeline.model.sdf-field.inside-outside False sdfstudio-data --data data/sdfstudio-demo-data/dtu-scan65 | ||
``` | ||
|
||
## VolSDF | ||
VolSDF uses error-bound sampler [see the paper for details] and convert the sdf value to density and then use normal volume rendering. To train a unisurf model, you could run as: | ||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.inside-outside False sdfstudio-data --data data/sdfstudio-demo-data/dtu-scan65 | ||
``` | ||
|
||
## NeuS | ||
NeuS uses hierachical sampling with multiple steps and convert the sdf value to alpha value based on sigmoid function [see the paper for details]. To train a NeuS model, you could run as: | ||
|
||
``` | ||
ns-train neus --pipeline.model.sdf-field.inside-outside False sdfstudio-data --data data/sdfstudio-demo-data/dtu-scan65 | ||
``` | ||
|
||
## MonoSDF | ||
MonoSDF is built on top of VolSDF and propose to use monocualr prior as additional supervision, it | ||
|
||
To train a monosdf model in the indoor scene, you could run as: | ||
``` | ||
ns-train monosdf --pipeline.model.sdf-field.inside-outside True sdfstudio-data --data data/sdfstudio-demo-data/replica-room0 --include-mono-prior True | ||
``` | ||
|
||
## Mono-unisurf | ||
Similar to monosdf, Mono-unisurf use monocualr prior as additional supervision for unisurf. It can be trained as: | ||
``` | ||
ns-train mono-unisurf --pipeline.model.sdf-field.inside-outside True sdfstudio-data --data data/sdfstudio-demo-data/replica-room0 --include-mono-prior True | ||
``` | ||
|
||
## Mono-neus | ||
Similar to monosdf, mono-neus use monocualr prior as additional supervision for neus. It can be trained as: | ||
``` | ||
ns-train mono-neus --pipeline.model.sdf-field.inside-outside True sdfstudio-data --data data/sdfstudio-demo-data/replica-room0 --include-mono-prior True | ||
``` | ||
|
||
## Geo-NeuS | ||
|
||
``` | ||
ns-train geo-neus --pipeline.model.sdf-field.inside-outside False sdfstudio-data -data data/dtu/scan24 --load-pairs True | ||
``` | ||
|
||
## Geo-unisurf | ||
The idea of geo-neus can also applied to unisurf, which we call geo-unisurf. It can be run as: | ||
``` | ||
ns-train geo-unisurf --pipeline.model.sdf-field.inside-outside False sdfstudio-data -data data/dtu/scan24 --load-pairs True | ||
``` | ||
|
||
## Geo-VolSDF | ||
Same here, we applied the idea of geo-neus to volsdf. It can be run as: | ||
``` | ||
ns-train geo-volsdf --pipeline.model.sdf-field.inside-outside False sdfstudio-data -data data/dtu/scan24 --load-pairs True | ||
``` | ||
|
||
## NeuS-acc | ||
NeuS-acc maintains an occupancy grid for empty space skipping during points sampling along the ray. It significantly reduces the number of samples used in training as thus speed up training. It can be trained as: | ||
``` | ||
ns-train neus-acc --pipeline.model.sdf-field.inside-outside False sdfstudio-data -data data/dtu/scan65 | ||
``` | ||
|
||
## NeuS-facto | ||
NeuS-facto is inspired by nerfacto in nerfstudio, where a proposal network proposed in mipnerf360 is used for sampling points along the ray. We apply the idea to neus to speed up the sampling process and reduce the number of samples for each ray. It can be trained as: | ||
``` | ||
ns-train neus-facto --pipeline.model.sdf-field.inside-outside False sdfstudio-data -data data/dtu/scan65 | ||
``` | ||
|
||
## NeuralReconW | ||
|
||
NeuralReconW is specifically designed for heritage scenes and hence can only be applied to these scenes. Specifically, it uses sparse point cloud to create an coarse occupancy grid and for each ray, we first find the intersection with the occupancy grid to determine near and far for the ray. then it samples points uniformly within the near and far range. Further, it also use a surface guided sampling, where it first find the intersection of the surface and then set a , To speed up the sampling, it use a more fine-graind cache for sdf so that it don't need to query the network for intersection. The sdf cache will be updated during training (e.g. every 5K iterations). To train a NeuralReconW model, you could run the following. | ||
|
||
``` | ||
ns-train neusW --pipeline.model.sdf-field.inside-outside False heritage-data --data data/heritage/brandenburg_gate | ||
``` | ||
|
||
# Representations | ||
|
||
The neural representation contains two parts, a geometric network and a color network. The geometric network takes a 3D position as input and outputs a sdf value, a normal vector, and a geometric feautre vector. And the color network takes a 3D position and view direction together with the normal vector and the geometric feautre vector from geometric network and as inputs and outputs a RGB color vector. | ||
|
||
We support three representations for the geometric network: MLPs, Multi-res feature grids, and Tri-plane. We now explain the detailes and how to use it in the following: | ||
|
||
## MLPs | ||
|
||
The 3D position is encoded with positional encoding as in nerf and pass to a multi-layer perception network to prediction sdf, normal, and geometric feature. For example, to use a MLPs with 8 layers and 512 hiddin dimension, you could run as: | ||
|
||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.use-grid-feature False --pipeline.model.sdf-field.use-grid-feature sdfstudio-data --data YOUR_DATA | ||
``` | ||
|
||
## Multi-res feature grids | ||
|
||
The 3D position is first mapped to a multi-resolution feature grids and use tri-linear interpolation to retreive the corresponding feature vector, it is then used as input to a MLPs to prediction sdf, normal, and geometric feature. For example, to use a multi-res feature grids with 2 layers and 256 hiddin dimension, you could run as: | ||
|
||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.use-grid-feature True --pipeline.model.sdf-field.encoding-type hash sdfstudio-data --data YOUR_DATA | ||
``` | ||
|
||
## Tri-plane | ||
|
||
The 3D position is first mapped three orthogonal planes and use linear interpolation to retreive feature vector for each plane and concat them as input the the MLPs. To use tri-plane, you could config as: | ||
|
||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.use-grid-feature True --pipeline.model.sdf-field.encoding-type tri-plane sdfstudio-data --data YOUR_DATA | ||
``` | ||
|
||
## Geometric initilaization | ||
|
||
Good initialization is important to get good results. So we usually initialize the sdf as a sphere. For example, in the DTU dataset, we usually init | ||
|
||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.geometric-init True --pipeline.model.sdf-field.bias 0.5--pipeline.model.sdf-field.inside-outside False | ||
``` | ||
|
||
And in the indoor scene we us | ||
|
||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.geometric-init True --pipeline.model.sdf-field.bias 0.8 --pipeline.model.sdf-field.inside-outside True | ||
``` | ||
|
||
Note that in the indoor scenes, cameras are inside the sphere so we set inside-outside to True such that the point inside the sphere will have positive sdf value and outside the sphere will have negetive value. | ||
|
||
## Color network | ||
|
||
The color netwokr is a MLPs, similar to geometric network, it can be config as: | ||
|
||
``` | ||
ns-train volsdf --pipeline.model.sdf-field.num-layers-color 2 --pipeline.model.sdf-field.hidden-dim-color 512 | ||
``` | ||
|
||
# Supervisions | ||
|
||
## RGB Loss | ||
|
||
We use L1 loss for the RGB loss to supervise the rendered color for each ray. It is always used for all models. | ||
|
||
## Mask Loss | ||
|
||
The mask loss is usually helpful to seperate foreground object and background. However, it needs additonal inputs. For example, in neuralreconW, a segmentation network is used to predict the sky region and the sky segmentation is used as label for mask loss. It is used by default is masks are provided in the dataset. You could change the weight for the mask loss with | ||
``` | ||
--pipeline.model.fg-mask-loss-mult 0.001 | ||
``` | ||
|
||
## Eikonal loss | ||
|
||
Eikonal loss is used in all SDF-based method to regularize the SDF field except unisurf because unisurf use occupancy field. You could change the weight for eikonal loss as: | ||
``` | ||
--pipeline.model.eikonal-loss-mult 0.01 | ||
``` | ||
|
||
## Smoothness Loss | ||
|
||
The smoothness enforce smoothness surface, it is used in unisurf and can be config as | ||
``` | ||
--pipeline.model.smooth-loss-multi 0.01 | ||
``` | ||
|
||
## Monocular depth consistency | ||
|
||
The monocular depth consistency loss is proposed in MonoSDF which use a pretrained monocular depth network to provided priors during training. It is very useful in sparse view cases and in indoor scenes. | ||
``` | ||
--pipeline.model.mono-depth-loss-mult 0.1 | ||
``` | ||
|
||
## Monocular normal consistency | ||
The monocular normal consistency loss is proposed in MonoSDF which use a pretrained monocular normal network to provided priors during training. It is very useful in sparse view cases and in indoor scenes. | ||
``` | ||
--pipeline.model.mono-normal-loss-mult 0.05 | ||
``` | ||
|
||
## Multi-view photometric consistency | ||
|
||
Multi-view photometric consistency is proposed in Geo-NeuS, where for each ray, it find the intersection with the surface and use homography to warp patches from nearby views to target views and use normalized cross correaltion loss (NCC) for supervision. It can be config as | ||
``` | ||
ns-train volsdf --pipeline.model.patch-size 11 --pipeline.model.patch-warp-loss-mult 0.1 --pipeline.model.topk 4 | ||
``` | ||
where topk is number of nearby views that have smalleast NCC loss used for supervision. It is an approximate occlusion handling. |