Welcome to take a look at our new work presented at CoRL 2024, titled "Toward General Object-level Mapping from Sparse Views with 3D Diffusion Priors."
We propose an object-level mapping framework that fuses both multi-view observations and a pre-trained diffusion shape prior model. It generalizes to multi-categories objects, and multiple multi-modalities observations without the need of fine-tuning.
Below are the original notes.
We propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea of our approach is to leverage a learnt generative model for shape categories as a prior and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level mapping system.
Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors, Ziwei Liao*, Jun Yang*, Jingxing Qian*, Angela P Schoellig, Steven L Waslander, ICRA'24.
If you find our work useful, please consider citing our paper:
@article{liao2024uncertainty,
title={Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors},
author={Liao, Ziwei and Yang, Jun and Qian, Jingxing and Schoellig, Angela P and Waslander, Steven L},
journal={2024 IEEE International Conference on Robotics and Automation (ICRA)},
year={2024}
}
Please configure environments and download datasets and models.
Create conda environments
conda create -n objslam python==3.9
conda activate objslam
Install pytorch and pytorch3d
Please install pytorch, torchvision and pytorch3d following here.
conda install pytorch=1.13.0 torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
conda install pytorch3d -c pytorch3d
Install python packages
pip install numpy open3d trimesh scikit-image plyfile h5py wandb opencv-python numpy-quaternion matplotlib addict tqdm
Please download the pretrained generative model weights for three categories, including Chairs, Tables, and Cars. Put it under, e.g., for chairs:
./data/weights/chairs_64
ScanNet & Scan2CAD
Please download a mini subset of ScanNet and Scan2CAD for a quick run, including masks from Mask2Former. Put it under:
./data/dataset/scannet_mini
For the complete experiments on ScanNet, please download the complete ScanNet, Scan2CAD and ShapeNet following here.
Here is a demo on the scene scene0568_00
of ScanNet with 4
chair instances. The demo will run for 200
iterations, from 10
views of observations.
python run_system.py --config configs/config_scannet.json \
--sequence_dir ./data/dataset/scannet_mini \
--save_root ./output/test \
--scene_name scene0568_00
The following results of each instance are stored in the save_root
:
inputs/
: RGB, Depth, Mask images;optimization-ins.gif
: a visualization of the optimization process with shape and pose;visualization_3d.png
: final result with uncertainty.
Our optimization formulation contains several parameters that can be adjusted according to the application, using --args X
:
- Change
lr
,num_iterations
,sample_num
to tune the system to trade-off between time and accuracy. - Change
weight_norm
to balance the shape prior and observations. - Change
init_sigma_pose
,init_sigma_scale
for initialized pose uncertainty, andinit_sigma
for shape uncertainty. - Change
view_num
from 1 to 10 to use single or multi-view results.
Other categories. We use DeepSDF as our generative shape prior model. You can train your own category following here.
Computation. We need 40-60 seconds for one object for 200 iterations. It's possible to speed up with parameters including iteration steps, sampling for trade-off.
TODO. We are organizing the following to release further: Detailed guidance for configuring complete ScanNet and KITTI dataset; demos on Table and Car category.
Our codebase is based on DSP-SLAM and DeepSDF. We appreciate their brilliant work to make this happen.