Zhening Huang · Xiaoyang Wu · Xi Chen · Hengshuang Zhao Lei Zhu · Joan Lasenby
Paper | Video | Project Page
TL;DR: OpenIns3D proposes a "mask-snap-lookup" scheme to achieve 2D-input-free 3D open-world scene understanding, which attains SOTA performance across datasets, even with fewer input prerequisites. 🚀✨
- 6 Jan, 2024: We have released a major revision, incorporating S3DIS and ScanNet benchmark code. Try out the latest version here 🔥🔥.
- 31 Dec, 2023 We release the batch inference code on ScanNet.
- 31 Dec, 2023 We release the zero-shot inference code, test it on your own data!
- Sep, 2023: OpenIns3D is released on arXiv, alongside with explanatory video, project page. We will release the code at end of this year.
- Installation
- Zero-Shot Scene Understanding
- Benchmarking on ScanNetv2 and S3DIS
- Citation
- Acknowledgement
- CUDA: 11.6
- PyTorch: 11.3
- Hardware: one 24G memory GPU or better
(Note: that several scenes in S3DIS are very large and may lead to RAM collapse if 24GB GPU is used)
Install dependencies by running:
conda create -n openins3d python=3.9
conda activate openins3d
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pytorch3d -c pytorch3d
conda install lightning -c conda-forge
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
conda install nltk
cd third_party/pointnet2
python setup.py install
cd ../
# install MinkowskiEngine for MPM
git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine" # clone the repo to third_party
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas
cd ../../
# install ODISE as 2D detectors
git clone https://github.com/NVlabs/ODISE.git
cd ODISE
pip install -e .
cd ..
pip install torch_scatter gdown loguru open3d plyfile pyviz3d python-dotenv omegaconf==2.1.1 iopath==0.1.8
To achieve zero-shot scene understanding with OpenIns3D, follow these two steps:
-
Download Checkpoint for Mask Proposal Module: - we recommend downloading scannet200_val.ckpt here and placing it under
checkpoints/
. -
Run
python zero_shot.py
by specifying a)pcd_path
: the path of the colored point cloud. b)vocab
: vocabulary list that is searching for. ODISE is the 2D detector, so the format of vocab is followed ODISE
We provide several sample datasets from Replica
, Mattarport3d
, and S3DIS
, Scannet
for quick testing. Run the following code to download demo data
cd demo; python download_demo_scenes.py
Example of testing:
# replica demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/replica/replica_scene3.ply' \
--vocab "lamp; blinds; chair; table; door; bowl; window; switch; bottle; indoor-plant; pillow; vase; handrail; basket; bin; shelf; tv-screen; sofa; blanket; bike; sink; bed; stair; refrigerator" \
--dataset replica
# scannet demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/scannet_scene1.ply' \
--vocab "cabinet; bed; chair; sofa; table; door; window; bookshelf; picture; counter; desk; curtain; refrigerator; showercurtain; toilet; sink; bathtub" \
--dataset scannet
# mattarport3d demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/mattarport3d/mp3d_scene1.ply' \
--vocab "chair; window; ceiling; picture; floor; lighting; table; cabinet; curtain; plant; shelving; sink; mirror; stairs; counter; stool; bed; sofa; shower; toilet; TV; clothes; bathtub; blinds; board" \
--dataset mattarport3d
# s3dis demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/s3dis/s3dis_scene3.npy' \
--vocab "floor; wall; beam; column; window; door; table; chair; sofa; bookcase; board" \
--dataset s3dis
# cuosmtized data
python zero_shot.py \
--pcd_path 'path/to/your/own/3dscene' \
--vocab "vocabulary list to be used" \
The dataset flag is only for adjusting the loading for different .ply files. For customizing the dataset, use 'scanent' as the default. Let us know if you encounter any issues! 📣
You can check out the detection results as well as the Snap images, Class_Lookup_Dict, and final results under demo_saved
.
When using your coustmize dataset:
- feel free to change the three parameters [
lift_cam, zoomout, remove_lip
] underadjust_camera
to optimise the snap images for better detection.
Here we provide instructions to reproduce the results on ScanNetv2 and S3DIS.
(Note: first time run will take a while 🕙 to download checkpoint of 2D detector ODISE automatically)
- Download ScanNetv2. (Note: No need to download the
.sens
file as 2D images are not used) - Pre-process the ScanNetv2 dataset by following the same code in Mask3d, as follows:
python -m openins3d.mask3d.datasets.preprocessing.scannet_preprocessing preprocess \
--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
--save_dir="input_data/processed/scannet" \
--git_repo="PATH_TO_SCANNET_GIT_REPO" \
--scannet200=false
-
Download the pre-trained Mask Proposal weights from here and place it under
checkpoints
. -
Double-check three paths under
scannet_benchmark.sh
: includeSCANNET_PROCESSED_DIR
,SCAN_PATH
, andMPM_CHECKPOINT
. Change them accordingly. Once changes are made, run the bash file. The bash file will first generate a class-agnostic mask proposal for the 312 scenes, each maks stored as a sparse tensor. Then, Snap and Lookup modules will be implemented underinference_openins3d.py
. Eventually,evaluate.py
can be called to evaluate the performance by calculating the AP values of the mask detections.
sh scannet_benchmark.sh
-
Download S3DIS data by filling out this Google form. Download the Stanford3dDataset_v1.2.zip file and unzip it.
-
Preprocess the dataset with the following code:
python -m openins3d.mask3d.datasets.preprocessing.s3dis_preprocessing preprocess \
--data_dir="PATH_TO_Stanford3dDataset_v1.2" \
--save_dir="input_data/processed/s3dis"
If you encounter issues in preprocessing due to bugs in the S3DIS dataset file, please refer to this issue in the Mask3D repo to fix it.
-
Download the pre-trained Mask proposal from here and place it under
checkpoints
. -
Double-check two file paths under
s3dis_benchmark.sh
: includeS3DIS_PROCESSED_DIR
andMPM_CHECKPOINT
. Change them accordingly and then run:
sh s3dis_benchmark.sh
(Note that several scenes in S3DIS are very large and may lead to RAM complications if 24GB is used. Large VRAM is recommended.)
- Release the batch inference code on STPLS3D
- Release checkpoints for limited supervision on S3DIS, ScanNetV2
- Release Evaluation Script for 3D Open-world Object Detection
If you find OpenIns3D useful for your research, please cite our work as a form of encouragement. 😊
@article{huang2023openins3d,
title={OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation},
author={Zhening Huang and Xiaoyang Wu and Xi Chen and Hengshuang Zhao and Lei Zhu and Joan Lasenby},
journal={arXiv preprint},
year={2023}
}
The mask proposal model is modified from Mask3D, and we heavily used the easy setup version of it for MPM. Thanks again for the great work! 🙌 We also drew inspiration from LAR and ContrastiveSceneContexts when developing the code. 🚀