Skip to content

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

License

Notifications You must be signed in to change notification settings

skimslozo/OpenIns3D

Repository files navigation

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Zhening Huang · Xiaoyang Wu · Xi Chen · Hengshuang Zhao Lei Zhu · Joan Lasenby

PWC PWC PWC PWC

TL;DR: OpenIns3D proposes a "mask-snap-lookup" scheme to achieve 2D-input-free 3D open-world scene understanding, which attains SOTA performance across datasets, even with fewer input prerequisites. 🚀✨

device to watch BBC news furniture that is capable of producing music Ma Long's domain of excellence
most comfortable area to sit in the room penciling down ideas during brainstorming furniture offers recreational enjoyment with friends

OpenIns3D pipeline

Highlights

Overview

Installation

Requirements

  • CUDA: 11.6
  • PyTorch: 11.3
  • Hardware: one 24G memory GPU or better

(Note: that several scenes in S3DIS are very large and may lead to RAM collapse if 24GB GPU is used)

Setup

Install dependencies by running:

conda create -n openins3d python=3.9
conda activate openins3d

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia
conda install pytorch3d -c pytorch3d
conda install lightning -c conda-forge
conda install -c "nvidia/label/cuda-11.6.1" libcusolver-dev
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
conda install nltk

cd third_party/pointnet2
python setup.py install
cd ../

# install MinkowskiEngine for MPM
git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine" # clone the repo to third_party
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas
cd ../../

# install ODISE as 2D detectors
git clone https://github.com/NVlabs/ODISE.git
cd ODISE
pip install -e .
cd ..

pip install torch_scatter gdown==v4.6.3 loguru open3d plyfile pyviz3d python-dotenv omegaconf==2.1.1 iopath==0.1.8

Zero-Shot Scene Understanding

  To achieve zero-shot scene understanding with OpenIns3D, follow these two steps:  

  1. Download Checkpoint for Mask Proposal Module:    - we recommend downloading scannet200_val.ckpt here and placing it under checkpoints/.

  2. Run python zero_shot.py by specifying a) pcd_path: the path of the colored point cloud. b)vocab: vocabulary list that is searching for. ODISE is the 2D detector, so the format of vocab is followed ODISE

We provide several sample datasets from Replica, Mattarport3d, and S3DIS, Scannet for quick testing. Run the following code to download demo data

pip install gdown==v4.6.3
cd demo; python download_demo_scenes.py

(If you are experiencing issues downloading the demo scene files, please ensure that you have the correct version of gdown)

Example of testing:  

# replica demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/replica/replica_scene3.ply' \
--vocab "lamp; blinds; chair; table; door; bowl; window; switch; bottle; indoor-plant; pillow; vase; handrail; basket; bin; shelf; tv-screen; sofa; blanket; bike; sink; bed; stair; refrigerator" \
--dataset replica

# scannet demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/scannet_scene1.ply' \
--vocab "cabinet; bed; chair; sofa; table; door; window; bookshelf; picture; counter; desk; curtain; refrigerator; showercurtain; toilet; sink; bathtub" \
--dataset scannet

# mattarport3d demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/mattarport3d/mp3d_scene1.ply' \
--vocab "chair; window; ceiling; picture; floor; lighting; table; cabinet; curtain; plant; shelving; sink; mirror; stairs;  counter; stool; bed; sofa; shower; toilet; TV; clothes; bathtub; blinds; board" \
--dataset mattarport3d

# s3dis demo
python zero_shot.py \
--pcd_path 'demo/demo_scene/s3dis/s3dis_scene3.npy' \
--vocab "floor; wall; beam; column; window; door; table; chair; sofa; bookcase; board" \
--dataset s3dis

# cuosmtized data 
python zero_shot.py \
--pcd_path 'path/to/your/own/3dscene' \
--vocab "vocabulary list to be used" \

The dataset flag is only for adjusting the loading for different .ply files. For customizing the dataset, use 'scanent' as the default. Let us know if you encounter any issues! 📣

Visulize the results

You can check out the detection results as well as the Snap images, Class_Lookup_Dict, and final results under demo_saved.

When using your coustmize dataset:

  • feel free to change the three parameters [lift_cam, zoomout, remove_lip] under adjust_camera to optimise the snap images for better detection.

Benchmarking on ScanNetv2 and S3DIS

Here we provide instructions to reproduce the results on ScanNetv2 and S3DIS.

(Note: first time run will take a while 🕙 to download checkpoint of 2D detector ODISE automatically)

ScanNetv2:

  1. Download ScanNetv2. (Note: No need to download the .sens file as 2D images are not used)
  2. Pre-process the ScanNetv2 dataset by following the same code in Mask3d, as follows:
python -m openins3d.mask3d.datasets.preprocessing.scannet_preprocessing preprocess \
--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
--save_dir="input_data/processed/scannet" \
--git_repo="PATH_TO_SCANNET_GIT_REPO" \
--scannet200=false
  1. Download the pre-trained Mask Proposal weights from here and place it under checkpoints.

  2. Double-check three paths under scannet_benchmark.sh: include SCANNET_PROCESSED_DIR, SCAN_PATH, and MPM_CHECKPOINT. Change them accordingly. Once changes are made, run the bash file. The bash file will first generate a class-agnostic mask proposal for the 312 scenes, each maks stored as a sparse tensor. Then, Snap and Lookup modules will be implemented under inference_openins3d.py. Eventually, evaluate.py can be called to evaluate the performance by calculating the AP values of the mask detections.

sh scannet_benchmark.sh

S3DIS

  1. Download S3DIS data by filling out this Google form. Download the Stanford3dDataset_v1.2.zip file and unzip it.

  2. Preprocess the dataset with the following code:

python -m openins3d.mask3d.datasets.preprocessing.s3dis_preprocessing preprocess \
--data_dir="PATH_TO_Stanford3dDataset_v1.2" \
--save_dir="input_data/processed/s3dis"

If you encounter issues in preprocessing due to bugs in the S3DIS dataset file, please refer to this issue in the Mask3D repo to fix it.

  1. Download the pre-trained Mask proposal from here and place it under checkpoints.

  2. Double-check two file paths under s3dis_benchmark.sh: include S3DIS_PROCESSED_DIR and MPM_CHECKPOINT. Change them accordingly and then run:

sh s3dis_benchmark.sh

(Note that several scenes in S3DIS are very large and may lead to RAM complications if 24GB is used. Large VRAM is recommended.)

To do

  • Release the batch inference code on STPLS3D
  • Release checkpoints for limited supervision on S3DIS, ScanNetV2
  • Release Evaluation Script for 3D Open-world Object Detection

Citation

If you find OpenIns3D useful for your research, please cite our work as a form of encouragement. 😊

@article{huang2023openins3d,
      title={OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation}, 
      author={Zhening Huang and Xiaoyang Wu and Xi Chen and Hengshuang Zhao and Lei Zhu and Joan Lasenby},
      journal={arXiv preprint},
      year={2023}
    }

Acknowlegement

The mask proposal model is modified from Mask3D, and we heavily used the easy setup version of it for MPM. Thanks again for the great work! 🙌 We also drew inspiration from LAR and ContrastiveSceneContexts when developing the code. 🚀

About

OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •