Skip to content

pbdahzou/GenPose

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenPose: Generative Category-level Object Pose Estimation via Diffusion Models

Website Arxiv Hits GitHub license SOTA

The official Pytorch implementation of the NeurIPS 2023 paper, GenPose.

Overview

Pipeline

(I) A score-based diffusion model and an energy-based diffusion model is trained via denoising score-matching. (II) a) We first generate pose candidates from the score-based model and then b) compute the pose energies for candidates via the energy-based model. c) Finally, we rank the candidates with the energies and then filter out low-ranking candidates. The remaining candidates are aggregated into the final output by mean-pooling.

Contents of this repo are as follows:

Requirements

  • Ubuntu 20.04
  • Python 3.8.15
  • Pytorch 1.12.0
  • Pytorch3d 0.7.2
  • CUDA 11.3
  • 1 * NVIDIA RTX 3090

Installation

  • Install pytorch

pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
  • Install pytorch3d from a local clone

git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
git checkout -f v0.7.2
pip install -e .
  • Install from requirements.txt

pip install -r requirements.txt 
  • Compile pointnet2

cd networks/pts_encoder/pointnet2_utils/pointnet2
python setup.py install

Download dataset and models

  • Download camera_train, camera_val, real_train, real_test, ground-truth annotations and mesh models provided by NOCS. Unzip and organize these files in $ROOT/data as follows:
data
├── CAMERA
│   ├── train
│   └── val
├── Real
│   ├── train
│   └── test
├── gts
│   ├── val
│   └── real_test
└── obj_models
    ├── train
    ├── val
    ├── real_train
    └── real_test
  • Preprocess NOCS files following SPD.

We provide the preprocessed testing data (REAL275) and checkpoints here for a quick evaluation. Download and organize the files in $ROOT/results as follows:

results
├── ckpts
│   ├── EnergyNet
│   │   └── ckpt_genpose.pth
│   └── ScoreNet
│       └── ckpt_genpose.pth
├── evaluation_results
│   ├── segmentation_logs_real_test.txt
│   └── segmentation_results_real_test.pkl
└── mrcnn_results
    ├── real_test
    └── val

The ckpts are the trained models of GenPose.

The evaluation_results are the preprocessed testing data, which contains the segmentation results of Mask R-CNN, the segmented pointclouds of obejcts, and the ground-truth poses.

The mrcnn_results are the segmentation results from here provided by SPD.

Note: You need to preprocess the dataset as mentioned before first if you want to evaluate on CAMERA dataset.

Training

Set the parameter '--data_path' in scripts/train_score.sh and scripts/train_energy.sh to your own path of NOCS dataset.

  • Score network

Train the score network to generate the pose candidates.

bash scripts/train_score.sh
  • Energy network

Train the energy network to aggragate the pose candidates.

bash scripts/train_energy.sh

Evaluation

Set the parameter --data_path in scripts/eval_single.sh to your own path of NOCS dataset.

  • Evaluate on REAL275 dataset.

Set the parameter --test_source in scripts/eval_single.sh to 'real_test' and run:

bash scripts/eval_single.sh
  • Evaluate on CAMERA dataset.

Set the parameter --test_source in scripts/eval_single.sh to 'val' and run:

bash scripts/eval_single.sh

Citation

If you find our work useful in your research, please consider citing:

@article{zhang2023genpose,
  title={GenPose: Generative Category-level Object Pose Estimation via Diffusion Models},
  author={Jiyao Zhang and Mingdong Wu and Hao Dong},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023},
  url={https://openreview.net/forum?id=l6ypbj6Nv5}
}

Contact

If you have any questions, please feel free to contact us:

Jiyao Zhang: [email protected]

Mingdong Wu: [email protected]

Hao Dong: [email protected]

License

This project is released under the MIT license. See LICENSE for additional details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.0%
  • Cuda 4.7%
  • C++ 2.4%
  • Other 0.9%