Jihyun Lee1, Shunsuke Saito2, Giljoo Nam2, Minhyuk Sung1, Tae-Kyun (T-K) Kim1,3
1 KAIST, 2 Codec Avatars Lab, Meta, 3 Imperial College London
[Project Page] [Paper] [Supplementary Video]
We present 🤝InterHandGen, a novel framework that learns the generative prior of two-hand interaction. Sampling from our model yields plausible and diverse two-hand shapes in close interaction with or without an object. Our prior can be incorporated into any optimization or learning methods to reduce ambiguity in an ill-posed setup. Our key observation is that directly modeling the joint distribution of multiple instances imposes high learning complexity due to its combinatorial nature. Thus, we propose to decompose the modeling of joint distribution into the modeling of factored unconditional and conditional single instance distribution. In particular, we introduce a diffusion model that learns the single-hand distribution unconditional and conditional to another hand via conditioning dropout. For sampling, we combine anti-penetration and classifier-free guidance to enable plausible generation. Furthermore, we establish the rigorous evaluation protocol of two-hand synthesis, where our method significantly outperforms baseline generative models in terms of plausibility and diversity. We also demonstrate that our diffusion prior can boost the performance of two-hand reconstruction from monocular in-the-wild images, achieving new state-of-the-art accuracy.
[Apr 14th 2024] There was a bug in anti-penetration loss guidance weighting, and that part is now fixed. I am sorry for the inconvenience.
- Clone this repository and install the dependencies specified in
requirements.txt
.
$ git clone https://github.com/jyunlee/InterHandGen.git
$ mv InterHandGen
$ pip install -r requirements.txt
- Install ChamferDistancePytorch.
$ cd utils
$ git clone https://github.com/ThibaultGROUEIX/ChamferDistancePytorch.git
$ mv ChamferDistancePytorch/chamfer3D
$ python setup.py install
-
Download InterHand2.6M dataset from its official website.
-
Follow the data pre-processing steps of IntagHand (
dataset/interhand.py
). Note that you only need the shape annotation files (anno/*.pkl
), and you can skip the image preprocessing parts. -
Download MANO model from its official website. Place the downloaded
mano_v1_2
folder undermisc
directory.
Train your own two-hand interaction diffusion model using the following command. Note that the pre-trained weights can be downloaded from this Google Drive link.
$ CUDA_VISIBLE_DEVICES={gpu_num} python interhandgen.py --train
Sample two-hand interactions from the trained model. The number of samples can be controlled by vis_epoch
(number of iterations in sampling) and vis_batch
(number of samples for each iteration) in the config file (configs/default.yml
). For a full evaluation, set vis_epoch = 4
and vis_batch = 2500
to generate 4 * 2500 = 10000 samples.
$ CUDA_VISIBLE_DEVICES={gpu_num} python interhandgen.py --model_path {trained_model_path}
Compute the evaluation metrics using the sampled two-hand interactions.
$ cd eval
$ CUDA_VISIBLE_DEVICES={gpu_num} python evaluate.py --sample_num {number_of_samples} --doc {trained_model_dir}
If you find this work useful, please consider citing our paper.
@inproceedings{lee2024interhandgen,
title = {InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion},
author = {Lee, Jihyun and Saito, Shunsuke and Nam, Giljoo and Sung, Minhyuk and Kim, Tae-Kyun},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}
- Parts of our code are based on DiffPose (forward/reverse diffusion process), Pointnet_Pointnet2_pytorch (feature extraction network for evaluation), MoCapDeform (anti-penetration guidance), and motion-diffusion-model (evaluation metrics). We appreciate the authors for releasing their codes.