Skip to content

Official Code Release for HOIDiffusion (CVPR 2024) - Easy to install and run

License

Notifications You must be signed in to change notification settings

JunukCha/HOIDiffusion

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HOIDiffusion

Official implementation of HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data.

[CVPR'24] | πŸ“ Arxiv | πŸ—’οΈ Project Page | πŸ“½οΈ Video |✨ Models

Easy to Install and Run Demo

CUDA

Ensure that your CUDA version is set to 11.7. ~/.bashrc

export PATH=/usr/local/cuda-11.7/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:$LD_LIBRARY_PATH

MANO

You should download the MANO and you can edit the --rhm-path in this file.

1. Clone

git clone https://github.com/JunukCha/HOIDiffusion.git
cd HOIDiffsion

2. Install

source scripts/install.sh

3. Download GRAB Trained Networks

GRAB

Sign in and find Trained Networks in the download menu. Download models.zip in third_party/test folder.

4. Extract and Move GRAB Trained Networks

source scripts/extract_mv_grabnet_pth.sh

5. Download GRAB object meshes

GRAB image

Download GRAB Objects and upzip it.

Put the folder contact_meshes in the /data/GRAB/tools/object_meshes. Or put the folder contact_meshes in the other folder and edit the --obj-path in this file.

6. Download midas_models

source scripts/download_midas_models.sh

7. Create Test Data

source scripts/create_test_data.sh

You can adjust the --obj-path and --rhm-path according to your needs.

8. Download HOI network

Models

Download it in root folder HOIDiffusion.

9. Run Demo

source scripts/demo.sh

teaser

πŸ”§ Dependencies and Installation

  1. For main model (stable diffusion+condition model) training and testing.
  • Python >= 3.8
    conda create --name <env_name> python==3.8
    pip install -r requirements.txt
  1. Normal map is estimated from midas depth model and specified threshold. Please create a new folder named midas_models and download the checkpoint inside.
  • dpt_hybrid-midas-501f0c75.pt: download it from this link.
  • The directory should look like this:
    configs
    ldm
    midas_models/
      dpt_hybrid-midas-501f0c75.pt
  1. To establish the environment for hand-object synthetic image generation and rendering, please follow this repo: GrabNet

  2. The pipeline leverages Llava to generate detailed prompts for training images. Please follow this repo to setup if this also works for you: LLaVA

⏬ Data Preparation

Training Image Preprocess

The model has been experimented on three large datasets. The download links are listed as below:

Model could be trained on separate or a combine of them. Other or self-collected datasets could also be used. After preprocessing it, every RGB image should be aligned with a hand-object mask, hand skeleton projected image, and sometimes a binary segmentation image. Data preprocess example codes are provided in third_party/data about HOI4D. A .csv file is also generated to store the image correspondance, which should have following structure.

image,skeleton,top,bottom,left,right,sentence,seg,mask
<RGB_path>/001.jpg,<Skeleton_path>/001.jpg,309,1079,683,1614,A hand is grasping a bucket,<Seg_path>/001.jpg,<Mask_path>/001.jpg
<RGB_path>/002.jpg,<Skeleton_path>/002.jpg,328,1079,725,1624,A hand is grasping a bottle,<Seg_path>/002.jpg,<Mask_path>/002.jpg

Top, bottom, left, right is pixel coordinates for hand-object segmentation borders. This is to crop HOI out, avoiding what we focused is too small in the whole image.

Prompt Generation

We provide the scripts in third_party/prompt to generate prompts with more detailed background and foreground description over templates. Please refer to README.md for more details.

Regularization Data

We use pretrained text2image stable diffusion models to synthesize 512$\times$512 scenery images as regularization data. The .csv file is structured similarly as training HOI:

image,sentence
<Reg_path>/0001.jpg,"A bustling, neon-lit cyberpunk street scene."
<Reg_path>/0002.jpg,"A dramatic, fiery lava flow in a volcano."

Here are some image examples:

Similar method could be used to construct your regularization data.

Testing Data Generation

Instead of spliting the original HOI datasets into train/test sets. We generate some new data conditions with seen/unseen objects/hand poses. To complete this, GrabNet is utilized to synthesize grasping hand given object model, and spherical interpolation is used to generate fetching trajectory. Please refer to third_party/test for more details.

πŸ’» Training

With all the data prepared, we coud train new model. Download the stable diffusion base model 1.4 and put it under ./models directory. Use this link to download. In our experiment, we adopt sd-v1-4.ckpt.

Then run following cmd to start training.

python -m torch.distributed.launch \
       --nproc_per_node=<n_gpu> --master_port 47771 train_dex.py \
       --data <train_data_path> --reg_data <reg_data_path> \
       --bsize 8 --bg_th 0.1 --epochs 5 --num_workers 4 \
       --ckpt ./models/sd-v1-4.ckpt --config configs/train_dex.yaml \
       --name train_dex --lr 1e-5 --auto_resume \
       --save_freq 5000 --reg_prob 0.1

If you'd like to only train condition model and set pretrained stable diffusion backbone locked, please add --sd_lock. This will reduce GPU usage, however, with backbone locked, longer training time is required to adapt to HOI distribution. --reg_prob is used to set regularization training strength. If background control is not important and you hope generated images are more realistic, it could be set to 0.

If training data quality is not satisfied or testing images are too OOD, we may witness degradation in performance.

πŸš€ Testing

We provide the script below for testing:

python -m torch.distributed.launch
       --nproc_per_node=<n_gpu> --master_port 47771 test_dex.py \
       --which_cond dex --bs 2 --cond_weight 1 --sd_ckpt <sd_backbone_model_path> \
       --cond_tau 1 --adapter_ckpt <condition_model_path> --cond_inp_type image \
       --input <test_image_folder> --file <data_structure_file_name> \
       --outdir <output_folder>

Acknowledgements

HOIDiffusion leverages following open-source repositories, we thank all authors for their amazing work:

Citation

@article{zhang2024hoidiffusion,
  title={HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data},
  author={Zhang, Mengqi and Fu, Yang and Ding, Zheng and Liu, Sifei and Tu, Zhuowen and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2403.12011},
  year={2024}
}

About

Official Code Release for HOIDiffusion (CVPR 2024) - Easy to install and run

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.7%
  • Shell 6.3%