The official PyTorch implementation of the paper "Human Motion Diffusion Model".
Please visit our webpage for more details.
If you find this code useful in your research, please cite:
@article{tevet2022human,
title={Human Motion Diffusion Model},
author={Tevet, Guy and Raab, Sigal and Gordon, Brian and Shafir, Yonatan and Bermano, Amit H and Cohen-Or, Daniel},
journal={arXiv preprint arXiv:2209.14916},
year={2022}
}
This code was tested on Ubuntu 18.04.5 LTS
and requires:
- Python 3.7
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate mdm
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
Download dependencies:
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
There are two paths to get the data:
(a) Go the easy way if you just want to generate text-to-motion (excluding editing which does require motion capture data)
(b) Get full data to train and evaluate the model.
HumanML3D - Clone HumanML3D, then copy the data dir to our repository:
cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D motion-diffusion-model/dataset/HumanML3D
cd motion-diffusion-model
HumanML3D - Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
KIT - Download from HumanML3D (no processing needed this time) and the place result in ./dataset/KIT-ML
Download the model(s) you wish to use, then unzip and place it in ./save/
. For text-to-motion, you need only the first one.
HumanML3D
humanml-encoder-512 (best model)
KIT
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --num_samples 10 --num_repetitions 3
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --input_text ./assets/example_text_prompts.txt
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."
You can also define:
--device
id.--seed
to sample different prompts.--motion_length
in seconds (maximum is 9.8[sec]).
Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion.
It will look something like this:
You can stop here, or render the SMPL mesh using the following script.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in.obj
format.
Notes:
- The
.obj
can be integrated into Blender/Maya/3DS-MAX and rendered using them. - This script is running SMPLify and needs GPU as well (can be specified with the
--device
flag). - Important - Do not change the original
.mp4
path before running the script.
Notes for 3d makers:
- You have two ways to animate the sequence:
- Use the SMPL add-on and the theta parameters saved to
sample##_rep##_smpl_params.npy
(we always use beta=0 and the gender-neutral model). - A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations.
Since the OBJs are not preserving vertices order, we also save this data to the
sample##_rep##_smpl_params.npy
file for your convenience.
- Use the SMPL add-on and the theta parameters saved to
ETA - Nov 22
HumanML3D
python -m train.train_mdm --save_dir save/my_humanml_trans_enc_512 --dataset humanml
KIT
python -m train.train_mdm --save_dir save/my_kit_trans_enc_512 --dataset kit
- Use
--device
to define GPU id. - Use
--arch
to choose one of the architectures reported in the paper{trans_enc, trans_dec, gru}
(trans_enc
is default). - Add
--train_platform_type {ClearmlPlatform, TensorboardPlatform}
to track results with either ClearML or Tensorboard. - Add
--eval_during_training
to run a short (90 minutes) evaluation for each saved checkpoint. This will slow down training but will give you better monitoring.
- Takes about 20 hours (on a single GPU)
- The output of this script is provided in the checkpoints zip file.
HumanML3D
python -m eval.eval_humanml --model_path ./save/humanml_trans_enc_512/model000475000.pt
KIT
python -m eval.eval_humanml --model_path ./save/kit_trans_enc_512/model000400000.pt
This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:
guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.