The official PyTorch implementation of the paper "Human Motion Diffusion as a Generative Prior"(ArXiv).
Please visit our webpage for more details.
Training | Generation | Evaluation | |
---|---|---|---|
DoubleTake (long motion) | ✅ | ✅ | ✅ |
ComMDM (two-person) | ✅ | ✅ | ✅ |
Fine-tuned motion control | ✅ | ✅ | ✅ |
📢 29/Apr/2023 - Evaluation release of the long-motions scripts, including both datasets (BABEL & HumanML3D) - please check the updated readme.
📢 25/Apr/2023 - Full release of the fine-tuned motion control scripts.
📢 14/Apr/2023 - First release - DoubleTake/ComMDM - Training and generation with pre-trained models is available.
This code was tested on Ubuntu 18.04.5 LTS
and requires:
- Python 3.8
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
For windows use this instead.
Setup conda env:
conda env create -f environment.yml
conda activate PriorMDM
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git
PriorMDM share most of its dependencies with the original MDM. If you already have an installed MDM from the official repo, you can save time and link the dependencies instead of getting them from scratch.
If you already have an installed MDM
Link from installed MDM
Before running the following bash script, first change the path to the full path to your installed MDM
bash prepare/link_mdm.sh
First time user
Download dependencies:
bash prepare/download_smpl_files.sh
bash prepare/download_glove.sh
bash prepare/download_t2m_evaluators.sh
Get HumanML3D dataset (For all applications):
Follow the instructions in HumanML3D, then copy the result dataset to our repository:
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
DoubleTake (Long sequences)
BABEL dataset
Download the processed version here, and place it at ./dataset/babel
Download the following for evaluation here, and place is at ./dataset/babel
SMPLH dependencies
Download here, and place it at ./bodymodels
ComMDM (two-person)
3DPW dataset
For ComMDM, we cleaned 3DPW and converted it to HumanML3D format.
Download the processed version here, and place it at ./dataset/3dpw
Fine-tuned motion control - No extra dependencies.
Download the model(s) you wish to use, then unzip and place it in ./save/
.
DoubleTake (long motions)
- my_humanml-encoder-512 (This is a reproduction of MDM best model without any changes)
- Babel_TrasnEmb_GeoLoss
ComMDM (two-person)
- pw3d_text (for text-to-motion)
- pw3d_prefix (for prefix completion)
Fine-tuned motion control
- root_horizontal_control (Finetuned the base model for 80,000 steps on (horizontal part of) root control objective)
- left_wrist_control (Finetuned the base model for 80,000 steps on left wrist control objective)
- right_foot_control (Finetuned the base model for 80,000 steps on right foor control objective)
DoubleTake (long motions)
Reproduce random text prompts:
python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_samples 4 --handshake_size 20 --blend_len 10
Reproduce out of text file:
python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_text_example.txt
Reproduce out of csv file (can determine each sequence length):
python -m sample.double_take --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --handshake_size 20 --blend_len 10 --input_text ./assets/dt_csv_example.csv
It will look something like this:
ComMDM (two-person)
Text-to-Motion
Reproduce paper text prompts:
python -m sample.two_person_text2motion --model_path ./save/pw3d_text/model000100000.pt --input_text ./assets/two_person_text_prompts.txt
It will look something like this:
Prefix completion
Complete unseen motion prefixes:
python -m sample.two_person_prefix_completion --model_path ./save/pw3d_prefix/model000050000.pt
It will look something like this:
Blue frames are the input prefix and orange frames are the generated completion.
Visualize dataset
Unfortunately, 3DPW dataset is not clean, even after our process. To get samples of it run:
python -m sample.two_person_text2motion --model_path ./save/humanml_trans_enc_512/model000200000.pt --sample_gt
Fine-tuned motion control
Horizontal Root Control
Sample the horizontal part of the root trajectory from the test set of HumanML3D, and generate a motion with the given trajectory (note that the vertical part of the trajectory is predicted by the model). To make the generation unconditioned on text we add --guidance_param 0
.
python -m sample.finetuned_motion_control --model_path save/root_horizontal_finetuned/model000280000.pt --guidance_param 0
It will look something like this:
Use --show_input
if you wish to plot the motion from which the control features were taken from.
Add a text condition with --text_condition
. Note that by default, we use classifier-free-guidance with scale of 2.5.
python -m sample.finetuned_motion_control --model_path save/root_horizontal_finetuned/model000280000.pt --text_condition "a person is raising hands"
Left Wrist Control
Sample the relative trajectory of the left wrist w.r.t the root trajectory from the test set of HumanML3D, and generate a motion with the given left wrist relative trajectory. To make the generation unconditioned on text we add --guidance_param 0
.
python -m sample.finetuned_motion_control --model_path save/left_wrist_finetuned/model000280000.pt --guidance_param 0
It will look something like this:
Add a text condition with --text_condition
. Note that by default, we use classifier-free-guidance with scale of 2.5.
python -m sample.finetuned_motion_control --model_path save/left_wrist_finetuned/model000280000.pt --text_condition "a person is walking in a circle"
You may also define:
--device
id.--seed
to sample different prompts.--motion_length
(text-to-motion only) in seconds (maximum is 9.8[sec]).
Running those will get you:
results.npy
file with text prompts and xyz positions of the generated animationsample##_rep##.mp4
- a stick figure animation for each generated motion.
To create SMPL mesh per frame run:
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
This script outputs:
sample##_rep##_smpl_params.npy
- SMPL parameters (thetas, root translations, vertices and faces)sample##_rep##_obj
- Mesh per frame in.obj
format.
Notes:
- The
.obj
can be integrated into Blender/Maya/3DS-MAX and rendered using them. - This script is running SMPLify and needs GPU as well (can be specified with the
--device
flag). - Important - Do not change the original
.mp4
path before running the script.
Notes for 3d makers:
- You have two ways to animate the sequence:
- Use the SMPL add-on and the theta parameters saved to
sample##_rep##_smpl_params.npy
(we always use beta=0 and the gender-neutral model). - A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations.
Since the OBJs are not preserving vertices order, we also save this data to the
sample##_rep##_smpl_params.npy
file for your convenience.
- Use the SMPL add-on and the theta parameters saved to
DoubleTake (long motions)
HumanML3D best model Retraining HumanML3D is not needed as we use the original trained model from MDM. Yet, for completeness this repository supports this training as well:
python -m train.train_mdm --save_dir save/my_humanML_bestmodel --dataset humanml
Babel best model
python -m train.train_mdm --save_dir ./save/my_Babel_TrasnEmb_GeoLoss --dataset babel --latent_dim 512 --batch_size 64 --diffusion_steps 1000 --num_steps 10000000 --min_seq_len 45 --max_seq_len 250 --lambda_rcxyz 1.0 --lambda_fc 1.0 --lambda_vel 1.0
ComMDM (two-person)
Text-to-Motion
Download the pretrained model for text-to-motion training from here and place it in ./save/
. Then train with:
python -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512/model000200000.pt --multi_train_mode text --multi_train_splits train,validation --save_dir ./save/my_pw3d_text
Prefix Completion
Download the pretrained model for prefix training from here and place it in ./save/
. Then train with:
python -m train.train_mdm_multi --pretrained_path ./save/humanml_trans_enc_512_prefix_finetune/model000330000.pt --multi_train_mode prefix --save_dir ./save/my_pw3d_prefix --save_interval 10000
Finetuned Motion Control
Train a model for left wrist control from scratch on HumanML3D dataset.
python -m train.train_mdm_motion_control --save_dir save/left_wrist_finetuned --dataset humanml --inpainting_mask left_wrist
Finetune a base model for left wrist control on HumanML3D dataset. We advise setting --save_interval
to 10,000 to have it saved more frequently, as this is a finetune and not training from scratch.
python -m train.train_mdm_motion_control --save_dir save/left_wrist_finetuned --dataset humanml --inpainting_mask left_wrist --resume_checkpoint save/humanml_trans_enc_512/model000200000.pt --save_interval 10_000
- Use
--device
to define GPU id. - Add
--train_platform_type {ClearmlPlatform, TensorboardPlatform}
to track results with either ClearML or Tensorboard. - Add
--eval_during_training
to run a short evaluation for each saved checkpoint. This will slow down training but will give you better monitoring.
DoubleTake (long motions)
To reproduce humanML3D evaluation over the motion run:
python -m eval.eval_multi --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_unfoldings 2 --handshake_size 20 --transition_margins 40 --eval_on motion --blend_len 10
To reproduce humanML3D evaluation over the transiton run:
python -m eval.eval_multi --model_path ./save/my_humanml_trans_enc_512/model000200000.pt --num_unfoldings 2 --handshake_size 20 --transition_margins 40 --eval_on transition --blend_len 10
To reproduce BABEL evaluation over the motion run:
python -m eval.eval_multi --model_path ./save/Babel_TrasnEmb_GeoLoss//model001250000.pt --num_unfoldings 2 --cropping_sampler --handshake_size 30 --transition_margins 40 --eval_on motion --blend_len 10
To reproduce BABEL evaluation over the transiton run:
python -m eval.eval_multi --model_path ./save/Babel_TrasnEmb_GeoLoss//model001250000.pt --num_unfoldings 2 --cropping_sampler --handshake_size 30 --transition_margins 40 --eval_on transition --blend_len 10
ComMDM (two-person)
The reported evaluation for prefix completion is in ./save/pw3d_prefix/eval_prefix_pw3d_paper_results_000240000_wo_mm_1000samples.log
.
To reproduce evaluation run:
python -m eval.eval_multi --model_path ./save/pw3d_prefix/model000240000.pt
Fine-tuned motion control
Evaluate the motion control models on the horizontal part of trajectories sampled from the test set of HumanML3D dataset.
python -m eval.eval_finetuned_motion_control --model_path save/root_horizontal_finetuned/model000280000.pt --replication_times 10
This code should produce a file named eval_humanml_root_horizontal_finetuned_000280000_gscale2.5_mask_root_horizontal_wo_mm.log
, or generally:
eval_humanml\_<model_name>\_gscale<guidance_free_scale>\_mask\_<name_of_control_features>_<evaluation_mode>.log
This code is standing on the shoulders of giants. We want to thank the following contributors that our code is based on:
MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, TEACH.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.