forked from GuyTevet/motion-diffusion-model
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
53 changed files
with
8,170 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,193 @@ | ||
# motion-diffusion-model | ||
The official PyTorch implementation of the paper "Human Motion Diffusion Model" | ||
# MDM: Human Motion Diffusion Model | ||
|
||
## Coming soon... | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanact12)](https://paperswithcode.com/sota/motion-synthesis-on-humanact12?p=human-motion-diffusion-model) | ||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=human-motion-diffusion-model) | ||
[![arXiv](https://img.shields.io/badge/arXiv-<2209.14916>-<COLOR>.svg)](https://arxiv.org/abs/2209.14916) | ||
|
||
The official PyTorch implementation of the paper [**"Human Motion Diffusion Model"**](https://arxiv.org/abs/2209.14916). | ||
|
||
Please visit our [**webpage**](https://guytevet.github.io/mdm-page/) for more details. | ||
|
||
![teaser](https://github.com/GuyTevet/mdm-page/raw/main/static/figures/github.gif) | ||
|
||
#### Bibtex | ||
If you find this code useful in your research, please cite: | ||
|
||
``` | ||
@article{tevet2022human, | ||
title={Human Motion Diffusion Model}, | ||
author={Tevet, Guy and Raab, Sigal and Gordon, Brian and Shafir, Yonatan and Bermano, Amit H and Cohen-Or, Daniel}, | ||
journal={arXiv preprint arXiv:2209.14916}, | ||
year={2022} | ||
} | ||
``` | ||
|
||
## Getting started | ||
|
||
This code was tested on `Ubuntu 18.04.5 LTS` and requires: | ||
|
||
* Python 3.7 | ||
* conda3 or miniconda3 | ||
* CUDA capable GPU (one is enough) | ||
|
||
### 1. Setup environment | ||
|
||
Install ffmpeg (if not already installed): | ||
|
||
```shell | ||
sudo apt update | ||
sudo apt install ffmpeg | ||
``` | ||
For windows use [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) instead. | ||
|
||
Setup conda env: | ||
```shell | ||
conda env create -f environment.yml | ||
conda activate mdm | ||
python -m spacy download en_core_web_sm | ||
pip install git+https://github.com/openai/CLIP.git | ||
``` | ||
|
||
Download SMPL body model by running this script: | ||
|
||
```bash | ||
bash prepare/download_smpl_files.sh | ||
``` | ||
This will download the SMPL neutral model from this [**github repo**](https://github.com/classner/up/blob/master/models/3D/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl) and additional files. | ||
|
||
|
||
|
||
### 2. Get data | ||
|
||
There are two paths to get the data: | ||
|
||
(a) **Go the easy way if** you just want to generate text-to-motion (excluding editing which does require motion capture data) | ||
|
||
(b) **Get full data** to train and evaluate the model. | ||
|
||
|
||
#### a. The easy way (text only) | ||
|
||
**HumanML3D** - Clone HumanML3D, then copy the data dir to our repository: | ||
|
||
```shell | ||
cd .. | ||
git clone https://github.com/EricGuo5513/HumanML3D.git | ||
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/ | ||
cp -r HumanML3D/HumanML3D motion-diffusion-model/dataset/HumanML3D | ||
cd motion-diffusion-model | ||
``` | ||
|
||
|
||
#### b. Full data (text + motion capture) | ||
|
||
**HumanML3D** - Follow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), | ||
then copy the result dataset to our repository: | ||
|
||
```shell | ||
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D | ||
``` | ||
|
||
**KIT** - Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git) (no processing needed this time) and the place result in `./dataset/KIT-ML` | ||
|
||
|
||
### 3. Download the pretrained models | ||
|
||
Download the model(s) you wish to use, then unzip and place it in `./save/`. **For text-to-motion, you need only the first one.** | ||
|
||
**HumanML3D** | ||
|
||
[humanml-encoder-512](https://drive.google.com/file/d/1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821/view?usp=sharing) (best model) | ||
|
||
[humanml-decoder-512](https://drive.google.com/file/d/1q3soLadvVh7kJuJPd2cegMNY2xVuVudj/view?usp=sharing) | ||
|
||
[humanml-decoder-with-emb-512](https://drive.google.com/file/d/1GnsW0K3UjuOkNkAWmjrGIUmeDDZrmPE5/view?usp=sharing) | ||
|
||
**KIT** | ||
|
||
[kit-encoder-512](https://drive.google.com/file/d/1SHCRcE0es31vkJMLGf9dyLe7YsWj7pNL/view?usp=sharing) | ||
|
||
## Generate text-to-motion | ||
|
||
### Generate from test set prompts | ||
|
||
```shell | ||
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --num_samples 10 --num_repetitions 3 | ||
``` | ||
|
||
### Generate from your text file | ||
|
||
```shell | ||
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --input_text ./assets/example_text_prompts.txt | ||
``` | ||
|
||
### Generate a single prompt | ||
|
||
```shell | ||
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox." | ||
``` | ||
|
||
**You can also define:** | ||
* `--device` id. | ||
* `--seed` to sample different prompts. | ||
* `--motion_length` in seconds (maximum is 9.8[sec]). | ||
|
||
**Running those will get you:** | ||
|
||
* `results.npy` file with text prompts and xyz positions of the generated animation | ||
* `sample##_rep##.mp4` - a stick figure animation for each generated motion. | ||
|
||
It will look something like this: | ||
|
||
![example](assets/example_stick_fig.gif) | ||
|
||
You can stop here, or render the SMPL mesh using the following script. | ||
|
||
### Render SMPL mesh | ||
|
||
To create SMPL mesh per frame run: | ||
|
||
```shell | ||
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file | ||
``` | ||
|
||
**This script outputs:** | ||
* `sample##_rep##_smpl_params.npy` - SMPL parameters (thetas, root translations, vertices and faces) | ||
* `sample##_rep##_obj` - Mesh per frame in `.obj` format. | ||
|
||
**Notes:** | ||
* The `.obj` can be integrated into Blender/Maya/3DS-MAX and rendered using them. | ||
* This script is running [SMPLify](https://smplify.is.tue.mpg.de/) and needs GPU as well (can be specified with the `--device` flag). | ||
* **Important** - Do not change the original `.mp4` path before running the script. | ||
|
||
**Notes for 3d makers:** | ||
* You have two ways to animate the sequence: | ||
1. Use the [SMPL add-on](https://smpl.is.tue.mpg.de/index.html) and the theta parameters saved to `sample##_rep##_smpl_params.npy` (we always use beta=0 and the gender-neutral model). | ||
1. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations. | ||
Since the OBJs are not preserving vertices order, we also save this data to the `sample##_rep##_smpl_params.npy` file for your convenience. | ||
|
||
### Editing | ||
|
||
ETA - Nov 22 | ||
|
||
## Train your own MDM | ||
|
||
ETA - end of Oct 22 | ||
|
||
## Evaluate | ||
|
||
ETA - Nov 22 | ||
|
||
|
||
|
||
## Acknowledgments | ||
|
||
This code is standing on the shoulders of giants. We want to thank the following contributors | ||
that our code is based on: | ||
|
||
[guided-diffusion](https://github.com/openai/guided-diffusion), [MotionCLIP](https://github.com/GuyTevet/MotionCLIP), [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [actor](https://github.com/Mathux/ACTOR), [joints2smpl](https://github.com/wangsen1312/joints2smpl). | ||
|
||
## License | ||
This code is distributed under an [MIT LICENSE](LICENSE). | ||
|
||
Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
person got down and is crawling across the floor. | ||
a person walks forward with wide steps. | ||
a person drops their hands then brings them together in front of their face clasped. | ||
a person lifts their right arm and slaps something, then repeats the motion again. | ||
a person walks forward and stops. | ||
a person marches forward, turns around, and then marches back. | ||
a person is stretching their arms. | ||
person is making attention gesture |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
## Body models | ||
|
||
Put SMPL models here (full instractions in the main README) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
from torch.utils.data import DataLoader | ||
from data_loaders.tensors import collate as all_collate | ||
from data_loaders.tensors import t2m_collate | ||
|
||
def get_dataset_class(name): | ||
if name == "amass": | ||
from .amass import AMASS | ||
return AMASS | ||
elif name == "uestc": | ||
from .uestc import UESTC | ||
return UESTC | ||
elif name == "humanact12": | ||
from .humanact12poses import HumanAct12Poses | ||
return HumanAct12Poses | ||
elif name == "humanml": | ||
from data_loaders.humanml.data.dataset import HumanML3D | ||
return HumanML3D | ||
elif name == "kit": | ||
from data_loaders.humanml.data.dataset import KIT | ||
return KIT | ||
else: | ||
raise ValueError(f'Unsupported dataset name [{name}]') | ||
|
||
def get_collate_fn(name, hml_mode='train'): | ||
if hml_mode == 'gt': | ||
from data_loaders.humanml.data.dataset import collate_fn as t2m_eval_collate | ||
return t2m_eval_collate | ||
if name in ["humanml", "kit"]: | ||
return t2m_collate | ||
else: | ||
return all_collate | ||
|
||
|
||
def get_dataset(name, num_frames, split='train', hml_mode='train'): | ||
DATA = get_dataset_class(name) | ||
if name in ["humanml", "kit"]: | ||
dataset = DATA(split=split, num_frames=num_frames, mode=hml_mode) | ||
else: | ||
dataset = DATA(split=split, num_frames=num_frames) | ||
return dataset | ||
|
||
|
||
def get_dataset_loader(name, batch_size, num_frames, split='train', hml_mode='train'): | ||
dataset = get_dataset(name, num_frames, split, hml_mode) | ||
collate = get_collate_fn(name, hml_mode) | ||
|
||
loader = DataLoader( | ||
dataset, batch_size=batch_size, shuffle=True, #(split == 'train'), | ||
num_workers=8, drop_last=True, collate_fn=collate | ||
) | ||
|
||
return loader |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This code is based on https://github.com/EricGuo5513/text-to-motion.git |
Oops, something went wrong.