Skip to content

Commit

Permalink
First commit - model + sampling
Browse files Browse the repository at this point in the history
  • Loading branch information
GuyTevet committed Oct 6, 2022
1 parent 9aad53c commit 14d0f79
Show file tree
Hide file tree
Showing 53 changed files with 8,170 additions and 3 deletions.
195 changes: 192 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,193 @@
# motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
# MDM: Human Motion Diffusion Model

## Coming soon...
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanact12)](https://paperswithcode.com/sota/motion-synthesis-on-humanact12?p=human-motion-diffusion-model)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=human-motion-diffusion-model)
[![arXiv](https://img.shields.io/badge/arXiv-<2209.14916>-<COLOR>.svg)](https://arxiv.org/abs/2209.14916)

The official PyTorch implementation of the paper [**"Human Motion Diffusion Model"**](https://arxiv.org/abs/2209.14916).

Please visit our [**webpage**](https://guytevet.github.io/mdm-page/) for more details.

![teaser](https://github.com/GuyTevet/mdm-page/raw/main/static/figures/github.gif)

#### Bibtex
If you find this code useful in your research, please cite:

```
@article{tevet2022human,
title={Human Motion Diffusion Model},
author={Tevet, Guy and Raab, Sigal and Gordon, Brian and Shafir, Yonatan and Bermano, Amit H and Cohen-Or, Daniel},
journal={arXiv preprint arXiv:2209.14916},
year={2022}
}
```

## Getting started

This code was tested on `Ubuntu 18.04.5 LTS` and requires:

* Python 3.7
* conda3 or miniconda3
* CUDA capable GPU (one is enough)

### 1. Setup environment

Install ffmpeg (if not already installed):

```shell
sudo apt update
sudo apt install ffmpeg
```
For windows use [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) instead.

Setup conda env:
```shell
conda env create -f environment.yml
conda activate mdm
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
```

Download SMPL body model by running this script:

```bash
bash prepare/download_smpl_files.sh
```
This will download the SMPL neutral model from this [**github repo**](https://github.com/classner/up/blob/master/models/3D/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl) and additional files.



### 2. Get data

There are two paths to get the data:

(a) **Go the easy way if** you just want to generate text-to-motion (excluding editing which does require motion capture data)

(b) **Get full data** to train and evaluate the model.


#### a. The easy way (text only)

**HumanML3D** - Clone HumanML3D, then copy the data dir to our repository:

```shell
cd ..
git clone https://github.com/EricGuo5513/HumanML3D.git
unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
cp -r HumanML3D/HumanML3D motion-diffusion-model/dataset/HumanML3D
cd motion-diffusion-model
```


#### b. Full data (text + motion capture)

**HumanML3D** - Follow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git),
then copy the result dataset to our repository:

```shell
cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
```

**KIT** - Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git) (no processing needed this time) and the place result in `./dataset/KIT-ML`


### 3. Download the pretrained models

Download the model(s) you wish to use, then unzip and place it in `./save/`. **For text-to-motion, you need only the first one.**

**HumanML3D**

[humanml-encoder-512](https://drive.google.com/file/d/1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821/view?usp=sharing) (best model)

[humanml-decoder-512](https://drive.google.com/file/d/1q3soLadvVh7kJuJPd2cegMNY2xVuVudj/view?usp=sharing)

[humanml-decoder-with-emb-512](https://drive.google.com/file/d/1GnsW0K3UjuOkNkAWmjrGIUmeDDZrmPE5/view?usp=sharing)

**KIT**

[kit-encoder-512](https://drive.google.com/file/d/1SHCRcE0es31vkJMLGf9dyLe7YsWj7pNL/view?usp=sharing)

## Generate text-to-motion

### Generate from test set prompts

```shell
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --num_samples 10 --num_repetitions 3
```

### Generate from your text file

```shell
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --input_text ./assets/example_text_prompts.txt
```

### Generate a single prompt

```shell
python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."
```

**You can also define:**
* `--device` id.
* `--seed` to sample different prompts.
* `--motion_length` in seconds (maximum is 9.8[sec]).

**Running those will get you:**

* `results.npy` file with text prompts and xyz positions of the generated animation
* `sample##_rep##.mp4` - a stick figure animation for each generated motion.

It will look something like this:

![example](assets/example_stick_fig.gif)

You can stop here, or render the SMPL mesh using the following script.

### Render SMPL mesh

To create SMPL mesh per frame run:

```shell
python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
```

**This script outputs:**
* `sample##_rep##_smpl_params.npy` - SMPL parameters (thetas, root translations, vertices and faces)
* `sample##_rep##_obj` - Mesh per frame in `.obj` format.

**Notes:**
* The `.obj` can be integrated into Blender/Maya/3DS-MAX and rendered using them.
* This script is running [SMPLify](https://smplify.is.tue.mpg.de/) and needs GPU as well (can be specified with the `--device` flag).
* **Important** - Do not change the original `.mp4` path before running the script.

**Notes for 3d makers:**
* You have two ways to animate the sequence:
1. Use the [SMPL add-on](https://smpl.is.tue.mpg.de/index.html) and the theta parameters saved to `sample##_rep##_smpl_params.npy` (we always use beta=0 and the gender-neutral model).
1. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations.
Since the OBJs are not preserving vertices order, we also save this data to the `sample##_rep##_smpl_params.npy` file for your convenience.

### Editing

ETA - Nov 22

## Train your own MDM

ETA - end of Oct 22

## Evaluate

ETA - Nov 22



## Acknowledgments

This code is standing on the shoulders of giants. We want to thank the following contributors
that our code is based on:

[guided-diffusion](https://github.com/openai/guided-diffusion), [MotionCLIP](https://github.com/GuyTevet/MotionCLIP), [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [actor](https://github.com/Mathux/ACTOR), [joints2smpl](https://github.com/wangsen1312/joints2smpl).

## License
This code is distributed under an [MIT LICENSE](LICENSE).

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.
Binary file added assets/example_stick_fig.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions assets/example_text_prompts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
person got down and is crawling across the floor.
a person walks forward with wide steps.
a person drops their hands then brings them together in front of their face clasped.
a person lifts their right arm and slaps something, then repeats the motion again.
a person walks forward and stops.
a person marches forward, turns around, and then marches back.
a person is stretching their arms.
person is making attention gesture
3 changes: 3 additions & 0 deletions body_models/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Body models

Put SMPL models here (full instractions in the main README)
52 changes: 52 additions & 0 deletions data_loaders/get_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
from torch.utils.data import DataLoader
from data_loaders.tensors import collate as all_collate
from data_loaders.tensors import t2m_collate

def get_dataset_class(name):
if name == "amass":
from .amass import AMASS
return AMASS
elif name == "uestc":
from .uestc import UESTC
return UESTC
elif name == "humanact12":
from .humanact12poses import HumanAct12Poses
return HumanAct12Poses
elif name == "humanml":
from data_loaders.humanml.data.dataset import HumanML3D
return HumanML3D
elif name == "kit":
from data_loaders.humanml.data.dataset import KIT
return KIT
else:
raise ValueError(f'Unsupported dataset name [{name}]')

def get_collate_fn(name, hml_mode='train'):
if hml_mode == 'gt':
from data_loaders.humanml.data.dataset import collate_fn as t2m_eval_collate
return t2m_eval_collate
if name in ["humanml", "kit"]:
return t2m_collate
else:
return all_collate


def get_dataset(name, num_frames, split='train', hml_mode='train'):
DATA = get_dataset_class(name)
if name in ["humanml", "kit"]:
dataset = DATA(split=split, num_frames=num_frames, mode=hml_mode)
else:
dataset = DATA(split=split, num_frames=num_frames)
return dataset


def get_dataset_loader(name, batch_size, num_frames, split='train', hml_mode='train'):
dataset = get_dataset(name, num_frames, split, hml_mode)
collate = get_collate_fn(name, hml_mode)

loader = DataLoader(
dataset, batch_size=batch_size, shuffle=True, #(split == 'train'),
num_workers=8, drop_last=True, collate_fn=collate
)

return loader
1 change: 1 addition & 0 deletions data_loaders/humanml/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This code is based on https://github.com/EricGuo5513/text-to-motion.git
Loading

0 comments on commit 14d0f79

Please sign in to comment.