First commit - model + sampling

vwelve · Oct 6, 2022 · 14d0f79 · 14d0f79
1 parent 9aad53c
commit 14d0f79
Show file tree

Hide file tree

Showing 53 changed files with 8,170 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,193 @@
-# motion-diffusion-model
-The official PyTorch implementation of the paper "Human Motion Diffusion Model"
+# MDM: Human Motion Diffusion Model
 
-## Coming soon...
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanact12)](https://paperswithcode.com/sota/motion-synthesis-on-humanact12?p=human-motion-diffusion-model)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/human-motion-diffusion-model/motion-synthesis-on-humanml3d)](https://paperswithcode.com/sota/motion-synthesis-on-humanml3d?p=human-motion-diffusion-model)
+[![arXiv](https://img.shields.io/badge/arXiv-<2209.14916>-<COLOR>.svg)](https://arxiv.org/abs/2209.14916)
+
+The official PyTorch implementation of the paper [**"Human Motion Diffusion Model"**](https://arxiv.org/abs/2209.14916).
+
+Please visit our [**webpage**](https://guytevet.github.io/mdm-page/) for more details.
+
+![teaser](https://github.com/GuyTevet/mdm-page/raw/main/static/figures/github.gif)
+
+#### Bibtex
+If you find this code useful in your research, please cite:
+
+```
+@article{tevet2022human,
+  title={Human Motion Diffusion Model},
+  author={Tevet, Guy and Raab, Sigal and Gordon, Brian and Shafir, Yonatan and Bermano, Amit H and Cohen-Or, Daniel},
+  journal={arXiv preprint arXiv:2209.14916},
+  year={2022}
+}
+```
+
+## Getting started
+
+This code was tested on `Ubuntu 18.04.5 LTS` and requires:
+
+* Python 3.7
+* conda3 or miniconda3
+* CUDA capable GPU (one is enough)
+
+### 1. Setup environment
+
+Install ffmpeg (if not already installed):
+
+```shell
+sudo apt update
+sudo apt install ffmpeg
+```
+For windows use [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) instead.
+
+Setup conda env:
+```shell
+conda env create -f environment.yml
+conda activate mdm
+python -m spacy download en_core_web_sm
+pip install git+https://github.com/openai/CLIP.git
+```
+
+Download SMPL body model by running this script:
+
+```bash
+bash prepare/download_smpl_files.sh
+```
+This will download the SMPL neutral model from this [**github repo**](https://github.com/classner/up/blob/master/models/3D/basicModel_neutral_lbs_10_207_0_v1.0.0.pkl) and additional files.
+
+
+
+### 2. Get data
+
+There are two paths to get the data:
+
+(a) **Go the easy way if** you just want to generate text-to-motion (excluding editing which does require motion capture data)
+
+(b) **Get full data** to train and evaluate the model.
+
+
+#### a. The easy way (text only)
+
+**HumanML3D** - Clone HumanML3D, then copy the data dir to our repository:
+
+```shell
+cd ..
+git clone https://github.com/EricGuo5513/HumanML3D.git
+unzip ./HumanML3D/HumanML3D/texts.zip -d ./HumanML3D/HumanML3D/
+cp -r HumanML3D/HumanML3D motion-diffusion-model/dataset/HumanML3D
+cd motion-diffusion-model
+```
+
+
+#### b. Full data (text + motion capture)
+
+**HumanML3D** - Follow the instructions in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git),
+then copy the result dataset to our repository:
+
+```shell
+cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
+```
+
+**KIT** - Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git) (no processing needed this time) and the place result in `./dataset/KIT-ML`
+
+
+### 3. Download the pretrained models
+
+Download the model(s) you wish to use, then unzip and place it in `./save/`. **For text-to-motion, you need only the first one.** 
+
+**HumanML3D**
+
+[humanml-encoder-512](https://drive.google.com/file/d/1PE0PK8e5a5j-7-Xhs5YET5U5pGh0c821/view?usp=sharing) (best model)
+
+[humanml-decoder-512](https://drive.google.com/file/d/1q3soLadvVh7kJuJPd2cegMNY2xVuVudj/view?usp=sharing)
+
+[humanml-decoder-with-emb-512](https://drive.google.com/file/d/1GnsW0K3UjuOkNkAWmjrGIUmeDDZrmPE5/view?usp=sharing)
+
+**KIT**
+
+[kit-encoder-512](https://drive.google.com/file/d/1SHCRcE0es31vkJMLGf9dyLe7YsWj7pNL/view?usp=sharing)
+
+## Generate text-to-motion
+
+### Generate from test set prompts
+
+```shell
+python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --num_samples 10 --num_repetitions 3
+```
+
+### Generate from your text file
+
+```shell
+python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --input_text ./assets/example_text_prompts.txt
+```
+
+### Generate a single prompt
+
+```shell
+python -m sample --model_path ./save/humanml_trans_enc_512/model000200000.pt --text_prompt "the person walked forward and is picking up his toolbox."
+```
+
+**You can also define:**
+* `--device` id.
+* `--seed` to sample different prompts.
+* `--motion_length` in seconds (maximum is 9.8[sec]).
+
+**Running those will get you:**
+
+* `results.npy` file with text prompts and xyz positions of the generated animation
+* `sample##_rep##.mp4` - a stick figure animation for each generated motion.
+
+It will look something like this:
+
+![example](assets/example_stick_fig.gif)
+
+You can stop here, or render the SMPL mesh using the following script.
+
+### Render SMPL mesh
+
+To create SMPL mesh per frame run:
+
+```shell
+python -m visualize.render_mesh --input_path /path/to/mp4/stick/figure/file
+```
+
+**This script outputs:**
+* `sample##_rep##_smpl_params.npy` - SMPL parameters (thetas, root translations, vertices and faces)
+* `sample##_rep##_obj` - Mesh per frame in `.obj` format.
+
+**Notes:**
+* The `.obj` can be integrated into Blender/Maya/3DS-MAX and rendered using them.
+* This script is running [SMPLify](https://smplify.is.tue.mpg.de/) and needs GPU as well (can be specified with the `--device` flag).
+* **Important** - Do not change the original `.mp4` path before running the script.
+
+**Notes for 3d makers:**
+* You have two ways to animate the sequence:
+  1. Use the [SMPL add-on](https://smpl.is.tue.mpg.de/index.html) and the theta parameters saved to `sample##_rep##_smpl_params.npy` (we always use beta=0 and the gender-neutral model).
+  1. A more straightforward way is using the mesh data itself. All meshes have the same topology (SMPL), so you just need to keyframe vertex locations. 
+     Since the OBJs are not preserving vertices order, we also save this data to the `sample##_rep##_smpl_params.npy` file for your convenience.
+
+### Editing
+
+ETA - Nov 22
+
+## Train your own MDM
+
+ETA - end of Oct 22
+
+## Evaluate
+
+ETA - Nov 22
+
+
+
+## Acknowledgments
+
+This code is standing on the shoulders of giants. We want to thank the following contributors
+that our code is based on:
+
+[guided-diffusion](https://github.com/openai/guided-diffusion), [MotionCLIP](https://github.com/GuyTevet/MotionCLIP), [text-to-motion](https://github.com/EricGuo5513/text-to-motion), [actor](https://github.com/Mathux/ACTOR), [joints2smpl](https://github.com/wangsen1312/joints2smpl).
+
+## License
+This code is distributed under an [MIT LICENSE](LICENSE).
+
+Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.
diff --git a/assets/example_stick_fig.gif b/assets/example_stick_fig.gif
diff --git a/assets/example_text_prompts.txt b/assets/example_text_prompts.txt
@@ -0,0 +1,8 @@
+person got down and is crawling across the floor.
+a person walks forward with wide steps.
+a person drops their hands then brings them together in front of their face clasped.
+a person lifts their right arm and slaps something, then repeats the motion again.
+a person walks forward and stops.
+a person marches forward, turns around, and then marches back.
+a person is stretching their arms.
+person is making attention gesture
diff --git a/body_models/README.md b/body_models/README.md
@@ -0,0 +1,3 @@
+## Body models
+
+Put SMPL models here (full instractions in the main README)
diff --git a/data_loaders/get_data.py b/data_loaders/get_data.py
@@ -0,0 +1,52 @@
+from torch.utils.data import DataLoader
+from data_loaders.tensors import collate as all_collate
+from data_loaders.tensors import t2m_collate
+
+def get_dataset_class(name):
+    if name == "amass":
+        from .amass import AMASS
+        return AMASS
+    elif name == "uestc":
+        from .uestc import UESTC
+        return UESTC
+    elif name == "humanact12":
+        from .humanact12poses import HumanAct12Poses
+        return HumanAct12Poses
+    elif name == "humanml":
+        from data_loaders.humanml.data.dataset import HumanML3D
+        return HumanML3D
+    elif name == "kit":
+        from data_loaders.humanml.data.dataset import KIT
+        return KIT
+    else:
+        raise ValueError(f'Unsupported dataset name [{name}]')
+
+def get_collate_fn(name, hml_mode='train'):
+    if hml_mode == 'gt':
+        from data_loaders.humanml.data.dataset import collate_fn as t2m_eval_collate
+        return t2m_eval_collate
+    if name in ["humanml", "kit"]:
+        return t2m_collate
+    else:
+        return all_collate
+
+
+def get_dataset(name, num_frames, split='train', hml_mode='train'):
+    DATA = get_dataset_class(name)
+    if name in ["humanml", "kit"]:
+        dataset = DATA(split=split, num_frames=num_frames, mode=hml_mode)
+    else:
+        dataset = DATA(split=split, num_frames=num_frames)
+    return dataset
+
+
+def get_dataset_loader(name, batch_size, num_frames, split='train', hml_mode='train'):
+    dataset = get_dataset(name, num_frames, split, hml_mode)
+    collate = get_collate_fn(name, hml_mode)
+
+    loader = DataLoader(
+        dataset, batch_size=batch_size, shuffle=True,  #(split == 'train'),
+        num_workers=8, drop_last=True, collate_fn=collate
+    )
+
+    return loader
diff --git a/data_loaders/humanml/README.md b/data_loaders/humanml/README.md
@@ -0,0 +1 @@
+This code is based on https://github.com/EricGuo5513/text-to-motion.git
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## Body models

		Put SMPL models here (full instractions in the main README)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		This code is based on https://github.com/EricGuo5513/text-to-motion.git