Skip to content

Commit 0a8d780

Browse files
authored
Merge pull request adobe-research#5 from yzhou359/main
Merge updates from Yang's forked repo
2 parents f2eaeea + 1e2041a commit 0a8d780

File tree

92 files changed

+6815
-1300
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+6815
-1300
lines changed

.idea/.gitignore

+2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/MakeItTalk.iml

+14
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/deployment.xml

+22
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/profiles_settings.xml

+6
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

+7
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/remote-mappings.xml

+16
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

+6
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

+124-44
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,141 @@
1-
# MakeItTalk
1+
# MakeItTalk: Speaker-Aware Talking-Head Animation
22

3+
This is the code repository implementing the paper:
34

4-
## Required packages
5-
- ffmpeg and ffmpeg-python (version >= 0.2.0)
6-
- pynormalize
7-
- pytorch
5+
> **MakeItTalk: Speaker-Aware Talking-Head Animation**
6+
>
7+
> [Yang Zhou](https://people.umass.edu/~yangzhou),
8+
> [Xintong Han](http://users.umiacs.umd.edu/~xintong/),
9+
> [Eli Shechtman](https://research.adobe.com/person/eli-shechtman),
10+
> [Jose Echevarria](http://www.jiechevarria.com) ,
11+
> [Evangelos Kalogerakis](https://people.cs.umass.edu/~kalo/),
12+
> [Dingzeyu Li](https://dingzeyu.li)
13+
>
14+
> SIGGRAPH Asia 2020
15+
>
16+
> **Abstract** We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures.
17+
In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.
18+
>
19+
> [[Project page]](https://people.umass.edu/~yangzhou/MakeItTalk/)
20+
> [[Paper]](https://people.umass.edu/~yangzhou/MakeItTalk/MakeItTalk_SIGGRAPH_Asia_Final_round-5.pdf)
21+
> [[Video]](https://www.youtube.com/watch?v=OU6Ctzhpc6s)
22+
> [[Arxiv]](https://arxiv.org/abs/2004.12992)
23+
> [[Colab Demo]](quick_demo.ipynb)
24+
> [[Colab Demo TDLR]](quick_demo_tdlr.ipynb)
825
26+
![img](doc/teaser.png)
927

10-
## How to use
28+
Figure. Given an audio speech signal and a single portrait image as input (left), our model generates speaker-aware talking-head animations (right).
29+
Both the speech signal and the input face image are not observed during the model training process.
30+
Our method creates both non-photorealistic cartoon animations (top) and natural human face videos (bottom).
1131

12-
### Step 1. Git clone
32+
## Updates
1333

14-
### Step 2. Create root directory
15-
- create root directory ```ROOT_DIR```
16-
- add sub folders ```ckpt```, ```dump```, ```nn_result```, ```puppets```, ```raw_wav```, ```test_wav_files``` to it.
17-
18-
### Step 3. Import pre-trained model and demo Wilk files
19-
- put pre-trained face expression model under ```ROOT_DIR/ckpt/BEST_CONTENT_MODEL/ckpt_best_model.pth```
20-
- put pre-trained face pose model under ```ROOT_DIR/ckpt/BEST_POSE_MODEL/ckpt_last_epoch.pth```
21-
- put wilk demo files ```wilk_face_close_mouth.txt``` and ```wilk_face_open_mouth.txt``` under ```puppets```
34+
- [x] Pre-trained models
35+
- [x] Google colab quick demo for natural faces [[detail]](quick_demo.ipynb) [[TDLR]](quick_demo_tdlr.ipynb)
36+
- [ ] Training code for each module
37+
- [ ] Customized puppet creating tool
2238

23-
### Step 4. Import your test audio wav file
24-
- put your test audio file like ```example.wav``` under ```test_wav_files``` folder
25-
26-
### Step 5. Run Talking Toon model
27-
- change the ```ROOT_DIR``` in ```main_sneak_demo.py``` line 10 to your own ```ROOT_DIR```
28-
- run
39+
## Requirements
40+
- Python environment 3.6
2941
```
30-
python main_sneak_demo.py
42+
conda create -n makeittalk_env python=3.6
43+
conda activate makeittalk_env
3144
```
32-
- its process has 3 steps in details:
33-
- create input data for network from your test audio file
34-
- run Talking Toon neural network to get the predicted facial landmarks
35-
- post process output files into real image scale for later image morphing
36-
37-
- its outputs are under ```ROOT_DIR/nn_result/sneak_demo```
38-
- raw facial landmark prediction visualization mp4 file, i.e. ```*_pos_EVAL_av.mp4```
39-
- a folder with your test audio name, containing 3 required files for later image morphing
40-
- ```reference_points.txt```
41-
- ```triangulation.txt```
42-
- ```warped_points.txt```
43-
44-
### Step 6. Image morphing (through Jakub's code)
45-
- rebuild Jakub's code with my updated ```dingwarp.cpp```
46-
- copy 3 required files to Jakub's code directory ```dingwarp/test/```
47-
- run ```test_win.bat``` or do with normal cmd commands.
48-
- run ``final_ffmpeg_combine.bat`` like this
45+
- ffmpeg (https://ffmpeg.org/download.html)
4946
```
50-
>> final_ffmpeg_combine.bat [YOUR_TEST_AUDIO_FILE_DIR] [OUTPUT_VIDEO_NAME]
47+
sudo apt-get install ffmpeg
5148
```
52-
for exmaple
49+
- python packages
5350
```
54-
>> final_ffmpeg_combine.bat E:\TalkingToon\test_wav_files\example.wav output.mp4
51+
pip install -r requirements.txt
5552
```
5653

54+
## Pre-trained Models
55+
56+
Download the following pre-trained models to `examples/ckpt` folder for testing your own animation.
57+
58+
| Model | Link to the model |
59+
| :-------------: | :---------------: |
60+
| Voice Conversion | [Link](https://drive.google.com/file/d/1ZiwPp_h62LtjU0DwpelLUoodKPR85K7x/view?usp=sharing) |
61+
| Speech Content Module | [Link](https://drive.google.com/file/d/1r3bfEvTVl6pCNw5xwUhEglwDHjWtAqQp/view?usp=sharing) |
62+
| Speaker-aware Module | [Link](https://drive.google.com/file/d/1rV0jkyDqPW-aDJcj7xSO6Zt1zSXqn1mu/view?usp=sharing) |
63+
| Image2Image Translation Module | [Link](https://drive.google.com/file/d/1i2LJXKp-yWKIEEgJ7C6cE3_2NirfY_0a/view?usp=sharing) |
64+
| Non-photorealistic Warping (.exe) | [Link](https://drive.google.com/file/d/1rlj0PAUMdX8TLuywsn6ds_G6L63nAu0P/view?usp=sharing) |
65+
66+
## Animate You Portraits!
67+
68+
- Download pre-trained embedding [[here]](https://drive.google.com/file/d/18-0CYl5E6ungS3H4rRSHjfYvvm-WwjTI/view?usp=sharing) and save to `examples/dump` folder.
69+
70+
### _Nature Human Faces / Paintings_
71+
72+
- crop your portrait image into size `256x256` and put it under `examples` folder with `.jpg` format.
73+
Make sure the head is almost in the middle (check existing examples for a reference).
74+
75+
- put test audio files under `examples` folder as well with `.wav` format.
76+
77+
- animate!
78+
79+
```
80+
python main_end2end.py --jpg <portrait_file>
81+
```
82+
83+
- use addition args `--amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos>`
84+
to amply lip motion (in x/y-axis direction) and head motion displacements, default values are `<x>=2., <y>=2., <pos>=.5`
85+
86+
87+
88+
### _Cartoon Faces_
89+
90+
- put test audio files under `examples` folder as well with `.wav` format.
91+
92+
- animate one of the existing puppets
93+
94+
| Puppet Name | wilk | roy | sketch | color | cartoonM | danbooru1 |
95+
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
96+
| Image | ![img](examples_cartoon/wilk_fullbody.jpg) | ![img](examples_cartoon/roy_full.png) | ![img](examples_cartoon/sketch.png) | ![img](examples_cartoon/color.jpg) | ![img](examples_cartoon/cartoonM.png) | ![img](examples_cartoon/danbooru1.jpg) |
97+
98+
```
99+
python main_end2end_cartoon.py --jpg <cartoon_puppet_name>
100+
```
101+
102+
- create your own puppets (ToDo...)
103+
104+
## Train
105+
106+
### Train Voice Conversion Module
107+
Todo...
108+
109+
### Train Content Branch
110+
- Create dataset root directory `<root_dir>`
111+
112+
- Dataset: Download preprocessed dataset [[here]](https://drive.google.com/drive/folders/1EwuAy3j1b9Zc1MsidUfxG_pJGc_cV60O?usp=sharing), and put it under `<root_dir>/dump`.
113+
114+
- Train script: Run script below. Models will be saved in `<root_dir>/ckpt/<train_instance_name>`.
115+
116+
```shell script
117+
python main_train_content.py --train --write --root_dir <root_dir> --name <train_instance_name>
118+
```
119+
120+
### Train Speaker-Aware Branch
121+
Todo...
122+
123+
### Train Image-to-Image Translation
124+
125+
Todo...
126+
127+
## [License](LICENSE.md)
128+
129+
## Acknowledgement
57130

58-
# [License](LICENSE.md)
131+
We would like to thank Timothy Langlois for the narration, and
132+
[Kaizhi Qian](https://scholar.google.com/citations?user=uEpr4C4AAAAJ&hl=en)
133+
for the help with the [voice conversion module](https://auspicious3000.github.io/icassp-2020-demo/). We
134+
thank Daichi Ito for sharing the caricature image and Dave Werner
135+
for Wilk, the gruff but ultimately lovable puppet.
59136

137+
This research is partially funded by NSF (EAGER-1942069)
138+
and a gift from Adobe. Our experiments were performed in the
139+
UMass GPU cluster obtained under the Collaborative Fund managed
140+
by the MassTech Collaborative.
60141

61-

cartoon_vis.py

-104
This file was deleted.

doc/__init__.py

Whitespace-only changes.

doc/teaser.png

898 KB
Loading

examples_cartoon/danbooru1.jpg

733 KB
Loading

examples_cartoon/danbooru1_anno.jpg

698 KB
Loading

0 commit comments

Comments
 (0)