Skip to content

Commit b70112f

Browse files
authored
Update README.md
1 parent a76e266 commit b70112f

File tree

1 file changed

+1
-153
lines changed

1 file changed

+1
-153
lines changed

README.md

+1-153
Original file line numberDiff line numberDiff line change
@@ -1,157 +1,5 @@
11
# MakeItTalk: Speaker-Aware Talking-Head Animation
22

3-
This is the code repository implementing the paper:
4-
5-
> **MakeItTalk: Speaker-Aware Talking-Head Animation**
6-
>
7-
> [Yang Zhou](https://people.umass.edu/~yangzhou),
8-
> [Xintong Han](http://users.umiacs.umd.edu/~xintong/),
9-
> [Eli Shechtman](https://research.adobe.com/person/eli-shechtman),
10-
> [Jose Echevarria](http://www.jiechevarria.com) ,
11-
> [Evangelos Kalogerakis](https://people.cs.umass.edu/~kalo/),
12-
> [Dingzeyu Li](https://dingzeyu.li)
13-
>
14-
> SIGGRAPH Asia 2020
15-
>
16-
> **Abstract** We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures.
17-
In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.
18-
>
19-
> [[Project page]](https://people.umass.edu/~yangzhou/MakeItTalk/)
20-
> [[Paper]](https://people.umass.edu/~yangzhou/MakeItTalk/MakeItTalk_SIGGRAPH_Asia_Final_round-5.pdf)
21-
> [[Video]](https://www.youtube.com/watch?v=OU6Ctzhpc6s)
22-
> [[Arxiv]](https://arxiv.org/abs/2004.12992)
23-
> [[Colab Demo]](quick_demo.ipynb)
24-
> [[Colab Demo TDLR]](quick_demo_tdlr.ipynb)
25-
26-
![img](doc/teaser.png)
27-
28-
Figure. Given an audio speech signal and a single portrait image as input (left), our model generates speaker-aware talking-head animations (right).
29-
Both the speech signal and the input face image are not observed during the model training process.
30-
Our method creates both non-photorealistic cartoon animations (top) and natural human face videos (bottom).
31-
32-
## Updates
33-
34-
- [x] facewarp source code and compile instructions
35-
- [x] Pre-trained models
36-
- [x] Google colab quick demo for natural faces [[detail]](quick_demo.ipynb) [[TDLR]](quick_demo_tdlr.ipynb)
37-
- [ ] Training code for each module
38-
- [ ] Customized puppet creating tool
39-
40-
## Requirements
41-
- Python environment 3.6
42-
```
43-
conda create -n makeittalk_env python=3.6
44-
conda activate makeittalk_env
45-
```
46-
- ffmpeg (https://ffmpeg.org/download.html)
47-
```
48-
sudo apt-get install ffmpeg
49-
```
50-
- python packages
51-
```
52-
pip install -r requirements.txt
53-
```
54-
- `winehq-stable` for cartoon face warping in Ubuntu (https://wiki.winehq.org/Ubuntu). Tested on Ubuntu16.04, wine==5.0.3.
55-
```
56-
sudo dpkg --add-architecture i386
57-
wget -nc https://dl.winehq.org/wine-builds/winehq.key
58-
sudo apt-key add winehq.key
59-
sudo apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'
60-
sudo apt update
61-
sudo apt install --install-recommends winehq-stable
62-
```
63-
64-
## Pre-trained Models
65-
66-
Download the following pre-trained models to `examples/ckpt` folder for testing your own animation.
67-
68-
| Model | Link to the model |
69-
| :-------------: | :---------------: |
70-
| Voice Conversion | [Link](https://drive.google.com/file/d/1ZiwPp_h62LtjU0DwpelLUoodKPR85K7x/view?usp=sharing) |
71-
| Speech Content Module | [Link](https://drive.google.com/file/d/1r3bfEvTVl6pCNw5xwUhEglwDHjWtAqQp/view?usp=sharing) |
72-
| Speaker-aware Module | [Link](https://drive.google.com/file/d/1rV0jkyDqPW-aDJcj7xSO6Zt1zSXqn1mu/view?usp=sharing) |
73-
| Image2Image Translation Module | [Link](https://drive.google.com/file/d/1i2LJXKp-yWKIEEgJ7C6cE3_2NirfY_0a/view?usp=sharing) |
74-
| Non-photorealistic Warping (.exe) | [Link](https://drive.google.com/file/d/1rlj0PAUMdX8TLuywsn6ds_G6L63nAu0P/view?usp=sharing) |
75-
76-
## Animate You Portraits!
77-
78-
- Download pre-trained embedding [[here]](https://drive.google.com/file/d/18-0CYl5E6ungS3H4rRSHjfYvvm-WwjTI/view?usp=sharing) and save to `examples/dump` folder.
79-
80-
### _Nature Human Faces / Paintings_
81-
82-
- crop your portrait image into size `256x256` and put it under `examples` folder with `.jpg` format.
83-
Make sure the head is almost in the middle (check existing examples for a reference).
84-
85-
- put test audio files under `examples` folder as well with `.wav` format.
86-
87-
- animate!
88-
89-
```
90-
python main_end2end.py --jpg <portrait_file>
913
```
92-
93-
- use addition args `--amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos>`
94-
to amply lip motion (in x/y-axis direction) and head motion displacements, default values are `<x>=2., <y>=2., <pos>=.5`
95-
96-
97-
98-
### _Cartoon Faces_
99-
100-
- put test audio files under `examples` folder as well with `.wav` format.
101-
102-
- animate one of the existing puppets
103-
104-
| Puppet Name | wilk | roy | sketch | color | cartoonM | danbooru1 |
105-
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
106-
| Image | ![img](examples_cartoon/wilk_fullbody.jpg) | ![img](examples_cartoon/roy_full.png) | ![img](examples_cartoon/sketch.png) | ![img](examples_cartoon/color.jpg) | ![img](examples_cartoon/cartoonM.png) | ![img](examples_cartoon/danbooru1.jpg) |
107-
108-
```
109-
python main_end2end_cartoon.py --jpg <cartoon_puppet_name_with_extension> --jpg_bg <puppet_background_with_extension>
4+
python demo.py
1105
```
111-
112-
- `--jpg_bg` takes a same-size image as the background image to create the animation, such as the puppet's body, the overall fixed background image. If you want to use the background, make sure the puppet face image (i.e. `--jpg` image) is in `png` format and is transparent on the non-face area. If you don't need any background, please also create a same-size image (e.g. a pure white image) to hold the argument place.
113-
114-
- use addition args `--amp_lip_x <x> --amp_lip_y <y> --amp_pos <pos>`
115-
to amply lip motion (in x/y-axis direction) and head motion displacements, default values are `<x>=2., <y>=2., <pos>=.5`
116-
117-
- create your own puppets (ToDo...)
118-
119-
## Train
120-
121-
### Train Voice Conversion Module
122-
Todo...
123-
124-
### Train Content Branch
125-
- Create dataset root directory `<root_dir>`
126-
127-
- Dataset: Download preprocessed dataset [[here]](https://drive.google.com/drive/folders/1EwuAy3j1b9Zc1MsidUfxG_pJGc_cV60O?usp=sharing), and put it under `<root_dir>/dump`.
128-
129-
- Train script: Run script below. Models will be saved in `<root_dir>/ckpt/<train_instance_name>`.
130-
131-
```shell script
132-
python main_train_content.py --train --write --root_dir <root_dir> --name <train_instance_name>
133-
```
134-
135-
### Train Speaker-Aware Branch
136-
Todo...
137-
138-
### Train Image-to-Image Translation
139-
140-
Todo...
141-
142-
## [License](LICENSE.md)
143-
144-
## Acknowledgement
145-
146-
We would like to thank Timothy Langlois for the narration, and
147-
[Kaizhi Qian](https://scholar.google.com/citations?user=uEpr4C4AAAAJ&hl=en)
148-
for the help with the [voice conversion module](https://auspicious3000.github.io/icassp-2020-demo/).
149-
We thank [Jakub Fiser](https://research.adobe.com/person/jakub-fiser/) for implementing the real-time GPU version of the triangle morphing algorithm.
150-
We thank Daichi Ito for sharing the caricature image and Dave Werner
151-
for Wilk, the gruff but ultimately lovable puppet.
152-
153-
This research is partially funded by NSF (EAGER-1942069)
154-
and a gift from Adobe. Our experiments were performed in the
155-
UMass GPU cluster obtained under the Collaborative Fund managed
156-
by the MassTech Collaborative.
157-

0 commit comments

Comments
 (0)