The source code for the paper:
FaceFormer: Speech-Driven 3D Facial Animation with Transformers. CVPR 2022 [PDF]
- Ubuntu 18.04.1
- Python 3.7
- Pytorch 1.9.0
Check the required packages in requirements.txt
.
- transformers
- librosa
- trimesh
- opencv-python
- pyrender
- MPI-IS/mesh
Request the VOCASET data from the https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy
, raw_audio_fixed.pkl
, templates.pkl
and subj_seq_to_idx.pkl
in vocaset/VOCASET/
. Download "FLAME_sample.ply" from voca and put it in VOCASET/templates
.
- Read the vertices/audio data and convert them to .npy/.wav files stored in
vocaset/VOCASET/vertices_npy
andvocaset/VOCASET/wav
.
cd vocaset
python process_voca_data.py
- To animate a mesh given an audio signal, download the pretrained model and put it in the folder
vocaset/VOCASET
, run:
cd vocaset
python demo.py --wav_path "VOCASET/demo/wav/test.wav"
- To train the model and obtain the results on the testing set, run:
cd vocaset
python main.py
The results will be available in the vocaset/VOCASET/result
folder, and the models will be stored in the vocaset/VOCASET/save
folder.
- To visualize the results, run:
cd vocaset
python render.py
The rendered videos will be available in the vocaset/VOCASET/output
folder.
Request the dataset from Biwi 3D Audiovisual Corpus of Affective Communication. The dataset contains the following subfolders:
-
'faces' contains the binary (.vl) files for the tracked facial geometries.
-
'rigid_scans' contains the templates stored as .obj files.
-
'audio' contains audio signals stored as .wav files.
Place the folders 'faces' and 'rigid_scans' in BIWI_data
and place the wav files in BIWI_data/wav
.
- To animate a mesh given an audio signal, download the pretrained model and put it in the folder
biwi/BIWI_data/
, run:
cd biwi
python demo.py --wav_path "BIWI_data/demo/wav/test.wav"
- (to do) Read the geometry data and convert them to .npy files stored in
biwi/BIWI_data/vertices_npy
.
- To train the model and obtain the results on testing set, run:
cd biwi
python main.py
The results will be available in the biwi/BIWI_data/result
folder, and the models will be stored in the biwi/BIWI_data/save
folder.
- To visualize the results, run:
cd biwi
python render.py
The rendered videos will be available in the biwi/BIWI_data/output
folder.
If you find this code useful for your work, please consider citing:
@inproceedings{faceformer2022,
title={FaceFormer: Speech-Driven 3D Facial Animation with Transformers},
author={Fan, Yingruo and Lin, Zhaojiang and Saito, Jun and Wang, Wenping and Komura, Taku},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
We gratefully acknowledge ETHZ-CVL for providing the B3D(AC)2 database and MPI-IS for releasing the VOCASET dataset. The implementation of wav2vec2 is built upon huggingface-transformers, and the temporal bias is modified from ALiBi. We use MPI-IS/mesh for mesh processing and VOCA/rendering for rendering. We thank the authors for their great works.
Any third-party packages and data are owned by their respective authors and must be used under their respective licenses.