AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations
Huawei Wei, Zejun Yang, Zhisheng Wang
Tencent Games Zhiji, Tencent
TODO
We Recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:
pip install -r requirements.txt
We will upload them to huggingface soon!
All the weights should be placed under the ./pretrained_weights
direcotry. You can download weights manually as follows:
-
Download our trained weights, which include four parts:
denoising_unet.pth
,reference_unet.pth
,pose_guider.pth
,motion_module.pth
,audio2mesh.pt
andaudio2pose.pt
. -
Download pretrained weight of based models and other components:
-
Download dwpose weights (
dw-ll_ucoco_384.onnx
,yolox_l.onnx
) following this.
Finally, these weights should be orgnized as follows:
./pretrained_weights/
|-- DWPose
| |-- dw-ll_ucoco_384.onnx
| `-- yolox_l.onnx
|-- image_encoder
| |-- config.json
| `-- pytorch_model.bin
|-- audio2mesh.pt
|-- audio2pose.pt
|-- denoising_unet.pth
|-- motion_module.pth
|-- pose_guider.pth
|-- reference_unet.pth
|-- sd-vae-ft-mse
| |-- config.json
| |-- diffusion_pytorch_model.bin
| `-- diffusion_pytorch_model.safetensors
|-- stable-diffusion-v1-5
| |-- feature_extractor
| | `-- preprocessor_config.json
| |-- model_index.json
| |-- unet
| | |-- config.json
| | `-- diffusion_pytorch_model.bin
| `-- v1-inference.yaml
`-- wav2vec2-base-960h
Note: If you have installed some of the pretrained models, such as StableDiffusion V1.5
, you can specify their paths in the config file (e.g. ./config/prompts/animation.yaml
).
Here is the cli command for running inference scripts:
python -m scripts.pose2vid --config ./configs/prompts/animation.yaml -W 512 -H 512 -L 64
You can refer the format of animation.yaml to add your own reference images or pose videos. To convert the raw video into a pose video (keypoint sequence), you can run with the following command:
python -m scripts.vid2pose --video_path pose_video_path.mp4
python -m scripts.vid2vid --config ./configs/prompts/animation_facereenac.yaml -W 512 -H 512 -L 64
Add source face videos and reference images in the animation_facereenac.yaml.
python -m scripts.audio2vid --config ./configs/prompts/animation_audio.yaml -W 512 -H 512 -L 64
Add audios and reference images in the animation_audio.yaml.
Comming soon!
@misc{wei2024aniportrait,
title={AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animations},
author={Huawei Wei and Zejun Yang and Zhisheng Wang},
year={2024},
eprint={*},
archivePrefix={arXiv},
primaryClass={cs.CV}
}