[ICCV-2023] The official repo for the paper "LivelySpeaker: Towards Semantic-aware Co-Speech Gesture Generation"
conda create -n livelyspeaker python=3.7
conda activate livelyspeaker
pip install -r requirements.txt
Prepare TED following TriModel
and link it to ./datasets/ted_dataset
.
ln -s path_to_ted ./datasets/ted_dataset
Prepare BEAT following BEAT and modify the data path in ./scripts_beat.yaml/configs/beat.yaml
ln -s path_to_ted ./datasets/ted_dataset
Train RAG
python scripts/train_RAG.py --exp RAG -b 512
Test RAG
python scripts/test_RAG_ted.py --model_path ckpts/TED/RAG.pt
Test LivelySpeaker
python scripts/test_LivelySpeaker_ted.py.py --model_path ckpts/TED/RAG.pt
Train RAG
python train_RAG.py -c ./configs/beat.yaml --exp beat --epochs 1501
Test RAG
python test_LivelySpeaker_beat.py --model_path ckpts/BEAT/RAG.pt -c configs/beat.yaml
Test LivelySpeaker
python test_RAG_beat.py --model_path ckpts/BEAT/RAG.pt -c configs/beat.yaml
We provide all checkpoints at here. Download and link it to ./ckpts
.
To run the FGD evaluation of TED. You should first download of Encoder weights from TriModal
We build our code base from: MotionCLIP, MDM, TriModal , BEAT.
@InProceedings{Zhi_2023_ICCV,
author = {Zhi, Yihao and Cun, Xiaodong and Chen, Xuelin and Shen, Xi and Guo, Wen and Huang, Shaoli and Gao, Shenghua},
title = {LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {20807-20817}
}