This is the official PyTorch implementation of the CVPR 2020 paper "G3AN: Disentangling Appearance and Motion for Video Generation"
- Python 3.6
- cuda 9.2
- cudnn 7.1
- PyTorch 1.4+
- scikit-video
- tensorboard
- moviepy
- PyAV
You can download the original UvA-NEMO datest from https://www.uva-nemo.org/ and use https://github.com/1adrianb/face-alignment to crop face regions. We also provide our preprocessed version here.
Download the G3AN pretrained model on UvA-NEMO from here.
- For sampling NUM videos and saving them under ./demos/EXP_NAME
python demo_random.py --model_path $MODEL_PATH --n $NUM --demo_name $EXP_NAME
- For sampling N appearances with M motions and saving them under ./demos/EXP_NAME
python demo_nxm.py --model_path $MODEL_PATH --n_za_test $N --n_zm_test $M --demo_name $EXP_NAME
- For sampling N appearances with different video lengthes (9 different video lengthes) and saving them under ./demos/EXP_NAME
python demo_multilength.py --model_path $MODEL_PATH --n_za_test $N --demo_name $EXP_NAME
python train.py --data_path $DATASET --exp_name $EXP_NAME
- Generate 5000 videos for evaluation, save them in $GEN_PATH
python generate_videos.py --gen_path $GEN_PATH
- Move into evaluation folder
cd evaluation
Download feature extractor resnext-101-kinetics.pth from here to the current folder. Pre-computed UvA_NEMO dataset stats can be found in stats/uva.npz. If you would like to compute it youeself, save all the training videos in $UVA_PATH and run
python precalc_stats.py --data_path $UVA_PATH
To compute FID
python fid.py $GEN_PATH stats/uva.npz
You can obtain FID around 80 ~ 83 (better than reported number on the paper) by evaluating provided model. Here I improve the original video discriminator by using a (2+1)D ConvNets instead of 3D ConvNets.
- Unconditional Generation
- Evaluation
- Conditional Generation
If you find this code useful for your research, please consider citing our paper:
@InProceedings{Wang_2020_CVPR,
author = {Wang, Yaohui and Bilinski, Piotr and Bremond, Francois and Dantcheva, Antitza},
title = {{G3AN}: Disentangling Appearance and Motion for Video Generation},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
Part of the evaluation code is adapted from evan. I have moved most of the operations from CPU into GPU to accelerate the computation. We thank authors for their contribution to the community.