Figure 1. Illustration of the proposed CADDY model for playable video generation.
Playable Video Generation
Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci
ArXiv
Paper: arXiv: Coming soon
Website
Live Demo
Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety.
Given a set of completely unlabeled videos, we jointly learn a set of discrete actions and a video generation model conditioned on the learned actions. At test time, the user can control the generated video on-the-fly providing action labels as if he or she was playing a videogame. We name our method CADDY. Our architecture for unsupervised playable video generation is composed by several components. An encoder E extracts frame representations from the input sequence. A temporal model estimates the successive states using a recurrent dynamics network R and an action network A which predicts the action label corresponding to the current action performed in the input sequence. Finally, a decoder D reconstructs the input frames. The model is trained using reconstruction as the main driving loss.
The complete environment for execution can be installed with:
conda env create -f env.yml
conda activate video-generation
Build the docker image
docker build -t video-generation:1.0 .
Run the docker image. Mount the root directory to /video-generation
in the docker container:
docker run -it --gpus all --ipc=host -v /path/to/directory/video-generation:/video-generation video-generation:1.0 /bin/bash
Please create the following directories in the root of the project:
results
checkpoints
data
Datasets can be downloaded at the following link: Google Drive
- Breakout: Coming soon
- BAIR: bair_256_ours.tar.gz
- Tennis: Coming soon
Please extract them under the data
folder
Pretrained models can be downloaded at the following link: Google Drive
Please place the directories under the checkpoints
folder
After downloading the checkpoints, the models can be played with the following commands:
-
Bair:
python play.py --config configs/01_bair.yaml
-
Breakout:
python play.py configs/breakout/02_breakout.yaml
-
Tennis:
python play.py --config configs/03_tennis.yaml
The models can be trained with the following commands:
python train.py --config configs/<config_file>
Multi-gpu support is active by default. Runs can be logged through Weights and Biases by running before execution of the training command:
wandb init
Evaluation requires two steps. First, an evaluation dataset must be build. Second, evaluation is carried our on the evaluation dataset. To build the evaluation dataset please issue:
python build_evaluation_dataset.py --config configs/<config_file>
To run evaluation issue:
python evaluate_dataset.py --config configs/evaluation/configs/<config_file>
Evaluation results are saved under the evaluation_results
directory.