Disentangled Imputed Video autoEncoder (DIVE)

Code for NeurIPS 2020 paper titled Learning Disentangled Representations of Video with Missing Data, by Armand Comas, Chi Zhang, Zlatan Feric, Octavia Camps and Rose Yu.

Missing data poses significant challenges while learning representations of video sequences. We present DIVE, a deep generative model that imputes and predicts future video frames in the presence of missing data. Specifically, DIVE contributes by:

Introducing a missingness latent variable,
Disentangling the hidden video representations into static and dynamic appearance, pose, and missingness factors for each object and
Imputing each object trajectory where data is missing. This is done in an end-to-end training fashion and with using only self-supervision, by leveraging a VAE framework.

Poster, Slides and video

The poster is the file with name "NeurIPS_DIVE_Poster_v3.pdf". Slides and video will be available soon.

Demo

Following we provide details to run both datasets presented in our paper. If you encounter a problem please feel free to contact the authors.

Deformed and Missing-Data Moving MNIST

We prepare the code to be trained and tested for Scenario 3 of the experiments for Moving MNIST which includes:

Out-of-scene (fully occluded) digits for 1 time-step
Varying appearance following an elastic transformation, which is reduced linearly in time from very severe (alpha = 100) to inexistent. All arguments are set by default for this scenario, but they can be changed at the users convenience. To better understand them, use the help command "-h" or refer to the "config.py" file. For this experiment, missing labels are set to be soft.

A qualitative example of the expected results after 600 epochs (about 100k iterations) is:

Expected quantitative results can be found in the paper. For Scenario 1, set the argument crop_size to [64, 32] and set the flag use_crop_size to True in the config.py file. For both Scenarios 1 and 2, fix the flag with_var_appearance to False and ini-et-alpha to 0.

After setting up the environment, we can train and test the code with:

cd DIVE/
python3 train.py

and

python3 eval.py

To view the results (while training in this case), execute the following commands:

cd tensorboard/ckpt/moving_mnist/dive/
tensorboard --logdir train_log --port 6006

Installation

This codebase is trained and tested with Python 3.6+, Pytorch 1.2.0+ and Cuda 10.0+. We use tensorboardX 2.0. for visualization purposes. We make use of Pyro 0.2 as our framework for probabilitstic programming. To better understand our model, we encourage the reader to browse their Examples and Tutorials.

Moving MNIST

Download MNIST dataset from by running:

cd DIVE
mkdir moving_mnist
cd moving_mnist
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

For this demo we don't provide our fixed test set. Instead we use an on-the-fly generated set for testing.

MOTSChallenge Pedestrian

Download the MOTSChallenge pre-processed dataset here. Allocate it in the directory that you find convenient under a folder named after dset_name. Since the code is set for MovingMNIST experiments, some changes will have to be made to the config.py file. We set image_size=[256, 256], crop_size=[256, 256], n_frames_output=5, dset_dir: data directory, dset_name: "pedestrian", num_missing: 1, num_objects: 3, n_components: 3, hidden_size: 96, stn_scale_prior: 3.5, batch_size = 32 and gamma_switch_step = 5e3.

We use as the backbone of our implementation the available implementation for Decompositional Disentangled Predictive Auto-Encoder (DDPAE), as well as many of its functions, while it is built on Pyro.

Citation

If you find this repository useful in your research, please cite our paper:

@article{Comas2020Dive,
  title={Learning Disentangled Representations of Video with Missing Data},
  author={Armand Comas Massague and Chi Zhang and Zlatan Feric and O. Camps and R. Yu},
  journal={Advances in neural information processing systems},
  year={2020},
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
models		models
tensorboard		tensorboard
utils		utils
LICENSE		LICENSE
NeurIPS_DIVE_Poster_v3.pdf		NeurIPS_DIVE_Poster_v3.pdf
README.md		README.md
__init__.py		__init__.py
config.py		config.py
eval.py		eval.py
example_image.png		example_image.png
example_image_varying_MNIST.png		example_image_varying_MNIST.png
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disentangled Imputed Video autoEncoder (DIVE)

Poster, Slides and video

Demo

Deformed and Missing-Data Moving MNIST

Installation

Moving MNIST

MOTSChallenge Pedestrian

Citation

About

Releases

Packages

Languages

License

Sanju-Xaviar/DIVE

Folders and files

Latest commit

History

Repository files navigation

Disentangled Imputed Video autoEncoder (DIVE)

Poster, Slides and video

Demo

Deformed and Missing-Data Moving MNIST

Installation

Moving MNIST

MOTSChallenge Pedestrian

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages