VPTR: Efficient Transformers for Video Prediction

Video future frames prediction based on Transformers. Accepted by ICPR2022, http://arxiv.org/abs/2203.15836

The overall framework for video prediction.

Fully autoregressive (left) and non-autoregressive VPTR (right).

Pretrained-models

Download the checkpoints from here: https://polymtlca0-my.sharepoint.com/:f:/g/personal/xi_ye_polymtl_ca/EuxjSddJ7wNIsiSTOfB-u7AB7qQhP5H0iX2a5mbaowiSZw?e=IEj1bd

See Test_AutoEncoder.ipynb and Test_VPTR.ipynb for the detatiled test functions.

Training

Stage 1: train_AutoEncoder.py

Train the autoencoder firstly, save the ckpt, load it for stage 2

Stage 2: Train Transformer for the video prediction

train_FAR.py: Fully autoregressive model
train_FAR_mp.py: multiple gpu training (single machine)
train_NAR.py: Non-autoregressive model
train_NAR_mp.py: multiple gpu training (single machine)

Dataset folder structure

/MovingMNIST
     moving-mnist-train.npz
     moving-mnist-test.npz
     moving-mnist-val.npz

/KTH
     boxing/
         person01_boxing_d1/
            image_0001.png
            image_0002.png
            ...
         person01_boxing_d2/
            image_0001.png
            image_0002.png
            ...

     handclapping/
         ...
     handwaving/
         ...
     jogging_no_empty/
         ...
     running_no_empty/
         ...
     walking_no_empty/
         ...

/BAIR
     test/
         example_0/
            0000.png
            0001.png
            ...
         example_1/
            0000.png
            0001.png
            ...
         example_...
     train/
         example_0/
            0000.png
            0001.png
            ...
         example_...

Citing

Please cite the paper if you find our work is helpful.

@article{ye_2022,
       title = {VPTR: Efficient Transformers for Video Prediction},
       author = {Xi Ye and Guillaume-Alexandre Bilodeau},
       journal={arXiv preprint arXiv:2203.15836},
       year={2022}
}

Correction about the paper

Recently, we found a mistake in our ICPR paper. For the BAIR experiments, the previous papers predict 28 future frames instead of 10. Specifically, the results in "TABLE II: Results on BAIR" are for 10 future frames instead of 28. The results for 28 predicted frames are updated here, see the following correct table.

We apologize for the mistake, the correction does not affect our conclusions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VPTR: Efficient Transformers for Video Prediction

Pretrained-models

Training

Stage 1: train_AutoEncoder.py

Stage 2: Train Transformer for the video prediction

Dataset folder structure

Citing

Correction about the paper

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
Test_AutoEncoder.ipynb		Test_AutoEncoder.ipynb
Test_VPTR.ipynb		Test_VPTR.ipynb
requirements.txt		requirements.txt
train_AutoEncoder.py		train_AutoEncoder.py
train_FAR.py		train_FAR.py
train_FAR_mp.py		train_FAR_mp.py
train_NAR.py		train_NAR.py
train_NAR_mp.py		train_NAR_mp.py

License

May-226/VPTR

Folders and files

Latest commit

History

Repository files navigation

VPTR: Efficient Transformers for Video Prediction

Pretrained-models

Training

Stage 1: train_AutoEncoder.py

Stage 2: Train Transformer for the video prediction

Dataset folder structure

Citing

Correction about the paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages