Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Action Recognition Action Anticipation		Action Recognition Action Anticipation
Auxiliary Annotation/Polygon RNN		Auxiliary Annotation/Polygon RNN
Video Future Prediction/Deep MCnet		Video Future Prediction/Deep MCnet
doc		doc
README.md		README.md

Repository files navigation

Deep RNN Framework

This is the code for the paper Deep RNN Framework for Visual Sequential Applications by Bo Pang, Kaiwen Zha, Hanwen Cao, Chen Shi, and Cewu Lu.

Please follow the instructions to run the code.

Overview

Deep-RNN Framework is a RNN framework for high-dimensional sequential tasks and in this repository we focus on the visualtasks. The deep-RNN framework achieves more than 11% relative improvements over shallow RNN models on Kinetics, UCF-101, and HMDB-51 for video classification. For auxiliary annotation, after replacing the shallow RNN part of Polygon-RNN with our 15-layer deep RBM, the performance improves by 14.7%. For video future prediction, our deep RNN improves the state-of-the-art shallow model's performance by 2.4% on PSNR and SSIM.

Action Recognition and Anticipation

Results

Results on backbone supported models:

	UCF-101		HMDB-51
	Recognition	Anticipation	Recognition	Anticipation
1-layer LSTM	71.1	30.6	36.0	18.8
15-layer ConvLSTM	68.9	49.6	34.2	27.6
1-layer RBM	65.3	28.4	34.3	16.9
15-layer RBM	79.8	57.7	40.2	32.1

Action recognition results on standalone RNN models:

Architecture	Kinetics	UCF-101	HMDB-51
Shallow LSTM with Backbone	53.9	86.8	49.7
C3D	56.1	79.9	49.4
Two-Stream	62.8	93.8	64.3
3D-Fused	62.3	91.5	66.5
Deep RBM without Backbone	60.2	91.9	61.7

Usage

Model with Backbone

Dependencies:
- Python 2.7
- Pytorch 0.4
- torchvision
- Numpy
- Pillow
- tqdm

Download UCF101 and HMDB and organize the image files (from the videos) as follows:

Dataset
├── train
│   ├── action0
│   │   ├── video0
|   |   |   ├── frame0
|   |   |   ├── frame1
|   |   |   ├── ...
│   │   ├── video1
|   |   |   ├── frame0
|   |   |   ├── frame1
|   |   |   ├── ...
│   │   ├── ...
│   ├── action1
│   ├── ...
├── test
│   ├── action0
│   │   ├── video0
│   │   |     ├── frame0
│   │   ├── ...
│   ├── ...

Running train.py and test.py for training and evaluation respectively. By default, the code runs for action recognition and you can assign "--anticipation" for action anticipation:
```
# for action recognition
python train.py
python test.py

# for action anticipation
python train.py --anticipation
python test.py --anticipation
```
Get our pre-trained models:
- Action Recognition on UCF-101: Google Drive, Baidu Pan
- Action Anticipation on UCF-101: Google Drive, Baidu Pan
- Action Recognition on HMDB-51: Google Drive, Baidu Pan
- Action Anticipation on HMDB-51: Google Drive, Baidu Pan

Standalone model without backbone

Dependencies:
- Python 2.7
- Pytorch 0.4
- torchvision
- Numpy
- Pillow
- tqdm
Download Kinetics-400 from the official website or from the copy of facebookresearch/video-nonlocal-net, and organize the image files (from the videos) the same as UCF101 and HMDB:
```
Dataset
├── train_frames
│   ├── action0
│   │   ├── video0
|   |   |   ├── frame0
├── test_frames
```
Running train.py and test.py for training and evaluation respectively. In this standalone model, we only commit the action recognition task:
1. Run the following command to train.
```
# start from scratch
python main.py --train 

# start from our pre-trained model
python main.py --model_path [path_to_model] --model_name [model's name] --resume --train
```
2. Run the following command to test.
```
python main.py --test
```
3. Get our pre-trained models:
  - Action Recognition on Kinetics: Google Drive, Baidu Pan

Auxiliary Annotation (Polygon-RNN)

Results

Results on Cityscapes dataset:

Model			IoU
Original Polygon-RNN			61.4
Residual Polygon-RNN			62.2
Residual Polygon-RNN + attention + RL			67.2
Residual Polygon-RNN + attention + RL + EN			70.2
Polygon-RNN++			71.4
	# Layers	# params of RNN
Polyg-LSTM	2	0.47M	61.4
Polyg-LSTM	5	2.94M	63.0
Polyg-LSTM	10	7.07M	59.3
Polyg-LSTM	15	15.71M	46.7
Polyg-RBM	2	0.20M	59.9
Polyg-RBM	5	1.13M	63.1
Polyg-RBM	10	2.68M	67.1
Polyg-RBM	15	5.85M	70.4

Usage

Dependencies:

Python 2.7
Pytorch 0.4
torchvision
Numpy
Pillow

Download data from Cityscapes, organize the image files and annotation json files as follows:

img
├── train
│   ├── cityname1
│   │   ├── pic.png
│   │   ├── ...
│   ├── cityname2
│   │   ├── pic.png
│   │   ├── ...
├── val
│   ├── cityname
│   │   ├── pic.png
│   │   ├── ...
├── test
│   ├── cityname
│   │   ├── pic.png
│   │   ├── ...

label
├── train
│   ├── cityname1
│   │   ├── annotation.json
│   │   ├── ...
│   ├── cityname2
│   │   ├── annotation.json
│   │   ├── ...
├── val
│   ├── cityname
│   │   ├── annotation.json
│   │   ├── ...
├── test
│   ├── cityname
│   │   ├── annotation.json
│   │   ├── ...

The png files and the json files should have corresponding same name.

Execute the following command to make directories for new data and save models:

mkdir -p new_img/(train/val/test)
mkdir -p new_label/(train/val/test)
mkdir save

Run the following command to generate data for train/validation/test.

python generate_data.py --data train/val/test

Run the following command to train.

python train.py --gpu_id 0 --batch_size 1  --lr 0.0001 --pretrained False

Run the following command to test.

python test.py --gpu_id 0 --batch_size 128 --model [model_path]

Video Future Prediction

Results

Quantitative results on KTH:

Method	Metric	T1	T2	T3	T4	T5	T6	T7	T8	T9	T10	T11	T12	T13	T14	T15	T16	T17	T18	T19	T20	Avg
ConvLSTM	PSNR	33.8	30.6	28.8	27.6	26.9	26.3	26.0	25.7	25.3	25.0	24.8	24.5	24.2	23.7	23.2	22.7	22.1	21.8	21.7	21.6	25.3
	SSIM	0.947	0.906	0.871	0.844	0.824	0.807	0.795	0.787	0.773	0.757	0.747	0.738	0.732	0.721	0.708	0.691	0.674	0.663	0.659	0.656	0.765
MCnet	PSNR	33.8	31.0	29.4	28.4	27.6	27.1	26.7	26.3	25.9	25.6	25.1	24.7	24.2	23.9	23.6	23.4	23.2	23.1	23.0	22.9	25.9
	SSIM	0.947	0.917	0.889	0.869	0.854	0.840	0.828	0.817	0.808	0.797	0.788	0.799	0.770	0.760	0.752	0.744	0.736	0.730	0.726	0.723	0.804
Ours	PSNR	34.3	31.8	30.2	29.0	28.2	27.6	27.14	26.7	26.3	25.8	25.5	25.1	24.8	24.5	24.2	24.0	23.8	23.7	23.6	23.5	26.5
	SSIM	0.951	0.923	0.905	0.885	0.871	0.856	0.843	0.833	0.824	0.814	0.805	0.796	0.790	0.783	0.779	0.775	0.770	0.765	0.761	0.757	0.824

Qualitative results on KTH

Usage

Dependencies:
- Python 2.7
- Tensorflow 1.1.0 (pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl)
- Package: scipy, imageio, pyssim, joblib, Pillow, scikit-image, opencv-python (pip or conda)
- FFMPEG: conda install -c menpo ffmpeg=3.1.3
Downloading KTH dataset

./data/KTH/download.sh

Training (enable balanced multi-gpu training)

python train_kth_multigpu.py --gpu 0 1 2 3 4 5 6 7 --batch_size 8 --lr 0.0001

Testing

python test_kth.py --gpu 0 --prefix [checkpoint_folder] --p [checkpoint_index]

Obtain quantitative and qualitative results

The generated gifs will be located in

./results/images/KTH

The quantitative results will be located in

./results/quantitative/KTH

The quantitative results for each video will be stored as dictionaries, and the mean results for all test data instances at every timestep can be displayed as

import numpy as np
results = np.load('<results_file_name>')
print(results['psnr'].mean(axis=0))
print(results['ssim'].mean(axis=0))

Contributors

Deep RNN framework is authored by Bo Pang, Kaiwen Zha, Hanwen Cao, Chen Shi and Cewu Lu. Note that Cewu Lu is the corresponding author.

Acknowledgements

Special thanks for the source code of MCnet for ICLR 2017 paper: Decomposing Motion and Content for Natural Video Sequence Prediction.

Citation

Please cite these papers in your publications if it helps your research:

@article{pang2018deeprnn
 title = {Deep RNN Framework for Visual Sequential Applications}，
 author = {Bo, Pang and Kaiwen, Zha and Hanwen, Cao and Chen, Shi, and Cewu, Lu},
 journal = {arXiv preprint}
 year = {2018}
 }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep RNN Framework

Overview

Action Recognition and Anticipation

Results

Usage

Model with Backbone

Standalone model without backbone

Auxiliary Annotation (Polygon-RNN)

Results

Usage

Video Future Prediction

Results

Usage

Contributors

Acknowledgements

Citation

About

Releases

Packages

Languages

Seasoncarl/Deep-RNN-Framework

Folders and files

Latest commit

History

Repository files navigation

Deep RNN Framework

Overview

Action Recognition and Anticipation

Results

Usage

Model with Backbone

Standalone model without backbone

Auxiliary Annotation (Polygon-RNN)

Results

Usage

Video Future Prediction

Results

Usage

Contributors

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages