Skip to content
/ st-gcn Public
forked from yysijie/st-gcn

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

License

Notifications You must be signed in to change notification settings

yyy0921/st-gcn

Repository files navigation

Spatial Temporal Graph Convolutional Networks (ST-GCN)

A graph convolutional network for skeleton based action recognition.

News & Updates

  • June. 5, 2018 - A demo for feature visualization and skeleton based action recognition is released.
  • June. 1, 2018 - We update our code base and complete the PyTorch 0.4.0 migration.

Introduction

This repository holds the codebase, dataset and models for the paper>

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Sijie Yan, Yuanjun Xiong and Dahua Lin, AAAI 2018.

[Arxiv Preprint]

Visulization of ST-GCN in Action

Touch head Sitting down Take off a shoe Eat meal/snack Kick other person
Hammer throw Clean and jerk Pull ups Tai chi Juggling ball

ST-GCN is able to exploit local pattern and correlation from human skeletons. Above figures show the neural response magnitude of each node in the last layer of our ST-GCN.

The first row of results is from NTU-RGB+D dataset, and the second row is from Kinetics-skeleton.

Prerequisites

Our codebase is based on Python3.6. There are a few dependencies to run the code. The major libraries we depend are

  • Python (>=3.6)
  • PyTorch (Release version 0.4.0)
  • Openpose (Optional: for demo only)
  • Other Python libraries can be installed by pip install -r requirements.txt

Get pretrained models

We provided the pretrained model weithts of our ST-GCN. The model weights can be downloaded by running the script

bash tools/get_models.sh

The downloaded models will be stored under ./models.

Demo

Our graph convolutional networks represent human skeleton sequences by spatial temporal graph, which maintain the spatial structure in the network propagation. To visualize how ST-GCN exploit local correlation and pattern, we compute the feature vector magnitude of each node in the final spatial temporal graph, and overlay them on the original video. Openpose should be ready for extracting human skeletons from videos as the input of our model.

Run the demo by this command:

python main.py demo --openpose <path to openpose build directory> [--video <path to your video>]

Data Preparation

Kinetics-skeleton

Kinetics is a video-based dataset for action recognition which only provide raw video clips without skeleton data. To obatin the joint locations, we first resized all videos to the resolution of 340x256 and converted the frame rate to 30 fps. Then, we extracted skeletons from each frame in Kinetics by Openpose. The extracted skeleton data we called Kinetics-skeleton(7.5GB) can be directly downloaded from GoogleDrive or BaiduYun.

After uncompressing, rebuild the database by this command:

python tools/kinetics_gendata.py --data_path <path to kinetics-skeleton>

Testing Pretrained Models

To evaluate ST-GCN model pretrained on Kinetcis-skeleton, run

python main.py recognition -c config/st_gcn/kinetics-skeleton/test.yaml

To speed up evaluation by multi-gpu inference or modify batch size for reducing the memory cost, set --test-batch-size and --device like:

python main.py recognition -c <config file> --test-batch-size <batch size> --device <gpu0> <gpu1> ...

Results

The expected Top-1 accuracy of provided models are shown here:

Model Kinetics-
skeleton (%)
NTU RGB+D
Cross View (%)
NTU RGB+D
Cross Subject (%)
Baseline[1] 20.3 83.1 74.3
ST-GCN (Ours) 30.6 88.9 80.7

[1] Kim, T. S., and Reiter, A. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In BNMW CVPRW.

Training

To train a new ST-GCN model, run

python main.py recognition -c config/st_gcn/<dataset>/train.yaml [--work-dir <work folder>]

where the <dataset> must be nturgbd-cross-view, nturgbd-cross-subject or kinetics-skeleton, depending on the dataset you want to use. The training results, including model weights, configurations and logging files, will be saved under the ./work_dir by default or <work folder> if you appoint it.

You can modify the training parameters such as work-dir, batch-size, step, base_lr and device in the command line or configuration files. The order of priority is: command line > config file > default parameter. For more information, use main.py -h.

Finally, custom model evaluation can be achieved by this command as we mentioned above:

python main.py -c config/st_gcn/<dataset>/test.yaml --weights <path to model weights>

Citation

Please cite the following paper if you use this repository in your reseach.

@inproceedings{stgcn2018aaai,
  title     = {Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition},
  author    = {Sijie Yan and Yuanjun Xiong and Dahua Lin},
  booktitle = {AAAI},
  year      = {2018},
}

Contact

For any question, feel free to contact

Sijie Yan     : [email protected]
Yuanjun Xiong : [email protected]

About

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.4%
  • Shell 0.6%