A graph convolutional network for skeleton-based action recognition.
<img src="resource/info/pipeline.png">
This repository holds the codebase, dataset and models for the paper>
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition Sijie Yan, Yuanjun Xiong and Dahua Lin, AAAI 2018.
- Aug. 6, 2019 - End-to-end action recognizer with Openpose Python API.
- Aug. 5, 2019 - We complete the PyTorch 1.0 migration.
- July. 10, 2019 - We provide processed data on NTU-RGB+D and kinetics-skeleton.
- Feb. 21, 2019 - We provide pretrained models and training scripts on NTU-RGB+D and kinetics-skeleton datasets. So that you can achieve the performance we mentioned in the paper.
- June. 5, 2018 - A demo for feature visualization and skeleton based action recognition is released.
- June. 1, 2018 - We update our code base and complete the PyTorch 0.4.0 migration.
Our demo for skeleton-based action recognition:
<img src="resource/info/demo_video.gif", width="1200">
ST-GCN is able to exploit local pattern and correlation from human skeletons. Below figures show the neural response magnitude of each node in the last layer of our ST-GCN.
<td><img width="150px" src="resource/info/S001C001P001R001A044_w.gif"></td>
<td><img width="150px" src="resource/info/S003C001P008R001A008_w.gif"></td>
<td><img width="150px" src="resource/info/S002C001P010R001A017_w.gif"></td>
<td><img width="150px" src="resource/info/S003C001P008R001A002_w.gif"></td>
<td><img width="150px" src="resource/info/S001C001P001R001A051_w.gif"></td>
<td><font size="1">Touch head<font></td>
<td><font size="1">Sitting down<font></td>
<td><font size="1">Take off a shoe<font></td>
<td><font size="1">Eat meal/snack<font></td>
<td><font size="1">Kick other person<font></td>
<td><img width="150px" src="resource/info/hammer_throw_w.gif"></td>
<td><img width="150px" src="resource/info/clean_and_jerk_w.gif"></td>
<td><img width="150px" src="resource/info/pull_ups_w.gif"></td>
<td><img width="150px" src="resource/info/tai_chi_w.gif"></td>
<td><img width="150px" src="resource/info/juggling_balls_w.gif"></td>
<td><font size="1">Hammer throw<font></td>
<td><font size="1">Clean and jerk<font></td>
<td><font size="1">Pull ups<font></td>
<td><font size="1">Tai chi<font></td>
<td><font size="1">Juggling ball<font></td>
The first row of above results is from NTU-RGB+D dataset, and the second row is from Kinetics-skeleton.
Our codebase is based on Python3 (>=3.5). There are a few dependencies to run the code. The major libraries we depend are
- PyTorch
- Openpose with python API. (Optional: for demo only)
- Other Python libraries should be installed by
pip install -r requirements.txt
cd torchlight;
python setup.py install;
cd..
We provided the pretrained model weithts of our ST-GCN. The model weights can be downloaded by running the script
bash tools / get_models.sh
You can also obtain models from GoogleDrive or BaiduYun, and manually put them into
./models
.
Openpose Python API is required in this demo. You can use the following commands to run the demo.
# with offline pose estimation
python main.py demo_offline [--video ${PATH_TO_VIDEO}] [--openpose ${PATH_TO_OPENPOSE}]
# with realtime pose estimation
python main.py demo [--video ${PATH_TO_VIDEO}] [--openpose ${PATH_TO_OPENPOSE}]
Optional arguments:
PATH_TO_OPENPOSE
: It is required if systemPYTHONPATH
doesn't contain the Openpose Python API.PATH_TO_VIDEO
: Filename of the input video.
We experimented on two skeleton-based action recognition datasts: Kinetics-skeleton and NTU RGB+D. Before training and testing, for convenience of fast data loading, the datasets should be converted to proper file structure. You can download the pre-processed data from GoogleDrive and extract files with
cd st - gcn
unzip < path to st - gcn - processed - data.zip >
Otherwise, for processing raw data by yourself, please refer to below guidances.
Kinetics is a video-based dataset for action recognition which only provide raw video clips without skeleton data. Kinetics dataset include To obatin the joint locations, we first resized all videos to the resolution of 340x256 and converted the frame rate to 30 fps. Then, we extracted skeletons from each frame in Kinetics by Openpose. The extracted skeleton data we called Kinetics-skeleton(7.5GB) can be directly downloaded from GoogleDrive or BaiduYun.
After uncompressing, rebuild the database by this command:
python tools / kinetics_gendata.py--data_path < path to kinetics - skeleton >
NTU RGB+D can be downloaded from their website. Only the 3D skeletons(5.8GB) modality is required in our experiments. After that, this command should be used to build the database for training or evaluation:
python tools / ntu_gendata.py--data_path < path to nturgbd + d_skeletons >
where the
<path to nturgbd+d_skeletons>
points to the 3D skeletons modality of NTU RGB+D dataset you download.
To evaluate ST-GCN model pretrained on Kinetcis-skeleton, run
python main.py recognition - c config / st_gcn / kinetics - skeleton / test.yaml
For cross-view evaluation in NTU RGB+D, run
python main.py recognition - c config / st_gcn / ntu - xview / test.yaml
For cross-subject evaluation in NTU RGB+D, run
python main.py recognition -c config/st_gcn/ntu-xsub/test.yaml
To speed up evaluation by multi-gpu inference or modify batch size for reducing the memory cost, set
--test_batch_size
and
--device
like:
python main.py recognition - c < config file > --test_batch_size < batch size > --device < gpu0 > < gpu1 > ...
The expected Top-1 accuracy of provided models are shown here:
Model | Kinetics- skeleton (%) |
NTU RGB+D Cross View (%) |
NTU RGB+D Cross Subject (%) |
---|---|---|---|
Baseline[1] | 20.3 | 83.1 | 74.3 |
ST-GCN (Ours) | 31.6 | 88.8 | 81.6 |
[1] Kim, T. S., and Reiter, A. 2017. Interpretable 3d human action analysis with temporal convolutional networks. In BNMW CVPRW.
To train a new ST-GCN model, run
python main.py recognition - c config / st_gcn / < dataset > /train.yaml [--work_dir <work folder>]
where the
<dataset>
must be
ntu-xsub
,
ntu-xview
or
kinetics-skeleton
, depending on the dataset you want to use.
The training results, including model weights, configurations and logging files, will be saved under the
./work_dir
by default or
<work folder>
if you appoint it.
You can modify the training parameters such as
work_dir
,
batch_size
,
step
,
base_lr
and
device
in the command line or configuration files. The order of priority is: command line > config file > default parameter. For more information, use
main.py -h
.
Finally, custom model evaluation can be achieved by this command as we mentioned above:
python main.py recognition - c config / st_gcn / < dataset > /test.yaml --weights <path to model weights>
Please cite the following paper if you use this repository in your reseach.
@inproceedings {
stgcn2018aaai,
title = {
Spatial Temporal Graph Convolutional Networks
for Skeleton - Based Action Recognition
},
author = {
Sijie Yan and Yuanjun Xiong and Dahua Lin
},
booktitle = {
AAAI
},
year = {
2018
},
}
For any question, feel free to contact
Sijie Yan: ys016 @ie.cuhk.edu.hk
Yuanjun Xiong: bitxiong @gmail.com