Skip to content


Latest commit

de39028 · Mar 29, 2023


41 lines (29 loc) · 3.44 KB

File metadata and controls

41 lines (29 loc) · 3.44 KB


We currently provide an offline GPU-demo for skeleton action recognition and an online CPU-demo for gesture recognition. Details are provided below.


  • Before running the skeleton action recognition demo, make sure you have installed mmcv-full, mmpose and mmdet. We recommend you to directly use the provided conda environment, with all necessary dependencies included:
# Following commands assume you are in the root directory of pyskl (indicated as `$PYSKL`)
# This command runs well with conda 22.9.0, if you are running an early conda version and got some errors, try to update your conda first
conda env create -f pyskl.yaml  # Create the conda environment (named `pyskl`) for this project, run it if you haven't created one yet.
conda activate pyskl  # Activate the `pyskl` environment
pip install -e .  # Install this project
  • Before running the gesture recognition demo, you need to install mediapipe first. This can be completed simply by pip install mediapipe.

Skeleton Action Recognition Demo (GPU, offline)

The provided skeleton action recognition demo is offline, which means it takes a video clip as input and return the action detection. The demo runs on GPU. By default, this demo recognizes 120 actions categories defined in NTURGB+D 120.

For human skeleton extraction, we use Faster-RCNN (R50 backbone) for human detection and HRNet_w32 for human pose estimation. All based on OpenMMLab implementations.

# Running the demo with PoseC3D trained on NTURGB+D 120 (Joint Modality), which is the default option. The input file is demo/ntu_sample.avi, the output file is demo/demo.mp4
python demo/ demo/ntu_sample.avi demo/demo.mp4
# Running the demo with STGCN++ trained on NTURGB+D 120 (Joint Modality). The input file is demo/ntu_sample.avi, the output file is demo/demo.mp4
python demo/ demo/ntu_sample.avi demo/demo.mp4 --config configs/stgcn++/stgcn++_ntu120_xsub_hrnet/ --checkpoint

Note that for running demo on an arbitrary input video, you need a tracker to formulate pose estimation results for each frame into multiple skeleton sequences. Currently we are using a naive tracker based on inter-frame pose similarities. You can also try to write your own tracker.

Gestrue Recognition Demo (CPU, Real-time)

We provide an online gesture recognition demo that runs real-time on CPU. The demo takes a video stream as input and predict the current gesture performed (It only supports the single-hand scenario now). By default, this demo recognizes 15 gestures defined in HaGRID, including: Call, Dislike, Fist, Four, Like, Mute, OK, One, Palm, Peace, Rock, Stop, Three [Middle 3 Fingers], Three [Left 3 Fingers], Two Up.

For hand keypoint extraction, we use the opensource solution mediapipe. For skeleton-based gesture recognition, currently we adopt a light variant of ST-GCN++ model trained on the HaGRID gesture recognition dataset.

# Run the real time skeleton-based gesture recognition demo
python demo/