This project is RGB base action recognition system that focus on real time application. We use KARD dataset to train this model. You can check the full demo version in this youtube link
- keras
- tensorflow 1.14 (for using CuDNNLSTM)
- CUDA 10.0
You can download dataset from here. Skeleton joints and depth data is not used in this project. Only RGB part is needed. For preparing dataset, you should make the structure of your folder to be in this form
KARD-split
├── a01
│ ├── a01_s01_e01.mp4
│ ├── a01_s01_e02.mp4
│ ├── ...
│ ├── a01_s10_e03
├── a02
│ ├── a02_s01_e01.mp4
│ ├── a02_s01_e02.mp4
│ ├── ...
│ ├── a02_s10_e03
├── ....
├── ....
├── a18
│ ├── a18_s01_e01.mp4
│ ├── a18_s01_e02.mp4
│ ├── ...
│ ├── a18_s10_e03
└── ...
You can see more detail in dataset_list/trainlist.txt and dataset_list/testlist.txt
Webcam or any camera is required for using demo. You can run demo by using the following command
python webcam.py
If you want to see accuracy and confusion matrix of pretrain model, you can run the evaluation part by using below command.
python evaluate_model.py
You can try to train model by run this command.
python train.py
If you want to change any parameter, you can find it in train.py file
Accuracy: around 87-89% (depend on which part of test set that is random)
Input: 8 RGB frames
Output: 18 action classes
- This project use RGB Difference as input. The idea is from this paper and this project.
- This project just use only simple model to solve it. I use just only LSTM as core of model and use MobileNetV2 as feature extraction part. You can see the detail of architecture in model_ML.py
- while testing and evaluation, I will random n_sequence frames from each video file. So, n_sequence frames is "1 sample". while testing, if we random only 1 sample per 1 video file is not good because the accuracy will be unstable. So, we need to random more sample per one file. For example in evaluate_model.py, I set 'n_mul_test' to 2. It mean I will random 2 sample per one video file. You can change n_mul_test to be any value. If value is high, the accuracy will be stable but it need more testing time.
- The hyperparameters of train.py, evaluate_model.py and webcam.py is located in header of file. You can adjust it.
- If you face the out of memory problem when you try to evaluate or train, you can decrease n_batch and n_sequence to reduce memory consumption. I suggest you should not use n_batch = 1 because the accuracy will be very swing and cannot converge
My supervisor: Assoc Prof Dr. Hui Gao
My mentor: ChaiXinXin
Co-supervisor: Prof Dr.Xin Su
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly Data generator on keras
https://github.com/eriklindernoren/Action-Recognition Sampling Idea
https://github.com/AhmedGamal1496/online-action-recognition#Introduction RGB Difference Example
Temporal Segment Networks for Action Recognition in Videos, Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool, TPAMI, 2018. Arxiv Preprint