Region-based Non-local operation for Video Classification [arXiv]
Please [★star] this repo and [cite] the following arXiv paper if you think our RNL is useful for you:
@article{huang2020region,
title={Region-based Non-local Operation for Video Classification},
author={Huang, Guoxi and Bors, Adrian G},
journal={arXiv preprint arXiv:2007.09033},
year={2020}
}
- PyTorch 1.3 or higher
Please refer to TSM repo for the details of data preparation.
The accuracy might be a bit different from the paper, as we did some modification to our models. For example, instead of using SE module reported in the paper, we use the Channel-gate module form GCNet to model the channel attention.
method | n-frame | Kinetics Acc. | checkpoint |
---|---|---|---|
NL I3D-ResNet50 | 32 * 10clips | 74.9% | - |
RNL TSM-ResNet50 | 8 * 10clips | 75.6% | link |
RNL TSM-ResNet50 | 16 * 10clips | 77.2% | link |
RNL TSM-ResNet50 | (16+8) * 10clips | 77.4% | - |
On Kinetics, RNL TSM models achieve better performance than NL I3D model with less computation (shorter video length).
method | n-frame | Something-V1 Acc. | checkpoint |
---|---|---|---|
RNL TSM-ResNet50 | 8 * 2clips | 49.5% | link |
RNL TSM-ResNet50 | 16 * 2clips | 51.0% | link |
RNL TSM-ResNet50 | (8+16) * 2clips | 52.7% | - |
RNL TSM-ResNet101 | 8 * 2clips | 50.8% | link |
RNL 101 + RNL 50 | (8+16) * 2clips | 54.1% | - |
We provided several examples to train RNL network with this repo:
- To train on Kinetics from ImageNet pretrained models, you can run the script bellow:
python main.py --dataset kinetics --dense_sample --dist-url 'tcp://localhost:6666' \
--dist-backend 'nccl' --multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 \
--rank 0 --gd 20 --shift --shift_div=8 --shift_place=blockres --npb --lr 0.02 --wd 2e-4 \
--dropout 0.5 --num_segments 8 --batch_size 16 --batch_multiplier 4 --use_warmup --warmup_epochs 5 \
--lr_type cos --epochs 100 --non_local --suffix 1
- To train on Something-Something V1 from ImageNet pretrained models, you can run the script bellow:
python main.py --dist-url 'tcp://localhost:6666' --dist-backend 'nccl' \
--multiprocessing-distributed --available_gpus 0,1,2,3 --world-size 1 --rank 0 \
--dataset something --gd 20 --shift --shift_div=8 --shift_place=blockres --npb \
--lr 0.02 --wd 1e-3 --dropout 0.8 --num_segments 8 --batch_size 16 --batch_multiplier 4\
--use_warmup --warmup_epochs 1 --lr_type cos --epochs 50 --non_local --suffix 1
# Notice that the total batch size is equal to batch_size x batch_multiplier x world_size, and
# you should scale up the learning rate with batch size. For example, if you use
# a batch size of 128 you should set learning rate to 0.04.
For example, to test the downloaded pretrained models, you can run the scripts below. The scripts test RNL on 8-frame setting by running:
# test on kinetics
python test_models.py kinetics \
--weights=pretrained/TSM_kinetics_RGB_resnet50_shift8_blockres_avg_segment8_e100_cos_dense_nl_lr0.02_wd2.0e-04.pth.tar \
--test_segments=8 --batch_size=16 -j 25 --test_crops=3 --dense_sample --full_res
# test on Something
python test_models.py something \
--weights=pretrained/TSM_something_RGB_resnet50_shift8_blockres_avg_segment8_e50_cos_nl_h_8e-4.pth.tar \
--test_segments=8 --batch_size=2 -j 25 --test_crops=3 --twice_sample --full_res
This repository is built upon the following baseline implementations.
For any questions, please feel free to open an issue or contact:
Guoxi Huang: [email protected]