We use the ActionFormer detection pipeline as our baseline method and replace its I3D feature with the feature extracted by VideoMAE V2-g.
Dataset | Backbone | Head | mAP | Features |
---|---|---|---|---|
THUMOS14 | VideoMAE V2-g | ActionFormer | 69.6 | th14_mae_g_16_4.tar.gz |
FineAction | VideoMAE V2-g | ActionFormer | 18.2 | fineaction_mae_g.tar.gz |
Use extract_tad_feature.py
to extract the feature of datasets. For example, to extract the feature of THUMOS14, running the following command:
python extract_tad_feature.py \
--data_set THUMOS14 \
--data_path YOUR_PATH/thumos14_videos \
--save_path YOUR_PATH/th14_vit_g_16_4 \
--model vit_giant_patch14_224 \
--ckpt_path YOUR_PATH/vit_g_hyrbid_pt_1200e_k710_ft.pth
# to extract the 413 THUMOS videos that VideoMAEv2/ActionFormer use
# For reproducing results from papers, use this
python extract_tad_feature --use_actionformer_subset
# To extract all videos from THUMOS'14 (ie. all of UCF101, the 1010 val and 1574 test videos)
python extract_tad_feature