[arXiv ] [project page]
This repository contains official code for our paper Rethinking matching-based few-shot action recognition.
This code is based on the TSL [1] and TRX [2] repositories. It requires Python >= 3.8
You can find below the installation script:
Code
ROOT_REPO_DIR=<path_to_the_root_folder>
cd ${ROOT_REPO_DIR}
git clone https://github.com/xianyongqin/few-shot-video-classification.git
git clone https://github.com/tobyperrett/few-shot-action-recognition.git
git clone https://github.com/jbertrand89/temporal_matching.git
cd temporal_matching
python -m venv ENV
source ENV/bin/activate
pip install torch torchvision==0.12.0
pip install tensorboard
pip install einops
pip install ffmpeg
pip install pandas
or use
pip install -r requirements.txt
For more details on the datasets, please refer to DATA_PREPARATION.
We saved the scripts and the pretrained models evaluated in the paper in MODEL_ZOO.
The following sections detail each step.
To reproduce the paper numbers, you first need to
To run inference for a given matching function on pre-saved episodes, you need to specify:
- ROOT_TEST_EPISODE_DIR (as defined in Download test episodes)
- CHECKPOINT_DIR (as defined in Model ZOO)
- ROOT_REPO_DIR (as defined in Installation)
- MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
- SHOT (number of example per class between 1/5)
- DATASET (between ssv2/kinetics/ucf101)
And then run the evaluation on saved episodes script. Each script is different depending on the matching function, so please refer to scripts summary to find the one you need. For example, with Chamfer++ matching, run
Code
ROOT_TEST_EPISODE_DIR=<your_path>
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
SHOT=1
DATASET=ssv2
TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment
for SEED in 1 5 10
do
MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}.pt
python run_matching.py \
--num_gpus 1 \
--num_workers 1 \
--backbone r2+1d_fc \
--feature_projection_dimension 1152 \
--method matching-based \
--matching_function chamfer \
--video_to_class_matching joint \
--clip_tuple_length 3 \
--shot ${SHOT} \
--way 5 \
-c ${CHECKPOINT_DIR} \
-r -m ${MODEL_NAME} \
--load_test_episodes \
--test_episode_dir ${ROOT_TEST_EPISODE_DIR} \
--dataset_name ${DATASET}
done
python average_multi_seeds.py --result_dir ${CHECKPOINT_DIR} --result_template ${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed --seeds 1 5 10
You may want to run inference on a new set of episodes. We provide a script to use the R(2+1)D feature loader.
You first need to
To run inference for a given matching function, you need to specify:
- ROOT_FEATURE_DIR (as defined in Download pre-saved features)
- CHECKPOINT_DIR (as defined in Model ZOO)
- ROOT_REPO_DIR (as defined in Installation)
- MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
- SHOT (number of example per class between 1/5)
- DATASET (between ssv2/kinetics/ucf101)
- TEST_SEED (the number you like)
And then run the evaluation, general case script. Each script is different depending on the matching function, so please refer to scripts summary to find the one you need. For example, with Chamfer++ matching, run
Code
ROOT_FEATURE_DIR=<your_path>
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
SHOT=1
DATASET=ssv2
TEST_SEED=1
TEST_DIR=${ROOT_FEATURE_DIR}/${DATASET}/test
TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment
for SEED in 1 5 10
do
MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}.pt
python run_matching.py \
--num_gpus 1 \
--num_workers 1 \
--backbone r2+1d_fc \
--feature_projection_dimension 1152 \
--method matching-based \
--matching_function chamfer \
--video_to_class_matching joint \
--clip_tuple_length 3 \
--shot ${SHOT} \
--way 5 \
-c ${CHECKPOINT_DIR} \
-r -m ${MODEL_NAME}\
--split_dirs ${TEST_DIR} \
--split_names test \
--split_seeds ${TEST_SEED}\
--dataset_name ${DATASET}
done
python average_multi_seeds.py --result_dir ${CHECKPOINT_DIR} --result_template ${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed --seeds 1 5 10
To compare fairly classifier-based and matching-based approaches, we start from frozen R(2+1)D features
To run inference for a given matching function on pre-saved episodes, you need to specify:
- CHECKPOINT_DIR (can be different from the one defined in Model ZOO)
- ROOT_FEATURE_DIR (as defined in Download pre-saved features)
- ROOT_REPO_DIR (as defined in Installation)
- MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
- DATASET (between ssv2/kinetics/ucf101)
- SHOT (number of example per class between 1/5)
- SEED (the number you like, we chose 1/5/10)
The following hyper-parameters were tuned with optuna, we provide you the optimum value found for each method
- LR (usually between 0.01/0.001/0.0001)
- GLOBAL_TEMPERATURE
- TEMPERATURE_WEIGHT
And then run the training script. Each script is different depending on the matching function, so please refer to the scripts summary to find the one you need. For example, with Chamfer++ matching, run
Code
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_FEATURE_DIR=<your_path>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
DATASET=ssv2
SHOT=1
SEED=1
LR=0.001 # hyper parameter tuned with optuna
GLOBAL_TEMPERATURE=100 # hyper parameter tuned with optuna
TEMPERATURE_WEIGHT=0.1 # hyper parameter tuned with optuna
TRAIN_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/train
VAL_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/val
TEST_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/test
MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}
CHECKPOINT_DIR_TRAIN=${CHECKPOINT_DIR}/${MODEL_NAME}
rm -r ${CHECKPOINT_DIR_TRAIN}
TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment
python run_matching.py \
--dataset_name ${DATASET} \
--tasks_per_batch 1 \
--num_gpus 1 \
--num_workers 1 \
--shot ${SHOT} \
--way 5 \
--query_per_class 1 \
--num_test_tasks 10000 \
--num_val_tasks 10000 \
-c ${CHECKPOINT_DIR_TRAIN} \
--train_split_dir ${TRAIN_FEATURE_DIR} \
--val_split_dir ${VAL_FEATURE_DIR} \
--test_split_dir ${TEST_FEATURE_DIR} \
--train_seed ${SEED} \
--val_seed ${SEED} \
--test_seed 1 \
--seed ${SEED} \
-lr ${LR} \
--matching_global_temperature ${GLOBAL_TEMPERATURE} \
--matching_global_temperature_fixed \
--matching_temperature_weight ${TEMPERATURE_WEIGHT} \
--backbone r2+1d_fc \
--feature_projection_dimension 1152 \
--method matching-based \
--matching_function chamfer \
--video_to_class_matching joint \
--clip_tuple_length 3
The following Table recaps the scripts for evaluating and training the following models:
- our method: Chamfer++
- prior work:
- useful baselines:
- mean
- max
- diagonal
- linear
Table
Matching method | Evaluation on saved episodes | Evaluation, general case | Training |
---|---|---|---|
tsl | from_episodes | N/A | N/A |
mean | from_episodes | from_loader | train |
max | from_episodes | from_loader | train |
chamfer++ | from_episodes | from_loader | train |
diagonal | from_episodes | from_loader | train |
linear | from_episodes | from_loader | train |
otam | from_episodes | from_loader | train |
trx | from_episodes | from_loader | train |
visil | from_episodes | from_loader | train |
Coming soon.
[1] Xian et al. Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation
[2] Perrett et al. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
[3] Cao et al. Few-Shot Video Classification via Temporal Alignment
[4] Kordopatis-Zilos et al. ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning