Rethinking matching-based few-shot action recognition

[arXiv ] [project page]

This repository contains official code for our paper Rethinking matching-based few-shot action recognition.

What do we have here?

Installation
Data preparation
Model zoo
Evaluating a pre-trained model
1. On pre-saved episodes
2. General use-case
Train a model
Scripts summary
Citation

Installation

This code is based on the TSL [1] and TRX [2] repositories. It requires Python >= 3.8

You can find below the installation script:

Code

ROOT_REPO_DIR=<path_to_the_root_folder>
cd ${ROOT_REPO_DIR}
git clone https://github.com/xianyongqin/few-shot-video-classification.git
git clone https://github.com/tobyperrett/few-shot-action-recognition.git
git clone https://github.com/jbertrand89/temporal_matching.git
cd temporal_matching

python -m venv ENV
source ENV/bin/activate
pip install torch torchvision==0.12.0
pip install tensorboard
pip install einops
pip install ffmpeg
pip install pandas

or use
pip install -r requirements.txt

Data preparation

For more details on the datasets, please refer to DATA_PREPARATION.

Model zoo

We saved the scripts and the pretrained models evaluated in the paper in MODEL_ZOO.

The following sections detail each step.

Evaluating a pre-trained model

On pre-saved episodes

To reproduce the paper numbers, you first need to

To run inference for a given matching function on pre-saved episodes, you need to specify:

ROOT_TEST_EPISODE_DIR (as defined in Download test episodes)
CHECKPOINT_DIR (as defined in Model ZOO)
ROOT_REPO_DIR (as defined in Installation)
MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
SHOT (number of example per class between 1/5)
DATASET (between ssv2/kinetics/ucf101)

And then run the evaluation on saved episodes script. Each script is different depending on the matching function, so please refer to scripts summary to find the one you need. For example, with Chamfer++ matching, run

Code

ROOT_TEST_EPISODE_DIR=<your_path>
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
SHOT=1
DATASET=ssv2

TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate  # ENV is the name of the environment

for SEED in 1 5 10
do
  MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}.pt
  
  python run_matching.py \
  --num_gpus 1 \
  --num_workers 1 \
  --backbone r2+1d_fc \
  --feature_projection_dimension 1152 \
  --method matching-based \
  --matching_function chamfer \
  --video_to_class_matching joint \
  --clip_tuple_length 3 \
  --shot ${SHOT} \
  --way 5 \
  -c ${CHECKPOINT_DIR} \
  -r -m ${MODEL_NAME} \
  --load_test_episodes \
  --test_episode_dir ${ROOT_TEST_EPISODE_DIR} \
  --dataset_name ${DATASET}
done

python average_multi_seeds.py --result_dir ${CHECKPOINT_DIR} --result_template ${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed --seeds 1 5 10

General use-case

You may want to run inference on a new set of episodes. We provide a script to use the R(2+1)D feature loader.

You first need to

To run inference for a given matching function, you need to specify:

ROOT_FEATURE_DIR (as defined in Download pre-saved features)
CHECKPOINT_DIR (as defined in Model ZOO)
ROOT_REPO_DIR (as defined in Installation)
MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
SHOT (number of example per class between 1/5)
DATASET (between ssv2/kinetics/ucf101)
TEST_SEED (the number you like)

And then run the evaluation, general case script. Each script is different depending on the matching function, so please refer to scripts summary to find the one you need. For example, with Chamfer++ matching, run

Code

ROOT_FEATURE_DIR=<your_path>
CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_REPO_DIR=<your_repo_dir>
MATCHING_NAME=chamfer++
SHOT=1
DATASET=ssv2
TEST_SEED=1
TEST_DIR=${ROOT_FEATURE_DIR}/${DATASET}/test

TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment

for SEED in 1 5 10
do
  MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}.pt
  
  python run_matching.py \
  --num_gpus 1 \ 
  --num_workers 1 \
  --backbone r2+1d_fc \
  --feature_projection_dimension 1152 \
  --method matching-based \
  --matching_function chamfer \
  --video_to_class_matching joint \
  --clip_tuple_length 3 \
  --shot ${SHOT} \
  --way 5  \
  -c ${CHECKPOINT_DIR} \
  -r -m ${MODEL_NAME}\
  --split_dirs  ${TEST_DIR} \
  --split_names test \
  --split_seeds ${TEST_SEED}\
  --dataset_name ${DATASET}
done

python average_multi_seeds.py --result_dir ${CHECKPOINT_DIR} --result_template ${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed --seeds 1 5 10

Train a model

To compare fairly classifier-based and matching-based approaches, we start from frozen R(2+1)D features

To run inference for a given matching function on pre-saved episodes, you need to specify:

CHECKPOINT_DIR (can be different from the one defined in Model ZOO)
ROOT_FEATURE_DIR (as defined in Download pre-saved features)
ROOT_REPO_DIR (as defined in Installation)
MATCHING_NAME (between diag/mean/max/linear/otam/chamfer++/trx/visil)
DATASET (between ssv2/kinetics/ucf101)
SHOT (number of example per class between 1/5)
SEED (the number you like, we chose 1/5/10)

The following hyper-parameters were tuned with optuna, we provide you the optimum value found for each method

LR (usually between 0.01/0.001/0.0001)
GLOBAL_TEMPERATURE
TEMPERATURE_WEIGHT

And then run the training script. Each script is different depending on the matching function, so please refer to the scripts summary to find the one you need. For example, with Chamfer++ matching, run

Code

CHECKPOINT_DIR=<your_checkpoint_dir>
ROOT_FEATURE_DIR=<your_path>
ROOT_REPO_DIR=<your_repo_dir>

MATCHING_NAME=chamfer++
DATASET=ssv2
SHOT=1
SEED=1
LR=0.001  # hyper parameter tuned with optuna
GLOBAL_TEMPERATURE=100  # hyper parameter tuned with optuna
TEMPERATURE_WEIGHT=0.1  # hyper parameter tuned with optuna

TRAIN_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/train
VAL_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/val
TEST_FEATURE_DIR=${ROOT_FEATURE_DIR}/${DATASET}/test

MODEL_NAME=${DATASET}_${MATCHING_NAME}_5way_${SHOT}shots_seed${SEED}
CHECKPOINT_DIR_TRAIN=${CHECKPOINT_DIR}/${MODEL_NAME}
rm -r ${CHECKPOINT_DIR_TRAIN}

TEMPORAL_MATCHING_REPO_DIR=${ROOT_REPO_DIR}/temporal_matching
cd ${TEMPORAL_MATCHING_REPO_DIR}
source ENV/bin/activate # ENV is the name of the environment

python run_matching.py \
--dataset_name ${DATASET} \
--tasks_per_batch 1 \
--num_gpus 1 \
--num_workers 1 \
--shot ${SHOT} \
--way 5 \
--query_per_class 1 \
--num_test_tasks 10000 \
--num_val_tasks 10000 \
-c ${CHECKPOINT_DIR_TRAIN} \
--train_split_dir ${TRAIN_FEATURE_DIR} \
--val_split_dir ${VAL_FEATURE_DIR} \
--test_split_dir ${TEST_FEATURE_DIR} \
--train_seed ${SEED} \
--val_seed ${SEED} \
--test_seed 1 \
--seed ${SEED} \
-lr ${LR} \
--matching_global_temperature ${GLOBAL_TEMPERATURE} \
--matching_global_temperature_fixed \
--matching_temperature_weight ${TEMPERATURE_WEIGHT} \
--backbone r2+1d_fc \
--feature_projection_dimension 1152 \
--method matching-based \
--matching_function chamfer \
--video_to_class_matching joint \
--clip_tuple_length 3

Scripts summary

The following Table recaps the scripts for evaluating and training the following models:

our method: Chamfer++
prior work:
- TSL [1]
- TRX [2]
- OTAM [3]
- ViSiL [4] adapted for few-shot-action-recognition
useful baselines:
- mean
- max
- diagonal
- linear

Table

Matching method	Evaluation on saved episodes	Evaluation, general case	Training
tsl	from_episodes	N/A	N/A
mean	from_episodes	from_loader	train
max	from_episodes	from_loader	train
chamfer++	from_episodes	from_loader	train
diagonal	from_episodes	from_loader	train
linear	from_episodes	from_loader	train
otam	from_episodes	from_loader	train
trx	from_episodes	from_loader	train
visil	from_episodes	from_loader	train

Citation

Coming soon.

References

[1] Xian et al. Generalized Few-Shot Video Classification with Video Retrieval and Feature Generation

[2] Perrett et al. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

[3] Cao et al. Few-Shot Video Classification via Temporal Alignment

[4] Kordopatis-Zilos et al. ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
src		src
DATA_PREPARATION.md		DATA_PREPARATION.md
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
average_multi_seeds.py		average_multi_seeds.py
feature_extraction.py		feature_extraction.py
requirements.txt		requirements.txt
run_classification.py		run_classification.py
run_matching.py		run_matching.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rethinking matching-based few-shot action recognition

What do we have here?

Installation

Data preparation

Model zoo

Evaluating a pre-trained model

On pre-saved episodes

General use-case

Train a model

Scripts summary

Citation

References

About

Releases

Packages

Languages

jbertrand89/matching_based_fsar

Folders and files

Latest commit

History

Repository files navigation

Rethinking matching-based few-shot action recognition

What do we have here?

Installation

Data preparation

Model zoo

Evaluating a pre-trained model

On pre-saved episodes

General use-case

Train a model

Scripts summary

Citation

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages