GitHub - fanyunfeng-bit/Cross-modal-Representation-Flattening-for-MMDG

Code

The code was tested using Python 3.10.4, torch 1.11.0+cu113 and NVIDIA GeForce RTX 3090.

Environments:

mmcv-full 1.2.7
mmaction2 0.13.0

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

Download Audio model link, rename it as vggsound_avgpool.pth.tar and place under the EPIC-rgb-flow-audio/pretrained_models directory
Download SlowFast model for RGB modality link and place under the EPIC-rgb-flow-audio/pretrained_models directory
Download SlowOnly model for Flow modality link and place under the EPIC-rgb-flow-audio/pretrained_models directory

Download EPIC-Kitchens Dataset

bash download_script.sh

Download Audio files EPIC-KITCHENS-audio.zip.

Unzip all files and the directory structure should be modified to match:

├── MM-SADA_Domain_Adaptation_Splits
├── rgb
|   ├── train
|   |   ├── D1
|   |   |   ├── P08_01.wav
|   |   |   ├── P08_01
|   |   |   |     ├── frame_0000000000.jpg
|   |   |   |     ├── ...
|   |   |   ├── P08_02.wav
|   |   |   ├── P08_02
|   |   |   ├── ...
|   |   ├── D2
|   |   ├── D3
|   ├── test
|   |   ├── D1
|   |   ├── D2
|   |   ├── D3


├── flow
|   ├── train
|   |   ├── D1
|   |   |   ├── P08_01 
|   |   |   |   ├── u
|   |   |   |   |   ├── frame_0000000000.jpg
|   |   |   |   |   ├── ...
|   |   |   |   ├── v
|   |   |   ├── P08_02
|   |   |   ├── ...
|   |   ├── D2
|   |   ├── D3
|   ├── test
|   |   ├── D1
|   |   ├── D2
|   |   ├── D3

Video and Audio

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_ours2.py --use_video --use_audio -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_video --use_audio -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_video --use_audio -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Video and Flow

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_ours2.py --use_video --use_flow -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_video --use_flow -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_video --use_flow -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Flow and Audio

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_ours2.py --use_audio --use_flow -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_audio --use_flow -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_audio --use_flow -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Video and Flow and Audio

cd EPIC-rgb-flow-audio

python train_video_flow_audio_EPIC_ours2.py --use_video --use_audio --use_flow -s D2 D3 -t D1 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_video --use_audio --use_flow -s D1 D3 -t D2 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_EPIC_ours2.py --use_video --use_audio --use_flow -s D1 D2 -t D3 --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/EPIC-KITCHENS/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 400 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

HAC Dataset

This dataset can be downloaded at link.

Download the pretrained weights similar to EPIC-Kitchens Dataset and put under the HAC-rgb-flow-audio/pretrained_models directory.

Video and Audio

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_ours2.py --use_video --use_audio --use_flow -s animal cartoon -t human --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_video --use_audio --use_flow -s human cartoon -t animal --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_video --use_audio --use_flow -s human animal -t cartoon --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Video and Flow

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_ours2.py --use_video --use_flow --use_flow -s animal cartoon -t human --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_video --use_flow --use_flow -s human cartoon -t animal --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_video --use_flow --use_flow -s human animal -t cartoon --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Flow and Audio

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_ours2.py --use_audio --use_flow --use_flow -s animal cartoon -t human --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_audio --use_flow --use_flow -s human cartoon -t animal --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_audio --use_flow --use_flow -s human animal -t cartoon --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Video and Flow and Audio

cd HAC-rgb-flow-audio

python train_video_flow_audio_HAC_ours2.py --use_video --use_audio --use_flow --use_flow -s animal cartoon -t human --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_video --use_audio --use_flow --use_flow -s human cartoon -t animal --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

python train_video_flow_audio_HAC_ours2.py --use_video --use_audio --use_flow --use_flow -s human animal -t cartoon --lr 1e-4 --bsz 16 --nepochs 15 --datapath /path/to/HAC/ --vanilla_learning --DG_algorithm naive --SMA --sma_start_step 100 --CM_mixup --mix_alpha 0.1 --contrast --distill --distill_coef 3.0 --mix_coef 2.0

Acknowledgement

Many thanks to the excellent open-source projects SimMMDG.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
EPIC-rgb-flow-audio		EPIC-rgb-flow-audio
HAC-rgb-flow-audio		HAC-rgb-flow-audio
dataset		dataset
README.md		README.md
download_script.sh		download_script.sh
generate_sound_files.py		generate_sound_files.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

Download EPIC-Kitchens Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

HAC Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

Acknowledgement

About

Releases

Packages

Languages

fanyunfeng-bit/Cross-modal-Representation-Flattening-for-MMDG

Folders and files

Latest commit

History

Repository files navigation

Code

EPIC-Kitchens Dataset

Prepare

Download Pretrained Weights

Download EPIC-Kitchens Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

HAC Dataset

Video and Audio

Video and Flow

Flow and Audio

Video and Flow and Audio

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages