Pytorch: Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
- Create conda environment with dependencies: conda env create -f environment.yml -n name
- Run data_prep.py to prepare the h5py files
- Run coco_caption/get_stanford_models.sh to download the libraries necessary for evaluating the metrics.
- Set the parameters you want in settings/settings.yaml
- Run experiments: python train.py -n exp_name
- Set settings in rl block in settings/settings.yaml
- Run: python finetune_rl.py -n exp_name
@INPROCEEDING{sun2023dual,
title={Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning},
author={Jianyuan Sun and Xubo Liu and Xinhao Mei and Volkan Kılıç and Mark D. Plumbley and Wenwu Wang},
year={2023},
booktitle={INTERSPEECH2023}}