Skip to content

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Notifications You must be signed in to change notification settings

Sunsunny11/Audio-captioning

Repository files navigation

Dual Transformer for Audio captioning

Pytorch: Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

  • Create conda environment with dependencies: conda env create -f environment.yml -n name

Set up dataset

  • Run data_prep.py to prepare the h5py files

Prepare evaluation tool

  • Run coco_caption/get_stanford_models.sh to download the libraries necessary for evaluating the metrics.

Run experiments

  • Set the parameters you want in settings/settings.yaml
  • Run experiments: python train.py -n exp_name

Reinforcement learning training

  • Set settings in rl block in settings/settings.yaml
  • Run: python finetune_rl.py -n exp_name

Citation

@INPROCEEDING{sun2023dual,
title={Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning},
author={Jianyuan Sun and Xubo Liu and Xinhao Mei and Volkan Kılıç and Mark D. Plumbley and Wenwu Wang},
year={2023},
booktitle={INTERSPEECH2023}}

About

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published