This repository contains code for various sound event classification systems implemented on two datasets - (a) DCASE 2019 Task 5 and (b) a subset of Audioset.
This repository uses the code in the External-Attention-pytorch repository as a submodule.
Hence to only clone this repository:
git clone xx
To clone this repository and the sub-modules at the same time:
git clone --recurse-submodules -j8 xx
-j8
is an optional performance optimization that became available in version 2.8, and fetches up to 8 submodules at a time in parallel
Look at this StackOverflow answer for more information.
Each dataset has its own folder and contains the following subfolders:
- augmentation: package which contains code for various data augmentations
- data: contains raw audio files and generated features
- split: contains metadata required to parse the dataset
- models: contains saved weights
Other than these subfolders, each dataset folder contains a script
statistics.py
to evaluate the channel wise mean and standard deviation for the various features.
This is the Urban Sound Tagging dataset from DCASE 2019 Task 5. It contains one training split and one validation split.
This is a subset of Audioset containing 10 classes. It is split into 5 different folds for 5-fold cross validation.
To reproduce the results, first clone this repository. Then, follow the steps below.
Generate the required type of feature using the following
python compute_<feature_type>.py <input_path> <output_path>
Replace <feature_type>
with one of logmelspec
, cqt
, gammatone
. The output path has to be ./<dataset>/data/<feature_type>
where <dataset>
is one of dcase
or audioset
.
Evaluate the channel wise mean and standard deviation for the features using statistics.py
.
Only two arguments are required: feature type and number of frequency bins.
python statistics.py -w $WORKSPACE -f logmelspec -n 128
An additional parameter is required to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python statistics.py -w $WORKSPACE -f logmelspec -n 128 -p 0 1 2 3 4
To parse the DCASE data, run parse_dcase.py
. No such step is required for Audioset.
The train.py
file for DCASE takes in 3 arguments: feature type, number of time frames and random seed. To train logmel, run
python train.py -w $WORKSPACE -f logmelspec -n 636 --seed 42
Other than the three arguments above, the train.py
file for Audioset takes in an additional argument to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python train.py -w $WORKSPACE -f logmelspec -n 636 -p 0 1 2 3 4
Different flags you can use with python train.py
are:
-w
,--workspace
: To provide the path of the active workspace.-f
,--feature_type
: Default =logmelspec
. To provide which model feature to use for training.-ma
,--model_arch
: Which model architecture to use when training. Default =mobilenetv2
. Options:[mobilenetv2, pann_cnn10, pann_cnn14]
-cp10
,--pann_cnn10_encoder_ckpt_path
: Path of the pretrained model weights for PANN architecture. Only required whenmodel_arch=pann_cnn10
-cp14
,--pann_cnn14_encoder_ckpt_path
: Path of the pretrained model weights for PANN architecture. Only required whenmodel_arch=pann_cnn14
-n
,--num_frames
:Number of frames in generated logmelspecs.-p
,--permutation
: Specifies the data splits used for training and validation. Defaultp = 0 1 2 3 4
-s
,--seed
: Int seed value to set for torch and numpy. Helpful in replicating results.-rt
,--resume_training
: Whether to resume training from the last epoch it stopped at. Will automatically load the previous model weights. Default'yes'
-bs
,--balanced_sampler
: Whether to use a balanced sampler when loading data. Default =False
-ga
,--grad_acc_steps
: Number of epochs to perform gradient accumulation on. Default =2
For validation, run evaluate.py
with the same arguments as above but without the random seed argument.
In order to perform feature fusion, refer to section 1 to generate logmelspec
, cqt
and gammatone
features and then train their respective models. Next, to generate the weights of each feature, run
python generate_weights.py -w $WORKSPACE -p 0 1 2 3 4
Finally, run
python feature_fusion.py -w $WORKSPACE -p 0 1 2 3 4