Skip to content

sohamtiwari3120/sound-event-classification

 
 

Repository files navigation

Sound Event Classification

This repository contains code for various sound event classification systems implemented on two datasets - (a) DCASE 2019 Task 5 and (b) a subset of Audioset.

Getting Started

This repository uses the code in the External-Attention-pytorch repository as a submodule.

Hence to only clone this repository:

git clone xx

To clone this repository and the sub-modules at the same time:

git clone --recurse-submodules -j8 xx

-j8 is an optional performance optimization that became available in version 2.8, and fetches up to 8 submodules at a time in parallel

Look at this StackOverflow answer for more information.

Organization

Each dataset has its own folder and contains the following subfolders:

  1. augmentation: package which contains code for various data augmentations
  2. data: contains raw audio files and generated features
  3. split: contains metadata required to parse the dataset
  4. models: contains saved weights Other than these subfolders, each dataset folder contains a script statistics.py to evaluate the channel wise mean and standard deviation for the various features.

Datasets

A. DCASE

This is the Urban Sound Tagging dataset from DCASE 2019 Task 5. It contains one training split and one validation split.

B. Audioset

This is a subset of Audioset containing 10 classes. It is split into 5 different folds for 5-fold cross validation.

Reproducing the results

To reproduce the results, first clone this repository. Then, follow the steps below.

1. Generating the features

Generate the required type of feature using the following
python compute_<feature_type>.py <input_path> <output_path>
Replace <feature_type> with one of logmelspec, cqt, gammatone. The output path has to be ./<dataset>/data/<feature_type> where <dataset> is one of dcase or audioset.

2. Evaluating channel wise mean and standard deviation

Evaluate the channel wise mean and standard deviation for the features using statistics.py.

DCASE

Only two arguments are required: feature type and number of frequency bins.
python statistics.py -w $WORKSPACE -f logmelspec -n 128

Audioset

An additional parameter is required to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python statistics.py -w $WORKSPACE -f logmelspec -n 128 -p 0 1 2 3 4

3. Parse the data

To parse the DCASE data, run parse_dcase.py. No such step is required for Audioset.

4. Training

DCASE

The train.py file for DCASE takes in 3 arguments: feature type, number of time frames and random seed. To train logmel, run
python train.py -w $WORKSPACE -f logmelspec -n 636 --seed 42

Audioset

Other than the three arguments above, the train.py file for Audioset takes in an additional argument to specify the training, validation and testing folds. For training on folds 0, 1 and 2, validating on 3 and testing on 4, run
python train.py -w $WORKSPACE -f logmelspec -n 636 -p 0 1 2 3 4
Different flags you can use with python train.py are:

  • -w, --workspace: To provide the path of the active workspace.
  • -f, --feature_type: Default = logmelspec. To provide which model feature to use for training.
  • -ma, --model_arch: Which model architecture to use when training. Default = mobilenetv2. Options: [mobilenetv2, pann_cnn10, pann_cnn14]
  • -cp10, --pann_cnn10_encoder_ckpt_path: Path of the pretrained model weights for PANN architecture. Only required when model_arch=pann_cnn10
  • -cp14, --pann_cnn14_encoder_ckpt_path: Path of the pretrained model weights for PANN architecture. Only required when model_arch=pann_cnn14
  • -n, --num_frames:Number of frames in generated logmelspecs.
  • -p, --permutation: Specifies the data splits used for training and validation. Default p = 0 1 2 3 4
  • -s, --seed: Int seed value to set for torch and numpy. Helpful in replicating results.
  • -rt, --resume_training: Whether to resume training from the last epoch it stopped at. Will automatically load the previous model weights. Default 'yes'
  • -bs, --balanced_sampler: Whether to use a balanced sampler when loading data. Default = False
  • -ga, --grad_acc_steps: Number of epochs to perform gradient accumulation on. Default = 2

5. Validating

For validation, run evaluate.py with the same arguments as above but without the random seed argument.

6. Feature Fusion

In order to perform feature fusion, refer to section 1 to generate logmelspec, cqt and gammatone features and then train their respective models. Next, to generate the weights of each feature, run
python generate_weights.py -w $WORKSPACE -p 0 1 2 3 4

Finally, run
python feature_fusion.py -w $WORKSPACE -p 0 1 2 3 4

7. Audioset subset results

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%