Oversampling, Augmentation and Curriculum Learning for Speaking Assessment with Limited Training Data
This project is a refactored version of the l2-speech-scoring-tools
developed by Aalto-speech. This project explores methods includeing data augmentation, oversampling and curriculum learning to alleviate challenges related to training wav2vec-based Automatic Speaking Assessment models using small and imbalanced datasets.
The datasets can be downloaded from https://www.kielipankki.fi/corpora/digitala/. The pre-trained Swedish model KBLab/wav2vec2-large-voxrex-swedish is publicly available on HuggingFace. The pre-trained Finnish model is unfortunately not publicly available.
config.yml
contains all the model, data and training parameters.environment.yml
defines the conda env that the code is run on.run_finetune.py
fine-tunes the wav2vec2 model pre-trained on native Finnish/Finland Swedish speech.run_predict.py
use the fine-tuned models for prediction.run_finetune.sh
runsrun_finetune.py
on Triton.run_predict.sh
runsrun_predict.py
on Triton.augmentations
folder contains everything to do with data augmentation.helper
folder contains all the functions that are not directly run in main().others
files that are reference and can be remove later.
- Clone this repo and cd into it
- Create conda env
conda env create --file environment.yml
- Install WavAugment
git clone [email protected]:facebookresearch/WavAugment.git && cd WavAugment && python setup.py develop
- Run the code on Triton
- Check
config.yml
to see if all parameters look good. - Check
run.sh
and set--lang
to the desired language (eitherfi
orsv
). - And then run:
sbatch run.sh