This repository contains legacy (under development) code for developing a framework SSL-based speech enhancement.
You can install the required packages using the following command:
pip install -r requirements.txt
Download the VCTK-DEMAND dataset with 16 kHz, and change the dataset dir:
The dataset is expected to be in the audio/
contains the clean test
contains the noisy test
contains the clean training
contains the noisy training set.
You can update the path to the dataset in the [`configs/mpnet_weights.json file.]. The configuration file also contains other parameters you can change to run the experiment (e.g., compression, learnable sigmoid, etc.).
To apply PCS to the audio you can use apply_pcs, please update the paths in the for audios.
Run for the best model, which is a conformer as a head. Additionally, remember to use waveform_loss (weighted_sdr_loss) in conjunction with consistency_loss (L1) and unconsistency_loss (L1).
To run the experiment you can use the following command to train the model:
--experiment_config configs/mpnet_weights.json \
--num_epochs 50 \
--batch_size 4\
--cuda \
--model_tag microsoft/wavlm-large \
--checkpoint_dir mag_only/wavlm_best/ \
--reconstructed_audio_folder reconstructed_audio/mag_only/wavlm_best \
--compute_metrics_interval 1\
--magnitude_head conformer\
--experiment_name mag_only/wavlm_best \
To evaluate the model you can use the following command:
--experiment_config configs/mpnet_weights.json \
--model_checkpoint /home/salman/SE_Self-Supervise_Learning-/mag_only/wavlm_best/ \
--cuda \
--reconstructed_audio_folder reconstructed_audio/mag_only/wavlm_best \
--model_tag microsoft/wavlm-large \
--magnitude_head conformer > results/mag_only/wavlm_best.txt
After running the evaluation script, you can use the results/mag_only/wavlm_best.txt
file to check the results regarding PESQ and STOI.
You can set the command line arguments according to your needs. For example, you can change the model tag to use a different pre-trained model. You can also change the magnitude head to use different architectures (e.g., lstm
or transformer
You should also check the CUDA_VISIBLE_DEVICES
variable to make sure that you are using the correct GPU. The code is only tested on a single GPU at the moment.
You can also use the pre-trained best checkpoint file we provide in the mag_only/wavlm_best/ Generated WAV files are saved in the reconstructed_audio_folder. For computing other metrics like CBAK, COVL, etc you can run and update the paths for enhanced and clean waveforms accordingly.