Prior to any of the following steps, dependencies should be installed as listed in requirements.txt
- Demand: noise dataset used for data augmentation. Can be downloaded here.
- HifiTTS: high-resolution multi-speaker english dataset used here as baseline. Can be downloaded here.
Training dataset and noise dataset audio samples should be decoded and placed in a directory using the following command:
python src/data/preprocessing/ -i <input_directory> -o <output_directory> -sr <sample_rate>
Resulting decoded audio directory paths should be placed in configuration file in place of noise_dir
and input_data_dirs
Training can be launched using the following command:
python src/train/ --config-name=hifitts +trainer.devices=<list_of_gpu_ids>
The configuration name should refer to a Hydra config in the configs/backbone
folder (YAML file).
that will download a checkpoint we trained using this repository for 200k training-steps and will place it in the right directory so that following inference and app work smoothly.
python src/utilities/
An inferencer class is provided in source code and can be called from command-line as follows:
python src/inference/ \
<experiment_directory> \
<checkpoint_filename> \
<device> \
<source_audio_path> \
<target_audio_path> \
python src/inference/ \
"static/runs/runs_backbone/hifitts/2023-09-29_16-22-28" \
"opt-steps=step=400000.ckpt" \
"cuda:0" \
"static/samples/vctk/p225_001.wav" \
"static/samples/vctk/p226_002.wav" \
streamlit run app/ --server.port <port_number>
Along training you can visualize logs using the following command:
tensorboard --logdir=static/runs/runs_backbone --bind_all --port <port_number>
Here is a screenshot of our tensorboard at the end of a 200k training-steps training launched with this repo, following the above guidelines and which results are displayed in a following section:
Observations and key R&D results are detailed here.
Results from checkpoints trained with this repo are showcased on this Notion page.