HiFi-GAN : Fully-Convolutional Non-AR GAN vocoder

Clone of the official HiFi-GAN implementation.
official demo page.

Pre-requisites

Python >= 3.6
Clone this repository.
Install python requirements. Please refer requirements.txt
Download and extract the LJ Speech dataset. And move all wav files to LJSpeech-1.1/wavs

Training

python train.py --config config_v1.json

To train V2 or V3 Generator, replace config_v1.json with config_v2.json or config_v3.json.
Checkpoints and copy of the configuration file are saved in cp_hifigan directory by default.
You can change the path by adding --checkpoint_path option.

Validation loss during training with V1 generator.

Pretrained Model

You can also use pretrained models we provide.
Download pretrained models
Details of each folder are as in follows:

Folder Name	Generator	Dataset	Fine-Tuned
LJ_V1	V1	LJSpeech	No
LJ_V2	V2	LJSpeech	No
LJ_V3	V3	LJSpeech	No
LJ_FT_T2_V1	V1	LJSpeech	Yes (Tacotron2)
LJ_FT_T2_V2	V2	LJSpeech	Yes (Tacotron2)
LJ_FT_T2_V3	V3	LJSpeech	Yes (Tacotron2)
VCTK_V1	V1	VCTK	No
VCTK_V2	V2	VCTK	No
VCTK_V3	V3	VCTK	No
UNIVERSAL_V1	V1	Universal	No

We provide the universal model with discriminator weights that can be used as a base for transfer learning to other datasets.

Fine-Tuning

Generate mel-spectrograms in numpy format using Tacotron2 with teacher-forcing.
The file name of the generated mel-spectrogram should match the audio file and the extension should be .npy.
Example:
```
Audio File : LJ001-0001.wav
Mel-Spectrogram File : LJ001-0001.npy
```
Create ft_dataset folder and copy the generated mel-spectrogram files into it.
Run the following command.
```
python train.py --fine_tuning True --config config_v1.json
```
For other command line options, please refer to the training section.

Inference from wav file

Make test_files directory and copy wav files into the directory.

Run the following command.

python inference.py --checkpoint_file [generator checkpoint file path]

Generated wav files are saved in generated_files by default.
You can change the path by adding --output_dir option.

Inference for end-to-end speech synthesis

Make test_mel_files directory and copy generated mel-spectrogram files into the directory.
You can generate mel-spectrograms using Tacotron2, Glow-TTS and so forth.

Run the following command.

python inference_e2e.py --checkpoint_file [generator checkpoint file path]

Generated wav files are saved in generated_files_from_mel by default.
You can change the path by adding --output_dir option.

Acknowledgements

We referred to WaveGlow, MelGAN and Tacotron2 to implement this.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
LJSpeech-1.1		LJSpeech-1.1
LICENSE		LICENSE
README.md		README.md
config_v1.json		config_v1.json
config_v2.json		config_v2.json
config_v3.json		config_v3.json
env.py		env.py
inference.py		inference.py
inference_e2e.py		inference_e2e.py
meldataset.py		meldataset.py
migration_test.py		migration_test.py
models.py		models.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py
validation_loss.png		validation_loss.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiFi-GAN : Fully-Convolutional Non-AR GAN vocoder

Pre-requisites

Training

Pretrained Model

Fine-Tuning

Inference from wav file

Inference for end-to-end speech synthesis

Acknowledgements

About

Releases

Packages

Languages

License

tarepan/HiFiGAN-official

Folders and files

Latest commit

History

Repository files navigation

HiFi-GAN : Fully-Convolutional Non-AR GAN vocoder

Pre-requisites

Training

Pretrained Model

Fine-Tuning

Inference from wav file

Inference for end-to-end speech synthesis

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages