Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
configs		configs
core		core
dataset		dataset
filelists		filelists
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compute_statistics.py		compute_statistics.py
demo_fastspeech2.ipynb		demo_fastspeech2.ipynb
evaluation.py		evaluation.py
export_torchscript.py		export_torchscript.py
fastspeech.py		fastspeech.py
inference.py		inference.py
nvidia_preprocessing.py		nvidia_preprocessing.py
requirements.txt		requirements.txt
train_fastspeech.py		train_fastspeech.py

Repository files navigation

AdaSpeech: Adaptive Text to Speech for Custom Voice [WIP]

Unofficial Pytorch implementation of AdaSpeech.

Note:

I am not considering multi-speaker use case, Iam much more focus only on single speaker.
I will use only Utterance level encoder and Phoneme level encoder not condition layer norm (which is the soul of AdaSpeech paper), it definelty restrict the adaptive nature of AdaSpeech but my focus is to improve FastSpeech 2 acoustic generalization rather than adaptation.

Citations

@misc{chen2021adaspeech,
      title={AdaSpeech: Adaptive Text to Speech for Custom Voice}, 
      author={Mingjian Chen and Xu Tan and Bohan Li and Yanqing Liu and Tao Qin and Sheng Zhao and Tie-Yan Liu},
      year={2021},
      eprint={2103.00993},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Requirements :

All code written in Python 3.6.2 .

Install Pytorch

Before installing pytorch please check your Cuda version by running following command : nvcc --version

pip install torch torchvision

In this repo I have used Pytorch 1.6.0 for torch.bucketize feature which is not present in previous versions of PyTorch.

Installing other requirements :

pip install -r requirements.txt

To use Tensorboard install tensorboard version 1.14.0 seperatly with supported tensorflow (1.14.0)

For Preprocessing :

filelists folder contains MFA (Motreal Force aligner) processed LJSpeech dataset files so you don't need to align text with audio (for extract duration) for LJSpeech dataset. For other dataset follow instruction here. For other pre-processing run following command :

python nvidia_preprocessing.py -d path_of_wavs

For finding the min and max of F0 and Energy

python compute_statistics.py

Update the following in hparams.py by min and max of F0 and Energy

p_min = Min F0/pitch
p_max = Max F0
e_min = Min energy
e_max = Max energy

For training

 python train_fastspeech.py --outdir etc -c configs/default.yaml -n "name"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdaSpeech: Adaptive Text to Speech for Custom Voice [WIP]

Note:

Citations

Requirements :

For Preprocessing :

For training

About

Releases

Packages

Languages

License

wenzhu888/AdaSpeech

Folders and files

Latest commit

History

Repository files navigation

AdaSpeech: Adaptive Text to Speech for Custom Voice [WIP]

Note:

Citations

Requirements :

For Preprocessing :

For training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages