Based on BERT, NaturalSpeech, VITS
1, Hidden prosody embedding from BERT
2, Infer loss from NaturalSpeech
3, Framework of VITS
pip install -r requirements.txt
cd monotonic_align
python setup.py build_ext --inplace
BaiduYun:https://pan.baidu.com/s/1Cj4MnwFyZ0XZmTR6EpygbQ?pwd=yn60
Or down from release page
put prosody_model.pt To ./bert/prosody_model.pt
put vits_bert.pth To ./vits_bert.pth
python vits_infer.py
./vits_infer_out have the waves infered
going