Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Voice Conversion Using Zero-Shot Learning

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Installation

Python 3.8

Install PyTorch (>=1.0.1).
Install Nvidia version of TensorFlow 1.15
Install ffmpeg.
Install Kaldi
Install PyKaldi
Run pip install -r requirements.txt to install the remaining necessary packages.
Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
- You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
Speaker Encoder: LibriSpeech, see here for detailed training process.
Synthesizer (i.e., Seq2seq model): ARCTIC and L2-ARCTIC. Please see here for a merged version.
Vocoder: LibriSpeech, see here for detailed training process.

All the pretrained the models are available here

Quick Start

See the inference script

Training

Use Kaldi to extract BNF for the reference L1 speaker

./kaldi_scripts/extract_features_kaldi.sh /path/to/L2-ARCTIC/BDL

Preprocessing

python synthesizer_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python synthesizer_preprocess_embeds.py your_preprocess_output_dir

Training

python synthesizer_train.py Accetron_train your_preprocess_output_dir

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Conversion Using Zero-Shot Learning

Installation

Dataset

Quick Start

Training

About

Releases

Packages

Languages

License

warisqr007/voice-conversion

Folders and files

Latest commit

History

Repository files navigation

Voice Conversion Using Zero-Shot Learning

Installation

Dataset

Quick Start

Training

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages