Stars
An unofficial PyTorch implementation of the audio LM VALL-E
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
pytorch implementation for MultiSpeech: Multi-Speaker Text to Speech with Transformer paper
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
This is a TTS model based on VITS that can control the output speech emotion through natural language and control the speaker through reference audio.
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
ouor / vits
Forked from CjangCjengh/vitsVITS implementation of Japanese, Chinese, Korean, Sanskrit and Thai
Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform with Multilingual Cleaners
PyTorch Implementation of Non-autoregressive Expressive (emotional, conversational) TTS based on FastSpeech2, supporting English, Korean, and your own languages.
Multi-speaker Speech Synthesis Using VITS(KO, JA, EN, ZH)
Implementation of Korean FastSpeech2
The official implementation of EmoSphere-TTS
The Official Implementation of “Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis”
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。