Audio/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
🔊 Text-Prompted Generative Audio Model
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Foundational Models for State-of-the-Art Speech and Text Translation
vits2 backbone with multilingual-bert
The official implementation of HierSpeech++
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Foundational model for human-like, expressive TTS
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Inference and training library for high-quality TTS models.
A generative speech model for daily dialogue.
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Instant voice cloning by MIT and MyShell. Audio foundation model.
TTS models for Arabic (Tacotron2, FastPitch)
Deep learning for AR text Vocalization - التشكيل الالي للنصوص العربية
A book about Text-to-Speech (TTS) in Chinese.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"