Lists (7)
Sort Name ascending (A-Z)
Stars
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Manipulate audio with a simple and easy high level interface
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Production First and Production Ready End-to-End Speech Recognition Toolkit
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headles…
An unofficial PyTorch implementation of the audio LM VALL-E
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
A python package to analyze and compare voices with deep learning
DeepMind's Tacotron-2 Tensorflow implementation
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
Command line utility for forced alignment using Kaldi
🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
In defence of metric learning for speaker recognition
The Implementation of FastSpeech based on pytorch.
基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)
Chinese text normalization for speech processing