Lists (1)
Sort Name ascending (A-Z)
Stars
Robust Speech Recognition via Large-Scale Weak Supervision
GUI for a Vocal Remover that uses Deep Neural Networks.
The official Python API for ElevenLabs Text to Speech.
🔊 Text-Prompted Generative Audio Model
Easily train a good VC model with voice data <= 10 mins!
リアルタイムボイスチェンジャー Realtime Voice Changer
Official Implementation of FreeDrag (CVPR 2024)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Singing Voice Conversion via diffusion model
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
The official gpt4free repository | various collection of powerful language models
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
SoftVC VITS Singing Voice Conversion
Official PyTorch Code and Models of "RePaint: Inpainting using Denoising Diffusion Probabilistic Models", CVPR 2022
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Voice Conversion Based on Learnable Similarity-Guided Masked Autoencoder
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion (Interspeech 2022)
To provide the stego community with C/C++ implementations of selected feature extractors mainly targeted at H.264 steganography.
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
A Pytorch Toy Implementation of 'Dynamic Region-Aware Convolution (ECCV2020)'
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Calculation of MCD (dB) between two speech waveforms
Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features
A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型,适用于英语、普通话/中文、日语、韩语、俄语和藏语(当前已测试)。