Lists (2)
Sort Name ascending (A-Z)
Stars
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.
Retrieval and Retrieval-augmented LLMs
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
An unofficial PyTorch implementation of the audio LM VALL-E
Self-Supervised Speech Pre-training and Representation Learning Toolkit
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
ccks baidu entity link 实体链接 第一名
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech
VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …
Official implementation of Meta-StyleSpeech and StyleSpeech
[WIP] VoiceSmith makes training text to speech models easy.
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Official repository of https://doi.org/10.1109/TASLP.2022.3167258. More up-to-date code is in "refactor" branch.
Singing Voice Synthesis based on VITS, different from VISinger
Official implementation of "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder" (AAAI2023)
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ulti…