Stars
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
A Pytorch Implementation of Finite Scalar Quantization
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Speech, Language, Audio, Music Processing with Large Language Model
An unofficial PyTorch implementation of the audio LM VALL-E
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model
A toolkit to calculate speech audio quality. Not affiliated with the original authors
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Vector (and Scalar) Quantization, in Pytorch
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🔊 Text-Prompted Generative Audio Model
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
A generative speech model for daily dialogue.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
🐍 mecab-python. you can find original version here:http://taku910.github.io/mecab/
Systems submitted to IWSLT 2021 by the MT-UPC group.
Faster Whisper transcription with CTranslate2
Whisper command line client compatible with original OpenAI client based on CTranslate2.