Stars
Code for the paper "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"
first base model for full-duplex conversational audio
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
Codebase for Aria - an Open Multimodal Native MoE
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Entropy Based Sampling and Parallel CoT Decoding
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
An Open-Sourced LLM-empowered Foundation TTS System
LlamaVoice is a llama-based large voice generation model, providing inference and training ability.
My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
All generative model in one for better TTS model
DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability
Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer
Training code for FAcodec presented in NaturalSpeech3
SpeechFlow neural network implementation
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Lumina-T2X is a unified framework for Text to Any Modality Generation