Stars
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Multilingual Voice Understanding Model
MARS5 speech model (TTS) from CAMB.AI
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Open source real-time translation app for Android that runs locally
Foundational model for human-like, expressive TTS
A generative speech model for daily dialogue.
llama3 implementation one matrix multiplication at a time
Inference and training library for high-quality TTS models.
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Official repo for WavCraft, an AI agent for audio creation and editing
Awesome speech/audio LLMs, representation learning, and codec models
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
A lightweight library for Frechet Audio Distance calculation.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
Open-Sora: Democratizing Efficient Video Production for All
AI powered speech denoising and enhancement
VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.