Skip to content
View lmxue's full-sized avatar

Block or report lmxue

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 88 2 Updated Jan 7, 2025

Versatile Evaluation of Speech and Audio

Python 140 11 Updated Dec 31, 2024

A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.

217 5 Updated Dec 3, 2024

A Survey of Spoken Dialogue Models (60 pages)

241 14 Updated Nov 28, 2024

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 56 3 Updated Nov 9, 2024

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 246 15 Updated Jan 2, 2025
Python 211 25 Updated Dec 14, 2024

Multilingual Voice Understanding Model

Python 3,955 350 Updated Jan 8, 2025

MARS5 speech model (TTS) from CAMB.AI

Jupyter Notebook 2,586 213 Updated Aug 1, 2024

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 6,741 659 Updated Dec 26, 2024

Open source real-time translation app for Android that runs locally

C++ 7,014 534 Updated Nov 23, 2024

Foundational model for human-like, expressive TTS

Python 3,964 668 Updated Jul 30, 2024

A generative speech model for daily dialogue.

Python 33,485 3,635 Updated Jan 7, 2025

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 13,994 1,138 Updated May 23, 2024

Inference and training library for high-quality TTS models.

Python 4,877 504 Updated Dec 10, 2024

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

Python 323 21 Updated Sep 3, 2024

Official repo for WavCraft, an AI agent for audio creation and editing

Python 655 96 Updated Sep 13, 2024

Awesome speech/audio LLMs, representation learning, and codec models

808 51 Updated Jan 6, 2025

利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.

Python 19,319 2,953 Updated Dec 12, 2024

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

Python 4,765 591 Updated Jul 2, 2024

A lightweight library for Frechet Audio Distance calculation.

Python 242 24 Updated Sep 4, 2024

trying to reproduce suno v3

25 1 Updated Mar 24, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,000 773 Updated Jun 24, 2024

SOTA Open Source TTS

Python 18,188 1,360 Updated Jan 4, 2025

我的 ComfyUI 工作流合集 | My ComfyUI workflows collection

5,573 528 Updated Dec 20, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 23,010 2,264 Updated Dec 27, 2024

AI powered speech denoising and enhancement

Python 1,568 169 Updated Dec 3, 2024

VoicePAT is a modular and efficient toolkit for voice privacy research, with main focus on speaker anonymization.

Shell 47 4 Updated May 14, 2024

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 66,643 8,093 Updated Dec 26, 2024

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Python 11,854 1,040 Updated Dec 31, 2024
Next