Skip to content
View jd3655's full-sized avatar

Block or report jd3655

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for the paper "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"

Python 26 3 Updated Dec 2, 2024

first base model for full-duplex conversational audio

Python 1,649 106 Updated Nov 12, 2024

Interface for OuteTTS models.

Python 762 59 Updated Dec 14, 2024

[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

Python 40 2 Updated Nov 1, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,449 198 Updated Dec 5, 2024

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,673 199 Updated Nov 6, 2024

An easy-to-understand framework for LLM samplers that rewind and revise generated tokens

Python 114 8 Updated Oct 29, 2024

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 901 74 Updated Dec 12, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 7,981 1,009 Updated Dec 14, 2024

Entropy Based Sampling and Parallel CoT Decoding

Python 3,165 319 Updated Nov 13, 2024

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

162 10 Updated Sep 27, 2024

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 102 10 Updated Nov 1, 2024

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Python 49 3 Updated Nov 9, 2024

An Open-Sourced LLM-empowered Foundation TTS System

Python 492 35 Updated Oct 17, 2024

LlamaVoice is a llama-based large voice generation model, providing inference and training ability.

Python 224 12 Updated Aug 26, 2024

My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one

Jupyter Notebook 27 2 Updated Aug 5, 2024

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python 387 36 Updated Dec 3, 2024

All generative model in one for better TTS model

Python 65 8 Updated Sep 8, 2024

DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability

Python 95 8 Updated Nov 1, 2024
HTML 25 1 Updated Aug 2, 2024

VALL-E 2 reproduction

Jupyter Notebook 102 14 Updated Jul 14, 2024

Official Demo Page for DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

HTML 31 1 Updated Aug 21, 2024

A fast multimodal LLM for real-time voice

Python 1,587 107 Updated Dec 12, 2024

Training code for FAcodec presented in NaturalSpeech3

Python 183 18 Updated Aug 26, 2024

SpeechFlow neural network implementation

Jupyter Notebook 18 Updated Aug 8, 2024

Supervoice diffusion enhance

Jupyter Notebook 25 Updated Jul 15, 2024

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Python 101 5 Updated Sep 20, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,108 88 Updated Aug 6, 2024
Next