Alpaca0904

Follow

Alpaca0904

Follow

6 followers · 7 following

Stars

jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 917 51 Updated Dec 9, 2024

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 11,026 676 Updated Dec 4, 2024

facebookresearch / libri-light

dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.

Python 482 78 Updated Jul 11, 2023

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,209 286 Updated Nov 5, 2024

duchenzhuang / FSQ-pytorch

A Pytorch Implementation of Finite Scalar Quantization

Python 97 4 Updated Nov 29, 2023

jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 6,972 1,271 Updated Dec 6, 2023

open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,938 597 Updated Nov 30, 2024

X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Python 617 56 Updated Dec 19, 2024

enhuiz / vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

Python 2,972 419 Updated May 10, 2023

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 8,417 818 Updated Dec 18, 2024

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 3,758 334 Updated Nov 29, 2024

yangdongchao / UniAudio

The Open Source Code of UniAudio

Python 533 32 Updated Jul 22, 2024

fishaudio / fish-speech

SOTA Open Source TTS

Python 17,402 1,303 Updated Dec 20, 2024

fakerybakery / utmos

A toolkit to calculate speech audio quality. Not affiliated with the original authors

Python 44 4 Updated Aug 13, 2024

lucidrains / DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python 5,591 642 Updated Feb 17, 2024

XiaoyuBIE1994 / DVAE

Official implementation of Dynamical VAEs

Python 216 38 Updated Apr 5, 2023

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 2,742 223 Updated Dec 3, 2024

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 36,225 4,442 Updated Aug 16, 2024

suno-ai / bark

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 36,453 4,287 Updated Aug 19, 2024

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,731 768 Updated Feb 11, 2024

gemelo-ai / vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 843 97 Updated Aug 7, 2024

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 33,074 3,590 Updated Dec 3, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 37,140 4,230 Updated Dec 19, 2024

jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Python 668 151 Updated Jul 12, 2022

mt-upc / SHAS

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Python 37 4 Updated Feb 9, 2023

SamuraiT / mecab-python3

🐍 mecab-python. you can find original version here:http://taku910.github.io/mecab/

C++ 543 52 Updated Nov 2, 2024

mt-upc / iwslt-2021

Systems submitted to IWSLT 2021 by the MT-UPC group.

Python 14 4 Updated Feb 23, 2023

SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2

Python 13,048 1,092 Updated Dec 12, 2024

Softcatala / whisper-ctranslate2

Whisper command line client compatible with original OpenAI client based on CTranslate2.

Python 940 83 Updated Dec 19, 2024

dropreg / R-Drop

Python 870 107 Updated May 24, 2024