Skip to content
View Alpaca0904's full-sized avatar

Block or report Alpaca0904

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

Python 917 51 Updated Dec 9, 2024

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 11,026 676 Updated Dec 4, 2024

dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.

Python 482 78 Updated Jul 11, 2023

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,209 286 Updated Nov 5, 2024

A Pytorch Implementation of Finite Scalar Quantization

Python 97 4 Updated Nov 29, 2023

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Python 6,972 1,271 Updated Dec 6, 2023

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,938 597 Updated Nov 30, 2024

Speech, Language, Audio, Music Processing with Large Language Model

Python 617 56 Updated Dec 19, 2024

An unofficial PyTorch implementation of the audio LM VALL-E

Python 2,972 419 Updated May 10, 2023

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 8,417 818 Updated Dec 18, 2024

Multilingual Voice Understanding Model

Python 3,758 334 Updated Nov 29, 2024

The Open Source Code of UniAudio

Python 533 32 Updated Jul 22, 2024

SOTA Open Source TTS

Python 17,402 1,303 Updated Dec 20, 2024

A toolkit to calculate speech audio quality. Not affiliated with the original authors

Python 44 4 Updated Aug 13, 2024

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python 5,591 642 Updated Feb 17, 2024

Official implementation of Dynamical VAEs

Python 216 38 Updated Apr 5, 2023

Vector (and Scalar) Quantization, in Pytorch

Python 2,742 223 Updated Dec 3, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 36,225 4,442 Updated Aug 16, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 36,453 4,287 Updated Aug 19, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,731 768 Updated Feb 11, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 843 97 Updated Aug 7, 2024

A generative speech model for daily dialogue.

Python 33,074 3,590 Updated Dec 3, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 37,140 4,230 Updated Dec 19, 2024

A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Python 668 151 Updated Jul 12, 2022

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

Python 37 4 Updated Feb 9, 2023

🐍 mecab-python. you can find original version here:http://taku910.github.io/mecab/

C++ 543 52 Updated Nov 2, 2024

Systems submitted to IWSLT 2021 by the MT-UPC group.

Python 14 4 Updated Feb 23, 2023

Faster Whisper transcription with CTranslate2

Python 13,048 1,092 Updated Dec 12, 2024

Whisper command line client compatible with original OpenAI client based on CTranslate2.

Python 940 83 Updated Dec 19, 2024
Python 870 107 Updated May 24, 2024
Next