Stars
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion
g2p ID: Indonesian Grapheme-to-Phoneme Converter
Integrate the DeepSeek API into popular softwares
DeepEP: an efficient expert-parallel communication library
CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages
OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.
A paper and project list about the cutting edge Speech Synthesis, Text-to-Speech (TTS), Singing Voice Synthesis (SVS), Voice Conversion (VC), Singing Voice Conversion (SVC), and related interesting…
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
DeepSeek LLM: Let there be answers
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Welcome to AudioCIL, the toolbox for audio class-incremental learning with the most implemented methods.
A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks.
Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
PyTorch Implementation of StyleSVC:Singing Voice Conversion with Multi-scale Style Transfer
Instant voice cloning by MIT and MyShell. Audio foundation model.
Metadata, scripts and baselines for the MTG-Jamendo dataset
Audio-to-score alignment with human-labeled repeats