- College Park
- http://cs.umd.edu/~zhy
Stars
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
The official implementation of HierSpeech++
Differentiable audio signal processors in PyTorch
Steerable discovery of neural audio effects
TorchCFM: a Conditional Flow Matching library
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
speech self-supervised representations
Acoustic impulse response generation using diffusion models
Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
NOMAD: Non-Matching Audio Distance (ICASSP 2024)
Code for the paper Semi-Conditional Normalizing Flows for Semi-Supervised Learning
An High-resolution implementation of HiFi-GAN Vocoder for Voice Conversion.
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Official implementation of the source-filter HiFiGAN vocoder
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
Repository for Accent Recognition (Hackathon @SLT2022)
SoftVC VITS Singing Voice Conversion
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
🎲 Iterable dataset resampling in PyTorch