- Beijing, China
-
23:09
(UTC +08:00) - https://andong-li-speech.github.io
Stars
[NeurIPS 2023] UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]
ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS@2023 Spotlight, TPAMI@2024)
AI powered speech denoising and enhancement
PyTorch implementation of the Perceptual Evaluation of Speech Quality for wideband audio
A summary of related works about flow matching, stochastic interpolants
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
[Interspeech 2024] Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement
[Official Implementation] Acoustic Autoregressive Modeling 🔥
The official Implementation of PeriodWave and PeriodWave-Turbo
Unofficial implementation of wavenext vocoder
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
The official implementation of GTCRN, an ultra-lite speech enhancement model.
Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
This is official repository of new SOTA diffusion models based method for speech enhancement
Generation scripts for EARS-WHAM and EARS-Reverb
Official data preparation scripts for the URGENT 2024 Challenge
[InterSpeech 24] FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Source code and demo for INTERSPEECH 2024 paper: Noise-robust Speech Separation with Fast Generative Correction
Expressive Anechoic Recordings of Speech (EARS)
Model configurations for scaling SE models in the paper "Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement"
PyTorch implementation of [1412.6553] and [1511.06530] tensor decomposition methods for convolutional layers.
Real-time binaural target sound extraction model.