-
Humelo Inc.
- Seoul
- deeesp.github.io
Stars
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
a list of demo websites for automatic music generation research
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
A benchmarking suite for disentanglement algorithms, suited for evaluating robustness to correlated factors. Codebase for the paper "Disentanglement of Correlated Factors via Hausdorff Factorized S…
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs
Demucs Lightning: A PyTorch lightning version of Demucs with Hydra and Tensorboard features
A self-supervised learning framework for audio-visual speech
PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Korean TTS, Tacotron2, Wavenet
Unofficial PyTorch implementation of Google AI's VoiceFilter system
Avocodo: Generative Adversarial Network for Artifact-free Vocoder
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
PyTorch Implementation of FastDiff (IJCAI'22)
Code for the paper Hybrid Spectrogram and Waveform Source Separation
거꾸로 읽는 self-supervised learning 파트 1
This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch