Lists (4)
Sort Name ascending (A-Z)
Stars
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
The official PyTorch implementation of "FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement".
A transformer-based network model for pitch detection
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Generative Models by Stability AI
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
🔊 Text-Prompted Generative Audio Model