Starred repositories
VoiceLDM: Text-to-Speech with Environmental Context
Making large AI models cheaper, faster and more accessible
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
Stable Diffusion and Flux in pure C/C++
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
An official implementation of "UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data"
SoftVC VITS Singing Voice Conversion
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Voice activity detection (VAD) paper and code(From 198*~ )and its classification.
Barkify: an unoffical training implementation of Bark TTS by suno-ai
FluentTTS: Text-dependent Fine-grained Style Control for Multi-style TTS
PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.
🦜🔗 Build context-aware reasoning applications
🔊 Text-Prompted Generative Audio Model
serp-ai / bark-with-voice-clone
Forked from suno-ai/bark🔊 Text-prompted Generative Audio Model - With the ability to clone voices
A family of diffusion models for text-to-audio generation.
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
An unofficial PyTorch implementation of Mix-Phoneme-Bert
Keep track of big models in audio domain, including speech, singing, music etc.
PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
A fully working pytorch implementation of NaturalSpeech (Tan et al., 2022)
AudioLDM: Generate speech, sound effects, music and beyond, with text.
Objective metrics used in several text-to-speech (TTS) papers.
phoneme tokenizer and grapheme-to-phoneme model for 8k languages
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
An unofficial PyTorch implementation of the audio LM VALL-E