Stars
A fast python library for aligning similar audio snippets passed in as NumPy arrays
Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
Foundational Models for State-of-the-Art Speech and Text Translation
Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" accepted by INTERSPEECH 2023.
babua / audiotools
Forked from descriptinc/audiotoolsObject-oriented handling of audio data, with GPU-powered augmentations, and more.
mallorbc / lit-gpt
Forked from Lightning-AI/litgptHackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.…
A guidance language for controlling large language models.
SpeechGPT Series: Speech Large Language Models
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
💬 ASR FastAPI server using faster-whisper and Multi-Scale Auto-Tuning Spectral Clustering for diarization.
Experiments w/ ChatGPT, LangChain, local LLMs
A high-quality, varied ~30hr voice dataset suitable for training a TTS model
Free Auto GPT with NO paids API is a repository that offers a simple version of Auto GPT, an autonomous AI agent capable of performing tasks independently. Unlike other versions, our implementation…
Easily train a good VC model with voice data <= 10 mins!
🔊 Text-Prompted Generative Audio Model
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…
The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions
OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database). Upload your own custom knowledge base files (PDF, txt, epub, etc) using a simple React frontend.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
VQ-VAE for Acoustic Unit Discovery and Voice Conversion
Official implementation of DualCycleGAN for nonparallel audio super resolution
Benchmark popular audio i/o packages
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch
Lightweight, super fast C/C++ (& Python) library for sequence alignment using edit (Levenshtein) distance.
Unsupervised domain adaptation for conversational speech enhancement using RemixIT