Stars
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Robust Speech Recognition via Large-Scale Weak Supervision
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
TensorFlow code and pre-trained models for BERT
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
GUI for a Vocal Remover that uses Deep Neural Networks.
Faster Whisper transcription with CTranslate2
Python Implementation of Reinforcement Learning: An Introduction
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
CodeGeeX2: A More Powerful Multilingual Code Generation Model
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Chinese version of GPT2 training code, using BERT tokenizer.
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Production First and Production Ready End-to-End Speech Recognition Toolkit
😝 TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
pytorch tutorial for beginners
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Evolutionary algorithm toolbox and framework with high performance for Python
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
faster_whisper GUI with PySide6