Stars
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compa…
This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamics"
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
A novel human-interaction method for real-time speech extraction on headphones.
simple and efficient python implemention of a series of adaptive filters. including time domain adaptive filters(lms、nlms、rls、ap、kalman)、nonlinear adaptive filters(volterra filter、functional link a…
Unofficial implementation of PercepNet: A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
jingxuan9862 / PaddleSpeech
Forked from PaddlePaddle/PaddleSpeechAn Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
An Industrial Grade Federated Learning Framework
A high performance and generic framework for distributed DNN training
This library provides common speech features for ASR including MFCCs and filterbank energies.
SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016
Deezer source separation library including pretrained models.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
Chinese text normalization for speech processing
A tutorial for Speech Enhancement researchers and practitioners. The purpose of this repo is to organize the world’s resources for speech enhancement and make them universally accessible and useful.
The PyTorch-based audio source separation toolkit for researchers
A UNIFIED SPEECH ENHANCEMENT FRONT-END FOR ONLINE DEREVERBERATION, ACOUSTIC ECHO CANCELLATION, AND SOURCE SEPARATION