Stars
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Deezer source separation library including pretrained models.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Code for the paper Hybrid Spectrogram and Waveform Source Separation
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
An Industrial Grade Federated Learning Framework
《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码
TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compa…
A high performance and generic framework for distributed DNN training
An optimizer that trains as fast as Adam and as good as SGD.
Paper and implementation of UNet-related model.
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
This library provides common speech features for ASR including MFCCs and filterbank energies.
[IEEE TMI] Official Implementation for UNet++
The PyTorch-based audio source separation toolkit for researchers
Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
In defence of metric learning for speaker recognition