A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 12,555 2,576 Updated Dec 29, 2024

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…

Python 11,325 1,866 Updated Dec 27, 2024

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 9,108 1,413 Updated Dec 28, 2024

espnet / espnet

End-to-End Speech Processing Toolkit

Python 8,627 2,200 Updated Dec 28, 2024

facebookresearch / demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Python 8,517 1,089 Updated Apr 24, 2024

kyutai-labs / moshi

Python 7,059 552 Updated Dec 20, 2024

tyiannak / pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Python 5,936 1,202 Updated Mar 31, 2024

FederatedAI / FATE

An Industrial Grade Federated Learning Framework

Python 5,767 1,558 Updated Nov 19, 2024

hzy46 / Deep-Learning-21-Examples

《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码

Python 4,532 1,756 Updated Mar 18, 2019

TEN-framework / TEN-Agent

TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compa…

Python 3,784 372 Updated Dec 28, 2024

bytedance / byteps

A high performance and generic framework for distributed DNN training

Python 3,641 491 Updated Oct 3, 2023

magenta / ddsp

DDSP: Differentiable Digital Signal Processing

Python 2,931 345 Updated Sep 23, 2024

Luolc / AdaBound

An optimizer that trains as fast as Adam and as good as SGD.

Python 2,908 330 Updated Jul 23, 2023

ShawnBIT / UNet-family

Paper and implementation of UNet-related model.

Python 2,521 504 Updated May 21, 2020

webdataset / webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,398 193 Updated Dec 11, 2024

jameslyons / python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies.

Python 2,381 617 Updated Oct 20, 2021

MrGiovanni / UNetPlusPlus

[IEEE TMI] Official Implementation for UNet++

Python 2,345 543 Updated Nov 15, 2023

asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers

Python 2,300 424 Updated Jul 19, 2024

abisee / pointer-generator

Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"

Python 2,180 811 Updated Jun 16, 2022

bigmb / Unet-Segmentation-Pytorch-Nest-of-Unets

Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet

Python 1,962 349 Updated Nov 28, 2022

google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Python 1,566 320 Updated Sep 25, 2024

LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Python 1,489 434 Updated Dec 8, 2024

magenta / mt3

MT3: Multi-Task Multitrack Music Transcription

Python 1,457 195 Updated Dec 11, 2024

qiuqiangkong / audioset_tagging_cnn

Python 1,379 258 Updated Jul 25, 2024

microsoft / DNS-Challenge

This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.

Python 1,141 415 Updated Jul 25, 2024

clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition

Python 1,074 274 Updated Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jingxuan9862

Block or report jingxuan9862

Stars

d2l-ai / d2l-zh

CorentinJ / Real-Time-Voice-Cloning

deezer / spleeter

microsoft / unilm

NVIDIA / NeMo