A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,259 2,721 Updated Mar 7, 2025

PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…

Python 11,592 1,892 Updated Mar 5, 2025

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 11,565 1,150 Updated Mar 7, 2025

speechbrain / speechbrain

A PyTorch-based Speech Toolkit

Python 9,459 1,446 Updated Mar 6, 2025

espnet / espnet

End-to-End Speech Processing Toolkit

Python 8,844 2,224 Updated Mar 3, 2025

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,813 776 Updated Feb 11, 2024

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,672 616 Updated Mar 6, 2025

tyiannak / pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

Python 5,997 1,209 Updated Mar 31, 2024

FederatedAI / FATE

An Industrial Grade Federated Learning Framework

Python 5,831 1,561 Updated Nov 19, 2024

TEN-framework / TEN-Agent

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaki…

Python 4,950 566 Updated Mar 6, 2025

hzy46 / Deep-Learning-21-Examples

《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码

Python 4,550 1,757 Updated Mar 18, 2019

bytedance / byteps

A high performance and generic framework for distributed DNN training

Python 3,662 494 Updated Oct 3, 2023

magenta / ddsp

DDSP: Differentiable Digital Signal Processing

Python 2,973 347 Updated Sep 23, 2024

Luolc / AdaBound

An optimizer that trains as fast as Adam and as good as SGD.

Python 2,908 334 Updated Jul 23, 2023

ShawnBIT / UNet-family

Paper and implementation of UNet-related model.

Python 2,533 506 Updated May 21, 2020

webdataset / webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Python 2,490 201 Updated Feb 12, 2025

MrGiovanni / UNetPlusPlus

[IEEE TMI] Official Implementation for UNet++

Python 2,391 546 Updated Jan 11, 2025

jameslyons / python_speech_features

This library provides common speech features for ASR including MFCCs and filterbank energies.

Python 2,390 616 Updated Oct 20, 2021

asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers

Python 2,337 429 Updated Jan 11, 2025

abisee / pointer-generator

Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"

Python 2,186 808 Updated Jun 16, 2022

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,135 164 Updated Feb 13, 2025

bigmb / Unet-Segmentation-Pytorch-Nest-of-Unets

Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet

Python 1,992 355 Updated Nov 28, 2022

google / uis-rnn

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Python 1,570 320 Updated Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jingxuan9862

Block or report jingxuan9862

Stars

d2l-ai / d2l-zh

CorentinJ / Real-Time-Voice-Cloning

hiyouga / LLaMA-Factory

hpcaitech / ColossalAI

deezer / spleeter

microsoft / unilm

fishaudio / fish-speech

NVIDIA / NeMo