Stars
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Making large AI models cheaper, faster and more accessible
Deezer source separation library including pretrained models.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
An Industrial Grade Federated Learning Framework
TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaki…
《21个项目玩转深度学习———基于TensorFlow的实践详解》配套代码
A high performance and generic framework for distributed DNN training
An optimizer that trains as fast as Adam and as good as SGD.
Paper and implementation of UNet-related model.
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
[IEEE TMI] Official Implementation for UNet++
This library provides common speech features for ASR including MFCCs and filterbank energies.
The PyTorch-based audio source separation toolkit for researchers
Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Implementation of different kinds of Unet Models for Image Segmentation - Unet , RCNN-Unet, Attention Unet, RCNN-Attention Unet, Nested Unet
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.