Highlights
- Pro
Stars
R Bioinformatics Cookbook, published by Packt
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
A generative speech model for daily dialogue.
Whisper realtime streaming for long speech-to-text transcription and translation
python bindings for symphonia/opus - read various audio formats from python and write opus files
Emrys365 / fairseq
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
🚀 Power Your World with AI - Explore, Extend, Empower.
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
iOS CallKit blocking of NPA-NXX number prefix spam
B 站(bilibili)自动任务工具,支持docker、青龙、k8s等多种部署方式。敏感肌也能用。
🔮 ChatGPT Desktop Application (Mac, Windows and Linux)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
🔊 Text-Prompted Generative Audio Model
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer