-
University of Science and Technology of China
- http://home.ustc.edu.cn/~wangruoyu/
Highlights
- Pro
Stars
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Multi-level network clustering based on the Map Equation
Speech Security and Privacy Compendium - Mini
The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.
🔖 Curated list of video object segmentation (VOS) papers, datasets, and projects.
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
Baseline Recipe for VoicePrivacy Challenge 2024: anonymization systems and evaluation software
NOTSOFAR-1 Challenge: Distant Diarization and ASR
MetaBCI: China’s first open-source platform for non-invasive brain computer interface. The project of MetaBCI is led by Prof. Minpeng Xu from Tianjin University, China.
Optimize the audio quality of your loudspeakers
INTERSPEECH 23 - Refunction Whisper to recognize new tasks with adapters!
Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.
Collection of papers on state-space models
Structured state space sequence models
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
MU-LLaMA: Music Understanding Large Language Model
A collection of resources and papers on Diffusion Models
AudioLDM training, finetuning, evaluation and inference.
Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"
Some comprehensive papers about speaker diarization
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
You can find the speech algorithms you want here
A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"
Robust Speech Recognition via Large-Scale Weak Supervision
A natural language interface for computers
Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.