Skip to content
View rywang99's full-sized avatar

Highlights

  • Pro

Organizations

@SPRATeam-USTC

Block or report rywang99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FAIR Sequence Modeling Toolkit 2

Python 863 102 Updated Mar 6, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,841 194 Updated Nov 14, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,733 222 Updated Dec 5, 2024

Multi-level network clustering based on the Map Equation

C++ 447 89 Updated Jan 15, 2025

Speech Security and Privacy Compendium - Mini

Python 9 Updated Jun 18, 2024

The Song Describer dataset is an evaluation dataset made of ~1.1k captions for 706 permissively licensed music recordings.

Jupyter Notebook 148 5 Updated Dec 22, 2023

🔖 Curated list of video object segmentation (VOS) papers, datasets, and projects.

273 9 Updated Mar 4, 2025

Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"

Python 40 1 Updated Jan 27, 2025

Baseline Recipe for VoicePrivacy Challenge 2024: anonymization systems and evaluation software

Python 51 6 Updated Jan 30, 2025

NOTSOFAR-1 Challenge: Distant Diarization and ASR

Python 50 12 Updated Feb 12, 2025

MetaBCI: China’s first open-source platform for non-invasive brain computer interface. The project of MetaBCI is led by Prof. Minpeng Xu from Tianjin University, China.

Python 393 164 Updated Dec 28, 2024

Optimize the audio quality of your loudspeakers

Python 997 30 Updated Nov 29, 2023

INTERSPEECH 23 - Refunction Whisper to recognize new tasks with adapters!

Python 36 2 Updated Sep 11, 2023

Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.

Python 21 3 Updated Feb 25, 2025

Collection of papers on state-space models

581 20 Updated Mar 2, 2025

Structured state space sequence models

Jupyter Notebook 2,571 313 Updated Jul 17, 2024

Mamba SSM architecture

Python 14,175 1,236 Updated Jan 18, 2025

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Python 8,644 671 Updated Mar 3, 2025

MU-LLaMA: Music Understanding Large Language Model

Python 268 20 Updated Mar 25, 2024

A collection of resources and papers on Diffusion Models

HTML 11,496 967 Updated Aug 1, 2024

AudioLDM training, finetuning, evaluation and inference.

Python 237 48 Updated Dec 13, 2024

Official PyTorch implementation of "AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks"

Python 196 46 Updated Jun 25, 2023

Some comprehensive papers about speaker diarization

260 6 Updated Feb 24, 2025

EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction

Jupyter Notebook 249 16 Updated May 19, 2024

You can find the speech algorithms you want here

C 789 247 Updated Jan 1, 2025

A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"

Shell 56 2 Updated Sep 19, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 77,749 9,320 Updated Jan 4, 2025

A natural language interface for computers

Python 58,620 5,001 Updated Jan 24, 2025

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Python 26,398 3,353 Updated Dec 30, 2024

This repo is meant to serve as a guide for Machine Learning/AI technical interviews.

Jupyter Notebook 5,664 993 Updated Feb 24, 2025
Next