Skip to content
View JiJiJiang's full-sized avatar
  • Tencent Meeting, Tencent
  • Shenzhen, China

Block or report JiJiJiang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement

Python 18 1 Updated Mar 10, 2025

An open source dataset for source separation

Python 408 69 Updated Feb 9, 2024
Jupyter Notebook 114 19 Updated Oct 25, 2021

SpEx+(tied) source code

Python 79 17 Updated Jul 6, 2023

Spark-TTS Inference Code

Python 2,867 286 Updated Mar 5, 2025

SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"

Python 172 8 Updated Mar 8, 2025

Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit

Python 846 127 Updated Feb 26, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,714 198 Updated Mar 4, 2025

OSUM: Open Speech Understanding Model, open-sourced by ASLP@NPU.

Python 324 18 Updated Mar 6, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 408 25 Updated Mar 7, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 719 48 Updated Mar 5, 2025

🧑‍🚀 全世界最好的LLM资料总结(数据处理、模型训练、模型部署、o1 模型、小语言模型、视觉语言模型) | Summary of the world's best LLM resources.

3,977 420 Updated Mar 10, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,073 1,411 Updated Feb 1, 2025

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Python 1,003 183 Updated Dec 22, 2023

An unofficial implementation of the Personal VAD speaker-conditioned voice activity detection method. Bachelor's thesis project.

Python 65 13 Updated Sep 22, 2022

Awesome speech/audio LLMs, representation learning, and codec models

923 58 Updated Mar 4, 2025
Python 10 2 Updated Jul 16, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,989 6,178 Updated Mar 10, 2025

Official repository for Mamba-based Segmentation Model for Speaker Diarization

Python 33 3 Updated Oct 10, 2024

wenet_LLM_from_ASLP

Python 8 Updated Nov 26, 2024

The official Meta Llama 3 GitHub site

Python 28,478 3,308 Updated Jan 26, 2025

Inference code for Llama models

Python 57,829 9,713 Updated Jan 26, 2025

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,844 194 Updated Nov 14, 2024

real time face swap and one-click video deepfake with only a single image

Python 44,546 6,564 Updated Mar 6, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,145 164 Updated Feb 13, 2025

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 17,691 1,776 Updated Mar 7, 2025

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 11,463 720 Updated Dec 17, 2024

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation

Python 878 210 Updated Mar 10, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 7,728 621 Updated Mar 10, 2025
Next