Skip to content
View Le-Xiaohuai-speech's full-sized avatar
  • Nanjing University; RTC Lab ByteDance
  • Nanjing China

Block or report Le-Xiaohuai-speech

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 22 Updated Nov 20, 2024

This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.

Python 339 18 Updated Sep 1, 2023

[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training

Python 184 7 Updated Nov 7, 2024

This is the code and dataset repo for Interspeech 2024 paper "Target conversation extraction: Source separation using turn-taking dynamics"

Python 40 4 Updated Oct 4, 2024

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,025 1,080 Updated Nov 14, 2024

AlphaFold 3 inference pipeline.

Python 5,631 660 Updated Dec 20, 2024

Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment, CVPR, 2024

Python 86 8 Updated Jun 11, 2024

Official Implemetation of DPLM (ICML'24) - Diffusion Language Models Are Versatile Protein Learners

C++ 91 9 Updated Nov 19, 2024

Mods for Stardew Valley using SMAPI.

C# 716 379 Updated Dec 17, 2024

This repo is for the SPL paper "Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap"

Python 114 15 Updated Apr 8, 2022
Python 208 25 Updated Dec 14, 2024

Update ASR paper everyday

Python 87 5 Updated Dec 21, 2024

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 101 2 Updated Dec 13, 2024

zero-shot voice conversion & singing voice conversion, with real-time support

Python 790 97 Updated Dec 16, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,943 598 Updated Nov 30, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,488 198 Updated Dec 5, 2024

Implementation of the proposed minGRU in Pytorch

Python 264 20 Updated Dec 18, 2024

Google Research

Jupyter Notebook 34,538 7,957 Updated Dec 13, 2024

Awesome Deep Graph Clustering is a collection of SOTA, novel deep graph clustering methods (papers, codes, and datasets).

Python 860 147 Updated Oct 24, 2024

Learning audio concepts from natural language supervision

Python 505 38 Updated Sep 18, 2024
Python 7,008 548 Updated Dec 20, 2024

WavJourney: Compositional Audio Creation with LLMs

Python 525 44 Updated Sep 28, 2023

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

5,738 314 Updated Dec 21, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,445 2,572 Updated Dec 15, 2024

Text-to-Music Generation with Rectified Flow Transformers

Python 1,634 125 Updated Dec 10, 2024

Computes the Mel-Cepstral Distance of two WAV files based on the paper "Mel-Cepstral Distance Measure for Objective Speech Quality Assessment" by Robert F. Kubichek.

Python 50 10 Updated Dec 11, 2024

An Audio Language model for Audio Tasks

Python 296 16 Updated Apr 19, 2024

Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

Python 1,571 128 Updated Jun 17, 2024

This is the official repository for M2UGen

Jupyter Notebook 455 38 Updated Dec 18, 2024
Next