Skip to content
View JingRH's full-sized avatar
  • Northwestern Polytechnical University
  • 北京
  • 07:42 (UTC +08:00)

Block or report JingRH

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 74 11 Updated Dec 20, 2024

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组

Python 8,616 834 Updated Dec 20, 2024

Official implementation of the NeurIPS 24 paper of statistical flow matching (SFM) for discrete generation.

Jupyter Notebook 16 Updated Nov 7, 2024

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 10,394 1,180 Updated Dec 1, 2024

Official inference repo for FLUX.1 models

Python 18,546 1,311 Updated Nov 21, 2024

《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing

Java 103,923 13,031 Updated Dec 20, 2024

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

C 136 9 Updated Mar 6, 2024

A Survey of Spoken Dialogue Models (60 pages)

214 12 Updated Nov 28, 2024

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Python 270 6 Updated Aug 24, 2024

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Python 132 10 Updated Aug 3, 2021

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,066 323 Updated Nov 14, 2023

first base model for full-duplex conversational audio

Python 1,656 107 Updated Nov 12, 2024

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Python 917 121 Updated Nov 27, 2024

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 101 2 Updated Dec 13, 2024

🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …

TypeScript 7,575 981 Updated Dec 8, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 73,214 8,740 Updated Dec 1, 2024
Python 119 11 Updated Feb 27, 2024

SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 174 4 Updated Dec 5, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 21,189 2,183 Updated Nov 11, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,550 800 Updated Dec 13, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,488 198 Updated Dec 5, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,713 2,230 Updated Dec 20, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 5,571 466 Updated Dec 15, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,337 224 Updated Dec 12, 2024

Implementation of Autoregressive Diffusion in Pytorch

Python 324 9 Updated Nov 3, 2024

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,025 1,079 Updated Nov 14, 2024

Next-Token Prediction is All You Need

Python 1,915 76 Updated Oct 24, 2024

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 2,612 259 Updated Dec 21, 2024

Official github page of Oceanship Dataset

Python 18 2 Updated Jun 11, 2024

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 506 45 Updated Jun 9, 2024
Next