JingRH

JingRH

My interests include speech synthesis, speech-to-speech translation and underwater acoustic target detection.

Northwestern Polytechnical University
北京
07:42 (UTC +08:00)

Stars

ScottishFold007 / TTSAudioNormalizer

TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.

Python 74 11 Updated Dec 20, 2024

Huanshere / VideoLingo

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

Python 8,616 834 Updated Dec 20, 2024

ccr-cheng / statistical-flow-matching

Official implementation of the NeurIPS 24 paper of statistical flow matching (SFM) for discrete generation.

Jupyter Notebook 16 Updated Nov 7, 2024

google / sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 10,394 1,180 Updated Dec 1, 2024

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 18,546 1,311 Updated Nov 21, 2024

krahets / hello-algo

《Hello 算法》：动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新，English version ongoing

Java 103,923 13,031 Updated Dec 20, 2024

nii-yamagishilab / ZMM-TTS

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

C 136 9 Updated Mar 6, 2024

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

214 12 Updated Nov 28, 2024

lucidrains / mmdit

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Python 270 6 Updated Aug 24, 2024

keonlee9420 / Soft-DTW-Loss

PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA

Python 132 10 Updated Aug 3, 2021

lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,066 323 Updated Nov 14, 2023

Standard-Intelligence / hertz-dev

first base model for full-duplex conversational audio

Python 1,656 107 Updated Nov 12, 2024

lenML / Speech-AI-Forge

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Python 917 121 Updated Nov 27, 2024

JishengBai / AudioSetCaps

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 101 2 Updated Dec 13, 2024

coaidev / coai

🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案，支持 OpenAI，Midjourney，Claude，讯飞星火，Stable Diffusion，DALL·E，ChatGLM，通义千问，腾讯混元，360 智脑，百川 AI，火山方舟，新必应，Gemini，Moonshot …

TypeScript 7,575 981 Updated Dec 8, 2024

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 73,214 8,740 Updated Dec 1, 2024

minyoungg / vqtorch

Python 119 11 Updated Feb 27, 2024

youngsheen / SimVQ

SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 174 4 Updated Dec 5, 2024

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 21,189 2,183 Updated Nov 11, 2024

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,550 800 Updated Dec 13, 2024

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 2,488 198 Updated Dec 5, 2024

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 22,713 2,230 Updated Dec 20, 2024

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 5,571 466 Updated Dec 15, 2024

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,337 224 Updated Dec 12, 2024

lucidrains / autoregressive-diffusion-pytorch

Implementation of Autoregressive Diffusion in Pytorch

Python 324 9 Updated Nov 3, 2024

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,025 1,079 Updated Nov 14, 2024

baaivision / Emu3

Next-Token Prediction is All You Need

Python 1,915 76 Updated Oct 24, 2024

jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Python 2,612 259 Updated Dec 21, 2024

lizeyujack / oceanship

Official github page of Oceanship Dataset

Python 18 2 Updated Jun 11, 2024

ZhangXInFD / SpeechTokenizer

This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on

Python 506 45 Updated Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly