-
Northwestern Polytechnical University
- 北京
-
07:42
(UTC +08:00)
Stars
TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loudness normalization operations.
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
Official implementation of the NeurIPS 24 paper of statistical flow matching (SFM) for discrete generation.
Unsupervised text tokenizer for Neural Network-based text generation.
Official inference repo for FLUX.1 models
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch
PyTorch implementation of Soft-DTW: a Differentiable Loss Function for Time-Series in CUDA
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
first base model for full-duplex conversational audio
🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …
Robust Speech Recognition via Large-Scale Weak Supervision
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Open-Sora: Democratizing Efficient Video Production for All
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Implementation of Autoregressive Diffusion in Pytorch
Foundational Models for State-of-the-Art Speech and Text Translation
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on