Starred repositories
🤖 Assemble, configure, and deploy autonomous AI Agents in your browser.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
Real time interactive streaming digital human
🔥 2D and 3D Face alignment library build using pytorch
Industry leading face manipulation platform
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Using modified BiSeNet for face parsing in PyTorch
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other p…
[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"
[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Multilingual Voice Understanding Model
High-performance In-browser LLM Inference Engine
🤖 Components Library for Quickly Building LLM Chat Interfaces.
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Production First and Production Ready End-to-End Speech Recognition Toolkit
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Faster Whisper transcription with CTranslate2
Janus-Series: Unified Multimodal Understanding and Generation Models