-
Korea Advanced Institute of Science and Technology (KAIST)
- Daejeon, Korea
- https://choijeongsoo.github.io
Stars
SoftVC VITS Singing Voice Conversion
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Official PyTorch implementation of BigVGAN (ICLR 2023)
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
zero-shot voice conversion & singing voice conversion, with real-time support
Out of time: automated lip sync in the wild
An Open-Sourced LLM-empowered Foundation TTS System
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Implementation of Autoregressive Diffusion in Pytorch
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Real-time Speech-Text Foundation Model Toolkit (wip)
[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.
Implementation of Zorro, Masked Multimodal Transformer, in Pytorch
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"
Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23