open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,203 286 Updated Nov 5, 2024

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 454 26 Updated Nov 19, 2024

AbrahamSanders / codec-bpe

Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs

Python 48 6 Updated Sep 26, 2024

svc-develop-team / so-vits-svc

SoftVC VITS Singing Voice Conversion

Python 26,128 4,863 Updated Nov 11, 2023

JeongHun0716 / lmd-vsr

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge (ICCV 2023)

Python 4 Updated Sep 3, 2024

GalaxyCong / HPMDubbing

[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.

Python 102 8 Updated Jun 21, 2024

GalaxyCong / StyleDubber

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 52 3 Updated Nov 14, 2024

Plachtaa / VALL-E-X

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,724 767 Updated Feb 11, 2024

joonson / syncnet_python

Out of time: automated lip sync in the wild

Python 691 153 Updated Jan 23, 2024

v-iashin / Synchformer

Efficient synchronization from sparse cues

Python 35 4 Updated Apr 25, 2024

lucidrains / autoregressive-diffusion-pytorch

Implementation of Autoregressive Diffusion in Pytorch

Python 322 9 Updated Nov 3, 2024

mhamilton723 / DenseAV

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 65 10 Updated Jun 12, 2024

roudimit / whisper-flamingo

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 94 5 Updated Nov 19, 2024

ms-dot-k / AVSR

PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…

Python 14 Updated Apr 3, 2024

yzxing87 / Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Python 135 8 Updated Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jeongsoo Choi choijeongsoo

Achievements

Achievements

Block or report choijeongsoo

Stars

naver-ai / usdm

facebookresearch / blt

LqNoob / Neural-Codec-and-Speech-Language-Models

BytedanceSpeech / seed-tts-eval

VITA-MLLM / Freeze-Omni

facebookresearch / MovieGenBench

facebookresearch / spiritlm

gpt-omni / mini-omni2

WWWWxp / Speech-Tokenizer-Papers

SWivid / F5-TTS

baaivision / Emu3

antgroup / echomimic

FireRedTeam / FireRedTTS

yangdongchao / RSTnet

Plachtaa / seed-vc

gpt-omni / mini-omni