open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,203 286 Updated Nov 5, 2024

baaivision / Emu3

Next-Token Prediction is All You Need

Python 1,911 76 Updated Oct 24, 2024

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,687 200 Updated Nov 6, 2024

BytedanceSpeech / seed-tts-eval

Python 1,082 108 Updated Jun 14, 2024

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 920 111 Updated Sep 5, 2024

facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 848 55 Updated Oct 28, 2024

facebookresearch / blt

Code for BLT research paper

Python 808 44 Updated Dec 12, 2024

Plachtaa / seed-vc

zero-shot voice conversion & singing voice conversion, with real-time support

Python 778 96 Updated Dec 16, 2024

joonson / syncnet_python

Out of time: automated lip sync in the wild

Python 691 153 Updated Jan 23, 2024

FireRedTeam / FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Python 496 35 Updated Oct 17, 2024

hubertsiuzdak / snac

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 454 26 Updated Nov 19, 2024

facebookresearch / muavic

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 370 32 Updated Sep 11, 2023

lucidrains / autoregressive-diffusion-pytorch

Implementation of Autoregressive Diffusion in Pytorch

Python 322 9 Updated Nov 3, 2024

Sally-SH / VSP-LLM

Python 303 25 Updated May 19, 2024

VITA-MLLM / Freeze-Omni

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 211 11 Updated Dec 18, 2024

yzxing87 / Seeing-and-Hearing

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Python 135 8 Updated Jul 6, 2024

yangdongchao / RSTnet

Real-time Speech-Text Foundation Model Toolkit (wip)

Python 126 11 Updated Oct 14, 2024

GalaxyCong / HPMDubbing

[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.

Python 102 8 Updated Jun 21, 2024

lucidrains / zorro-pytorch

Implementation of Zorro, Masked Multimodal Transformer, in Pytorch

Python 94 6 Updated Oct 20, 2023

naver-ai / usdm

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 65 1 Updated Dec 3, 2024

LqNoob / Neural-Codec-and-Speech-Language-Models

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 64 2 Updated Dec 18, 2024

GalaxyCong / StyleDubber

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 52 3 Updated Nov 14, 2024

yochaiye / LipVoicer

Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"

Python 49 8 Updated Sep 19, 2024

AbrahamSanders / codec-bpe

Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs

Python 48 6 Updated Sep 26, 2024

v-iashin / Synchformer

Efficient synchronization from sparse cues

Python 35 4 Updated Apr 25, 2024

joannahong / AV-RelScore

Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" in CVPR23

Python 30 1 Updated Jun 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jeongsoo Choi choijeongsoo

Achievements