Skip to content
View choijeongsoo's full-sized avatar

Block or report choijeongsoo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 65 1 Updated Dec 3, 2024

Code for BLT research paper

Python 797 44 Updated Dec 12, 2024

Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models

Python 63 2 Updated Dec 18, 2024

✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Python 211 11 Updated Dec 18, 2024

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

349 19 Updated Dec 16, 2024

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 848 55 Updated Oct 28, 2024

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,687 200 Updated Nov 6, 2024

This repository collects papers related to Speech Tokenizer.

15 1 Updated Oct 16, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,073 1,027 Updated Dec 18, 2024

Next-Token Prediction is All You Need

Python 1,911 76 Updated Oct 24, 2024

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 3,265 379 Updated Dec 10, 2024

An Open-Sourced LLM-empowered Foundation TTS System

Python 495 35 Updated Oct 17, 2024

Real-time Speech-Text Foundation Model Toolkit (wip)

Python 126 11 Updated Oct 14, 2024

zero-shot voice conversion & singing voice conversion, with real-time support

Python 778 96 Updated Dec 16, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,203 286 Updated Nov 5, 2024

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate

Python 454 26 Updated Nov 19, 2024

Implementation of Acoustic BPE (Shen et al., 2024), extended for RVQ-based Neural Audio Codecs

Python 48 6 Updated Sep 26, 2024

SoftVC VITS Singing Voice Conversion

Python 26,128 4,863 Updated Nov 11, 2023

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge (ICCV 2023)

Python 4 Updated Sep 3, 2024

[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.

Python 102 8 Updated Jun 21, 2024

[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"

Python 52 3 Updated Nov 14, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,724 767 Updated Feb 11, 2024

Out of time: automated lip sync in the wild

Python 691 153 Updated Jan 23, 2024

Efficient synchronization from sparse cues

Python 35 4 Updated Apr 25, 2024

Implementation of Autoregressive Diffusion in Pytorch

Python 322 9 Updated Nov 3, 2024

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 65 10 Updated Jun 12, 2024

[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Jupyter Notebook 94 5 Updated Nov 19, 2024

PyTorch implementation of "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring" (CVPR2023) and "Visual Context-driven Audio Feature Enhan…

Python 14 Updated Apr 3, 2024

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Python 135 8 Updated Jul 6, 2024
Next