Lists (12)
Sort Name ascending (A-Z)
Stars
AlignNet: A Unifying Approach to Audio-Visual Alignment (WACV 2020)
State-of-the-art 2D and 3D Face Analysis Project
[PyTorch] Minimal codebase for MusicGen models
A neovim plugin for interactively running code with the jupyter kernel. Fork of magma-nvim with improvements in image rendering, performance, and more
Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"
Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
[ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"
Stable-V2A: Synthesis of Synchronized Sound Effect with Temporal and Semantic Controls
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Versatile audio super resolution (any -> 48kHz) with AudioSR.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.
Repo associated to the DESED dataset, download and creation of data
first base model for full-duplex conversational audio
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
This repository contains the python implementation of a Sound Event Detection systems working in real time.
A simple pytorch library for Fréchet Audio Distance (FAD) calculation