Skip to content
View GinChow's full-sized avatar

Block or report GinChow

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 36 3 Updated Oct 9, 2024

Perceptual Quality Estimator for speech and audio

C++ 723 128 Updated Aug 2, 2024

AlignNet: A Unifying Approach to Audio-Visual Alignment (WACV 2020)

Python 32 4 Updated Jan 10, 2021

State-of-the-art 2D and 3D Face Analysis Project

Python 24,052 5,478 Updated Dec 5, 2024

[PyTorch] Minimal codebase for MusicGen models

Python 46 Updated Jan 7, 2025

A neovim plugin for interactively running code with the jupyter kernel. Fork of magma-nvim with improvements in image rendering, performance, and more

Python 687 36 Updated Jan 15, 2025

Pythonic bindings for FFmpeg's libraries.

Cython 2,615 375 Updated Jan 16, 2025

Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"

Python 30 Updated Dec 13, 2024

Implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Python 349 64 Updated Jul 21, 2024

Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)

Python 39 5 Updated Apr 25, 2024

[ECCV 2024] Official PyTorch implementation of TC-CLIP "Leveraging Temporal Contextualization for Video Action Recognition"

Python 44 7 Updated Sep 26, 2024

Stable-V2A: Synthesis of Synchronized Sound Effect with Temporal and Semantic Controls

10 Updated Dec 20, 2024

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Jupyter Notebook 1,199 221 Updated May 21, 2023

[arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 969 103 Updated Jan 14, 2025

SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Python 191 5 Updated Dec 29, 2024

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 7,472 581 Updated Jan 17, 2025

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 111 2 Updated Dec 13, 2024

Versatile audio super resolution (any -> 48kHz) with AudioSR.

Python 1,258 130 Updated Jan 9, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 9,688 938 Updated Jan 15, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 8,991 1,194 Updated Jan 15, 2025

The Audio Set Ontology aims to provide a comprehensive set of categories to describe sound events.

655 152 Updated May 21, 2018

Repo associated to the DESED dataset, download and creation of data

Python 131 15 Updated Jul 16, 2024

Pretty fancy and modern terminal file manager

Go 8,721 200 Updated Jan 13, 2025

first base model for full-duplex conversational audio

Python 1,685 113 Updated Jan 5, 2025
Jupyter Notebook 19 Updated Dec 24, 2024

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

Python 148 17 Updated Apr 23, 2024

This repository contains the python implementation of a Sound Event Detection systems working in real time.

Python 53 8 Updated Oct 10, 2022
Python 1 Updated Aug 18, 2024

A simple pytorch library for Fréchet Audio Distance (FAD) calculation

Python 4 1 Updated Dec 5, 2024
Next