Skip to content
View PeiwenSun2000's full-sized avatar

Highlights

  • Pro

Block or report PeiwenSun2000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,968 187 Updated Jan 13, 2025

The official repository for paper "Tora: Trajectory-oriented Diffusion Transformer for Video Generation"

Python 1,058 46 Updated Jan 6, 2025

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Python 802 52 Updated Jan 22, 2025

Generative models for conditional audio generation

Python 2,864 281 Updated Jan 10, 2025

Efficient face emotion recognition in photos and videos

Jupyter Notebook 728 131 Updated Dec 18, 2024

The official repo for Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

19 Updated Jan 24, 2025

a text-conditional diffusion probabilistic model capable of generating high fidelity audio.

Python 143 18 Updated May 29, 2024

ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

Python 120 7 Updated Sep 20, 2024

This project aims to collect the latest "call for reviewers" links from various top CS/ML/AI conferences/journals

727 21 Updated Jan 31, 2025

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,107 270 Updated Nov 5, 2024

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 146 9 Updated Jan 9, 2025

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

Python 288 17 Updated Jan 17, 2025

Grounded Language-Image Pre-training

Python 2,308 197 Updated Jan 24, 2024

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Python 732 52 Updated Mar 20, 2024

DeepEar: Sound Localization with Binaural Microphones

Python 8 4 Updated Mar 10, 2024

End-to-End binaural sound localization

Python 14 2 Updated Feb 27, 2020

Official code for SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Python 121 9 Updated Nov 9, 2024

A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization” [TASLP 2021]

Python 23 Updated Feb 11, 2023

The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization [INTERSPEECH2023 & TASLP2024]

Python 102 11 Updated Dec 9, 2024

Real time emotion recognition

Python 1,113 367 Updated Aug 30, 2024

The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024

Python 31 1 Updated Dec 4, 2024

Code and data of We-Math

Python 125 9 Updated Jan 9, 2025

[ECCV 2024] The official code of paper "Open-Vocabulary SAM".

Python 924 31 Updated Jul 31, 2024
Python 6 Updated Oct 25, 2023

open soundstream-ish VAE codecs for downstream neural audio synthesis

Python 116 10 Updated Jun 12, 2023

Text-to-Audio/Music Generation

Python 2,364 183 Updated Sep 29, 2024

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 38,954 6,331 Updated Dec 9, 2024

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 21,214 2,743 Updated Aug 15, 2024

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,402 2,221 Updated Jan 15, 2025

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,580 313 Updated Jan 4, 2024
Next