Skip to content
View auzxb's full-sized avatar
😌
I may be slow to respond.
😌
I may be slow to respond.
  • Shenzhen

Block or report auzxb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A generative world for general-purpose robotics & embodied AI learning.

Python 21,433 1,690 Updated Jan 3, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 121 5 Updated Dec 10, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 854 97 Updated Aug 7, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,242 293 Updated Nov 5, 2024

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,036 61 Updated Jul 20, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

301 13 Updated Dec 23, 2024

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 43 2 Updated Dec 23, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 499 42 Updated Jul 26, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 272 32 Updated Aug 15, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,954 2,259 Updated Dec 27, 2024

The open source code for LLM-Codec

Python 118 5 Updated Aug 18, 2024

Community interface for generative AI

TypeScript 8,878 891 Updated Apr 30, 2024

Official Implementation of EnCLAP (ICASSP 2024)

Python 90 5 Updated Jun 2, 2024

Implementation of Google's USM speech model in Pytorch

Python 27 4 Updated Nov 11, 2024

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 203 14 Updated Dec 23, 2024

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 374 65 Updated Aug 16, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,248 117 Updated Jul 11, 2024

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,556 311 Updated Jan 4, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,622 250 Updated Dec 17, 2024

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 170 19 Updated May 29, 2024

更纯粹、更高压缩率的Tokenizer

Python 465 23 Updated Nov 27, 2024
35 1 Updated Jan 28, 2024

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,269 823 Updated Jul 18, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 36,564 4,303 Updated Aug 19, 2024

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,371 344 Updated Nov 3, 2024

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

Jupyter Notebook 166 13 Updated Mar 25, 2024

PhotoMaker [CVPR 2024]

Jupyter Notebook 9,692 771 Updated Oct 31, 2024

✨✨Latest Advances on Multimodal Large Language Models

13,349 847 Updated Jan 2, 2025
Next