Skip to content
View auzxb's full-sized avatar
😌
I may be slow to respond.
😌
I may be slow to respond.
  • Shenzhen

Block or report auzxb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A generative world for general-purpose robotics & embodied AI learning.

Python 20,852 1,605 Updated Dec 31, 2024

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 121 5 Updated Dec 10, 2024

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 852 97 Updated Aug 7, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,236 291 Updated Nov 5, 2024

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,035 61 Updated Jul 20, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

298 13 Updated Dec 23, 2024

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 42 2 Updated Dec 23, 2024

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝

Python 497 42 Updated Jul 26, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 271 32 Updated Aug 15, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,927 2,256 Updated Dec 27, 2024

The open source code for LLM-Codec

Python 118 5 Updated Aug 18, 2024

Community interface for generative AI

TypeScript 8,879 890 Updated Apr 30, 2024

Official Implementation of EnCLAP (ICASSP 2024)

Python 90 5 Updated Jun 2, 2024

Implementation of Google's USM speech model in Pytorch

Python 27 4 Updated Nov 11, 2024

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 202 14 Updated Dec 23, 2024

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 374 65 Updated Aug 16, 2024

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,245 117 Updated Jul 11, 2024

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,553 309 Updated Jan 4, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,621 250 Updated Dec 17, 2024

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Python 170 19 Updated May 29, 2024

更纯粹、更高压缩率的Tokenizer

Python 464 23 Updated Nov 27, 2024
35 1 Updated Jan 28, 2024

InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

Python 11,270 822 Updated Jul 18, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 36,551 4,300 Updated Aug 19, 2024

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,366 343 Updated Nov 3, 2024

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"

Jupyter Notebook 166 13 Updated Mar 25, 2024

PhotoMaker [CVPR 2024]

Jupyter Notebook 9,684 771 Updated Oct 31, 2024

✨✨Latest Advances on Multimodal Large Language Models

13,320 844 Updated Dec 26, 2024
Next