auzxb

😌

I may be slow to respond.

auzxb

😌

I may be slow to respond.

Interested in Machine Learning and Deep Learning. Focus on Speech Synthesis and NLP

28 followers · 60 following

Shenzhen

Achievements

Lists (1)

Sort

✨ Inspiration

1 repository

Stars

deepseek-ai / DeepSeek-V3

Python 14,633 1,027 Updated Jan 3, 2025

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 21,433 1,690 Updated Jan 3, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 121 5 Updated Dec 10, 2024

gemelo-ai / vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 854 97 Updated Aug 7, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,242 293 Updated Nov 5, 2024

gnobitab / RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Python 1,036 61 Updated Jul 20, 2024

showlab / Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

301 13 Updated Dec 23, 2024

sallymmx / m2clip

[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition

Python 43 2 Updated Dec 23, 2024

open-mmlab / FoleyCrafter

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师，给你的无声视频添加生动而且同步的音效 😝

Python 499 42 Updated Jul 26, 2024

showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 272 32 Updated Aug 15, 2024

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Python 22,954 2,259 Updated Dec 27, 2024

yangdongchao / LLM-Codec

The open source code for LLM-Codec

Python 118 5 Updated Aug 18, 2024

Stability-AI / StableStudio

Community interface for generative AI

TypeScript 8,878 891 Updated Apr 30, 2024

jaeyeonkim99 / EnCLAP

Official Implementation of EnCLAP (ICASSP 2024)

Python 90 5 Updated Jun 2, 2024

kyegomez / USM

Implementation of Google's USM speech model in Pytorch

Python 27 4 Updated Nov 11, 2024

qiuqiangkong / audioset_tagging_cnn

Python 1,385 258 Updated Jul 25, 2024

sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 203 14 Updated Dec 23, 2024

RetroCirce / HTS-Audio-Transformer

The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Python 374 65 Updated Aug 16, 2024

descriptinc / descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1,248 117 Updated Jul 11, 2024

facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,556 311 Updated Jan 4, 2024

esbatmop / MNBVC

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化，也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,622 250 Updated Dec 17, 2024