Skip to content

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Notifications You must be signed in to change notification settings

zhenye234/xcodec

Repository files navigation

arXiv

X-Codec

Unified Semantic and Acoustic Codec for Audio Language Model.

Paper

Title: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo*, Wei Xue*

Overview

Experiments on VALL-E

Exp

Available models

🤗 links to the Huggingface model hub.

Model name Hugging Face Config Semantic Model Domain Training Data
xcodec_hubert_librispeech 🤗 🤗 🤗 Hubert-base Speech Librispeech
xcodec_wavlm_mls (not mentioned in paper) 🤗 🤗 🤗 Wavlm-base-plus Speech MLS English
xcodec_wavlm_more_data (not mentioned in paper) 🤗 🤗 🤗 Wavlm-base-plus Speech MLS English + Internal data
xcodec_hubert_general_audio 🤗 🤗 🤗Hubert-base-general-audio General audio 200k hours internal data
xcodec_hubert_general_audio_more_data (not mentioned in paper) Coming Soon 🤗 🤗 General audio More balanced data

Inference

python inference.py

Training

torchrun --nnodes=1 --nproc-per-node=8 main_launch_vqdp.py

Acknowledgement

I would like to extend a special thanks to authors of Uniaudio and DAC, since our code base is mainly borrowed from Uniaudio and DAC.

Citation

If you find this repo helpful, please consider citing in the following format:

@article{ye2024codecdoesmatterexploring,
      title={Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model}, 
      author={Zhen Ye and Peiwen Sun and Jiahe Lei and Hongzhan Lin and Xu Tan and Zheqi Dai and Qiuqiang Kong and Jianyi Chen and Jiahao Pan and Qifeng Liu and Yike Guo and Wei Xue},
      journal={arXiv preprint arXiv:2408.17175},
      year={2024},
}