Unified Semantic and Acoustic Codec for Audio Language Model.
Title: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo*, Wei Xue*
🤗 links to the Huggingface model hub.
Model name | Hugging Face | Config | Semantic Model | Domain | Training Data |
---|---|---|---|---|---|
xcodec_hubert_librispeech | 🤗 | 🤗 | 🤗 Hubert-base | Speech | Librispeech |
xcodec_wavlm_mls (not mentioned in paper) | 🤗 | 🤗 | 🤗 Wavlm-base-plus | Speech | MLS English |
xcodec_wavlm_more_data (not mentioned in paper) | 🤗 | 🤗 | 🤗 Wavlm-base-plus | Speech | MLS English + Internal data |
xcodec_hubert_general_audio | 🤗 | 🤗 | 🤗Hubert-base-general-audio | General audio | 200k hours internal data |
xcodec_hubert_general_audio_more_data (not mentioned in paper) | Coming Soon | 🤗 | 🤗 | General audio | More balanced data |
python inference.py
torchrun --nnodes=1 --nproc-per-node=8 main_launch_vqdp.py
I would like to extend a special thanks to authors of Uniaudio and DAC, since our code base is mainly borrowed from Uniaudio and DAC.
If you find this repo helpful, please consider citing in the following format:
@article{ye2024codecdoesmatterexploring,
title={Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model},
author={Zhen Ye and Peiwen Sun and Jiahe Lei and Hongzhan Lin and Xu Tan and Zheqi Dai and Qiuqiang Kong and Jianyi Chen and Jiahao Pan and Qifeng Liu and Yike Guo and Wei Xue},
journal={arXiv preprint arXiv:2408.17175},
year={2024},
}