This repository collects papers related to Speech Tokenizer.
-
Moshi: a speech-text foundation model for real-time dialogue [arXiv] [code]
-
BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec [arXiv] [code]
-
X-Codec: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model [arXiv] [code]
-
Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer [Interspeech]
-
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis [arXiv] [code]
-
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec [arXiv] [code]
-
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders [arXiv] [demo] [code]
-
Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference [arXiv]
-
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization [arXiv]
-
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling [arXiv] [code]
-
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models [arXiv] [code]
-
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation [arXiv] [demo]
-
Personalized neural speech codec [ICASSP]
-
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders [arXiv]
-
HILCodec: High-Fidelity and Lightweight Neural Audio Codec [arXiv] [code]
-
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding [TASLP] [code]
-
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3 [arXiv] [code]
-
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound [arXiv] [code]
-
DM-Codec: Distilling Multimodal Representations for Speech Tokenization [openreview] [code]
-
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models [arXiv] [code]
-
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec [arXiv] [code]
-
DAC: High-Fidelity Audio Compression with Improved RVQGAN [NIPS] [code]
-
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis [arXiv] [code]
-
TiCodec: Fewer-token Neural Speech Codec with Time-invariant Codes [ICASSP] [code]
-
AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec [ICASSP] [code]
-
RepCodec: A Speech Representation Codec for Speech Tokenization [arXiv] [code]
- SoundStream: An End-to-End Neural Audio Codec [TASLP][arXiv][demo]
- Variable-rate discrete representation learning [arXiv]
- Vector-Quantized Autoregressive Predictive Coding [arXiv][code]
- Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning [arXiv][demo]
- vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations [arXiv]
- BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec [arXiv] [code]
- Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer [Interspeech]
- Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference [arXiv]
- WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling [arXiv] [code]
- Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models [arXiv] [code]
- Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation [arXiv] [demo]
- RepCodec: A Speech Representation Codec for Speech Tokenization [arXiv] [code]
- HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec [arXiv] [code]
- AcademiCodec: An Open Source Audio Codec Model for Academic Research [arXiv] [code]
- APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding [TASLP] [code]
- PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders [arXiv]
- DAC: High-Fidelity Audio Compression with Improved RVQGAN [NIPS] [code]
- Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis [arXiv] [code]
- HILCodec: High-Fidelity and Lightweight Neural Audio Codec [arXiv] [code]
- Moshi: a speech-text foundation model for real-time dialogue [arXiv] [code]
- AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec [ICASSP] [code]
- NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization [arXiv]
- vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders [arXiv] [demo] [code]
- SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models [arXiv] [code]
- X-Codec: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model [arXiv] [code]
- RepCodec: A Speech Representation Codec for Speech Tokenization [arXiv] [code]
- SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound [arXiv] [code]
- FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec [arXiv] [code]
- SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis [arXiv] [code]
- FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3 [arXiv] [code]
- SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models [arXiv] [code]
- TiCodec: Fewer-token Neural Speech Codec with Time-invariant Codes [ICASSP] [code]
- DM-Codec: Distilling Multimodal Representations for Speech Tokenization [openreview] [code]