Skip to content

This repository collects papers related to Speech Tokenizer.

Notifications You must be signed in to change notification settings

WWWWxp/Speech-Tokenizer-Papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 

Repository files navigation

Speech-Tokenizer-Papers

This repository collects papers related to Speech Tokenizer.

Contents


Papers

Search by Chronological Order

2024

  • Moshi: a speech-text foundation model for real-time dialogue [arXiv] [code]

  • BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec [arXiv] [code]

  • X-Codec: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model [arXiv] [code]

  • Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer [Interspeech]

  • SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis [arXiv] [code]

  • FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec [arXiv] [code]

  • vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders [arXiv] [demo] [code]

  • Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference [arXiv]

  • NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization [arXiv]

  • WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling [arXiv] [code]

  • Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models [arXiv] [code]

  • Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation [arXiv] [demo]

  • Personalized neural speech codec [ICASSP]

  • PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders [arXiv]

  • HILCodec: High-Fidelity and Lightweight Neural Audio Codec [arXiv] [code]

  • APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding [TASLP] [code]

  • FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3 [arXiv] [code]

  • SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound [arXiv] [code]

  • DM-Codec: Distilling Multimodal Representations for Speech Tokenization [openreview] [code]

2023

  • SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models [arXiv] [code]

  • HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec [arXiv] [code]

  • DAC: High-Fidelity Audio Compression with Improved RVQGAN [NIPS] [code]

  • Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis [arXiv] [code]

  • TiCodec: Fewer-token Neural Speech Codec with Time-invariant Codes [ICASSP] [code]

  • AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec [ICASSP] [code]

  • RepCodec: A Speech Representation Codec for Speech Tokenization [arXiv] [code]

2022

  • Encodec: High Fidelity Neural Audio Compression [arXiv] [code]

2021

  • SoundStream: An End-to-End Neural Audio Codec [TASLP][arXiv][demo]
  • Variable-rate discrete representation learning [arXiv]
  • Vector-Quantized Autoregressive Predictive Coding [arXiv][code]

2019

  • Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning [arXiv][demo]
  • vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations [arXiv]

Search by Method Category

Speed/Compression

  • BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec [arXiv] [code]
  • Low Bitrate High-Quality RVQGAN-based Discrete Speech Tokenizer [Interspeech]
  • Low Frame-rate Speech Codec: a Codec Designed for Fast High-quality Speech LLM Training and Inference [arXiv]
  • WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling [arXiv] [code]
  • Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models [arXiv] [code]
  • Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation [arXiv] [demo]
  • RepCodec: A Speech Representation Codec for Speech Tokenization [arXiv] [code]
  • HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec [arXiv] [code]
  • AcademiCodec: An Open Source Audio Codec Model for Academic Research [arXiv] [code]

Quality

  • APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding [TASLP] [code]
  • PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders [arXiv]
  • DAC: High-Fidelity Audio Compression with Improved RVQGAN [NIPS] [code]
  • Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis [arXiv] [code]
  • HILCodec: High-Fidelity and Lightweight Neural Audio Codec [arXiv] [code]
  • Moshi: a speech-text foundation model for real-time dialogue [arXiv] [code]
  • AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec [ICASSP] [code]
  • NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization [arXiv]
  • vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders [arXiv] [demo] [code]

Integrating Semantic Information

  • SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models [arXiv] [code]
  • X-Codec: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model [arXiv] [code]
  • RepCodec: A Speech Representation Codec for Speech Tokenization [arXiv] [code]
  • SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound [arXiv] [code]
  • FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec [arXiv] [code]
  • SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis [arXiv] [code]

feature disentanglement

  • FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3 [arXiv] [code]
  • SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models [arXiv] [code]
  • TiCodec: Fewer-token Neural Speech Codec with Time-invariant Codes [ICASSP] [code]
  • DM-Codec: Distilling Multimodal Representations for Speech Tokenization [openreview] [code]

About

This repository collects papers related to Speech Tokenizer.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published