textlesslib

Textless NLP is an active area of research that aims to extend NLP techniques (and tools!) to work directly on spoken language. By using self-supervisedly learnt discrete speech representations, the area promises to unlock interesting NLP applications on languages without written form or on facets of spoken language that are unaccessable for text-based approaches, e.g. prosody. To learn more, please check some of the papers.

textlesslib is a library aimed to facilitate research in Textless NLP. The goal of the library is to speed up the research cycle and lower the learning curve for those who want to start. We provide highly configurable, off-the-shelf available tools to encode speech as sequences of discrete values and tools to decode such streams back into the audio domain. A high-level description of the library can also be found in our paper [arxiv].

import torchaudio
from textless.data.speech_encoder import SpeechEncoder

dense_model_name = "hubert-base-ls960"
quantizer_name, vocab_size = "kmeans", 100
input_file = "input.wav"

# now let's load an audio example
waveform, sample_rate = torchaudio.load(input_file)

# We can build a speech encoder module using names of pre-trained
# dense and quantizer models.  The call below will download
# appropriate checkpoints as needed behind the scenes. We can
# also construct an encoder by directly passing model instances
encoder = SpeechEncoder.by_name(
    dense_model_name=dense_model_name,
    quantizer_model_name=quantizer_name,
    vocab_size=vocab_size,
    deduplicate=True,
).cuda()


# now convert it in a stream of deduplicated units (as in GSLM)
encoded = encoder(waveform.cuda())
# encoded is a dict with keys ('dense', 'units', 'durations').
# It can also contain 'f0' if SpeechEncoder was initialized
# with need_f0=True flag.
units = encoded["units"]  # tensor([71, 12, 57, ...], ...)

Now it can be casted back into the audio domain:

# as with encoder, we can setup vocoder by passing checkpoints
# directly or by specifying the expected format by the names
# of dense and quantizer models (these models themselves
# won't be loaded)
vocoder = TacotronVocoder.by_name(
    dense_model_name,
    quantizer_name,
    vocab_size,
).cuda()

# now we turn those units back into the audio.
audio = vocoder(units)

# save the audio
torchaudio.save(output_file, audio.cpu().float().unsqueeze(0), vocoder.output_sample_rate)

Dataset helpers

Below is an example on using textless view on the LibriSpeech dataset:

encoder = SpeechEncoder.by_name(
  dense_model_name=dense_model_name,
  quantizer_model_name=quantizer_name,
  vocab_size=vocab_size,
  deduplicate=True,
).cuda()

quantized_dataset = QuantizedLibriSpeech(
  root=existing_root, speech_encoder=encoder, url=url)

datum = quantized_dataset[0]
sample_rate, utterance, speaker_id, chapter_id, utterance_id = datum['rest']
# datum['units'] = tensor([71, 12, 63, ...])

In the probing example we illustrate how such a dataset can be used with a standard Pytorch dataloader in a scalable manner.

Data preprocessing

We also provide a multi-GPU/multi-node preprocessing tool for the cases where on-the-fly processing of audio should be avoided.

Provided models

We provide implementations and pre-trained checkpoints for the following models:

Dense representations: HuBERT-base (trained on LibriSpeech 960h) and CPC (trained on 6Kh subset of LibriLight);
Quantizers: k-means quantizers with vocabulary sizes of 50, 100, 200 for both the dense models (trained on LibriSpeech 960h);
Decoders: Tacotron2 models for all (dense model x quantizer) combinations (trained on LJSpeech).

Finally, the pitch extraction is done via YAAPT.

Testing

We use pytest (pip install pytest pytest-xdist ). Our unit tests are located in the tests directory:

cd tests && pytest -n 8

Citing textless-lib

If you find textless-lib useful in your research, please consider citing our work:

@article{Kharitonov2022,
      title={textless-lib: a Library for Textless Spoken Language Processing}, 
      author={Eugene Kharitonov and Jade Copet and Kushal Lakhotia and Tu Anh Nguyen and Paden Tomasello and Ann Lee and Ali Elkahky and Wei-Ning Hsu and Abdelrahman Mohamed and Emmanuel Dupoux and Yossi Adi},
      year={2022},
      eprint={2202.07359},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Licence

textlesslib is licensed under MIT, the text of the license can be found here. Internally, it uses

WaveGlow - licensed under BSD-3-Clause license;
tacotron implementation - licensed under MIT license;
tacotron2 implementation - licensed under BSD-3-Clause license;
STFT implementation - licensed under BSD-3-Clause license.

Name	Name	Last commit message	Last commit date
Latest commit adiyoss Merge pull request facebookresearch#8 from an918tw/update_mhubert Mar 31, 2022 698e6a0 · Mar 31, 2022 History 13 Commits
examples	examples	[speaker probing] fix changed encoder params	Feb 18, 2022
tests	tests	Initial commit	Feb 14, 2022
textless	textless	release mHuBERT checkpoint	Mar 30, 2022
tools/distributed_transcribe	tools/distributed_transcribe	Initial commit	Feb 14, 2022
.gitignore	.gitignore	Initial commit	Feb 14, 2022
CHANGELOG.md	CHANGELOG.md	Initial commit	Feb 14, 2022
CITATION.bib	CITATION.bib	Adding references/citation for the arxiv report (facebookresearch#3 )	Feb 16, 2022
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Initial commit	Feb 14, 2022
CONTRIBUTING.md	CONTRIBUTING.md	Initial commit	Feb 14, 2022
LICENSE	LICENSE	Initial commit	Feb 14, 2022
README.md	README.md	Update README.md	Feb 16, 2022
pytest.ini	pytest.ini	Initial commit	Feb 14, 2022
requirements.txt	requirements.txt	Initial commit	Feb 14, 2022
setup.cfg	setup.cfg	Initial commit	Feb 14, 2022
setup.py	setup.py	Initial commit	Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

textlesslib

Table of Contents

Installation

Usage examples

Encoding speech

Dataset helpers

Data preprocessing

Provided models

Testing

Citing textless-lib

Licence

About

Releases

Packages

Languages

License

ishine/textlesslib

Folders and files

Latest commit

History

Repository files navigation

textlesslib

Table of Contents

Installation

Usage examples

Encoding speech

Dataset helpers

Data preprocessing

Provided models

Testing

Citing textless-lib

Licence

About

Resources

License

Code of conduct

Citation

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages