This directory contains a simple torch to gguf format conversion script for the Parler TTS Mini Model or the Parler TTS Large Model.
Please note that the model encoding pattern used here is extremely naive and subject to further development (especially in order to align its pattern with gguf patterns in llama.cpp ad whisper.cpp).
In order to run the installation and conversion script you will need python3 and pip3 installed locally.
all requisite requirements can be installed via pip:
pip3 install -r requirements.txt
The gguf conversion script can be run via the convert_parler_tts_to_gguf
file locally like so:
python3 ./convert_parler_tts_to_gguf --save-path ./parler-tts-large.gguf --voice-prompt "female voice" --large-model
the command accepts --save-path which described where to save the gguf model file to, the flag --large-model which when passed encodes Parler-TTS-large (rather than mini), and --voice-prompt which is a sentence or statement that desribes how the model's voice should sound at generation time.
The Parler TTS model is trained to alter how it generates audio tokens via cross attending against a text prompt generated via google/flan-t5-large
a T5-encoder model. In order to avoid this encoding step on the ggml side, this converter generates the prompt's associated hidden states ahead of time and encodes them directly into the gguf model file.
If you would like to alter the voice prompt used to generate with parler TTS on the fly you will need to prepare the text encoder model, a T5-encoder model, in the gguf format. This can be accomplished by running convert_t5_encoder_to_gguf
from this directory:
python3 ./convert_t5_encoder_to_gguf --save-path ./t5-encoder-large.gguf --large-model
To use this model alongside the parler tts model see the cli readme for information on conditional generation.