Note
All models are from the repository: snakers4/silero-models
Language | Model | Speakers |
---|---|---|
Russian | v4_ru | 5: aidar, baya, kseniya, xenia, eugene |
Ukrainian | v4_ua | 1: mykyta |
Uzbek | v4_uz | 1: dilnavoz |
English | v3_en | 118: en_0, en_1, ..., en_117 |
Spanish | v3_es | 3: es_0, es_1, es_2 |
French | v3_fr | 6: fr_0, fr_1, fr_2, fr_3, fr_4, fr_5 |
German | v3_de | 5: bernd_ungerer, eva_k, friedrich, hokuspokus, karlsson |
Tatar | v3_tt | 1: dilyara |
Mongolian | v3_xal | 2: erdni, delghir |
All languages support sample rate: 8 000, 24 000, 48 000
Important
This requires docker installed and the docker daemon running
docker run --rm -p 8000:8000 twirapp/silero-tts-api-server
Build and run from local repository
Clone the repository:
git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-server
Build docker image:
docker build -f docker/Dockerfile -t silero-tts-api-server .
Run the container:
docker run --rm -p 8000:8000 silero-tts-api-server
Or use docker compose:
docker-compose -f docker/compose.yml up
Important
Minimum requirement python 3.9
This project uses rye for dependency management, it assumes you have installed it
-
Clone the repository
git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-server
-
Install dependencies
This will automatically create the virtual environment in the
.venv
directory and install the required dependenciesrye sync
(not recommended) alternative install via pip
Create a virtual environment and activatepython3 -m venv .venv && source .venv/bin/activate
Remove line 10
-e file:.
from therequirements.lock
file and then run the commandpip3 install -r requirements.lock
-
Download silero tts models
bash ./install_models.sh
-
Run the server
litestar run
Note
The default will be localhost:8000
You can view the automatically generated documentation based on OpenAPI at:
GET
/generate
- Generate audio in wav format from text. Parameters:text
speaker
sample_rate
GET
/speakers
- Get list of speakers
TEXT_LENGTH_LIMIT
- Maximum length of the text to be processed. Default is 930 characters.MKL_NUM_THREADS
- Number of threads to use for generating audio. Default number of threads: number of CPU cores.
This repository is dedicated to twir.app and is designed to meet its requirements.
TwirApp needs to generate audio using the CPU. If support for other devices such as cuda or mps is needed, please open an issue.