Vocalab is a real-time Text-to-Speech (TTS) API built using FastAPI, XTTSv2, and WaveGlow. This service allows you to convert text input into high-quality speech audio.
- Real-time Text-to-Speech (TTS) conversion
- API built with FastAPI
- Uses XTTSv2 for text to mel-spectrogram conversion
- WaveGlow for mel-spectrogram to audio synthesis
- Denoiser to improve audio quality
- Python 3.7+
- pip: Python package installer
Install the required Python packages:
pip install fastapi uvicorn torch torchvision torchaudio pydantic soundfile librosa numpy
- Clone the repository:
git clone https://github.com/yourusername/vocalab.git
cd vocalab
-
Save the provided script as app.py.
-
Run the FastAPI server using uvicorn:
uvicorn app:app --host 0.0.0.0 --port 8000
Your API will be available at http://127.0.0.1:8000.
This endpoint accepts a JSON payload containing the text to be synthesized and returns the synthesized audio in bytes.
- Request Body:
{
"text": "Hello, world!"
}
- Response:
{
"audio": "AUDIO_BYTES"
}
Use test.py
to make a request to the API using Python
Contributions are welcome! Please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License. See the LICENSE file for details.