-
Notifications
You must be signed in to change notification settings - Fork 158
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
54 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,54 @@ | ||
# vixtts-demo | ||
# viXTTS Demo | ||
|
||
viXTTS is a text-to-speech voice generation tool that offers voice cloning voices in Vietnamese and other languages. This model is a fine-tuned version based on the [XTTS-v2.0.3](https://huggingface.co/coqui/XTTS-v2) model, utilizing the [viVoice](https://huggingface.co/datasets/capleaf/viVoice) dataset. This repository is primarily intended for inference purposes. | ||
|
||
The model can be accessed at: [viXTTS on Hugging Face](https://huggingface.co/capleaf/viXTTS) | ||
|
||
## Online usage (Recommended) | ||
|
||
For a quick demonstration, please refer to [this notebook](./viXTTS_Demo.ipynb) on Google Colab. | ||
![viXTTS Colab Demo](assets/vixtts_colab.png) | ||
## Local Usage | ||
|
||
This code is specifically designed for running on Ubuntu or WSL2. It is not intended for use on macOS or Windows systems (might available later). | ||
![viXTTS Gradio Demo](assets/vixtts_gradio_ui.png) | ||
### Hardware Recommendations | ||
|
||
- At least 10GB of free disk space | ||
- At least 16GB of RAM | ||
- **Nvidia GPU** with a minimum of 4GB of VRAM | ||
- By default, the model will utilize the GPU. In the absence of a GPU, it will run on the CPU and run much slower. | ||
|
||
### Required Software | ||
|
||
- Git | ||
- Python version >=3.9 and <= 3.11. The default version is set to 3.11, but you can modify the Python version in the `run.sh` file. | ||
|
||
### Usage | ||
|
||
```bash | ||
git clone https://github.com/thinhlpg/vixtts-demo | ||
cd vixtts-demo | ||
./run.sh | ||
``` | ||
1. Run `run.sh` (dependencies will be automatically installed for the first run). | ||
2. Access the Gradio demo link. | ||
3. Load the model and wait for it to load. | ||
4. Inference and Enjoy 🤗 | ||
5. The result will be saved in `output/` | ||
|
||
## Acknowledgements | ||
|
||
We would like to express our gratitude to all libraries, and resources that have played a role in the development of this demo, especially: | ||
|
||
- [Coqui TTS](https://github.com/coqui-ai/TTS) for XTTS foundation model and inference code | ||
- [Vinorm](https://github.com/v-nhandt21/Vinorm) and [Undethesea](https://github.com/undertheseanlp/underthesea) for Vietnamese text normalization | ||
- [Deepspeed](https://github.com/microsoft/DeepSpeed) for fast inference | ||
- [Huggingface Hub](https://huggingface.co/) for hosting the model | ||
- [Gradio](https://www.gradio.app/) for web UI | ||
|
||
## Contact | ||
|
||
- You can message me directly on Facebook: <https://fb.com/thinhlpg/> (preferred 🤗) | ||
- GitHub: <https://github.com/thinhlpg> | ||
- Email: <[email protected]> or <[email protected]> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.