Skip to content

High Quality Sinhala dataset for Text to speech algorithm training - specially designed for deep learning algorithms

License

Notifications You must be signed in to change notification settings

DanojaDias/sinhala-tts-dataset

 
 

Repository files navigation

Path Nirvana Sinhala TTS Dataset

High Quality Sinhala dataset for Text to speech algorithm training - specially designed for deep learning algorithms

Currently there is a lack of publically availble tts datasets for sinhala language of enough length for Sinhala language. This dataset which has 3300 sentences with 7.5 hours of recordings will help mitigate some of these problems. These single voice recodings were done during the first quarter of 2021. Some effort has been taken to capture most of the rarely used syllables in the Sinhala language, specially those with Sanskrit and Pali origins. There could be some errors (recording not matching with prompt). Though should be rare, if you find some do feel free to contribute to this repository.

How to use

  • split-flac folder contains the original recordings in lossless flac compression format
  • final folder contains the the same files in wav format which is the format most fequently used in machine learning algorithms
  • file-mappings.json file contains the prompts, file name of the recording and the length of the recording for all 3300 recordings

See LICENSE.txt file for license information. GPL

If you have any questions or want to share your TTS training results contact me.

Copyright 2021 Path Nirvana Foundation ([email protected])

About

High Quality Sinhala dataset for Text to speech algorithm training - specially designed for deep learning algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%