DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
Documentation for installation, usage, and training models is available on deepspeech.readthedocs.io.
For the Quran Workflow, dataset and model release, see the folder data/quran
Thanks to Omer Asif , a nice ipynb is shared on colab. Feel free to tune, reproduce our work and reshare.
As the workflow clarifies, the engine is created in two steps:
- Step-1: Imam Only dataset :
WER: 0.056551, CER: 0.039540, loss: 24.844383
- Step-2: Imam + Filtered Users dataset :
WER: 0.099118, CER: 0.065586, loss: 39.312599