-
This repository is part of my participation in Hugging Face Fine Tuning week of XLRS Wav2Vec2 on Common Voice Corpus 4 Arabic dataset.
-
The mini_arabic.ipynb notebook contains all data preprocessing and training steps.
-
The evaluation.ipynb notebook contains testing steps.
-
Download the model from Hugging Face model hub
https://huggingface.co/anas/wav2vec2-large-xlsr-arabic
-
Download the Common Voice dataset
https://commonvoice.mozilla.org/en/datasets
-
Sprint announcement
https://discuss.huggingface.co/t/open-to-the-community-xlsr-wav2vec2-fine-tuning-week-for-low-resource-languages/4467
-
Additional info about the event
https://github.com/huggingface/transformers/blob/master/examples/research_projects/wav2vec2/FINE_TUNE_XLSR_WAV2VEC2.md
-
Preprocessing the transcriptions
https://github.com/saobou/arabic-text-preprocessing/blob/master/Preprocess.ipynb