Repository for Sonny George and Chris Tam's COSI136 (Brandeis University) project, the assembly of a 22hr+ Papiamentu ASR corpus and subsequent fine-tuning of whisper-tiny
.
- Corpus writeup:
ASR_Papiamentu_Corpus_Report.pdf
- Fine-tuning writeup:
ASR_Papiamentu_Model_Report.pdf
- Fine-tuning script:
train_whisper.ipynb
resource | note |
---|---|
TeleCuraçao | Many videos of reporters talking... perhaps some have some form of a transcription or directly read passages that could be transcribed? |
Telearuba | Same as above |
Telenoticia Telearuba | Same as above |
Papiamentu.ai | Website for a team of researchers working on Papiamentu NLP |
mms-tts-pap | Papiamentu (pap) language text-to-speech (TTS) model checkpoint from Facebook |
opus-mt-pap-es | Machine translation model from Papiamentu to Spanish - has a .txt files with Papiamentu-Spanish translations - apache 2.0 license |
Towards a Language Databae of Papiamentu |