Skip to content

sonnygeorge/papiamentu-asr-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

papiamentu-asr-corpus

Repository for Sonny George and Chris Tam's COSI136 (Brandeis University) project, the assembly of a 22hr+ Papiamentu ASR corpus and subsequent fine-tuning of whisper-tiny.

Pertinent Files

  • Corpus writeup: ASR_Papiamentu_Corpus_Report.pdf
  • Fine-tuning writeup: ASR_Papiamentu_Model_Report.pdf
  • Fine-tuning script: train_whisper.ipynb

Adjacent Resources

resource note
TeleCuraçao Many videos of reporters talking... perhaps some have some form of a transcription or directly read passages that could be transcribed?
Telearuba Same as above
Telenoticia Telearuba Same as above
Papiamentu.ai Website for a team of researchers working on Papiamentu NLP
mms-tts-pap Papiamentu (pap) language text-to-speech (TTS) model checkpoint from Facebook
opus-mt-pap-es Machine translation model from Papiamentu to Spanish - has a .txt files with Papiamentu-Spanish translations - apache 2.0 license
Towards a Language Databae of Papiamentu

About

22.4 hours of Papiamentu utterances

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published