Transformer Neural Machine Translation for Nguni langauges

Neural Machine Translation (NMT) has shown significant improvements over traditional phrase-based machine translation over recent years. Nonetheless, NMT models have a steep learning curve with respect to the amount of data thus underperform when the amount of training data is limited, as in the case of low resource languages. South African languages being under-resourced have achieved low performance in the machine translation paradigm. To address this issue, we compare different data augmentation techniques on two Nguni langauges, namely, IsiXhosa and IsiZulu, with English to IsiXhosa and English to IsiZulu baseline models. The first data augmentation technique makes use of target-side monolingual data to augment the amount of parallel data via backtranslation (convert target-side language into source-side language) and the second technique involves training a multilingual model on a joint set of bilingual corpora containing both the IsiXhosa and IsiZulu language.

download monolingual and bilingual data by running download scripts found at the following paths:

run download_zu_bilingual.sh to download bilingual IsiZulu data run download_zu_monolingual.sh to download monolingual IsiZulu data run download_xh_bilingual.sh to downlaod bilingual IsiXhosa data run download_xh_monolingaul to download monolingual IsiXhosa data

run train-en2xh-baseline.sh to prepare data and train English to IsiXhosa baseline model run train-en2zu-baseline.sh to prepare data and train English to IsiZulu baseline model run train-en2xhzu-multilingual.sh to prepare data and train English to IsiXhosa and IsiZulu multilingual model run train-xh2en-backtranslatino to preapre data and train IsiXhosa to english on augmented parallel data

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
neural_machine_translation		neural_machine_translation
bleu_score.sh		bleu_score.sh
bleu_score_backtrans.sh		bleu_score_backtrans.sh
bleu_score_backtrans_zu.sh		bleu_score_backtrans_zu.sh
bleu_score_multi.sh		bleu_score_multi.sh
bleu_score_multi_zu.sh		bleu_score_multi_zu.sh
bleu_score_zu.sh		bleu_score_zu.sh
combine_autshumato.py		combine_autshumato.py
readme.md		readme.md
train-en2xh-baseline.sh		train-en2xh-baseline.sh
train-en2xhzu-multilingual.sh		train-en2xhzu-multilingual.sh
train-en2zu-baseline.sh		train-en2zu-baseline.sh
train-xh2en-backtranslation.sh		train-xh2en-backtranslation.sh
train-zu2en-backtranslation.sh		train-zu2en-backtranslation.sh
~$results.xlsx		~$results.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Neural Machine Translation for Nguni langauges

About

Releases

Packages

Languages

DeeveshBeegun/low-resource-neural-machine-translation

Folders and files

Latest commit

History

Repository files navigation

Transformer Neural Machine Translation for Nguni langauges

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages