Skip to content

Commit

Permalink
add G2P conversion task of schwa deletion to Hindi (sebastianruder#478)
Browse files Browse the repository at this point in the history
* add G2P conversion task of schwa deletion to Hindi

* More complete description and benchmarks for schwa deletion in Hindi
  • Loading branch information
aryamanarora authored Jan 6, 2021
1 parent c05c403 commit d1c15bd
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions hindi/hindi.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,18 @@ The IIT Bombay English-Hindi Parallel Corpus used by Kunchukuttan et al. (2018)
| Philip et al. (2018) | 21.57 | [CVIT-MT Systems for WAT-2018](https://www.aclweb.org/anthology/Y18-3010/) ||
| Philip et al. (2020) | 21.20 | Revisiting Low Resource Status of Indian Languages in MT | [ilmulti](https://github.com/jerinphilip/ilmulti) |
| Saini et al. (2018) | 18.215| [Neural Machine Translation for English to Hindi](https://www.researchgate.net/publication/327717152_Neural_Machine_Translation_for_English_to_Hindi) | |

## G2P Conversion

### Schwa Deletion

Due to diachronic processes the inherent vowel of Hindi (the *schwa*, automatically applied to consonants that have no other vowel diacritic or vowel-killer diacritic attached) is sometimes dropped in pronunciation despite being present in the orthography. This process is known as schwa deletion. There are no known linguistic rules that can consistently and accurately predict what happens to the inherent vowel in speech. Thus, this is an open problem in the field.

Each paper below has used different datasets. The dataset for Arora et al. (2020) is the largest of all, extracted from the Oxford Hindi-English Dictionary, and future work should ideally compare against that dataset.

| Model | Schwa-level accuracy | Word-level accuracy | Paper / Source | Code |
| ----- | :------------------: | :-----------------: | -------------- | ---- |
| Arora et al. (2020) | 98.00 | 97.78 | [Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi](https://www.aclweb.org/anthology/2020.acl-main.696.pdf) | [schwa-deletion](https://github.com/aryamanarora/schwa-deletion) |
| Tyson and Nagar (2009) | | 95.00 | [Prosodic rules for schwa-deletion in hindi text-to-speech synthesis](http://www.academia.edu/download/38321628/tyson_nagar_2009.pdf) | |
| Narasimhan et al. (2004) | | 88.97 | [Schwa-Deletion in Hindi Text-to-Speech Synthesis](https://pure.mpg.de/rest/items/item_59025/component/file_59026/content) | |
| Choudhury et al. (2004) | | 99.89 | [A Diachronic Approach for Schwa Deletion in Indo Aryan Languages](https://www.aclweb.org/anthology/W04-0103.pdf) | |

0 comments on commit d1c15bd

Please sign in to comment.