This directory contains language-specific data files. Most importantly, you will find here:
- A list of unique characters for the target language (e.g. English) in
data/alphabet.txt
- A binary n-gram language model compiled by
kenlm
indata/lm/lm.binary
- A trie model compiled by generate_trie in
data/lm/trie
For more information on how to build these resources from scratch, see data/lm/README.md