Code and resources for training Bidirectional LSTM model, to supoort correct and sound writing for ESL learners.
Please cite this work if you find this code and datasets useful: https://arxiv.org/abs/1901.02490
Choosing the Right Word: Using Bidirectional LSTM Tagger for Writing Support Systems by: Victor Makarenkov, Lior Rokach, Bracha Shapira
In case of testing for scientific domain specific model, you should initilize the vocab dictionaries with values from: w2i.txt. This file already contains the dict and word serial numbers, so no need for the whole corpus, upon vocabulary initialization.
You need the model file itself: adam_batch_bilstm_bigmodel2.txt You can download it from here: https://drive.google.com/file/d/0B5iAITPoL9L2ald0VzNxZWtxdGM/view?usp=sharing
Otherwise, for any other domain or corpora you please replace long30k.txt filename in the code with your own training text file. You should also uncomment the reading of the file in code, and comment the vocabulary initialization.