Error-repair Dependency Pasring for Ungrammatical Texts (ACL 2017)

Last updated: June, 2017

Instructions

N.B. For license restriction, we don't provide the original PTB in this repository.

Download Penn Treebank under data directory.
Convert PTB into CoNLL format (e.g., Penn2Malt)
Put the CoNLL format file as ./data/[train|dev|test].E00 (i.e., Error rate = 0%)

Add noise by running errgent. See the readme file in the directory.

 cd ./errgent
 sh ./generate_train_dev_test.sh (for generating all the files needed)

We assume that we have named the files as ./data/[train|dev|test].[E00|E05|E10|E15|E20]. The file should look like the following.

     1       Ms.     B-NP    NNP     _       _       2       TITLE   _       _
     2       Haag    I-NP    NNP     _       _       3       SBJ     _       _
     3       plays   B-VP    VBZ     _       _       0       ROOT    _       _
     4       Elianti B-NP    NNP     _       _       3       OBJ     _       _
     5       .       O       .       _       _       3       P       _       _
     
     1       The     B-NP    DT      _       _       4       NMOD    _       _
     2       luxury  I-NP    NN      _       _       4       NMOD    _       _
     3       auto    I-NP    NN      _       _       4       NMOD    _       _
     4       maker   I-NP    NN      _       _       7       SBJ     _       _
     5       last    B-NP    JJ      _       _       6       NMOD    _       _
     6       year    I-NP    NN      _       _       7       TMP     _       _
     7       sold    B-VP    VBD     _       _       0       ROOT    _       _
     8       1,214   B-NP    CD      _       _       9       NMOD    _       _
     9       cars    I-NP    NNS     _       _       7       OBJ     _       _
     10      in      B-PP    IN      _       _       7       LOC     _       _
     11      the     B-NP    DT      _       _       12      NMOD    _       _
     12      U.S.    I-NP    NNP     _       _       10      PMOD    _       _
     
     ...

Training a model

 (e.g.,) sh sample_train.sh E05 (training a model with 5% error-injected corpus)

Parsing sentences with the trained model

 (e.g.,) sh sample_parse.sh dev E05 E10 (parse 10% error-injected dev set with a model trained on 5% error corpus)

Evaluation on parsing performance

 cd ./eval
 wget https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/srleval/source-archive.zip -O srleval.zip
 unzip srleval.zip
 cd ./eval/srleval/trunk/align
 make
 
 modify line 231 in ./eval/srleval/trunk/eval.py
 (from) for item in alignment.align(ref_words, hyp_words, command=os.path.dirname(__file__) + "/align/align"):
 (to)   for item in alignment.align(ref_words, hyp_words):
 
 run evaluation script
 cd  ./eval
 (e.g.,) sh evaluate.sh dev E05 E10 (evaluate 10% error-injected dev set with a model trained on 5% error corpus)

Evaluation on grammaticality improvement

See Predicting Grammaticality on an Ordinal Scale

Questions

Please e-mail to Keisuke Sakaguchi (keisuke[at]cs.jhu.edu).

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
easyfirst		easyfirst
errgent		errgent
eval		eval
.gitignore		.gitignore
README.md		README.md
README.original.md		README.original.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Error-repair Dependency Pasring for Ungrammatical Texts (ACL 2017)

Instructions

Questions

About

Releases

Packages

Languages

phu-pmh/error-repair-parsing

Folders and files

Latest commit

History

Repository files navigation

Error-repair Dependency Pasring for Ungrammatical Texts (ACL 2017)

Instructions

Questions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages