Skip to content

Error-repair Dependency Pasring for Ungrammatical Texts (ACL 2017)

Notifications You must be signed in to change notification settings

phu-pmh/error-repair-parsing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Error-repair Dependency Pasring for Ungrammatical Texts (ACL 2017)

Last updated: June, 2017

paper bibtex


Instructions

  • N.B. For license restriction, we don't provide the original PTB in this repository.
  1. Download Penn Treebank under data directory.

  2. Convert PTB into CoNLL format (e.g., Penn2Malt)

  3. Put the CoNLL format file as ./data/[train|dev|test].E00 (i.e., Error rate = 0%)

  4. Add noise by running errgent. See the readme file in the directory.

     cd ./errgent
     sh ./generate_train_dev_test.sh (for generating all the files needed)
    

    We assume that we have named the files as ./data/[train|dev|test].[E00|E05|E10|E15|E20]. The file should look like the following.

         1       Ms.     B-NP    NNP     _       _       2       TITLE   _       _
         2       Haag    I-NP    NNP     _       _       3       SBJ     _       _
         3       plays   B-VP    VBZ     _       _       0       ROOT    _       _
         4       Elianti B-NP    NNP     _       _       3       OBJ     _       _
         5       .       O       .       _       _       3       P       _       _
         
         1       The     B-NP    DT      _       _       4       NMOD    _       _
         2       luxury  I-NP    NN      _       _       4       NMOD    _       _
         3       auto    I-NP    NN      _       _       4       NMOD    _       _
         4       maker   I-NP    NN      _       _       7       SBJ     _       _
         5       last    B-NP    JJ      _       _       6       NMOD    _       _
         6       year    I-NP    NN      _       _       7       TMP     _       _
         7       sold    B-VP    VBD     _       _       0       ROOT    _       _
         8       1,214   B-NP    CD      _       _       9       NMOD    _       _
         9       cars    I-NP    NNS     _       _       7       OBJ     _       _
         10      in      B-PP    IN      _       _       7       LOC     _       _
         11      the     B-NP    DT      _       _       12      NMOD    _       _
         12      U.S.    I-NP    NNP     _       _       10      PMOD    _       _
         
         ...
    
  5. Training a model

     (e.g.,) sh sample_train.sh E05 (training a model with 5% error-injected corpus)
    
  6. Parsing sentences with the trained model

     (e.g.,) sh sample_parse.sh dev E05 E10 (parse 10% error-injected dev set with a model trained on 5% error corpus)
    
  7. Evaluation on parsing performance

     cd ./eval
     wget https://storage.googleapis.com/google-code-archive-source/v2/code.google.com/srleval/source-archive.zip -O srleval.zip
     unzip srleval.zip
     cd ./eval/srleval/trunk/align
     make
     
     modify line 231 in ./eval/srleval/trunk/eval.py
     (from) for item in alignment.align(ref_words, hyp_words, command=os.path.dirname(__file__) + "/align/align"):
     (to)   for item in alignment.align(ref_words, hyp_words):
     
     run evaluation script
     cd  ./eval
     (e.g.,) sh evaluate.sh dev E05 E10 (evaluate 10% error-injected dev set with a model trained on 5% error corpus)
    
  8. Evaluation on grammaticality improvement

Questions

  • Please e-mail to Keisuke Sakaguchi (keisuke[at]cs.jhu.edu).

About

Error-repair Dependency Pasring for Ungrammatical Texts (ACL 2017)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Other 1.0%