Initial commit of EVALB code from http://nlp.cs.nyu.edu/evalb/

FeiWang96 · Nov 3, 2017 · 7b1da8f · 7b1da8f
1 parent 0640660
commit 7b1da8f
Show file tree

Hide file tree

Showing 15 changed files with 2,291 additions and 0 deletions.
diff --git a/EVALB/COLLINS.prm b/EVALB/COLLINS.prm
@@ -0,0 +1,66 @@
+##------------------------------------------##
+## Debug mode                               ##
+##   0: No debugging                        ##
+##   1: print data for individual sentence  ##
+##------------------------------------------##
+DEBUG 0
+
+##------------------------------------------##
+## MAX error                                ##
+##    Number of error to stop the process.  ##
+##    This is useful if there could be      ##
+##    tokanization error.                   ##
+##    The process will stop when this number##
+##    of errors are accumulated.            ##
+##------------------------------------------##
+MAX_ERROR 10
+
+##------------------------------------------##
+## Cut-off length for statistics            ##
+##    At the end of evaluation, the         ##
+##    statistics for the senetnces of length##
+##    less than or equal to this number will##
+##    be shown, on top of the statistics    ##
+##    for all the sentences                 ##
+##------------------------------------------##
+CUTOFF_LEN 40
+
+##------------------------------------------##
+## unlabeled or labeled bracketing          ##
+##    0: unlabeled bracketing               ##
+##    1: labeled bracketing                 ##
+##------------------------------------------##
+LABELED 1                 
+
+##------------------------------------------##
+## Delete labels                            ##
+##    list of labels to be ignored.         ##
+##    If it is a pre-terminal label, delete ##
+##    the word along with the brackets.     ##
+##    If it is a non-terminal label, just   ##
+##    delete the brackets (don't delete     ##
+##    deildrens).                           ##
+##------------------------------------------##
+DELETE_LABEL TOP
+DELETE_LABEL -NONE-
+DELETE_LABEL ,
+DELETE_LABEL :
+DELETE_LABEL ``
+DELETE_LABEL ''
+DELETE_LABEL .
+
+##------------------------------------------##
+## Delete labels for length calculation     ##
+##    list of labels to be ignored for      ##
+##    length calculation purpose            ##
+##------------------------------------------##
+DELETE_LABEL_FOR_LENGTH -NONE-
+
+##------------------------------------------##
+## Equivalent labels, words                 ##
+##     the pairs are considered equivalent  ##
+##     This is non-directional.             ##
+##------------------------------------------##
+EQ_LABEL ADVP PRT
+
+# EQ_WORD  Example example
diff --git a/EVALB/LICENSE b/EVALB/LICENSE
@@ -0,0 +1,24 @@
+This is free and unencumbered software released into the public domain.
+
+Anyone is free to copy, modify, publish, use, compile, sell, or
+distribute this software, either in source code form or as a compiled
+binary, for any purpose, commercial or non-commercial, and by any
+means.
+
+In jurisdictions that recognize copyright laws, the author or authors
+of this software dedicate any and all copyright interest in the
+software to the public domain. We make this dedication for the benefit
+of the public at large and to the detriment of our heirs and
+successors. We intend this dedication to be an overt act of
+relinquishment in perpetuity of all present and future rights to this
+software under copyright law.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
+
+For more information, please refer to <http://unlicense.org/>
diff --git a/EVALB/Makefile b/EVALB/Makefile
@@ -0,0 +1,4 @@
+all: evalb
+
+evalb: evalb.c
+	gcc -Wall -g -o evalb evalb.c
diff --git a/EVALB/README b/EVALB/README
@@ -0,0 +1,300 @@
+#################################################################
+#                                                               #
+#      Bug fix and additional functionality for evalb           #
+#                                                               #
+# This updated version of evalb fixes a bug in which sentences  #
+# were incorrectly categorized as "length mismatch" when the    #
+# the parse output had certain mislabeled parts-of-speech.      #
+#                                                               #
+# The bug was the result of evalb treating one of the tags (in  #
+# gold or test) as a label to be deleted (see sections [6],[7]  #
+# for details), but not the corresponding tag in the other.     #
+# This most often occurs with punctuation. See the subdir       #
+# "bug" for an example gld and tst file demonstating the bug,   #
+# as well as output of evalb with and without the bug fix.      #
+#                                                               #
+# For the present version in case of length mismatch, the nodes #
+# causing the imbalance are reinserted to resolve the miscount. #
+# If the lengths of gold and test truly differ, the error is    #
+# still reported. The parameter file "new.prm" (derived from    #
+# COLLINS.prm) shows how to add new potential mislabelings for  #
+# quotes (",``,',`).                                            #
+#                                                               #
+# I have preserved DJB's revision for modern compilers except   #
+# for the delcaration of "exit" which is provided by stdlib.    #
+#                                                               #
+# Other changes:                                                #
+#                                                               #
+# * output of F-Measure in addition to precision and recall     #
+#   (I did not update the documention in section [4] for this)  #
+#                                                               #
+# * more comprehensive DEBUG output that includes bracketing    #
+#   information as evalb is processing each sentence            #
+#   (useful in working through this, and peraps other bugs).    #
+#   Use either the "-D" run-time switch or set DEBUG to 2 in    #
+#   the parameter file.                                         #
+#                                                               #
+# * added DELETE_LABEL lines in new.prm for S1 nodes produced   #
+#   by the Charniak parser and "?", "!" punctuation produced by #
+#   the Bikel parser.                                           #
+#                                                               #
+#                                                               #
+#                                           David Ellis (Brown) #
+#                                                               #
+#                                           January.2006        #
+#################################################################
+
+#################################################################
+#                                                               #
+#      Update of evalb for modern compilers                     #
+#                                                               #
+# This is an updated version of evalb, for use with modern C    #
+# compilers. There are a few updates, each marked in the code:  #
+#                                                               #
+# /* DJB: explanation of comment */                             #
+#                                                               #
+# The updates are purely to help compilation with recent        #
+# versions of GCC (and other C compilers). There are *NO* other #
+# changes to the algorithm itself.                              #
+#                                                               #
+# I have made these changes following recommendations from      #
+# users of the Corpora Mailing List, especially Peet Morris and #
+# Ramon Ziai.                                                   #
+#                                                               #
+#                                     David Brooks (Birmingham) #
+#                                                               #
+#                                     September.2005            #
+#################################################################
+
+#################################################################
+#                                                               #
+#      README file for evalb                                    #
+#                                                               #
+#                                         Satoshi Sekine (NYU)  #
+#                                         Mike Collins (UPenn)  #
+#                                                               #
+#                                         October.1997          #
+#################################################################
+
+Contents of this README:
+
+   [0] COPYRIGHT
+   [1] INTRODUCTION
+   [2] INSTALLATION AND RUN
+   [3] OPTIONS
+   [4] OUTPUT FORMAT FROM THE SCORER
+   [5] HOW TO CREATE A GOLDFILE FROM THE TREEBANK
+   [6] THE PARAMETER FILE
+   [7] MORE DETAILS ABOUT THE SCORING ALGORITHM
+
+
+[0] COPYRIGHT
+
+The authors abandon the copyright of this program. Everyone is 
+permitted to copy and distribute the program or a portion of the program
+with no charge and no restrictions unless it is harmful to someone.
+
+However, the authors are delightful for the user's kindness of proper
+usage and letting the authors know bugs or problems.
+
+This software is provided "AS IS", and the authors make no warranties,
+express or implied.
+
+To legally enforce the abandonment of copyright, this package is released
+under the Unlicense (see LICENSE).
+
+[1] INTRODUCTION
+
+Evaluation of bracketing looks simple, but in fact, there are minor
+differences from system to system. This is a program to parametarize
+such minor differences and to give an informative result.
+
+"evalb" evaluates bracketing accuracy in a test-file against a gold-file.
+It returns recall, precision, tagging accuracy. It uses an identical 
+algorithm to that used in (Collins ACL97).
+
+
+[2] Installation and Run
+
+To compile the scorer, type 
+
+> make
+
+
+To run the scorer:
+
+> evalb -p Parameter_file Gold_file Test_file
+
+
+For example to use the sample files:
+
+> evalb -p sample.prm sample.gld sample.tst
+
+
+
+[3] OPTIONS
+
+You can specify system parameters in the command line options.
+Other options concerning to evaluation metrix should be specified
+in parameter file, described later.
+
+        -p param_file  parameter file                        
+        -d             debug mode                            
+        -e n           number of error to kill (default=10)  
+        -h             help                                  
+
+
+
+[4] OUTPUT FORMAT FROM THE SCORER
+
+The scorer gives individual scores for each sentence, for
+example:
+
+  Sent.                        Matched  Bracket   Cross        Correct Tag
+ ID  Len.  Stat. Recal  Prec.  Bracket gold test Bracket Words  Tags Accracy
+============================================================================
+   1    8    0  100.00 100.00     5      5    5      0      6     5    83.33
+
+At the end of the output the === Summary === section gives statistics 
+for all sentences, and for sentences <=40 words in length. The summary
+contains the following information:
+
+i)   Number of sentences -- total number of sentences.
+
+ii)  Number of Error/Skip sentences -- should both be 0 if there is no
+    problem with the parsed/gold files.
+
+iii) Number of valid sentences = Number of sentences - Number of Error/Skip
+    sentences 
+
+iv)  Bracketing recall =     (number of correct constituents)
+                         ----------------------------------------
+                         (number of constituents in the goldfile)
+
+v)   Bracketing precision = (number of correct constituents)
+                         ----------------------------------------
+                         (number of constituents in the parsed file)
+
+vi)  Complete match = percentaage of sentences where recall and precision are
+    both 100%. 
+
+vii) Average crossing = (number of constituents crossing a goldfile constituen
+                         ----------------------------------------------------
+                                        (number of sentences)
+
+viii) No crossing = percentage of sentences which have 0 crossing brackets.
+
+ix)   2 or less crossing = percentage of sentences which have <=2 crossing brackets.
+
+x)    Tagging accuracy = percentage of correct POS tags (but see [5].3 for exact
+     details of what is counted).
+
+
+
+[5] HOW TO CREATE A GOLDFILE FROM THE PENN TREEBANK
+
+
+The gold and parsed files are in a format similar to this:
+
+(TOP (S (INTJ (RB No)) (, ,) (NP (PRP it)) (VP (VBD was) (RB n't) (NP (NNP Black) (NNP Monday))) (. .)))
+
+To create a gold file from the treebank:
+
+tgrep -wn '/.*/' | tgrep_proc.prl 
+
+will produce a goldfile in the required format.  ("tgrep -wn '/.*/'" prints
+parse trees, "tgrep_process.prl" just skips blank lines).
+
+For example, to produce a goldfile for section 23 of the treebank:
+
+tgrep -wn '/.*/' | tail +90895 | tgrep_process.prl | sed 2416q > sec23.gold
+
+
+
+[6] THE PARAMETER (.prm) FILE
+
+
+The .prm file sets options regarding the scoring method. COLLINS.prm gives
+the same scoring behaviour as the scorer used in (Collins 97). The options 
+chosen were: 
+
+1) LABELED 1
+
+to give labelled precision/recall figures, i.e. a constituent must have the
+same span *and* label as a constituent in the goldfile.
+
+2) DELETE_LABEL TOP   
+
+Don't count the "TOP" label (which is always given in the output of tgrep) 
+when scoring. 
+
+3) DELETE_LABEL -NONE-  
+
+Remove traces (and all constituents which dominate nothing but traces) when
+scoring. For example
+
+.... (VP (VBD reported) (SBAR (-NONE- 0) (S (-NONE- *T*-1)))) (. .)))
+
+would be processed to give
+
+.... (VP (VBD reported)) (. .)))
+
+
+4)
+DELETE_LABEL ,     -- for the purposes of scoring remove punctuation
+DELETE_LABEL :
+DELETE_LABEL ``
+DELETE_LABEL ''
+DELETE_LABEL .
+
+5) DELETE_LABEL_FOR_LENGTH -NONE-   -- don't include traces when calculating
+                                       the length of a sentence (important
+                                       when classifying a sentence as <=40
+                                       words or >40 words)
+
+6) EQ_LABEL ADVP PRT
+
+Count ADVP and PRT as being the same label when scoring.
+
+
+
+
+[7] MORE DETAILS ABOUT THE SCORING ALGORITHM
+
+
+1) The scorer initially processes the files to remove all nodes specified
+by DELETE_LABEL in the .prm file. It also recursively removes nodes which
+dominate nothing due to all their children being removed. For example, if
+-NONE- is specified as a label to be deleted, 
+
+.... (VP (VBD reported) (SBAR (-NONE- 0) (S (-NONE- *T*-1)))) (. .)))
+
+would be processed to give
+
+.... (VP (VBD reported)) (. .)))
+
+2) The scorer also removes all functional tags attached to non-terminals
+(functional tags are prefixed with "-" or "=" in the treebank). For example
+"NP-SBJ" is processed to give "NP", "NP=2" is changed to "NP".
+
+
+3) Tagging accuracy counts tags for all words *except* any tags which are
+deleted by a DELETE_LABEL specification in the .prm file. (For example, for
+COLLINS.prm, punctuation tagged as "," ":" etc. would not be included).
+
+4) When calculating the length of a sentence, all words with POS tags not 
+included in the "DELETE_LABEL_FOR_LENGTH" list in the .prm file are
+counted. (For COLLINS.prm, only "-NONE-" is specified in this list, so
+traces are removed before calculating the length of the sentence).
+
+5) There are some subtleties in scoring when either the goldfile or parsed
+file contains multiple constituents for the same span which have the same
+non-terminal label. e.g. (NP (NP the man)) If the goldfile contains n 
+constituents for the same span, and the parsed file contains m constituents
+with that nonterminal, the scorer works as follows:
+
+i) If m>n, then the precision is n/m, recall is 100%
+
+ii) If n>m, then the precision is 100%, recall is m/n.
+
+iii) If n==m, recall and precision are both 100%.