|
1 |
| -Stanford NER - v3.5.2 - 2015-04-17 |
| 1 | +Stanford NER - v3.5.2 - 2015-04-20 |
2 | 2 | ----------------------------------------------
|
3 | 3 |
|
4 | 4 | This package provides a high-performance machine learning based named
|
@@ -68,7 +68,10 @@ The nodistsim versions of the same models may be available on the
|
68 | 68 | Stanford NER webpage.
|
69 | 69 |
|
70 | 70 | Finally, we have models for other languages, include two German models,
|
71 |
| -a Chinese model, and a Spanish model. |
| 71 | +a Chinese model, and a Spanish model. The files for these models can be |
| 72 | +found at: |
| 73 | + |
| 74 | +http://nlp.stanford.edu/software/CRF-NER.shtml |
72 | 75 |
|
73 | 76 |
|
74 | 77 | QUICKSTART INSTRUCTIONS
|
@@ -108,6 +111,74 @@ automatically started, and you will also be given the option (under the
|
108 | 111 |
|
109 | 112 | java -mx1000m -jar stanford-ner.jar
|
110 | 113 |
|
| 114 | +USING FULL STANFORD CORENLP NER FUNCTIONALITY |
| 115 | + |
| 116 | +This standalone distribution also allows access to the full NER |
| 117 | +capabilities of the Stanford CoreNLP pipeline. These capabilities |
| 118 | +can be accessed via the NERClassifierCombiner class. |
| 119 | + |
| 120 | +NERClassifierCombiner allows for multiple CRF's to be layered together, |
| 121 | +and has options for recognizing numeric sequence patterns and time |
| 122 | +patterns with Stanford CoreNLP's SUTime. |
| 123 | + |
| 124 | +Suppose one combines three CRF's CRF-1,CRF-2, and CRF-3 with the |
| 125 | +NERClassifierCombiner. When the NERClassiferCombiner runs, it will |
| 126 | +first apply the NER tags of CRF-1 to the text, then it will apply |
| 127 | +CRF-2's NER tags to any tokens not tagged by CRF-1 and so on. If |
| 128 | +the option ner.combinationMode is set to NORMAL (default), any label |
| 129 | +applied by CRF-1 cannot be applied by subsequent CRF's. For instance |
| 130 | +if CRF-1 applies the LOCATION tag, no other CRF's LOCATION tag will be |
| 131 | +used. If ner.combinationMode is set to HIGH_RECALL, this limitation |
| 132 | +will be deactivated. |
| 133 | + |
| 134 | +To use NERClassifierCombiner at the command-line, the jars in lib |
| 135 | +and stanford-ner.jar must be in the CLASSPATH. Here is an example command: |
| 136 | + |
| 137 | +java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model \ |
| 138 | +classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz \ |
| 139 | +-ner.useSUTime false -textFile sample-w-time.txt |
| 140 | + |
| 141 | +Let's break this down a bit. The flag "-ner.model" should be followed by a |
| 142 | +list of CRF's to be combined by the NERClassifierCombiner. Some serialized |
| 143 | +CRF's are provided in the classifiers directory. In this example the CRF's |
| 144 | +trained on the CONLL 4 class data and the MUC 7 class data are being combined. |
| 145 | + |
| 146 | +When the flag "-ner.useSUTime" is followed by "false", SUTime is shut off. You should |
| 147 | +note that when the "false" is omitted, the text "4 days ago" suddenly is |
| 148 | +tagged with DATE. These are the kinds of phrases SUTime can identify. |
| 149 | + |
| 150 | +NERClassifierCombiner can be run on different types of input as well. Here is |
| 151 | +an example which is run on CONLL style input: |
| 152 | + |
| 153 | +java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model \ |
| 154 | +classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz \ |
| 155 | +-map word=0,answer=1 -testFile sample-conll-file.txt |
| 156 | + |
| 157 | +It is crucial to include the "-map word=0,answer=1" , which is specifying that |
| 158 | +the input test file has the words in the first column and the answer labels |
| 159 | +in the second column. |
| 160 | + |
| 161 | +It is also possible to serialize and load an NERClassifierCombiner. |
| 162 | + |
| 163 | +This command loads the three sample crfs with combinationMode=HIGH_RECALL |
| 164 | +and SUTime=false, and dumps them to a file named |
| 165 | +test_serialized_ncc.ncc.ser.gz |
| 166 | + |
| 167 | +java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model \ |
| 168 | +classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz,\ |
| 169 | +classifiers/english.all.3class.distsim.crf.ser.gz -ner.useSUTime false \ |
| 170 | +-ner.combinationMode HIGH_RECALL -serializeTo test.serialized.ncc.ncc.ser.gz |
| 171 | + |
| 172 | +An example serialized NERClassifierCombiner with these settings is supplied in |
| 173 | +the classifiers directory. Here is an example of loading that classifier and |
| 174 | +running it on the sample CONLL data: |
| 175 | + |
| 176 | +java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier \ |
| 177 | +classifiers/example.serialized.ncc.ncc.ser.gz -map word=0,answer=1 \ |
| 178 | +-testFile sample-conll-file.txt |
| 179 | + |
| 180 | +For a more exhaustive description of NERClassifierCombiner go to |
| 181 | +http://nlp.stanford.edu/software/ncc-faq.shtml |
111 | 182 |
|
112 | 183 | PROGRAMMATIC USE
|
113 | 184 |
|
@@ -165,19 +236,7 @@ PERSON ORGANIZATION LOCATION
|
165 | 236 | CHANGES
|
166 | 237 | --------------------
|
167 | 238 |
|
168 |
| -2015-04-17 3.5.2 trial ner |
169 |
| - |
170 |
| -2015-04-17 3.5.2 synch standalone and CoreNLP functionality |
171 |
| - |
172 |
| -2015-04-17 3.5.2 synch standalone and CoreNLP functionality |
173 |
| - |
174 |
| -2015-04-17 3.5.2 synch standalone and CoreNLP functionality |
175 |
| - |
176 |
| -2015-04-17 3.5.2 synch standalone and CoreNLP functionality |
177 |
| - |
178 |
| -2015-04-17 3.5.2 synch standalone and CoreNLP functionality |
179 |
| - |
180 |
| -2015-04-17 3.5.2 synch standalone and CoreNLP functionality |
| 239 | +2015-04-20 3.5.2 synch standalone and CoreNLP functionality |
181 | 240 |
|
182 | 241 | 2015-01-29 3.5.1 Substantial accuracy improvements
|
183 | 242 |
|
|
0 commit comments