Skip to content

Commit 50151b9

Browse files
Grace MuznyStanford NLP
Grace Muzny
authored and
Stanford NLP
committed
Merge branch 'master' into gm-character-fixed
1 parent 8ae714b commit 50151b9

File tree

104 files changed

+102208
-26947
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+102208
-26947
lines changed

data/edu/stanford/nlp/ud/feature_map.txt

+71-27
Original file line numberDiff line numberDiff line change
@@ -6,27 +6,25 @@
66
* VBD VerbForm=Fin|Mood=Ind|Tense=Past
77
* VBN Tense=Past|VerbForm=Part
88
* VBP VerbForm=Fin|Mood=Ind|Tense=Pres
9+
* MD VerbForm=Fin
910
* JJ Degree=Pos
1011
* JJR Degree=Cmp
1112
* JJS Degree=Sup
12-
* RB Degree=Pos
13-
* RBR Degree=Cmp
14-
* RBS Degree=Sup
1513
* CD NumType=Card
1614
am VBP VerbForm=Fin|Mood=Ind|Tense=Pres|Person=1|Number=Sing
1715
was VBD VerbForm=Fin|Mood=Ind|Tense=Past|Number=Sing
18-
i PRP Number=Sing|Person=1|PronType=Prs
16+
i PRP Number=Sing|Person=1|PronType=Prs|Case=Nom
1917
you PRP Person=2|PronType=Prs
20-
he PRP Number=Sing|Person=3|Gender=Masc|PronType=Prs
21-
she PRP Number=Sing|Person=3|Gender=Fem|PronType=Prs
18+
he PRP Number=Sing|Person=3|Gender=Masc|PronType=Prs|Case=Nom
19+
she PRP Number=Sing|Person=3|Gender=Fem|PronType=Prs|Case=Nom
2220
it PRP Number=Sing|Person=3|Gender=Neut|PronType=Prs
23-
we PRP Number=Plur|Person=1|PronType=Prs
24-
they PRP Number=Plur|Person=3|PronType=Prs
25-
me PRP Number=Sing|Person=1|PronType=Prs
26-
him PRP Number=Sing|Person=3|Gender=Masc|PronType=Prs
27-
her PRP Number=Sing|Person=3|Gender=Fem|PronType=Prs
28-
us PRP Number=Plur|Person=1|PronType=Prs
29-
them PRP Number=Plur|Person=3|PronType=Prs
21+
we PRP Number=Plur|Person=1|PronType=Prs|Case=Nom
22+
they PRP Number=Plur|Person=3|PronType=Prs|Case=Nom
23+
me PRP Number=Sing|Person=1|PronType=Prs|Case=Acc
24+
him PRP Number=Sing|Person=3|Gender=Masc|PronType=Prs|Case=Acc
25+
her PRP Number=Sing|Person=3|Gender=Fem|PronType=Prs|Case=Acc
26+
us PRP Number=Plur|Person=1|PronType=Prs|Case=Acc
27+
them PRP Number=Plur|Person=3|PronType=Prs|Case=Acc
3028
my PRP$ Number=Sing|Person=1|Poss=Yes|PronType=Prs
3129
mine PRP$ Number=Sing|Person=1|Poss=Yes|PronType=Prs
3230
your PRP$ Person=2|Poss=Yes|PronType=Prs
@@ -39,24 +37,70 @@ our PRP$ Number=Plur|Person=1|Poss=Yes|PronType=Prs
3937
ours PRP$ Number=Plur|Person=1|Poss=Yes|PronType=Prs
4038
their PRP$ Number=Plur|Person=3|Poss=Yes|PronType=Prs
4139
theirs PRP$ Number=Plur|Person=3|Poss=Yes|PronType=Prs
42-
myself PRP Number=Sing|Person=1|Reflex=Yes|PronType=Prs
43-
yourself PRP Person=2|Reflex=Yes|PronType=Prs
44-
himself PRP Number=Sing|Person=3|Reflex=Yes|Gender=Masc|PronType=Prs
45-
herself PRP Number=Sing|Person=3|Reflex=Yes|Gender=Fem|PronType=Prs
46-
itself PRP Number=Sing|Person=3|Reflex=Yes|Gender=Neut|PronType=Prs
47-
ourselves PRP Number=Plur|Person=1|Reflex=Yes|PronType=Prs
48-
themselves PRP Number=Plur|Person=3|Reflex=Yes|PronType=Prs
40+
myself PRP Number=Sing|Person=1|PronType=Prs
41+
yourself PRP Number=SingPerson=2|PronType=Prs
42+
himself PRP Number=Sing|Person=3|Gender=Masc|PronType=Prs
43+
herself PRP Number=Sing|Person=3|Gender=Fem|PronType=Prs
44+
itself PRP Number=Sing|Person=3|Gender=Neut|PronType=Prs
45+
ourselves PRP Number=Plur|Person=1|PronType=Prs
46+
yourselves PRP Number=Plur|Person=2|PronType=Prs
47+
themselves PRP Number=Plur|Person=3|PronType=Prs
4948
the DT Definite=Def|PronType=Art
5049
a DT Definite=Ind|PronType=Art
5150
an DT Definite=Ind|PronType=Art
52-
some DT Definite=Ind|PronType=Art
53-
any DT Definite=Ind|PronType=Art
5451
this DT PronType=Dem|Number=Sing
5552
that DT PronType=Dem|Number=Sing
5653
these DT PronType=Dem|Number=Plur
5754
those DT PronType=Dem|Number=Plur
58-
59-
60-
61-
62-
55+
here RB PronType=Dem
56+
there RB PronType=Dem
57+
then RB PronType=Dem
58+
whose WP$ Poss=Yes
59+
hard RB Degree=Pos
60+
fast RB Degree=Pos
61+
late RB Degree=Pos
62+
long RB Degree=Pos
63+
high RB Degree=Pos
64+
easy RB Degree=Pos
65+
early RB Degree=Pos
66+
far RB Degree=Pos
67+
soon RB Degree=Pos
68+
low RB Degree=Pos
69+
close RB Degree=Pos
70+
well RB Degree=Pos
71+
badly RB Degree=Pos
72+
little RB Degree=Pos
73+
harder RBR Degree=Cmp
74+
faster RBR Degree=Cmp
75+
later RBR Degree=Cmp
76+
longer RBR Degree=Cmp
77+
higher RBR Degree=Cmp
78+
easier RBR Degree=Cmp
79+
quicker RBR Degree=Cmp
80+
earlier RBR Degree=Cmp
81+
further RBR Degree=Cmp
82+
farther RBR Degree=Cmp
83+
sooner RBR Degree=Cmp
84+
slower RBR Degree=Cmp
85+
lower RBR Degree=Cmp
86+
closer RBR Degree=Cmp
87+
better RBR Degree=Cmp
88+
worse RBR Degree=Cmp
89+
less RBR Degree=Cmp
90+
hardest RBS Degree=Sup
91+
fastest RBS Degree=Sup
92+
latest RBS Degree=Sup
93+
longest RBS Degree=Sup
94+
highest RBS Degree=Sup
95+
easiest RBS Degree=Sup
96+
quickest RBS Degree=Sup
97+
earliest RBS Degree=Sup
98+
furthest RBS Degree=Sup
99+
farthest RBS Degree=Sup
100+
soonest RBS Degree=Sup
101+
slowest RBS Degree=Sup
102+
lowest RBS Degree=Sup
103+
closest RBS Degree=Sup
104+
best RBS Degree=Sup
105+
worst RBS Degree=Sup
106+
least RBS Degree=Sup

data/edu/stanford/nlp/upos/ENUniversalPOS.tsurgeon

+19-10
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@
2323
% Context-sensitive mappings
2424
%
2525
% TO -> PART (in CONJP phrases)
26-
CONJP < TO=target < VB
26+
@CONJP < TO=target < VB
2727

2828
relabel target PART
2929

3030
% TO -> PART
31-
VP < VP < (/^TO$/=target <... {/.*/})
31+
@VP < @VP < (/^TO$/=target <... {/.*/})
3232

3333
relabel target PART
3434

@@ -37,13 +37,22 @@ TO=target <... {/.*/}
3737

3838
relabel target ADP
3939

40-
% delete the next two rules, third one should also cover them
41-
%
42-
% VB -> AUX (passive, case 1)
43-
%VP < (/^VB/=target < /^(?i:am|is|are|r|be|being|'s|'re|'m|was|were|been|s|ai|m|art|ar|wase|get|got|getting|gets|gotten)$/ ) < (VP|ADJP [ < VBN|VBD | < (VP|ADJP < VBN|VBD) < CC ] )
44-
%
45-
%relabel target AUX
46-
%
40+
41+
% VB.* -> AUX (for passives where main verb is part of an ADJP)
42+
@VP < (/^VB/=target < /^(?i:am|is|are|r|be|being|'s|'re|'m|was|were|been|s|ai|m|art|ar|wase|get|got|getting|gets|gotten)$/ ) < (@ADJP [ < VBN|VBD | < (@VP|ADJP < VBN|VBD) < CC ] )
43+
44+
relabel target AUX
45+
46+
% VB.* -> AUX (for cases with fronted main VPs)
47+
@SINV < (@VP < (/^VB/=target < /^(?i:am|is|are|r|be|being|'s|'re|'m|was|were|been|s|ai|m|art|ar|wase)$/ ) $-- (@VP < VBD|VBN))
48+
49+
relabel target AUX
50+
51+
% VB.* -> AUX (another, rarer case of fronted VPs)
52+
@SINV < (@VP=target < (@VP < (/^VB/=target < /^(?i:am|is|are|r|be|being|'s|'re|'m|was|were|been|s|ai|m|art|ar|wase)$/ )) $-- (@VP < VBD|VBN))
53+
54+
relabel target AUX
55+
4756
% VB.* -> AUX (passive, case 2)
4857
%SQ|SINV < (/^VB/=target < /^(?i:am|is|are|r|be|being|'s|'re|'m|was|were|been|s|ai|m|art|ar|wase)$/ $++ (VP < VBD|VBN))
4958
%
@@ -55,7 +64,7 @@ VP < VP < (/^VB.*$/=target <... {/.*/})
5564
relabel target AUX
5665

5766
% VB -> AUX (active, case 2)
58-
SQ|SINV < (/^VB/=target $++ /^(?:VP|ADJP)/ <... {/.*/})
67+
@SQ|SINV < (/^VB/=target $++ /^(?:VP|ADJP)/ <... {/.*/})
5968

6069
relabel target AUX
6170

doc/classify/README.txt

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Stanford Classifier v3.5.2 - 2015-04-18
1+
Stanford Classifier v3.5.2 - 2015-04-20
22
-------------------------------------------------
33

44
Copyright (c) 2003-2012 The Board of Trustees of
@@ -76,9 +76,7 @@ LICENSE
7676
CHANGES
7777
-------------------------
7878

79-
2015-04-18 3.5.2 trial classifier dist
80-
81-
2015-04-17 3.5.2 classifier trial
79+
2015-04-20 3.5.2 Update for compatibility
8280

8381
2015-01-29 3.5.1 New input/output options, support for GloVe
8482
word vectors

doc/corenlp/README.txt

+2-12
Original file line numberDiff line numberDiff line change
@@ -42,18 +42,8 @@ LICENSE
4242
CHANGES
4343
---------------------------------
4444

45-
2015-04-18 3.5.2 trial core
46-
47-
2015-04-17 3.5.2 trial core
48-
49-
2015-04-17 3.5.2 trial core
50-
51-
2015-04-17 3.5.2 trial core
52-
53-
2015-04-17 3.5.2 trial core
54-
55-
2015-04-20 3.5.2 Switch to Universal dependencies, add Chinese
56-
coreference system
45+
2015-04-20 3.5.2 Switch to Universal dependencies, add Chinese
46+
coreference systemCore NLP
5747

5848
2015-01-29 3.5.1 NER, dependency parser, SPIED improvements;
5949
general bugfixes

doc/corenlp/corenlp.sh

+4-2
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@
44
# Simple uses for xml and plain text output to files are:
55
# ./corenlp.sh -file filename
66
# ./corenlp.sh -file filename -outputFormat text
7+
# Split into sentences, run POS tagger and NER, write CoNLL-style TSV file:
8+
# ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,ner -outputFormat conll -file input.txt
79
# You can also start a simple shell where you can enter sentences to be processed:
810
# ./corenlp.sh
911

1012
OS=`uname`
11-
# Macs (BSD) don't support readlink -e
12-
if [ "$OS" == "Darwin" ]; then
13+
# Some machines (older OS X, BSD, Windows environments) don't support readlink -e
14+
if hash readlink 2>/dev/null; then
1315
scriptdir=`dirname $0`
1416
else
1517
scriptpath=$(readlink -e "$0") || scriptpath=$0

doc/corenlp/pom-full.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@
1414
</license>
1515
</licenses>
1616
<scm>
17-
<url>http://nlp.stanford.edu/software/stanford-corenlp-2015-04-18.zip</url>
18-
<connection>http://nlp.stanford.edu/software/stanford-corenlp-2015-04-18.zip</connection>
17+
<url>http://nlp.stanford.edu/software/stanford-corenlp-2015-04-21.zip</url>
18+
<connection>http://nlp.stanford.edu/software/stanford-corenlp-2015-04-21.zip</connection>
1919
</scm>
2020
<developers>
2121
<developer>

doc/lexparser/README.txt

+2-6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Stanford Lexicalized Parser v3.5.2 - 2015-04-18
1+
Stanford Lexicalized Parser v3.5.2 - 2015-04-20
22
-----------------------------------------------
33

44
Copyright (c) 2002-2015 The Board of Trustees of The Leland Stanford Junior
@@ -224,11 +224,7 @@ LICENSE
224224
CHANGES
225225
---------------------------------
226226

227-
2015-04-18 3.5.2 trial lex
228-
229-
2015-04-17 3.5.2 trial lex
230-
231-
2015-04-20 3.5.2 Switch to universal dependencies
227+
2015-04-20 3.5.2 Switch to universal dependencies
232228

233229
2015-01-29 3.5.1 Dependency parser improvements; general
234230
bugfixes

doc/lexparser/pom.xml

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@
1414
</license>
1515
</licenses>
1616
<scm>
17-
<url>http://nlp.stanford.edu/software/stanford-parser-2015-04-18.zip</url>
18-
<connection>http://nlp.stanford.edu/software/stanford-parser-2015-04-18.zip</connection>
17+
<url>http://nlp.stanford.edu/software/stanford-parser-2015-04-20.zip</url>
18+
<connection>http://nlp.stanford.edu/software/stanford-parser-2015-04-20.zip</connection>
1919
</scm>
2020
<developers>
2121
<developer>

doc/ner/README.txt

+74-15
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Stanford NER - v3.5.2 - 2015-04-17
1+
Stanford NER - v3.5.2 - 2015-04-20
22
----------------------------------------------
33

44
This package provides a high-performance machine learning based named
@@ -68,7 +68,10 @@ The nodistsim versions of the same models may be available on the
6868
Stanford NER webpage.
6969

7070
Finally, we have models for other languages, include two German models,
71-
a Chinese model, and a Spanish model.
71+
a Chinese model, and a Spanish model. The files for these models can be
72+
found at:
73+
74+
http://nlp.stanford.edu/software/CRF-NER.shtml
7275

7376

7477
QUICKSTART INSTRUCTIONS
@@ -108,6 +111,74 @@ automatically started, and you will also be given the option (under the
108111

109112
java -mx1000m -jar stanford-ner.jar
110113

114+
USING FULL STANFORD CORENLP NER FUNCTIONALITY
115+
116+
This standalone distribution also allows access to the full NER
117+
capabilities of the Stanford CoreNLP pipeline. These capabilities
118+
can be accessed via the NERClassifierCombiner class.
119+
120+
NERClassifierCombiner allows for multiple CRF's to be layered together,
121+
and has options for recognizing numeric sequence patterns and time
122+
patterns with Stanford CoreNLP's SUTime.
123+
124+
Suppose one combines three CRF's CRF-1,CRF-2, and CRF-3 with the
125+
NERClassifierCombiner. When the NERClassiferCombiner runs, it will
126+
first apply the NER tags of CRF-1 to the text, then it will apply
127+
CRF-2's NER tags to any tokens not tagged by CRF-1 and so on. If
128+
the option ner.combinationMode is set to NORMAL (default), any label
129+
applied by CRF-1 cannot be applied by subsequent CRF's. For instance
130+
if CRF-1 applies the LOCATION tag, no other CRF's LOCATION tag will be
131+
used. If ner.combinationMode is set to HIGH_RECALL, this limitation
132+
will be deactivated.
133+
134+
To use NERClassifierCombiner at the command-line, the jars in lib
135+
and stanford-ner.jar must be in the CLASSPATH. Here is an example command:
136+
137+
java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model \
138+
classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz \
139+
-ner.useSUTime false -textFile sample-w-time.txt
140+
141+
Let's break this down a bit. The flag "-ner.model" should be followed by a
142+
list of CRF's to be combined by the NERClassifierCombiner. Some serialized
143+
CRF's are provided in the classifiers directory. In this example the CRF's
144+
trained on the CONLL 4 class data and the MUC 7 class data are being combined.
145+
146+
When the flag "-ner.useSUTime" is followed by "false", SUTime is shut off. You should
147+
note that when the "false" is omitted, the text "4 days ago" suddenly is
148+
tagged with DATE. These are the kinds of phrases SUTime can identify.
149+
150+
NERClassifierCombiner can be run on different types of input as well. Here is
151+
an example which is run on CONLL style input:
152+
153+
java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model \
154+
classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz \
155+
-map word=0,answer=1 -testFile sample-conll-file.txt
156+
157+
It is crucial to include the "-map word=0,answer=1" , which is specifying that
158+
the input test file has the words in the first column and the answer labels
159+
in the second column.
160+
161+
It is also possible to serialize and load an NERClassifierCombiner.
162+
163+
This command loads the three sample crfs with combinationMode=HIGH_RECALL
164+
and SUTime=false, and dumps them to a file named
165+
test_serialized_ncc.ncc.ser.gz
166+
167+
java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -ner.model \
168+
classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz,\
169+
classifiers/english.all.3class.distsim.crf.ser.gz -ner.useSUTime false \
170+
-ner.combinationMode HIGH_RECALL -serializeTo test.serialized.ncc.ncc.ser.gz
171+
172+
An example serialized NERClassifierCombiner with these settings is supplied in
173+
the classifiers directory. Here is an example of loading that classifier and
174+
running it on the sample CONLL data:
175+
176+
java -mx2g edu.stanford.nlp.ie.NERClassifierCombiner -loadClassifier \
177+
classifiers/example.serialized.ncc.ncc.ser.gz -map word=0,answer=1 \
178+
-testFile sample-conll-file.txt
179+
180+
For a more exhaustive description of NERClassifierCombiner go to
181+
http://nlp.stanford.edu/software/ncc-faq.shtml
111182

112183
PROGRAMMATIC USE
113184

@@ -165,19 +236,7 @@ PERSON ORGANIZATION LOCATION
165236
CHANGES
166237
--------------------
167238

168-
2015-04-17 3.5.2 trial ner
169-
170-
2015-04-17 3.5.2 synch standalone and CoreNLP functionality
171-
172-
2015-04-17 3.5.2 synch standalone and CoreNLP functionality
173-
174-
2015-04-17 3.5.2 synch standalone and CoreNLP functionality
175-
176-
2015-04-17 3.5.2 synch standalone and CoreNLP functionality
177-
178-
2015-04-17 3.5.2 synch standalone and CoreNLP functionality
179-
180-
2015-04-17 3.5.2 synch standalone and CoreNLP functionality
239+
2015-04-20 3.5.2 synch standalone and CoreNLP functionality
181240

182241
2015-01-29 3.5.1 Substantial accuracy improvements
183242

doc/ner/sample-conll-file.txt

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
John PERSON
2+
Kerry PERSON
3+
will O
4+
fly O
5+
to O
6+
Paris LOCATION
7+
this O
8+
weekend O
9+
. O

0 commit comments

Comments
 (0)