Skip to content

Commit e87a53a

Browse files
Vivek SrikumarStanford NLP
Vivek Srikumar
authored and
Stanford NLP
committed
Merge branch 'bioprocess' of origin into bioprocess
1 parent 313e9ec commit e87a53a

File tree

138 files changed

+12631
-10320
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

138 files changed

+12631
-10320
lines changed

CONTRIBUTING.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ In order for us to continue to be able to dual-license Stanford CoreNLP, we need
1010
Therefore, we can accept contributions on any of the following terms:
1111
* If your contribution is a bug fix of 6 lines or less of new code, we will accept it on the basis that both you and us regard the contribution as de minimis, and not requiring further hassle.
1212
* You can declare that the contribution is in the public domain (in your commit message or pull request).
13-
* You can make your contribution available under a non-restrictive open source licensing, such as the Revised (or 3-clause) BSD license, with appropriate licensing information included with the submitted code.
14-
* You can sign and return to us a contributor license agreement, explicitly licensing us to be able to use the code. Contact us at: [email protected] .
13+
* You can make your contribution available under a non-restrictive open source license, such as the Revised (or 3-clause) BSD license, with appropriate licensing information included with the submitted code.
14+
* You can sign and return to us a contributor license agreement (CLA), explicitly licensing us to be able to use the code. You can find these agreements at http://nlp.stanford.edu/software/CLA/ . You can send them to us or contact us at: [email protected] .
1515

16-
You should do development against our master branch. You should make sure that all unit tests still pass. (In general, you will not be able to run our integration tests, since they rely on resources in our filesystem.)
16+
You should do development against our master branch. The project's source code is in utf-8 character encoding. You should make sure that all unit tests still pass. (In general, you will not be able to run our integration tests, since they rely on resources in our filesystem.)
+20-20
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
5 Iris-setosa Iris-setosa 0.995615365125735
2-
4.6 Iris-setosa Iris-setosa 0.9994804135630505
3-
5.1 Iris-setosa Iris-setosa 0.9937095680980086
4-
4.9 Iris-setosa Iris-setosa 0.9905109629700247
5-
5.4 Iris-setosa Iris-setosa 0.9982151488134486
6-
4.4 Iris-setosa Iris-setosa 0.9944214428148407
7-
5.3 Iris-setosa Iris-setosa 0.9984497925740373
8-
6.1 Iris-versicolor Iris-versicolor 0.8873152482428373
9-
6 Iris-versicolor Iris-versicolor 0.9424246013278404
10-
5.5 Iris-versicolor Iris-versicolor 0.9030026595536319
11-
6.5 Iris-versicolor Iris-versicolor 0.928816167001929
12-
6.8 Iris-versicolor Iris-versicolor 0.9569376555329442
13-
6.2 Iris-versicolor Iris-versicolor 0.9857141927233324
14-
6.7 Iris-virginica Iris-virginica 0.9698639532763317
15-
6.4 Iris-virginica Iris-virginica 0.8982390073296296
16-
5.7 Iris-virginica Iris-virginica 0.9920401400173403
17-
6.7 Iris-virginica Iris-virginica 0.968576539063806
18-
6.8 Iris-virginica Iris-virginica 0.9957320369272686
19-
7.7 Iris-virginica Iris-virginica 0.9900526044768513
20-
7.3 Iris-virginica Iris-virginica 0.9766204287594443
1+
5 Iris-setosa Iris-setosa 0.996 0.996
2+
4.6 Iris-setosa Iris-setosa 0.999 0.999
3+
5.1 Iris-setosa Iris-setosa 0.994 0.994
4+
4.9 Iris-setosa Iris-setosa 0.991 0.991
5+
5.4 Iris-setosa Iris-setosa 0.998 0.998
6+
4.4 Iris-setosa Iris-setosa 0.994 0.994
7+
5.3 Iris-setosa Iris-setosa 0.998 0.998
8+
6.1 Iris-versicolor Iris-versicolor 0.887 0.887
9+
6 Iris-versicolor Iris-versicolor 0.942 0.942
10+
5.5 Iris-versicolor Iris-versicolor 0.903 0.903
11+
6.5 Iris-versicolor Iris-versicolor 0.929 0.929
12+
6.8 Iris-versicolor Iris-versicolor 0.957 0.957
13+
6.2 Iris-versicolor Iris-versicolor 0.986 0.986
14+
6.7 Iris-virginica Iris-virginica 0.970 0.970
15+
6.4 Iris-virginica Iris-virginica 0.898 0.898
16+
5.7 Iris-virginica Iris-virginica 0.992 0.992
17+
6.7 Iris-virginica Iris-virginica 0.969 0.969
18+
6.8 Iris-virginica Iris-virginica 0.996 0.996
19+
7.7 Iris-virginica Iris-virginica 0.990 0.990
20+
7.3 Iris-virginica Iris-virginica 0.977 0.977
+20-20
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
1-
5 Iris-setosa Iris-setosa 0.9919247137755053
2-
4.6 Iris-setosa Iris-setosa 0.9988153870786971
3-
5.1 Iris-setosa Iris-setosa 0.9893228231715544
4-
4.9 Iris-setosa Iris-setosa 0.9835318845429561
5-
5.4 Iris-setosa Iris-setosa 0.9960427411240634
6-
4.4 Iris-setosa Iris-setosa 0.9910859075339642
7-
5.3 Iris-setosa Iris-setosa 0.9965862883009643
8-
6.1 Iris-versicolor Iris-versicolor 0.8468902641192759
9-
6 Iris-versicolor Iris-versicolor 0.9307517829994151
10-
5.5 Iris-versicolor Iris-versicolor 0.7982164305911292
11-
6.5 Iris-versicolor Iris-versicolor 0.873020490772672
12-
6.8 Iris-versicolor Iris-versicolor 0.9142958840729118
13-
6.2 Iris-versicolor Iris-versicolor 0.9691329948474605
14-
6.7 Iris-virginica Iris-virginica 0.9514065325627161
15-
6.4 Iris-virginica Iris-virginica 0.8326970803989662
16-
5.7 Iris-virginica Iris-virginica 0.9861478471561218
17-
6.7 Iris-virginica Iris-virginica 0.9281387678310443
18-
6.8 Iris-virginica Iris-virginica 0.9869791941203433
19-
7.7 Iris-virginica Iris-virginica 0.980694494307154
20-
7.3 Iris-virginica Iris-virginica 0.9555631398239129
1+
5 Iris-setosa Iris-setosa 0.992 0.992
2+
4.6 Iris-setosa Iris-setosa 0.999 0.999
3+
5.1 Iris-setosa Iris-setosa 0.989 0.989
4+
4.9 Iris-setosa Iris-setosa 0.984 0.984
5+
5.4 Iris-setosa Iris-setosa 0.996 0.996
6+
4.4 Iris-setosa Iris-setosa 0.991 0.991
7+
5.3 Iris-setosa Iris-setosa 0.997 0.997
8+
6.1 Iris-versicolor Iris-versicolor 0.847 0.847
9+
6 Iris-versicolor Iris-versicolor 0.931 0.931
10+
5.5 Iris-versicolor Iris-versicolor 0.798 0.798
11+
6.5 Iris-versicolor Iris-versicolor 0.873 0.873
12+
6.8 Iris-versicolor Iris-versicolor 0.914 0.914
13+
6.2 Iris-versicolor Iris-versicolor 0.969 0.969
14+
6.7 Iris-virginica Iris-virginica 0.951 0.951
15+
6.4 Iris-virginica Iris-virginica 0.833 0.833
16+
5.7 Iris-virginica Iris-virginica 0.986 0.986
17+
6.7 Iris-virginica Iris-virginica 0.928 0.928
18+
6.8 Iris-virginica Iris-virginica 0.987 0.987
19+
7.7 Iris-virginica Iris-virginica 0.981 0.981
20+
7.3 Iris-virginica Iris-virginica 0.956 0.956
+9-9
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
CONLL EVAL SUMMARY (Before COREF)
2-
Identification of Mentions: Recall: (12407 / 14291) 86.81% Precision: (12407 / 34999) 35.44% F1: 50.34%
2+
Identification of Mentions: Recall: (12407 / 14291) 86.81% Precision: (12407 / 34999) 35.44% F1: 50.34%
33

44
CONLL EVAL SUMMARY (After COREF)
5-
METRIC muc:Coreference: Recall: (6260 / 10539) 59.39% Precision: (6260 / 10027) 62.43% F1: 60.87%
6-
METRIC bcub:Coreference: Recall: (12379.37 / 18298) 67.65% Precision: (13598.84 / 18298) 74.31% F1: 70.83%
7-
METRIC ceafm:Coreference: Recall: (10894 / 18298) 59.53% Precision: (10894 / 18298) 59.53% F1: 59.53%
8-
METRIC ceafe:Coreference: Recall: (3811.5 / 7759) 49.12% Precision: (3811.5 / 8271) 46.08% F1: 47.55%
9-
METRIC blanc:Coreference links: Recall: (25257 / 54427) 46.4% Precision: (25257 / 40544) 62.29% F1: 53.18%
10-
Non-coreference links: Recall: (922975 / 938262) 98.37% Precision: (922975 / 952145) 96.93% F1: 97.64%
11-
BLANC: Recall: (0.72 / 1) 72.38% Precision: (0.8 / 1) 79.61% F1: 75.41%
5+
METRIC muc:Coreference: Recall: (6256 / 10539) 59.36% Precision: (6256 / 10078) 62.07% F1: 60.68%
6+
METRIC bcub:Coreference: Recall: (12462.33 / 18385) 67.78% Precision: (13629.92 / 18385) 74.13% F1: 70.81%
7+
METRIC ceafm:Coreference: Recall: (10928 / 18385) 59.43% Precision: (10928 / 18385) 59.43% F1: 59.43%
8+
METRIC ceafe:Coreference: Recall: (3832.95 / 7846) 48.85% Precision: (3832.95 / 8307) 46.14% F1: 47.45%
9+
METRIC blanc:Coreference links: Recall: (25245 / 54427) 46.38% Precision: (25245 / 40608) 62.16% F1: 53.12%
10+
Non-coreference links: Recall: (932068 / 947431) 98.37% Precision: (932068 / 961250) 96.96% F1: 97.66%
11+
BLANC: Recall: (0.72 / 1) 72.38% Precision: (0.8 / 1) 79.56% F1: 75.39%
1212

13-
Final conll score ((muc+bcub+ceafe)/3) = 59.75
13+
Final conll score ((muc+bcub+ceafe)/3) = 59.65
1414
Final score (pairwise) Precision = 0.57
1515
done

test/src/edu/stanford/nlp/neural/EmbeddingTest.java itest/src/edu/stanford/nlp/neural/EmbeddingITest.java

+4-5
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,10 @@
1313
*
1414
*/
1515

16-
public class EmbeddingTest {
17-
public static final String PREFIX = "projects/core/";
18-
public static final String wordVectorFile = PREFIX + "data/edu/stanford/nlp/neural/wordVector.txt";
19-
public static final String wordFile = PREFIX + "data/edu/stanford/nlp/neural/word.txt";
20-
public static final String vectorFile = PREFIX + "data/edu/stanford/nlp/neural/vector.txt";
16+
public class EmbeddingITest {
17+
public static final String wordVectorFile = "edu/stanford/nlp/neural/wordVector.txt";
18+
public static final String wordFile = "edu/stanford/nlp/neural/word.txt";
19+
public static final String vectorFile = "edu/stanford/nlp/neural/vector.txt";
2120

2221
@Test
2322
public void testLoadFromOneFile() {

itest/src/edu/stanford/nlp/pipeline/DeterministicCorefAnnotatorITest.java

+2-1
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,12 @@
1212
import edu.stanford.nlp.ling.CoreAnnotations;
1313
import edu.stanford.nlp.ling.CoreLabel;
1414
import edu.stanford.nlp.dcoref.CorefCoreAnnotations;
15-
import edu.stanford.nlp.dcoref.CorefCoreAnnotations;
1615
import edu.stanford.nlp.util.CoreMap;
1716

1817
public class DeterministicCorefAnnotatorITest extends TestCase {
1918
private static AnnotationPipeline pipeline;
2019

20+
@Override
2121
public void setUp() throws Exception {
2222
synchronized(DeterministicCorefAnnotatorITest.class) {
2323
pipeline = new AnnotationPipeline();
@@ -131,4 +131,5 @@ public static void main(String[] args) throws Exception {
131131
DeterministicCorefAnnotatorITest itest = new DeterministicCorefAnnotatorITest();
132132
itest.testDeterministicCorefAnnotator();
133133
}
134+
134135
}

scripts/makeSerialized.csh

+1-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ java -mx1500m edu.stanford.nlp.parser.lexparser.LexicalizedParser -evals "factDA
145145
# This now works
146146
( echo "Running xinhuaFactored from serialized (check without specifying -tLPP) on $host -server" ; time java -server -mx1800m edu.stanford.nlp.parser.lexparser.LexicalizedParser -evals "factDA,tsv" -maxLength 40 -loadFromSerializedFile xinhuaFactored.ser.gz -test $ctb 001-025 ) >>& ./serializedParsers.log
147147

148-
( echo "Running chinesePCFG (simplified for use in the RNN parser) on $host -server" ; time java -server -mx4g edu.stanford.nlp.parser.lexparser.LexicalizedParser -evals "factDA,tsv" -tLPP edu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams -chineseFactored -PCFG -compactGrammar 0 -saveToSerializedFile chinesePCFG-simple.ser.gz -maxLength 40 -train $ctb7train -test $ctb7test ) >>& ./serializedParsers.log
148+
( echo "Running chinesePCFG (simplified for use in the RNN parser) on $host -server" ; time java -server -mx4g edu.stanford.nlp.parser.lexparser.LexicalizedParser -evals "factDA,tsv" -tLPP edu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams -chineseFactored -PCFG -hMarkov 1 -nomarkNPconj -compactGrammar 0 -saveToSerializedFile chinesePCFG.simple.ser.gz -maxLength 40 -train $ctb7train -test $ctb7test ) >>& ./serializedParsers.log
149149

150150
# German Factored binary from Negra (version 2)
151151
# $negra 3 is the dev set

scripts/pos-tagger/Makefile

+72
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# TODO: is there some way to make all of the targets use the same command?
2+
3+
ARABIC_TEST = format=TREES,/u/nlp/data/lexparser/trees/Arabic/2-Unvoc-Test.utf8.txt
4+
5+
CHINESE_TEST = format=TREES,/u/nlp/data/chinese/ctb7/test.mrg
6+
7+
ENGLISH_TEST = /u/nlp/data/pos-tagger/english/test-wsj-22-24
8+
9+
FRENCH_TEST = format=TREES,/u/nlp/data/lexparser/trees/French/FTB-Test.utf8.txt
10+
11+
GERMAN_TEST = /u/nlp/data/pos-tagger/german/german-dev.txt
12+
13+
.SECONDEXPANSION:
14+
15+
all: arabic chinese english french german testing wsj
16+
.PHONY: all arabic chinese english french german testing wsj
17+
18+
arabic: arabic.tagger arabic-train.tagger
19+
20+
# we release an arabic model trained on everything, with a
21+
# corresponding model on train only for testing purposes
22+
arabic.tagger arabic-train.tagger: $$@.props
23+
@echo Training $@
24+
@echo Will test on $(ARABIC_TEST)
25+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
26+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -model $@ -testFile $(ARABIC_TEST) -verboseResults false >> $@.out 2>&1
27+
28+
chinese: chinese-distsim.tagger chinese-nodistsim.tagger
29+
30+
chinese-nodistsim.tagger chinese-distsim.tagger: $$@.props
31+
@echo Training $@
32+
@echo Will test on $(CHINESE_TEST)
33+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
34+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -model $@ -testFile $(CHINESE_TEST) -verboseResults false >> $@.out 2>&1
35+
36+
english: english-bidirectional-distsim.tagger english-caseless-left3words-distsim.tagger english-left3words-distsim.tagger
37+
38+
english-bidirectional-distsim.tagger english-caseless-left3words-distsim.tagger english-left3words-distsim.tagger: $$@.props
39+
@echo Training $@
40+
@echo Will test on $(ENGLISH_TEST)
41+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
42+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -model $@ -testFile $(ENGLISH_TEST) -verboseResults false >> $@.out 2>&1
43+
44+
french: french.tagger
45+
46+
french.tagger: $$@.props
47+
@echo Training $@
48+
@echo Will test on $(FRENCH_TEST)
49+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
50+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -model $@ -testFile $(FRENCH_TEST) -verboseResults false >> $@.out 2>&1
51+
52+
german: german-dewac.tagger german-fast.tagger german-fast-caseless.tagger german-hgc.tagger
53+
54+
german-dewac.tagger german-fast.tagger german-fast-caseless.tagger german-hgc.tagger: $$@.props
55+
@echo Training $@
56+
@echo Will test on $(GERMAN_TEST)
57+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
58+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -model $@ -testFile $(GERMAN_TEST) -verboseResults false >> $@.out 2>&1
59+
60+
testing: testing.tagger
61+
62+
testing.tagger:
63+
@echo Training $@
64+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
65+
66+
wsj: wsj-0-18-bidirectional-distsim.tagger wsj-0-18-bidirectional-nodistsim.tagger wsj-0-18-caseless-left3words-distsim.tagger wsj-0-18-left3words-distsim.tagger wsj-0-18-left3words-nodistsim.tagger
67+
68+
wsj-0-18-bidirectional-distsim.tagger wsj-0-18-bidirectional-nodistsim.tagger wsj-0-18-caseless-left3words-distsim.tagger wsj-0-18-left3words-distsim.tagger wsj-0-18-left3words-nodistsim.tagger: $$@.props
69+
@echo Training $@
70+
@echo Will test on $(ENGLISH_TEST)
71+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -props $@.props > $@.out 2>&1
72+
java -mx6g edu.stanford.nlp.tagger.maxent.MaxentTagger -model $@ -testFile $(ENGLISH_TEST) -verboseResults false >> $@.out 2>&1

0 commit comments

Comments
 (0)