-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathgeneration.txt
163 lines (106 loc) · 4.49 KB
/
generation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Changes:
--------
(1) comment out the loading of genrules.mrs in
mrs-initialization.lsp.
(1.1) Add the loading of mt.lsp and trigger.mtr in (a)script
(1.2) commented out the loading of the type file mrsmunge.tdl
(1.3) added the loading of the type file mtr.tdl
(3) added new rules in trigger.mtr
(4) In globals.lsp
(4.1) added list of contentful things not to generate in
(setf *duplicate-lex-ids*
'(;; s-end1-decl-lex - emphatic sentence enders
ga-sap keredomo-send kedomo-send ga-sap kedo-send shi-send
yo-2 yo-3 keredo-send exclamation-mark ze zo zo-2
;; s-end1-decl-minusahon-lex - emphatic sentence enders
i-emp
;; variant forms of numbers (hankaku)
zero_card_a one_card_a two_card_a three_card_a four_card_a
five_card_a six_card_a seven_card_a eight_card_a nine_card_a
;; variant forms of numbers (zenkaku)
zero_card one_card two_card three_card four_card
five_card six_card seven_card eight_card nine_card
;;; indefinite pronouns FIXME - improve semantics
donna douiu dono-det
))
(4.2)
;;;
;;; make generation faster
;;;
(setf *gen-packing-p* t)
(setf *gen-filtering-p* t)
(setf *packing-restrictor* '(RELS HCONS ORTH STEM RULE-NAME))
------------------------------------
(5) In MRS globals.lsp:
;;; Filter out uninformative information on the MRS attributes
;;; This will probably change soon
(setf %mrs-extras-filter%
(list
(cons (mrs::vsym "SORT") (mrs::vsym "semsort"))
(cons (mrs::vsym "E.ASPECT") (mrs::vsym "aspect"))
(cons (mrs::vsym "E.PASS") (mrs::vsym "bool"))))
;;; Fix defaults for unspecified attributes
(defparameter %mrs-extras-defaults%
(list
(list (vsym "E")
(cons (vsym "E.ASPECT") (vsym "default_aspect"))
(cons (vsym "E.PASS") (vsym "-")))))
========================================================================
Strings whose first character is a tilde will be interpreted as perl res
"~.*_s_"
Problem: Type "e" is used in the new genrules (mrs.tdl)
I will try and change "e" to "he"
(1) commented out vref in values.tdl (not used!)
this frees up "he"
(2) Changed postp-case "e" to "he" in values.tdl
(3) Changed "e" to "he" for e-main and e-sub
========================================================================
Problem: All classifiers become "ko"
Problem: can't distinguish between na-adj, sahen, etc
Strings whose first character is a tilde will be interpreted as perl res
"~.*_s_"
-> should have access to the lexical types in the generator.
Non generator problems: i-emp appearing on benkyou-suru, kirei, ...
Should I just list all the funny SAPs to reduce ambiguity? I have no
way to generate them when I want them now, ... (and i don't really
want them now).
Problem: progressive unifies with tense!
(do-parse-tty "寿动 し て いる") generates (do-parse-tty "寿动 する")
========================================================================
reload generation rules
(progn
(mt:initialize-transfer)
(mt:read-transfer-rules
(list
"~/svn/jacy/trigger.mtr")
"Generator Triggger Rules"
:filter nil :task :trigger :recurse nil :subsume nil))
========================================================================
Is there a clever way of checking changes beyond
(1) update - somewhat expensive
(2) reparse (where can I flop)?
check against a gold standard to see if the desired parse is still
in there (compare detail derivation?)
can I do this for batches, ...
how much data should I be checking against, ...
------------------------------------------------------------------------
I see two things in generation
(1) adding things in
need to check generation cover
(2) fixing spurious ambiguity in the grammar
need to check analysis cover
========================================================================
How to write out MRSs when parsing
(setf tsdb::*tsdb-semantix-hook* "mrs::get-mrs-string")
========================================================================
v2n-kata-rule is adding content in a deprecated manner. This means
that the rules is generated even though it adds content.
we can also constrain ga-no a bit more for different inflections.
Can we get rid of the ersatz thingies?
We should also constrain "you" etc a bit more
We should also constrain "donna" etc a bit more
FIXME [now commented out in *duplicate-lex-ids* ]
========================================================================
Why can we parse but not generate?
(do-parse-tty "袱 黔")
It makes noun-compounds but not then the n-infl....