-
Notifications
You must be signed in to change notification settings - Fork 0
/
pragmas.html
276 lines (230 loc) · 7.82 KB
/
pragmas.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>New Pragmas in the BNF Converter</title>
</head>
<body bgcolor="#ffffff" text="#000000">
<h1>New Pragmas in the BNF Converter</h1>
<p>
By Aarne Ranta, September 19, 2003.
<p>
This document is a supplement to the BNFC documentation, aimed to explain
the new pragmas introduced in versions 1.3 and 1.9b. Its content will
be incorporated in the BNFC report later.
<p>
<h2>New pragmas in BNFC version 1.3</h2>
These pragmas do not add to the expressive power of BNFC, but are just
shorthands for groups of rules.
<h3>Terminators and separators</h3>
The <tt>terminator</tt> pragma defines a pair of list rules by what
token terminates each element in the list. For instance,
<pre>
terminator Stm ";" ;
</pre>
tells that each statement (<tt>Stm</tt>) is terminated with a semicolon
(<tt>;</tt>). It is a shorthand for the pair of rules
<pre>
[]. [Stm] ::= ;
(:). [Stm] ::= Stm ";" [Stm] ;
</pre>
The qualifier <tt>nonempty</tt> in the pragma makes one-element list
to be the base case. Thus
<pre>
terminator nonempty Stm ";" ;
</pre>
is shorthand for
<pre>
(:[]). [Stm] ::= Stm ";" ;
(:). [Stm] ::= Stm ";" [Stm] ;
</pre>
The terminator can be specified as empty <tt>""</tt>. No token is
introduced then, but e.g.
<pre>
terminator Stm "" ;
</pre>
is translated to
<pre>
[]. [Stm] ::= ;
(:). [Stm] ::= Stm [Stm] ;
</pre>
<p>
The <tt>separator</tt> pragma is similar to <tt>terminator</tt>,
except that the separating token is not attached to the last
element. Thus
<pre>
separator Stm ";" ;
</pre>
means
<pre>
[]. [Stm] ::= ;
(:[]). [Stm] ::= Stm ;
(:). [Stm] ::= Stm ";" [Stm] ;
</pre>
whereas
<pre>
separator nonempty Stm ";" ;
</pre>
means
<pre>
(:[]). [Stm] ::= Stm ;
(:). [Stm] ::= Stm ";" [Stm] ;
</pre>
Notice that, if the empty token <tt>""</tt> is used, there
is no difference between <tt>terminator</tt> and <tt>separator</tt>.
<p>
<b>Problem</b>.
The grammar generated from
a <tt>separator</tt> without <tt>nonempty</tt>
will actually
also accept a list terminating with a semicolon, whereas
the pretty printer "normalizes" it away. This might be considered
as a bug, but a set of rules
forbidding the terminating semicolon would be much more
complicated. The <tt>nonempty</tt> case is strict.
<h3>Coercions</h3>
The <tt>coercions</tt> pragma is a shorthand for a group of rules
translating between precedence levels. For instance,
<pre>
coercions Exp 3 ;
</pre>
is shorthand for
<pre>
_. Exp ::= Exp1 ;
_. Exp1 ::= Exp2 ;
_. Exp2 ::= Exp3 ;
_. Exp3 ::= "(" Exp ")" ;
</pre>
Because of the total coverage of these coercions,
it does not matter in practice if the integer indicating the highest level
(here <tt>3</tt>) is bigger than the highest level actually occurring,
or if there are some other levels without productions in the grammar.
<h3>Rules</h3>
The <tt>rules</tt> pragma is a shorthand for a set of rules from which
labels are generated automatically. For instance,
<pre>
rules Type ::= "int" | "float" | "double" | "long" ;
</pre>
is shorthand for
<pre>
Type_int. Type ::= "int" ;
Type_float. Type ::= "float" ;
Type_double. Type ::= "double" ;
Type_long. Type ::= "long" ;
</pre>
The labels are created automatically. If the production has just
one item, the label looks natural. If it is longer, the type
name indexed with an integer is used. No global checks are
performed when generating these labels. Any label name clashes that
result from them are captured by BNFC type checking on the generated
rules.
<h2>New pragmas in BNFC version 1.9b: layout syntax</h2>
Those who do not know what layout syntax is or who do not like
it can skip this section.
<p>
These new pragmas define a <b>layout syntax</b> for a language.
Before these pragmas were added, layout syntax was not definable in
BNFC. The layout pragmas are only
available for the files generated for Haskell-related tools;
if Java or C++ programmers want to handle layout, they can use
the Haskell layout resolver as a preprocessor to their
front end, before the lexer. In Haskell, the layout resolver
appears, automatically, in its most natural place, which is between the
lexer and the parser. The layout pragmas of BNFC are not
powerful enough to handle the full layout rule of Haskell 98,
but they suffice for the "regular" cases.
<p>
Here is an example, found in the
grammar <tt>layout/Alfa2.cf</tt>.
<pre>
layout "of", "let", "where", "sig", "struct" ;
</pre>
The first line says that <tt>"of", "let", "where", "sig", "struct"</tt>
are <b>layout words</b>, i.e. start a
<b>layout list</b>. A layout list is a list of expressions
normally enclosed in curly brackets and separated by semicolons,
as shown by the Alfa example
<pre>
ECase. Exp ::= "case" Exp "of" "{" [Branch] "}" ;
separator Branch ";" ;
</pre>
When the layout resolver finds the token <tt>of</tt> in the code
(i.e. in the sequence of its lexical tokens), it
checks if the next token is an opening curly bracket. If it
is, nothing special is done until a layout word is encountered again.
The parser will expect the semicolons and the closing bracket
to appear as usual.
<p>
But, if the token <i>t</i>
following <tt>of</tt> is not an opening curly bracket,
a bracket is inserted, and the start column of <i>t</i>
is remembered as the position at which the elements of the
layout list must begin. Semicolons are inserted at those
positions. When a token is eventually encountered left
of the position of <i>t</i> (or an end-of-file), a closing bracket
is inserted at that point.
<p>
Nested layout blocks are allowed, which means that the layout
resolver maintains a stack of positions. Pushing a position
on the stack corresponds to inserting a left bracket, and
popping from the stack corresponds to inserting a right bracket.
<p>
Here is an example of an Alfa source file using layout:
<pre>
c :: Nat = case x of
True -> b
False -> case y of
False -> b
Neither -> d
d = case x of True -> case y of False -> g
x -> b
y -> h
</pre>
Here is what it looks like after layout resolution:
<pre>
c :: Nat = case x of {
True -> b
;False -> case y of {
False -> b
};Neither -> d
};d = case x of {True -> case y of {False -> g
;x -> b
};y -> h} ;
</pre>
<b>Hint</b>. It is good practice to start a new line after
any layout word, to guarantee alpha convertibility.
For instance, if you change the variable name <tt>x</tt>
to <tt>foo</tt>, the second definition above becomes
syntactically incorrect, whereas the first one remains correct.
<p>
There are two more layout-related pragmas. The <tt>layout stop</tt>
pragma, as in
<pre>
layout stop "in" ;
</pre>
tells the resolver that the layout list can be exited with
some stop words, like <tt>in</tt>, which exits a
<tt>let</tt> list. It is no error in the resolver to exit
some other kind of layout list with <tt>in</tt>, but an error will show up
in the parser.
<p>
The <tt>layout toplevel</tt> pragma tells that the whole source file
is a layout list, even though no layout word indicates this.
The position is the first column, and the resolver adds a semicolon
after every paragraph whose first token is at this position. No curly
brackets are added. The Alfa file above is an example of this, with
two such semicolons added.
<p>
To make layout resolution a stand-alone program, e.g. to serve as a
preprocessor, the programmer can modify the file
<tt>layout/ResolveLayoutAlfa.hs</tt> as indicated in the file,
and either compile it or run it in the Hugs interpreter by
<pre>
runhugs ResolveLayoutX.hs <X-source-file>
</pre>
We may add the generation of <tt>ResolveLayoutX.hs</tt>
to a later version of BNFC.
<p>
<b>Bug</b>. The generated layout resolver does not work correctly
if a layout word is the first token on a line.
</body>
</html>