-
Notifications
You must be signed in to change notification settings - Fork 48
/
Copy pathSimple.g
executable file
·104 lines (96 loc) · 3.5 KB
/
Simple.g
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
grammar Simple;
options {language=JavaScript;}
tokens {
RCURLY='}';
}
@lexer::members {
this.JAVADOC_CHANNEL = 1;
this.nesting = 0;
}
/** This example is meant to illustrate how ANTLR can handle so-called "island
* grammars", which are just embedded languages. I chose a particularly
* nasty problem. A simple programming language with javadoc-style
* comments with the usual embedded @author tag but also with embedded
* Simple code actions in curlies such as {x=3}. Now that is a stupid
* thing maybe from a language design point of view, but is hard because it's
* a language (Simple) embedded within another language (JavaDoc) embedded
* within the innermost language (Simple) again. See the input file.
* So the Simple lexer invokes the javadoc lexer which invokes the Simple
* lexer again. The key seems to be returning an EOF token when you
* see the "final" token.
*
* This example is made nasty further by using valid characters of Simple
* (the curlies) to delimit it inside the Javadoc comments. The problem is
* that '}' may be a regular curly inside a Simple statement or it could
* be the signal that the embedded action is over. You must count the
* curly nesting level to decide if it's time to stop the embedded action.
* consider every "island grammar input chunk" such as javadoc or
* embedded Simple statements as a separate "file". So, when I hit the
* last delimiter token that says to bail out, I just return EOF. Then
* there is no need for an explicit stack of input streams.
*
* Finally, this grammar illustrates how to share input streams as all
* the grammars pull from the same input stream.
*
* A key point to notice is that since I create a new token stream when
* I go off to recognize javadoc comments, the lookahead for the Simple
* parser is not messed up in any way.
*/
program : (variable)*
(method)+
;
variable: 'int' ID ('=' expr)? ';'
;
method : 'method' ID '(' ')' {print("enter method "+$ID.text);}
block
;
block : '{'
(variable)*
(statement)+
'}'
;
statement
: ID '=' expr ';' {print("assignment to "+$ID.text);}
| 'return' expr ';'
| block
;
expr : ID
| INT
;
ID : ('a'..'z'|'A'..'Z')+ ;
INT : ('0'..'9')+ ;
WS : (' '|'\t'|'\n')+ {$channel=HIDDEN;}
;
LCURLY : '{' {this.nesting++;}
;
/** If we have a '}' with nesting level 0 then it must match the '{'
* (unseen by this grammar) that started an embedded Simple statement
* block within a javadoc comment.
*/
RCURLY : '}'
{
if ( this.nesting<=0 ) {
this.emit(org.antlr.runtime.Token.EOF_TOKEN);
print("exiting embedded simple");
}
else {
this.nesting--;
}
}
;
JAVADOC : '/**'
{
// create a new javadoc lexer/parser duo that feeds
// off the current input stream
print("enter javadoc");
var j = new JavadocLexer(this.input);
var tokens = new org.antlr.runtime.CommonTokenStream(j);
tokens.discardTokenType(JavadocLexer.prototype.WS);
var p = new JavadocParser(tokens);
p.comment();
// returns a JAVADOC token to the java parser but on a
// different channel than the normal token stream so it
// doesn't get in the way.
$channel = this.JAVADOC_CHANNEL;
}
;