forked from opnsense/src
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlex.1
4068 lines (3751 loc) · 103 KB
/
lex.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
.\" $FreeBSD$
.\"
.TH FLEX 1 "April 1995" "Version 2.5"
.SH NAME
flex \- fast lexical analyzer generator
.SH SYNOPSIS
.B flex
.B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
.B [\-\-help \-\-version]
.I [filename ...]
.SH OVERVIEW
This manual describes
.I flex,
a tool for generating programs that perform pattern-matching on text. The
manual includes both tutorial and reference sections:
.nf
Description
a brief overview of the tool
Some Simple Examples
Format Of The Input File
Patterns
the extended regular expressions used by flex
How The Input Is Matched
the rules for determining what has been matched
Actions
how to specify what to do when a pattern is matched
The Generated Scanner
details regarding the scanner that flex produces;
how to control the input source
Start Conditions
introducing context into your scanners, and
managing "mini-scanners"
Multiple Input Buffers
how to manipulate multiple input sources; how to
scan from strings instead of files
End-of-file Rules
special rules for matching the end of the input
Miscellaneous Macros
a summary of macros available to the actions
Values Available To The User
a summary of values available to the actions
Interfacing With Yacc
connecting flex scanners together with yacc parsers
Options
flex command-line options, and the "%option"
directive
Performance Considerations
how to make your scanner go as fast as possible
Generating C++ Scanners
the (experimental) facility for generating C++
scanner classes
Incompatibilities With Lex And POSIX
how flex differs from AT&T lex and the POSIX lex
standard
Diagnostics
those error messages produced by flex (or scanners
it generates) whose meanings might not be apparent
Files
files used by flex
Deficiencies / Bugs
known problems with flex
See Also
other documentation, related tools
Author
includes contact information
.fi
.SH DESCRIPTION
.I flex
is a tool for generating
.I scanners:
programs which recognize lexical patterns in text.
.I flex
reads
the given input files, or its standard input if no file names are given,
for a description of a scanner to generate. The description is in
the form of pairs
of regular expressions and C code, called
.I rules. flex
generates as output a C source file,
.B lex.yy.c,
which defines a routine
.B yylex().
This file is compiled and linked with the
.B \-ll
library to produce an executable. When the executable is run,
it analyzes its input for occurrences
of the regular expressions. Whenever it finds one, it executes
the corresponding C code.
.SH SOME SIMPLE EXAMPLES
First some simple examples to get the flavor of how one uses
.I flex.
The following
.I flex
input specifies a scanner which whenever it encounters the string
"username" will replace it with the user's login name:
.nf
%%
username printf( "%s", getlogin() );
.fi
By default, any text not matched by a
.I flex
scanner
is copied to the output, so the net effect of this scanner is
to copy its input file to its output with each occurrence
of "username" expanded.
In this input, there is just one rule. "username" is the
.I pattern
and the "printf" is the
.I action.
The "%%" marks the beginning of the rules.
.PP
Here's another simple example:
.nf
%{
int num_lines = 0, num_chars = 0;
%}
%%
\\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main()
{
yylex();
printf( "# of lines = %d, # of chars = %d\\n",
num_lines, num_chars );
}
.fi
This scanner counts the number of characters and the number
of lines in its input (it produces no output other than the
final report on the counts). The first line
declares two globals, "num_lines" and "num_chars", which are accessible
both inside
.B yylex()
and in the
.B main()
routine declared after the second "%%". There are two rules, one
which matches a newline ("\\n") and increments both the line count and
the character count, and one which matches any character other than
a newline (indicated by the "." regular expression).
.PP
A somewhat more complicated example:
.nf
/* scanner for a toy Pascal-like language */
%{
/* need this for the call to atof() below */
#include <math.h>
%}
DIGIT [0-9]
ID [a-z][a-z0-9]*
%%
{DIGIT}+ {
printf( "An integer: %s (%d)\\n", yytext,
atoi( yytext ) );
}
{DIGIT}+"."{DIGIT}* {
printf( "A float: %s (%g)\\n", yytext,
atof( yytext ) );
}
if|then|begin|end|procedure|function {
printf( "A keyword: %s\\n", yytext );
}
{ID} printf( "An identifier: %s\\n", yytext );
"+"|"-"|"*"|"/" printf( "An operator: %s\\n", yytext );
"{"[^}\\n]*"}" /* eat up one-line comments */
[ \\t\\n]+ /* eat up whitespace */
. printf( "Unrecognized character: %s\\n", yytext );
%%
main( argc, argv )
int argc;
char **argv;
{
++argv, --argc; /* skip over program name */
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
yylex();
}
.fi
This is the beginnings of a simple scanner for a language like
Pascal. It identifies different types of
.I tokens
and reports on what it has seen.
.PP
The details of this example will be explained in the following
sections.
.SH FORMAT OF THE INPUT FILE
The
.I flex
input file consists of three sections, separated by a line with just
.B %%
in it:
.nf
definitions
%%
rules
%%
user code
.fi
The
.I definitions
section contains declarations of simple
.I name
definitions to simplify the scanner specification, and declarations of
.I start conditions,
which are explained in a later section.
.PP
Name definitions have the form:
.nf
name definition
.fi
The "name" is a word beginning with a letter or an underscore ('_')
followed by zero or more letters, digits, '_', or '-' (dash).
The definition is taken to begin at the first non-white-space character
following the name and continuing to the end of the line.
The definition can subsequently be referred to using "{name}", which
will expand to "(definition)". For example,
.nf
DIGIT [0-9]
ID [a-z][a-z0-9]*
.fi
defines "DIGIT" to be a regular expression which matches a
single digit, and
"ID" to be a regular expression which matches a letter
followed by zero-or-more letters-or-digits.
A subsequent reference to
.nf
{DIGIT}+"."{DIGIT}*
.fi
is identical to
.nf
([0-9])+"."([0-9])*
.fi
and matches one-or-more digits followed by a '.' followed
by zero-or-more digits.
.PP
The
.I rules
section of the
.I flex
input contains a series of rules of the form:
.nf
pattern action
.fi
where the pattern must be unindented and the action must begin
on the same line.
.PP
See below for a further description of patterns and actions.
.PP
Finally, the user code section is simply copied to
.B lex.yy.c
verbatim.
It is used for companion routines which call or are called
by the scanner. The presence of this section is optional;
if it is missing, the second
.B %%
in the input file may be skipped, too.
.PP
In the definitions and rules sections, any
.I indented
text or text enclosed in
.B %{
and
.B %}
is copied verbatim to the output (with the %{}'s removed).
The %{}'s must appear unindented on lines by themselves.
.PP
In the rules section,
any indented or %{} text appearing before the
first rule may be used to declare variables
which are local to the scanning routine and (after the declarations)
code which is to be executed whenever the scanning routine is entered.
Other indented or %{} text in the rule section is still copied to the output,
but its meaning is not well-defined and it may well cause compile-time
errors (this feature is present for
.I POSIX
compliance; see below for other such features).
.PP
In the definitions section (but not in the rules section),
an unindented comment (i.e., a line
beginning with "/*") is also copied verbatim to the output up
to the next "*/".
.SH PATTERNS
The patterns in the input are written using an extended set of regular
expressions. These are:
.nf
x match the character 'x'
. any character (byte) except newline
[xyz] a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z] a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\\n] any character EXCEPT an uppercase letter or
a newline
r* zero or more r's, where r is any regular expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5} anywhere from two to five r's
r{2,} two or more r's
r{4} exactly 4 r's
{name} the expansion of the "name" definition
(see above)
"[xyz]\\"foo"
the literal string: [xyz]"foo
\\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \\x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
\\0 a NUL character (ASCII code 0)
\\123 the character with octal value 123
\\x2a the character with hexadecimal value 2a
(r) match an r; parentheses are used to override
precedence (see below)
rs the regular expression r followed by the
regular expression s; called "concatenation"
r|s either an r or an s
r/s an r but only if it is followed by an s. The
text matched by s is included when determining
whether this rule is the "longest match",
but is then returned to the input before
the action is executed. So the action only
sees the text matched by r. This type
of pattern is called trailing context".
(There are some combinations of r/s that flex
cannot match correctly; see notes in the
Deficiencies / Bugs section below regarding
"dangerous trailing context".)
^r an r, but only at the beginning of a line (i.e.,
when just starting to scan, or right after a
newline has been scanned).
r$ an r, but only at the end of a line (i.e., just
before a newline). Equivalent to "r/\\n".
Note that flex's notion of "newline" is exactly
whatever the C compiler used to compile flex
interprets '\\n' as; in particular, on some DOS
systems you must either filter out \\r's in the
input yourself, or explicitly use r/\\r\\n for "r$".
<s>r an r, but only in start condition s (see
below for discussion of start conditions)
<s1,s2,s3>r
same, but in any of start conditions s1,
s2, or s3
<*>r an r in any start condition, even an exclusive one.
<<EOF>> an end-of-file
<s1,s2><<EOF>>
an end-of-file when in start condition s1 or s2
.fi
Note that inside of a character class, all regular expression operators
lose their special meaning except escape ('\\') and the character class
operators, '-', ']', and, at the beginning of the class, '^'.
.PP
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence. For example,
.nf
foo|bar*
.fi
is the same as
.nf
(foo)|(ba(r*))
.fi
since the '*' operator has higher precedence than concatenation,
and concatenation higher than alternation ('|'). This pattern
therefore matches
.I either
the string "foo"
.I or
the string "ba" followed by zero-or-more r's.
To match "foo" or zero-or-more "bar"'s, use:
.nf
foo|(bar)*
.fi
and to match zero-or-more "foo"'s-or-"bar"'s:
.nf
(foo|bar)*
.fi
.PP
In addition to characters and ranges of characters, character classes
can also contain character class
.I expressions.
These are expressions enclosed inside
.B [:
and
.B :]
delimiters (which themselves must appear between the '[' and ']' of the
character class; other elements may occur inside the character class, too).
The valid expressions are:
.nf
[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]
.fi
These expressions all designate a set of characters equivalent to
the corresponding standard C
.B isXXX
function. For example,
.B [:alnum:]
designates those characters for which
.B isalnum()
returns true - i.e., any alphabetic or numeric.
Some systems don't provide
.B isblank(),
so flex defines
.B [:blank:]
as a blank or a tab.
.PP
For example, the following character classes are all equivalent:
.nf
[[:alnum:]]
[[:alpha:][:digit:]]
[[:alpha:]0-9]
[a-zA-Z0-9]
.fi
If your scanner is case-insensitive (the
.B \-i
flag), then
.B [:upper:]
and
.B [:lower:]
are equivalent to
.B [:alpha:].
.PP
Some notes on patterns:
.IP -
A negated character class such as the example "[^A-Z]"
above
.I will match a newline
unless "\\n" (or an equivalent escape sequence) is one of the
characters explicitly present in the negated character class
(e.g., "[^A-Z\\n]"). This is unlike how many other regular
expression tools treat negated character classes, but unfortunately
the inconsistency is historically entrenched.
Matching newlines means that a pattern like [^"]* can match the entire
input unless there's another quote in the input.
.IP -
A rule can have at most one instance of trailing context (the '/' operator
or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
can only occur at the beginning of a pattern, and, as well as with '/' and '$',
cannot be grouped inside parentheses. A '^' which does not occur at
the beginning of a rule or a '$' which does not occur at the end of
a rule loses its special properties and is treated as a normal character.
.IP
The following are illegal:
.nf
foo/bar$
<sc1>foo<sc2>bar
.fi
Note that the first of these, can be written "foo/bar\\n".
.IP
The following will result in '$' or '^' being treated as a normal character:
.nf
foo|(bar$)
foo|^bar
.fi
If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
could be used (the special '|' action is explained below):
.nf
foo |
bar$ /* action goes here */
.fi
A similar trick will work for matching a foo or a
bar-at-the-beginning-of-a-line.
.SH HOW THE INPUT IS MATCHED
When the generated scanner is run, it analyzes its input looking
for strings which match any of its patterns. If it finds more than
one match, it takes the one matching the most text (for trailing
context rules, this includes the length of the trailing part, even
though it will then be returned to the input). If it finds two
or more matches of the same length, the
rule listed first in the
.I flex
input file is chosen.
.PP
Once the match is determined, the text corresponding to the match
(called the
.I token)
is made available in the global character pointer
.B yytext,
and its length in the global integer
.B yyleng.
The
.I action
corresponding to the matched pattern is then executed (a more
detailed description of actions follows), and then the remaining
input is scanned for another match.
.PP
If no match is found, then the
.I default rule
is executed: the next character in the input is considered matched and
copied to the standard output. Thus, the simplest legal
.I flex
input is:
.nf
%%
.fi
which generates a scanner that simply copies its input (one character
at a time) to its output.
.PP
Note that
.B yytext
can be defined in two different ways: either as a character
.I pointer
or as a character
.I array.
You can control which definition
.I flex
uses by including one of the special directives
.B %pointer
or
.B %array
in the first (definitions) section of your flex input. The default is
.B %pointer,
unless you use the
.B -l
lex compatibility option, in which case
.B yytext
will be an array.
The advantage of using
.B %pointer
is substantially faster scanning and no buffer overflow when matching
very large tokens (unless you run out of dynamic memory). The disadvantage
is that you are restricted in how your actions can modify
.B yytext
(see the next section), and calls to the
.B unput()
function destroys the present contents of
.B yytext,
which can be a considerable porting headache when moving between different
.I lex
versions.
.PP
The advantage of
.B %array
is that you can then modify
.B yytext
to your heart's content, and calls to
.B unput()
do not destroy
.B yytext
(see below). Furthermore, existing
.I lex
programs sometimes access
.B yytext
externally using declarations of the form:
.nf
extern char yytext[];
.fi
This definition is erroneous when used with
.B %pointer,
but correct for
.B %array.
.PP
.B %array
defines
.B yytext
to be an array of
.B YYLMAX
characters, which defaults to a fairly large value. You can change
the size by simply #define'ing
.B YYLMAX
to a different value in the first section of your
.I flex
input. As mentioned above, with
.B %pointer
yytext grows dynamically to accommodate large tokens. While this means your
.B %pointer
scanner can accommodate very large tokens (such as matching entire blocks
of comments), bear in mind that each time the scanner must resize
.B yytext
it also must rescan the entire token from the beginning, so matching such
tokens can prove slow.
.B yytext
presently does
.I not
dynamically grow if a call to
.B unput()
results in too much text being pushed back; instead, a run-time error results.
.PP
Also note that you cannot use
.B %array
with C++ scanner classes
(the
.B c++
option; see below).
.SH ACTIONS
Each pattern in a rule has a corresponding action, which can be any
arbitrary C statement. The pattern ends at the first non-escaped
whitespace character; the remainder of the line is its action. If the
action is empty, then when the pattern is matched the input token
is simply discarded. For example, here is the specification for a program
which deletes all occurrences of "zap me" from its input:
.nf
%%
"zap me"
.fi
(It will copy all other characters in the input to the output since
they will be matched by the default rule.)
.PP
Here is a program which compresses multiple blanks and tabs down to
a single blank, and throws away whitespace found at the end of a line:
.nf
%%
[ \\t]+ putchar( ' ' );
[ \\t]+$ /* ignore this token */
.fi
.PP
If the action contains a '{', then the action spans till the balancing '}'
is found, and the action may cross multiple lines.
.I flex
knows about C strings and comments and won't be fooled by braces found
within them, but also allows actions to begin with
.B %{
and will consider the action to be all the text up to the next
.B %}
(regardless of ordinary braces inside the action).
.PP
An action consisting solely of a vertical bar ('|') means "same as
the action for the next rule." See below for an illustration.
.PP
Actions can include arbitrary C code, including
.B return
statements to return a value to whatever routine called
.B yylex().
Each time
.B yylex()
is called it continues processing tokens from where it last left
off until it either reaches
the end of the file or executes a return.
.PP
Actions are free to modify
.B yytext
except for lengthening it (adding
characters to its end--these will overwrite later characters in the
input stream). This however does not apply when using
.B %array
(see above); in that case,
.B yytext
may be freely modified in any way.
.PP
Actions are free to modify
.B yyleng
except they should not do so if the action also includes use of
.B yymore()
(see below).
.PP
There are a number of special directives which can be included within
an action:
.IP -
.B ECHO
copies yytext to the scanner's output.
.IP -
.B BEGIN
followed by the name of a start condition places the scanner in the
corresponding start condition (see below).
.IP -
.B REJECT
directs the scanner to proceed on to the "second best" rule which matched the
input (or a prefix of the input). The rule is chosen as described
above in "How the Input is Matched", and
.B yytext
and
.B yyleng
set up appropriately.
It may either be one which matched as much text
as the originally chosen rule but came later in the
.I flex
input file, or one which matched less text.
For example, the following will both count the
words in the input and call the routine special() whenever "frob" is seen:
.nf
int word_count = 0;
%%
frob special(); REJECT;
[^ \\t\\n]+ ++word_count;
.fi
Without the
.B REJECT,
any "frob"'s in the input would not be counted as words, since the
scanner normally executes only one action per token.
Multiple
.B REJECT's
are allowed, each one finding the next best choice to the currently
active rule. For example, when the following scanner scans the token
"abcd", it will write "abcdabcaba" to the output:
.nf
%%
a |
ab |
abc |
abcd ECHO; REJECT;
.|\\n /* eat up any unmatched character */
.fi
(The first three rules share the fourth's action since they use
the special '|' action.)
.B REJECT
is a particularly expensive feature in terms of scanner performance;
if it is used in
.I any
of the scanner's actions it will slow down
.I all
of the scanner's matching. Furthermore,
.B REJECT
cannot be used with the
.I -Cf
or
.I -CF
options (see below).
.IP
Note also that unlike the other special actions,
.B REJECT
is a
.I branch;
code immediately following it in the action will
.I not
be executed.
.IP -
.B yymore()
tells the scanner that the next time it matches a rule, the corresponding
token should be
.I appended
onto the current value of
.B yytext
rather than replacing it. For example, given the input "mega-kludge"
the following will write "mega-mega-kludge" to the output:
.nf
%%
mega- ECHO; yymore();
kludge ECHO;
.fi
First "mega-" is matched and echoed to the output. Then "kludge"
is matched, but the previous "mega-" is still hanging around at the
beginning of
.B yytext
so the
.B ECHO
for the "kludge" rule will actually write "mega-kludge".
.PP
Two notes regarding use of
.B yymore().
First,
.B yymore()
depends on the value of
.I yyleng
correctly reflecting the size of the current token, so you must not
modify
.I yyleng
if you are using
.B yymore().
Second, the presence of
.B yymore()
in the scanner's action entails a minor performance penalty in the
scanner's matching speed.
.IP -
.B yyless(n)
returns all but the first
.I n
characters of the current token back to the input stream, where they
will be rescanned when the scanner looks for the next match.
.B yytext
and
.B yyleng
are adjusted appropriately (e.g.,
.B yyleng
will now be equal to
.I n
). For example, on the input "foobar" the following will write out
"foobarbar":
.nf
%%
foobar ECHO; yyless(3);
[a-z]+ ECHO;
.fi
An argument of 0 to
.B yyless
will cause the entire current input string to be scanned again. Unless you've
changed how the scanner will subsequently process its input (using
.B BEGIN,
for example), this will result in an endless loop.
.PP
Note that
.B yyless
is a macro and can only be used in the flex input file, not from
other source files.
.IP -
.B unput(c)
puts the character
.I c
back onto the input stream. It will be the next character scanned.
The following action will take the current token and cause it
to be rescanned enclosed in parentheses.
.nf
{
int i;
/* Copy yytext because unput() trashes yytext */
char *yycopy = strdup( yytext );
unput( ')' );
for ( i = yyleng - 1; i >= 0; --i )
unput( yycopy[i] );
unput( '(' );
free( yycopy );
}
.fi
Note that since each
.B unput()
puts the given character back at the
.I beginning
of the input stream, pushing back strings must be done back-to-front.
.PP
An important potential problem when using
.B unput()
is that if you are using
.B %pointer
(the default), a call to
.B unput()
.I destroys
the contents of
.I yytext,
starting with its rightmost character and devouring one character to
the left with each call. If you need the value of yytext preserved
after a call to
.B unput()
(as in the above example),
you must either first copy it elsewhere, or build your scanner using
.B %array
instead (see How The Input Is Matched).
.PP
Finally, note that you cannot put back
.B EOF
to attempt to mark the input stream with an end-of-file.
.IP -
.B input()
reads the next character from the input stream. For example,
the following is one way to eat up C comments:
.nf
%%
"/*" {
int c;
for ( ; ; )
{
while ( (c = input()) != '*' &&
c != EOF )
; /* eat up text of comment */
if ( c == '*' )
{
while ( (c = input()) == '*' )
;
if ( c == '/' )
break; /* found the end */
}
if ( c == EOF )
{
error( "EOF in comment" );
break;
}
}
}
.fi
(Note that if the scanner is compiled using
.B C++,
then
.B input()
is instead referred to as
.B yyinput(),
in order to avoid a name clash with the
.B C++
stream by the name of
.I input.)
.IP -
.B YY_FLUSH_BUFFER
flushes the scanner's internal buffer
so that the next time the scanner attempts to match a token, it will
first refill the buffer using
.B YY_INPUT
(see The Generated Scanner, below). This action is a special case
of the more general
.B yy_flush_buffer()
function, described below in the section Multiple Input Buffers.
.IP -
.B yyterminate()
can be used in lieu of a return statement in an action. It terminates
the scanner and returns a 0 to the scanner's caller, indicating "all done".
By default,
.B yyterminate()
is also called when an end-of-file is encountered. It is a macro and
may be redefined.