Skip to content

Latest commit

 

History

History
1243 lines (829 loc) · 48.5 KB

Rule-Reference.md

File metadata and controls

1243 lines (829 loc) · 48.5 KB

Rule Reference

This page contains brief descriptions of all PEGTL rule and combinator classes.

The information about how much input is consumed by the rules only applies to when the rules succeed; the PEGTL is implemented in a way that assumes that rules never consume input when they do not succeed.

Remember that there are two failure modes, only the first of which usually leads to back-tracking:

  • Local failure or a return value of false, in which case the rule must rewind the input to the position at which the rule match was attempted.
  • Global failure or an exception (usually of type tao::parse_error) that is usually generated by a control-class' raise() static member function.

Equivalence

Some rule classes are said to be equivalent to a combination of other rules. These rules are not completely equivalent to the shown definition because that is not how they are implemented, therefore:

  • Rule equivalence is with regard to which inputs will match, but:
  • not with regard to which actions will be invoked while matching.

However, rule equivalence does show exactly where the raise<> rule is inserted and therefore which rule will be used to call the control class' raise().

Parameter Packs

The documentation will use (template parameter) packs when zero-or-more or one-or-more of a (template) parameter are allowed. For example seq< R... > accepts zero-or-more template parameters. In the zero case, i.e. seq<>, we describe R as "empty". When at least one parameter is given, i.e. seq< A > or seq< A, B, C >, R is "non-empty".

Contents

Meta Rules

These rules are in namespace tao::pegtl.

action< A, R... >
  • Equivalent to seq< R... >, but:
  • Uses the given class template A for actions.
  • Actions can still be disabled explicitly (via disable) or implicitly (via at or not_at).
control< C, R... >
  • Equivalent to seq< R... >, but:
  • Uses the given class template C as control class.
disable< R... >
  • Equivalent to seq< R... >, but:
  • Disables all actions.
discard
  • Equivalent to success, but:
  • Calls the input's discard() member function.
  • Must not be used where backtracking to before the discard might occur and/or nested within a rule for which an action with input can be called.
  • See Incremental Input for details.
enable< R... >
  • Equivalent to seq< R... >, but:
  • Enables all actions (if any).
require< Num >
  • Succeeds if at least Num further input bytes are available.
  • With Incremental Input reads the bytes into the buffer.
state< S, R... >
  • Equivalent to seq< R... >, but:
  • Replaces all state arguments with a new instance s of type S.
  • s is constructed with the input and all previous states as arguments.
  • If seq< R... > succeeds then s.success() is called with the input after the match and all previous states as arguments, and, if expected, with A, M, Action, Control as template parameters.

Combinators

Combinators (or combinator rules) are rules that combine (other) rules into new ones.

These are the classical PEG combinator rules defined in namespace tao::pegtl.

at< R... >
  • PEG and-predicate &e
  • Succeeds if and only if seq< R... > would succeed.
  • Consumes nothing, i.e. rewinds after matching.
  • Disables all actions.
not_at< R... >
  • PEG not-predicate !e
  • Succeeds if and only if seq< R... > would not succeed.
  • Consumes nothing, i.e. rewinds after matching.
  • Disables all actions.
opt< R... >
  • PEG optional e?
  • Optional seq< R... >, i.e. attempt to match seq< R... > and signal success regardless of the result.
  • Equivalent to sor< seq< R... >, success >.
plus< R... >
  • PEG one-or-more e+
  • Matches seq< R... > as often as possible and succeeds if it matches at least once.
  • Equivalent to rep_min< 1, R... >.
  • R must be a non-empty rule pack.
seq< R... >
  • PEG sequence e1 e2
  • Sequence or conjunction of rules.
  • Matches the given rules R... in the given order.
  • Fails and stops matching when one of the given rules fails.
  • Consumes everything that the rules R... consumed.
  • Succeeds if R is an empty rule pack.
sor< R... >
  • PEG ordered choice e1 / e2
  • Choice or disjunction of rules.
  • Matches the given rules R... in the given order.
  • Succeeds and stops matching when one of the given rules succeeds.
  • Consumes whatever the first rule that succeeded consumed.
  • Fails if R is an empty rule pack.
star< R... >
  • PEG zero-or-more e*
  • Matches seq< R... > as often as possible and always succeeds.
  • R must be a non-empty rule pack.

Convenience

The PEGTL offers a variety of convenience rules which help writing concise grammars as well as offering performance benefits over the equivalent implementation with classical PEG combinators.

These rules are in namespace tao::pegtl.

if_must< R, S... >
  • Attempts to match R and depending on the result proceeds with either must< S... > or failure.
  • Equivalent to seq< R, must< S... > >.
  • Equivalent to if_then_else< R, must< S... >, failure >.
if_must_else< R, S, T >
  • Attempts to match R and depending on the result proceeds with either must< S > or must< T >.
  • Equivalent to if_then_else< R, must< S >, must< T > >.
if_then_else< R, S, T >
  • Equivalent to sor< seq< R, S >, seq< not_at< R >, T > >.
list< R, S >
  • Matches a non-empty list of R separated by S.
  • Equivalent to seq< R, star< S, R > >.
list< R, S, P >
  • Matches a non-empty list of R separated by S where each S can be padded by P.
  • Equivalent to seq< R, star< pad< S, P >, R > >.
list_must< R, S >
  • Matches a non-empty list of R separated by S.
  • Similar to list< R, S >, but if there is an S it must be followed by an R.
  • Equivalent to seq< R, star< if_must< S, R > > >.
list_must< R, S, P >
  • Matches a non-empty list of R separated by S where each S can be padded by P.
  • Similar to list< R, S, P >, but if there is an S it must be followed by an R.
  • Equivalent to seq< R, star< if_must< pad< S, P >, R > > >.
list_tail< R, S >
  • Matches a non-empty list of R separated by S with optional trailing S.
  • Equivalent to seq< list< R, S >, opt< S > >.
list_tail< R, S, P >
  • Matches a non-empty list of R separated by S with optional trailing S and padding P inside the list.
  • Equivalent to seq< list< R, S, P >, opt< star< P >, S > >.
minus< M, S >
  • Succeeds if M matches, and S does not match all of the input that M matched.
  • Equivalent to rematch< M, not_at< S, eof > >.
must< R... >
  • Equivalent to seq< R... >, but:
  • Converts local failure of R... into global failure.
  • Calls raise< R > for the R that failed.
  • Equivalent to seq< sor< R, raise< R > >... >.
opt_must< R, S... >
  • Equivalent to opt< if_must< R, S... > >.
pad< R, S, T = S >
  • Matches an R that can be padded by arbitrary many S on the left and T on the right.
  • Equivalent to seq< star< S >, R, star< T > >.
pad_opt< R, P >
  • Matches an optional R that can be padded by arbitrary many P or just arbitrary many P.
  • Equivalent to seq< star< P >, opt< R, star< P > > >.
rematch< R, S... >
  • Succeeds if R matches, and each S matches the input that R matched.
  • Ignores all S for the grammar analysis.
rep< Num, R... >
  • Matches seq< R... > for Num times without checking for further matches.
  • Equivalent to seq< seq< R... >, ..., seq< R... > > where seq< R... > is repeated Num times.
rep_max< Max, R... >
  • Matches seq< R... > for at most Max times and verifies that it doesn't match more often.
  • Equivalent to rep_min_max< 0, Max, R... >.
rep_min< Min, R... >
  • Matches seq< R... > as often as possible and succeeds if it matches at least Min times.
  • Equivalent to seq< rep< Min, R... >, star< R... > >.
  • R must be a non-empty rule pack.
rep_min_max< Min, Max, R... >
  • Matches seq< R... > for Min to Max times and verifies that it doesn't match more often.
  • Equivalent to seq< rep< Min, R... >, rep_opt< Max - Min, R... >, not_at< R... > >.
rep_opt< Num, R... >
  • Matches seq< R... > for zero to Num times without check for further matches.
  • Equivalent to rep< Num, opt< R... > >.
star_must< R, S... >
  • Equivalent to star< if_must< R, S... > >.
try_catch< R... >
  • Equivalent to seq< R... >, but:
  • Converts global failure (exception) into local failure (return value false).
  • Catches exceptions of type tao::pegtl::parse_error.
try_catch_type< E, R... >
  • Equivalent to seq< R... >, but:
  • Converts global failure (exception) into local failure (return value false).
  • Catches exceptions of type E.
until< R >
  • Consumes all input until R matches.
  • Equivalent to until< R, any >.
until< R, S... >
  • Matches seq< S... > as long as at< R > does not match and succeeds when R matches.
  • Equivalent to seq< star< not_at< R >, not_at< eof >, S... >, R >.
  • Does not apply if S is an empty rule pack, see the previous entry for the semantics of until< R >.

Action Rules

These rules are in namespace tao::pegtl.

These rules replicate the intrusive way actions were called from within the grammar in the PEGTL 0.x with the apply<> and if_apply<> rules. The actions for these rules are classes (rather than class templates as required for parse() and the action<>-rule). These rules respect the current apply_mode, but don't use the control-class to invoke the actions.

apply< A... >
  • Calls A::apply() for all A, in order, with an empty input and all states as arguments.
  • If any A::apply() has a boolean return type and returns false, no further A::apply() calls are made and the result is equivalent to failure, otherwise:
  • Equivalent to success wrt. parsing.
apply0< A... >
  • Calls A::apply0() for all A, in order, with all states as arguments.
  • If any A::apply0() has a boolean return type and returns false, no further A::apply0() calls are made and the result is equivalent to failure, otherwise:
  • Equivalent to success wrt. parsing.
if_apply< R, A... >
  • Equivalent to seq< R, apply< A... > > wrt. parsing, but also:
  • If R matches, calls A::apply(), for all A, in order, with the input matched by R and all states as arguments.
  • If any A::apply() has a boolean return type and returns false, no further A::apply() calls are made.

Atomic Rules

These rules are in namespace tao::pegtl.

Atomic rules do not rely on other rules.

bof
  • Succeeds at "beginning-of-file", i.e. when the input's byte() member function returns zero.
  • Does not consume input.
  • Does not work with inputs that don't have a byte() member function.
bol
  • Succeeds at "beginning-of-line", i.e. when the input's byte_in_line() member function returns zero.
  • Does not consume input.
  • Does not work with inputs that don't have a byte_in_line() member function.
bytes< Num >
  • Succeeds when the input contains at least Num further bytes.
  • Consumes these Num bytes from the input.
eof
  • Succeeds at "end-of-file", i.e. when the input is empty or all input has been consumed.
  • Does not consume input.
failure
  • Dummy rule that never succeeds.
  • Does not consume input.
raise< T >
  • Generates a global failure.
  • Calls the control-class' Control< T >::raise() static member function.
  • T can be a rule, but it does not have to be a rule.
  • Does not consume input.
success
  • Dummy rule that always succeeds.
  • Does not consume input.

ASCII Rules

These rules are in the inline namespace tao::pegtl::ascii.

The ASCII rules operate on single bytes, without restricting the range of values to 7 bits. They are compatible with input with the 8th bit set in the sense that nothing breaks in their presence. Rules like ascii::any or ascii::not_one< 'a' > will match all possible byte values, and all possible byte values excluding 'a', respectively. However the character class rules like ascii::alpha only match the corresponding ASCII characters.

(It is possible to match UTF-8 multi-byte characters with the ASCII rules, for example the Euro sign code point U+20AC, which is encoded by the UTF-8 sequence E2 82 AC, can be matched by either tao::pegtl::ascii::string< 0xe2, 0x82, 0xac > or tao::pegtl::utf8::one< 0x20ac >.)

alnum
  • Matches and consumes a single ASCII alphabetic or numeric character.
  • Equivalent to ranges< 'a', 'z', 'A', 'Z', '0', '9' >.
alpha
  • Matches and consumes a single ASCII alphabetic character.
  • Equivalent to ranges< 'a', 'z', 'A', 'Z' >.
any
  • Matches and consumes any single byte, including all ASCII characters.
  • Equivalent to bytes< 1 >.
blank
  • Matches and consumes a single ASCII horizontal space or horizontal tabulator character.
  • Equivalent to one< ' ', '\t' >.
digit
  • Matches and consumes a single ASCII decimal digit character.
  • Equivalent to range< '0', '9' >.
ellipsis
  • Matches and consumes three dots.
  • Equivalent to three< '.' >.
eol
  • Depends on the Eol template parameter of the input, by default:
  • Matches and consumes a Unix or MS-DOS line ending, that is:
  • Equivalent to sor< one< '\n' >, string< '\r', '\n' > >.
eolf
  • Equivalent to sor< eof, eol >.
forty_two< C... >
  • Equivalent to rep< 42, one< C... > >.
identifier_first
  • Matches and consumes a single ASCII character permissible as first character of a C identifier.
  • Equivalent to ranges< 'a', 'z', 'A', 'Z', '_' >.
identifier_other
  • Matches and consumes a single ASCII character permissible as subsequent character of a C identifier.
  • Equivalent to ranges< 'a', 'z', 'A', 'Z', '0', '9', '_' >.
identifier
  • Matches and consumes an ASCII identifier as defined for the C programming language.
  • Equivalent to seq< identifier_first, star< identifier_other > >.
istring< C... >
  • Matches and consumes the given ASCII string C... with case insensitive matching.
  • Similar to string< C... >, but:
  • For ASCII letters a-z and A-Z the match is case insensitive.
keyword< C... >
  • Matches and consumes a non-empty string not followed by a subsequent identifier character.
  • Equivalent to seq< string< C... >, not_at< identifier_other > >.
lower
  • Matches and consumes a single ASCII lower-case alphabetic character.
  • Equivalent to range< 'a', 'z' >.
not_one< C... >
  • Succeeds when the input is not empty, and:
  • C is an empty character pack or the next input byte is not one of C....
  • Consumes one byte when it succeeds.
not_range< C, D >
  • Succeeds when the input is not empty, and:
  • The next input byte is not in the closed range C ... D.
  • Consumes one byte when it succeeds.
nul
  • Matches and consumes an ASCII nul character.
  • Equivalent to one< 0 >.
one< C... >
  • Succeeds when the input is not empty, and:
  • The next input byte is one of C....
  • Consumes one byte when it succeeds.
  • Fails if C is an empty character pack.
print
  • Matches and consumes any single ASCII character traditionally defined as printable.
  • Equivalent to range< 32, 126 >.
range< C, D >
  • Succeeds when the input is not empty, and:
  • The next input byte is in the closed range C ... D.
  • Consumes one byte when it succeeds.
ranges< C1, D1, C2, D2, ... >
  • Equivalent to sor< range< C1, D1 >, range< C2, D2 >, ... >.
ranges< C1, D1, C2, D2, ..., E >
  • Equivalent to sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >.
seven
  • Matches and consumes any single true ASCII character that fits into 7 bits.
  • Equivalent to range< 0, 127 >.
shebang
  • Equivalent to seq< string< '#', '!' >, until< eolf > >.
space
  • Matches and consumes a single space, line-feed, carriage-return, horizontal-tab, vertical-tab or form-feed.
  • Equivalent to one< ' ', '\n', '\r', '\t', '\v', '\f' >.
string< C... >
  • Matches and consumes a string, a sequence of bytes or single-byte characters.
  • Equivalent to seq< one< C >... >.
TAO_PEGTL_ISTRING( "..." )
  • Macro where TAO_PEGTL_ISTRING( "foo" ) yields istring< 'f', 'o', 'o' >.
  • The argument must be a string literal.
  • Works for strings up to 512 bytes of length (excluding trailing '\0').
  • Strings may contain embedded '\0'.
TAO_PEGTL_KEYWORD( "..." )
  • Macro where TAO_PEGTL_KEYWORD( "foo" ) yields keyword< 'f', 'o', 'o' >.
  • The argument must be a string literal.
  • Works for keywords up to 512 bytes of length (excluding trailing '\0').
  • Strings may contain embedded '\0'.
TAO_PEGTL_STRING( "..." )
  • Macro where TAO_PEGTL_STRING( "foo" ) yields string< 'f', 'o', 'o' >.
  • The argument must be a string literal.
  • Works for strings up to 512 bytes of length (excluding trailing '\0').
  • Strings may contain embedded '\0'.
three< C >
  • Succeeds when the input contains at least three bytes, and:
  • These three input bytes are all C.
  • Consumes three bytes when it succeeds.
two< C >
  • Succeeds when the input contains at least two bytes, and:
  • These two input bytes are both C.
  • Consumes two bytes when it succeeds.
upper
  • Matches and consumes a single ASCII upper-case alphabetic character.
  • Equivalent to range< 'A', 'Z' >.
xdigit
  • Matches and consumes a single ASCII hexadecimal digit character.
  • Equivalent to ranges< '0', '9', 'a', 'f', 'A', 'F' >.

Unicode Rules

These rules are available in multiple versions,

  • in namespace tao::pegtl::utf8 for UTF-8 encoded inputs,
  • in namespace tao::pegtl::utf16_be for big-endian UTF-16 encoded inputs,
  • in namespace tao::pegtl::utf16_le for little-endian UTF-16 encoded inputs,
  • in namespace tao::pegtl::utf32_be for big-endian UTF-32 encoded inputs,
  • in namespace tao::pegtl::utf32_le for little-endian UTF-32 encoded inputs.

For convenience, they also appear in multiple namespace aliases,

  • namespace alias tao::pegtl::utf16 for native-endian UTF-16 encoded inputs,
  • namespace alias tao::pegtl::utf32 for native-endian UTF-32 encoded inputs.

The following limitations apply to the UTF-16 and UTF-32 rules:

  • Unaligned input leads to unaligned memory access.
  • The line and column numbers are not counted correctly.

Unaligned memory is no problem on x86 compatible processors; on some other architectures like ARM an unaligned access will crash the application.

In the following descriptions a Unicode code point is considered valid when it is in the range 0 to 0x10ffff. The parameter N stands for the size of the encoding of the next Unicode code point in the input, i.e.

  • for UTF-8 the rules are multi-byte-sequence-aware and N is either 1, 2, 3 or 4,
  • for UTF-16 the rules are surrogate-pair-aware and N is either 2 or 4, and
  • for UTF-32 everything is simple and N is always 4.

It is an error when a code unit in the range 0xd800 to 0xdfff is encountered outside of a valid UTF-16 surrogate pair (this changed in version 2.6.0).

any
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid Unicode code point.
  • Consumes the N bytes when it succeeds.
bom
  • Equivalent to one< 0xfeff >.
not_one< C... >
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid Unicode code point, and:
  • C is an empty character pack or the input code point is not one of the given code points C....
  • Consumes the N bytes when it succeeds.
not_range< C, D >
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid Unicode code point, and:
  • The input code point B satisfies B < C || D < B.
  • Consumes the N bytes when it succeeds.
one< C... >
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid Unicode code point, and:
  • C is a non-empty character pack and the input code point is one of the given code points C....
  • Consumes the N bytes when it succeeds.
range< C, D >
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid Unicode code point, and:
  • The input code point B satisfies C <= B && B <= D.
  • Consumes the N bytes when it succeeds.
ranges< C1, D1, C2, D2, ... >
  • Equivalent to sor< range< C1, D1 >, range< C2, D2 >, ... >.
ranges< C1, D1, C2, D2, ..., E >
  • Equivalent to sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >.
string< C... >
  • Equivalent to seq< one< C >... >.

ICU Support

The following rules depend on the International Components for Unicode (ICU) that provide the means to match characters with specific Unicode character properties. Because of the external dependency, the rules are in the contrib-section, and the required header files are not automatically included in tao/pegtl.hpp.

The ICU-based rules are again available in multiple versions,

  • in namespace tao::pegtl::utf8::icu for UTF-8 encoded inputs,
  • in namespace tao::pegtl::utf16_be::icu for big-endian UTF-16 encoded inputs,
  • in namespace tao::pegtl::utf16_le::icu for little-endian UTF-16 encoded inputs,
  • in namespace tao::pegtl::utf32_be::icu for big-endian UTF-32 encoded inputs, and
  • in namespace tao::pegtl::utf32_le::icu for little-endian UTF-32 encoded inputs.

To use these rules it is necessary to provide an include path to the ICU library, to link the application against libicu, and to manually include one or more of the following header files:

  • tao/pegtl/contrib/icu/utf8.hpp
  • tao/pegtl/contrib/icu/utf16.hpp
  • tao/pegtl/contrib/icu/utf32.hpp

The convenience ICU rules are supplied for all properties found in ICU version 3.4. Users of later versions can use the basic rules manually or create their own convenience rules derived from the basic rules for additional enumeration values found in those later versions of the ICU library.

Basic ICU Rules

Each of the above namespaces provides two basic rules for matching binary properties and property value matching for enum properties.

binary_property< P, V >
  • P is a binary property defined by ICU, see UProperty.
  • V is a boolean value.
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid unicode code point, and:
  • The code point's property P, i.e. u_hasBinaryProperty( cp, P ), equals V.
  • Consumes the N bytes when it succeeds.
binary_property< P >
  • Identical to binary_property< P, true >.
property_value< P, V >
  • P is an enumerated property defined by ICU, see UProperty.
  • V is an integer value.
  • Succeeds when the input is not empty, and:
  • The next N bytes encode a valid unicode code point, and:
  • The code point's property P, i.e. u_getIntPropertyValue( cp, P ), equals V.
  • Consumes the N bytes when it succeeds.

ICU Rules for Binary Properties

Convenience wrappers for binary properties.

alphabetic
  • Equivalent to binary_property< UCHAR_ALPHABETIC >.
ascii_hex_digit
  • Equivalent to binary_property< UCHAR_ASCII_HEX_DIGIT >.
bidi_control
  • Equivalent to binary_property< UCHAR_BIDI_CONTROL >.
bidi_mirrored
  • Equivalent to binary_property< UCHAR_BIDI_MIRRORED >.
case_sensitive
  • Equivalent to binary_property< UCHAR_CASE_SENSITIVE >.
dash
  • Equivalent to binary_property< UCHAR_DASH >.
default_ignorable_code_point
  • Equivalent to binary_property< UCHAR_DEFAULT_IGNORABLE_CODE_POINT >.
deprecated
  • Equivalent to binary_property< UCHAR_DEPRECATED >.
diacritic
  • Equivalent to binary_property< UCHAR_DIACRITIC >.
extender
  • Equivalent to binary_property< UCHAR_EXTENDER >.
full_composition_exclusion
  • Equivalent to binary_property< UCHAR_FULL_COMPOSITION_EXCLUSION >.
grapheme_base
  • Equivalent to binary_property< UCHAR_GRAPHEME_BASE >.
grapheme_extend
  • Equivalent to binary_property< UCHAR_GRAPHEME_EXTEND >.
grapheme_link
  • Equivalent to binary_property< UCHAR_GRAPHEME_LINK >.
hex_digit
  • Equivalent to binary_property< UCHAR_HEX_DIGIT >.
hyphen
  • Equivalent to binary_property< UCHAR_HYPHEN >.
id_continue
  • Equivalent to binary_property< UCHAR_ID_CONTINUE >.
id_start
  • Equivalent to binary_property< UCHAR_ID_START >.
ideographic
  • Equivalent to binary_property< UCHAR_IDEOGRAPHIC >.
ids_binary_operator
  • Equivalent to binary_property< UCHAR_IDS_BINARY_OPERATOR >.
ids_trinary_operator
  • Equivalent to binary_property< UCHAR_IDS_TRINARY_OPERATOR >.
join_control
  • Equivalent to binary_property< UCHAR_JOIN_CONTROL >.
logical_order_exception
  • Equivalent to binary_property< UCHAR_LOGICAL_ORDER_EXCEPTION >.
lowercase
  • Equivalent to binary_property< UCHAR_LOWERCASE >.
math
  • Equivalent to binary_property< UCHAR_MATH >.
nfc_inert
  • Equivalent to binary_property< UCHAR_NFC_INERT >.
nfd_inert
  • Equivalent to binary_property< UCHAR_NFD_INERT >.
nfkc_inert
  • Equivalent to binary_property< UCHAR_NFKC_INERT >.
nfkd_inert
  • Equivalent to binary_property< UCHAR_NFKD_INERT >.
noncharacter_code_point
  • Equivalent to binary_property< UCHAR_NONCHARACTER_CODE_POINT >.
pattern_syntax
  • Equivalent to binary_property< UCHAR_PATTERN_SYNTAX >.
pattern_white_space
  • Equivalent to binary_property< UCHAR_PATTERN_WHITE_SPACE >.
posix_alnum
  • Equivalent to binary_property< UCHAR_POSIX_ALNUM >.
posix_blank
  • Equivalent to binary_property< UCHAR_POSIX_BLANK >.
posix_graph
  • Equivalent to binary_property< UCHAR_POSIX_GRAPH >.
posix_print
  • Equivalent to binary_property< UCHAR_POSIX_PRINT >.
posix_xdigit
  • Equivalent to binary_property< UCHAR_POSIX_XDIGIT >.
quotation_mark
  • Equivalent to binary_property< UCHAR_QUOTATION_MARK >.
radical
  • Equivalent to binary_property< UCHAR_RADICAL >.
s_term
  • Equivalent to binary_property< UCHAR_S_TERM >.
segment_starter
  • Equivalent to binary_property< UCHAR_SEGMENT_STARTER >.
soft_dotted
  • Equivalent to binary_property< UCHAR_SOFT_DOTTED >.
terminal_punctuation
  • Equivalent to binary_property< UCHAR_TERMINAL_PUNCTUATION >.
unified_ideograph
  • Equivalent to binary_property< UCHAR_UNIFIED_IDEOGRAPH >.
uppercase
  • Equivalent to binary_property< UCHAR_UPPERCASE >.
variation_selector
  • Equivalent to binary_property< UCHAR_VARIATION_SELECTOR >.
white_space
  • Equivalent to binary_property< UCHAR_WHITE_SPACE >.
xid_continue
  • Equivalent to binary_property< UCHAR_XID_CONTINUE >.
xid_start
  • Equivalent to binary_property< UCHAR_XID_START >.

ICU Rules for Enumerated Properties

Convenience wrappers for enumerated properties.

bidi_class< V >
  • V is of type UCharDirection.
  • Equivalent to property_value< UCHAR_BIDI_CLASS, V >.
block< V >
  • V is of type UBlockCode.
  • Equivalent to property_value< UCHAR_BLOCK, V >.
decomposition_type< V >
  • V is of type UDecompositionType.
  • Equivalent to property_value< UCHAR_DECOMPOSITION_TYPE, V >.
east_asian_width< V >
  • V is of type UEastAsianWidth.
  • Equivalent to property_value< UCHAR_EAST_ASIAN_WIDTH, V >.
general_category< V >
  • V is of type UCharCategory.
  • Equivalent to property_value< UCHAR_GENERAL_CATEGORY, V >.
grapheme_cluster_break< V >
  • V is of type UGraphemeClusterBreak.
  • Equivalent to property_value< UCHAR_GRAPHEME_CLUSTER_BREAK, V >.
hangul_syllable_type< V >
  • V is of type UHangulSyllableType.
  • Equivalent to property_value< UCHAR_HANGUL_SYLLABLE_TYPE, V >.
joining_group< V >
  • V is of type UJoiningGroup.
  • Equivalent to property_value< UCHAR_JOINING_GROUP, V >.
joining_type< V >
  • V is of type UJoiningType.
  • Equivalent to property_value< UCHAR_JOINING_TYPE, V >.
line_break< V >
  • V is of type ULineBreak.
  • Equivalent to property_value< UCHAR_LINE_BREAK, V >.
numeric_type< V >
  • V is of type UNumericType.
  • Equivalent to property_value< UCHAR_NUMERIC_TYPE, V >.
sentence_break< V >
  • V is of type USentenceBreak.
  • Equivalent to property_value< UCHAR_SENTENCE_BREAK, V >.
word_break< V >
  • V is of type UWordBreakValues.
  • Equivalent to property_value< UCHAR_WORD_BREAK, V >.

ICU Rules for Value Properties

Convenience wrappers for enumerated properties that return a value instead of an actual enum.

canonical_combining_class< V >
  • V is of type std::uint8_t.
  • Equivalent to property_value< UCHAR_CANONICAL_COMBINING_CLASS, V >.
lead_canonical_combining_class< V >
  • V is of type std::uint8_t.
  • Equivalent to property_value< UCHAR_LEAD_CANONICAL_COMBINING_CLASS, V >.
trail_canonical_combining_class< V >
  • V is of type std::uint8_t.
  • Equivalent to property_value< UCHAR_TRAIL_CANONICAL_COMBINING_CLASS, V >.

Binary Rules

These rules are available in multiple versions,

  • in namespace tao::pegtl::uint8 for 8-bit integer values,
  • in namespace tao::pegtl::uint16_be for big-endian 16-bit integer values,
  • in namespace tao::pegtl::uint16_le for little-endian 16-bit integer values,
  • in namespace tao::pegtl::uint32_be for big-endian 32-bit integer values,
  • in namespace tao::pegtl::uint32_le for little-endian 32-bit integer values,
  • in namespace tao::pegtl::uint64_be for big-endian 64-bit integer values, and
  • in namespace tao::pegtl::uint64_le for little-endian 64-bit integer values.

These rules read one or more bytes from the input to form (and match) an 8, 16, 32 or 64-bit value, respectively, and template parameters are given as matching std::uint8_t, std::uint16_t, std::uint32_t or std::uin64_t.

In the following descriptions the parameter N is the size of a single value in bytes, i.e. either 1, 2, 4 or 8. The term input value indicates a correspondingly sized integer value read from successive bytes of the input.

any
  • Succeeds when the input contains at least N bytes.
  • Consumes N bytes when it succeeds.
mask_not_one< M, C... >
  • Succeeds when the input contains at least N bytes, and:
  • C is an empty character pack or the (endian adjusted) input value masked with M is not one of the given values C....
  • Consumes N bytes when it succeeds.
mask_not_range< M, C, D >
  • Succeeds when the input contains at least N bytes, and:
  • The (endian adjusted) input value B satisfies ( B & M ) < C || D < ( B & M ).
  • Consumes N bytes when it succeeds.
mask_one< M, C... >
  • Succeeds when the input contains at least N bytes, and:
  • C is a non-empty character pack and the (endian adjusted) input value masked with M is one of the given values C....
  • Consumes N bytes when it succeeds.
mask_range< M, C, D >
  • Succeeds when the input contains at least N bytes, and:
  • The (endian adjusted) input value B satisfies C <= ( B & M ) && ( B & M ) <= D.
  • Consumes N bytes when it succeeds.
mask_ranges< M, C1, D1, C2, D2, ... >
  • Equivalent to sor< mask_range< M, C1, D1 >, mask_range< M, C2, D2 >, ... >.
mask_ranges< M, C1, D1, C2, D2, ..., E >
  • Equivalent to sor< mask_range< M, C1, D1 >, mask_range< M, C2, D2 >, ..., mask_one< M, E > >.
mask_string< M, C... >
  • Equivalent to seq< mask_one< M, C >... >.
not_one< C... >
  • Succeeds when the input contains at least N bytes, and:
  • C is an empty character pack or the (endian adjusted) input value is not one of the given values C....
  • Consumes N bytes when it succeeds.
not_range< C, D >
  • Succeeds when the input contains at least N bytes, and:
  • The (endian adjusted) input value B satisfies B < C || D < B.
  • Consumes N bytes when it succeeds.
one< C... >
  • Succeeds when the input contains at least N bytes, and:
  • C is a non-empty character pack and the (endian adjusted) input value is one of the given values C....
  • Consumes N bytes when it succeeds.
range< C, D >
  • Succeeds when the input contains at least N bytes, and:
  • The (endian adjusted) input value B satisfies C <= B && B <= D.
  • Consumes N byte when it succeeds.
ranges< C1, D1, C2, D2, ... >
  • Equivalent to sor< range< C1, D1 >, range< C2, D2 >, ... >.
ranges< C1, D1, C2, D2, ..., E >
  • Equivalent to sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >.
string< C... >
  • Equivalent to seq< one< C >... >.

Full Index

Copyright (c) 2014-2020 Dr. Colin Hirsch and Daniel Frey