diff --git a/README.md b/README.md index deeef46..4bf65f5 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ Features * Predictive Recursive Descent Parsers * Error Message Support * Regular Expression Support -* Abstract Syntax Tree Support +* Grammar Support * Easy to Integrate (One Source File in ANSI C) @@ -28,12 +28,12 @@ Alternatives The current main alternative C based parser combinator is a branch of [Cesium3](https://github.com/wbhart/Cesium3/tree/combinators). -The main advantages of _mpc_ over this are: +This project has several downsides which _mpc_ overcomes: -* Works for Generic Types -* Doesn't rely on Boehm-Demers-Weiser Garbage Collection -* Doesn't use `setjmp` and `longjmp` for errors -* Doesn't pollute namespace +* _mpc_ Works for Generic Types +* _mpc_ Doesn't rely on Boehm-Demers-Weiser Garbage Collection +* _mpc_ Doesn't use `setjmp` and `longjmp` for errors +* _mpc_ Doesn't pollute the namespace View From the Top @@ -76,31 +76,33 @@ mpc_ast_t* parse_maths(const char* input) { If you were to input something like `"(4 * 2 * 11 + 2) - 5"` into this function the `mpc_ast_t` you get out would look something like this: ```c -root: - value: - char: '(' - expression: - product: - value: '4' - char: '*' - value: '2' - char: '*' - value: '11' - char: '+' - value: '2' - char: ')' - char: '-' - value: '5' +>: + value|>: + char: '(' + expression|>: + product|>: + value|regex: '4' + char: '*' + value|regex: '2' + char: '*' + value|regex: '11' + char: '+' + product|value|regex: '2' + char: ')' + char: '+' + product|value|regex: '5' ``` View From the Bottom -------------------- -Parser Combinators are structures that encode how to parse a particular language. They can be combined using intuitive operators to create new parsers of increasing complexity. With these - detailed grammars and languages can be parsed and processed easily. +Parser Combinators are structures that encode how to parse a particular language. They can be combined using intuitive operators to create new parsers of increasing complexity. Using these operators detailed grammars and languages can be parsed and processed in a quick, efficient, and easy way. The trick behind Parser Combinators is the observation that by structuring the library in a particular way, one can make building parser combinators look like writing a grammar itself. Therefore instead of describing _how to parse a language_, a user must only specify _the language itself_, and the computer will work out how to parse it ... as if by magic! +As is shown in the above example _mpc_ takes this one step further, and actually allows you to specify the grammar directly, or to built up parsers using library functions. + Parsers ------- @@ -122,7 +124,7 @@ typedef union { } mpc_result_t; ``` -where `mpc_val_t` is synonymous with `void*` and simply represents some pointer to data - the exact type of which is dependant on the parser. Some variations on the above also exist. +where `mpc_val_t` is synonymous with `void*` and simply represents some pointer to data - the exact type of which is dependant on the parser. Some variations on the above also exist. For almost all of the built-in and basic parsers the return type for a successful parser will be `char*`. * * * @@ -146,7 +148,7 @@ Basic Parsers ### String Parsers -All the following functions return basic parsers. All of those parsers return strings with the character(s) they manage to match or an error on failure. They have the following functionality. +All the following functions return basic parsers. All of those parsers return a newly allocated `char*` with the character(s) they manage to match or an error on failure. They have the following functionality. * * * @@ -154,7 +156,7 @@ All the following functions return basic parsers. All of those parsers return st mpc_parser_t* mpc_any(void); ``` -Matches any single character +Matches any individual character * * * @@ -227,7 +229,15 @@ Consumes no input, always fails with message `m`. * * * ```c -mpc_parser_t* mpc_lift(mpc_lift_t f); +mpc_parser_t* mpc_failf(const char* fmt, ...); +``` + +Consumes no input, always fails with formatted message given by `fmt` and following parameters. + +* * * + +```c +mpc_parser_t* mpc_lift(mpc_ctor_t f); ``` Consumes no input, always successful, returns the result of function `f` @@ -244,7 +254,7 @@ Consumes no input, always successful, returns `x` Combinators ----------- -Combinators are functions that take one or more parsers and return a new parser. These combinators work independent of exactly what data type those input parsers return on success. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type of data is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types. +Combinators are functions that take one or more parsers and return a new parser of some given functionality. These combinators work independent of exactly what data type those input parsers return on success. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type of data is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types. A second annoyance in C is that of manual memory management. Some parsers might get half-way and then fail. This means they need to clean up any partial data that has been collected in the parse. In Haskell this is handled by the Garbage Collector, but in C these combinators will need to take _destructor_ functions as input, which say how clean up any partial data that has been collected. @@ -269,19 +279,9 @@ Returns a parser that applies function `f` (optionality taking extra input `x`) * * * -```c -mpc_parser_t* mpc_predictive(mpc_parser_t* a); -``` - -Returns a parser that runs `a` with backtracking disabled. This means if `a` consumes any input, it will not be reverted, even on failure. Turning backtracking off has good performance benefits for grammars which are `LL(1)`. These are grammars where the first character completely determines the parse result - such as the decision of parsing either a C identifier, number, or string literal. This option should not be used for non `LL(1)` grammars or it will produce incorrect results or crash the parser. - -Another way to think of `mpc_predictive` is that it can be applied to a parser (for a performance improvement) if either successfully parsing the first character will result in a completely successful parse, or all of the referenced sub-parsers are also `LL(1)`. - -* * * - ```c mpc_parser_t* mpc_not(mpc_parser_t* a, mpc_dtor_t da); -mpc_parser_t* mpc_not_else(mpc_parser_t* a, mpc_dtor_t da, mpc_lift_t lf); +mpc_parser_t* mpc_not_lift(mpc_parser_t* a, mpc_dtor_t da, mpc_ctor_t lf); ``` Returns a parser with the following behaviour. If parser `a` succeeds, then it fails and consumes no input. If parser `a` fails, then it succeeds, consumes no input and returns `NULL` (or the result of lift function `lf`). Destructor `da` is used to destroy the result of `a` on success. @@ -290,24 +290,23 @@ Returns a parser with the following behaviour. If parser `a` succeeds, then it f ```c mpc_parser_t* mpc_maybe(mpc_parser_t* a); -mpc_parser_t* mpc_maybe_else(mpc_parser_t* a, mpc_lift_t lf); +mpc_parser_t* mpc_maybe_lift(mpc_parser_t* a, mpc_ctor_t lf); ``` -Returns a parser that runs `a`. If this fails then it still succeeds, but returns `NULL` (or the result of `lf`). +Returns a parser that runs `a`. If `a` is successful then it returns the result of `a`. If `a` is unsuccessful then it succeeds, but returns `NULL` (or the result of `lf`). * * * ```c -mpc_parser_t* mpc_many(mpc_parser_t* a, mpc_fold_t f); -mpc_parser_t* mpc_many_else(mpc_parser_t* a, mpc_fold_t f, mpc_lift_t lf); +mpc_parser_t* mpc_many(mpc_fold_t f, mpc_parser_t* a); ``` -Attempts to run `a` zero or more times. If zero runs are made it succeeds and returns `NULL` (or the result of `lf`). If at least one run is made, results of `a` are combined using fold function `f` and returned. See the _Function Types_ section for more details. +Keeps running `a` until it fails. Results are combined using fold function `f`. See the _Function Types_ section for more details. * * * ```c -mpc_parser_t* mpc_many1(mpc_parser_t* a, mpc_fold_t f); +mpc_parser_t* mpc_many1(mpc_fold_t f, mpc_parser_t* a); ``` Attempts to run `a` one or more times. Results are combined with fold function `f`. @@ -315,50 +314,43 @@ Attempts to run `a` one or more times. Results are combined with fold function ` * * * ```c -mpc_parser_t* mpc_count(mpc_parser_t* a, mpc_dtor_t da, mpc_fold_t f, int n); -mpc_parser_t* mpc_count_else(mpc_parser_t* a, mpc_dtor_t da, mpc_fold_t f, int n, mpc_lift_t lf); +mpc_parser_t* mpc_count(int n, mpc_fold_t f, mpc_parser_t* a, mpc_dtor_t da); ``` -Attempts to run `a` exactly `n` times. If this fails, any partial results are destructed with `da`, and it returns `NULL` (or the result of lift function `lf`). If it is successful, result of `a` are combined using fold function `f`. +Attempts to run `a` exactly `n` times. If this fails, any partial results are destructed with `da`. If successful results of `a` are combined using fold function `f`. * * * ```c -mpc_parser_t* mpc_else(mpc_parser_t* a, mpc_parser_t* b); +mpc_parser_t* mpc_or(int n, ...); ``` -Attempts to run `a`. On success returns the result `a`. On failure attempts to run `b`. If `b` also fails then returns an error. Otherwise it returns the result of `b`. +Attempts to run `n` parsers in sequence, returning the first one that succeeds. If all fail, returns an error. * * * ```c -mpc_parser_t* mpc_also(mpc_parser_t* a, mpc_parser_t* b, mpc_dtor_t da, mpc_fold_t f); -mpc_parser_t* mpc_bind(mpc_parser_t* a, mpc_parser_t* b, mpc_dtor_t da, mpc_fold_t f); +mpc_parser_t* mpc_and(int n, mpc_fold_t f, ...); ``` -Attempts to run `a`. Then attempts to run `b`. If `b` fails it destructs the result of `a` using `da`. If both succeed it returns the result of `a` and `b` combined using the fold function `f`. Otherwise it returns an error. +Attempts to run `n` parsers in sequence, returning the fold of the results using fold function `f`. First parsers must be specified, followed by destructors for each parser, excluding the final parser. These are used in case of partial success. For example: `mpc_and(3, mpcf_strfold, mpc_char('a'), mpc_char('b'), mpc_char('c'), free, free);` would attempt to match `'a'` followed by `'b'` followed by `'c'`, and if successful would concatenate them using `mpcf_strfold`. Otherwise would use `free` on the partial results. * * * ```c -mpc_parser_t* mpc_or(int n, ...); +mpc_parser_t* mpc_predictive(mpc_parser_t* a); ``` -Attempts to run `n` parsers in sequence, returning the first one that succeeds. If all fail, returns an error. - -* * * +Returns a parser that runs `a` with backtracking disabled. This means if `a` consumes any input, it will not be reverted, even on failure. Turning backtracking off has good performance benefits for grammars which are `LL(1)`. These are grammars where the first character completely determines the parse result - such as the decision of parsing either a C identifier, number, or string literal. This option should not be used for non `LL(1)` grammars or it will produce incorrect results or crash the parser. -```c -mpc_parser_t* mpc_and(int n, mpc_afold_t f, ...); -``` +Another way to think of `mpc_predictive` is that it can be applied to a parser (for a performance improvement) if either successfully parsing the first character will result in a completely successful parse, or all of the referenced sub-parsers are also `LL(1)`. -Attempts to run `n` parsers in sequence, returning the fold of the results using fold function `f`. First parsers must be specified, followed by destructors for each parser, minus the final one. These are used in case of partial success. For example: `mpc_and(3, mpcf_astrfold, mpc_char('a'), mpc_char('b'), mpc_char('c'), free, free);` would attempt to match `'a'` followed by `'b'` followed by `'c'`, and if successful would concatenate them using `mpcf_astrfold`. Otherwise would use `free` on the partial results. Function Types -------------- -The combinator functions take a number of special function types as function pointers. Here is a short explanation of those types are how they are expected to behave. It is important that these behave correctly otherwise it is exceedingly easy to introduce memory leaks into the system. +The combinator functions take a number of special function types as function pointers. Here is a short explanation of those types are how they are expected to behave. It is important that these behave correctly otherwise it is exceedingly easy to introduce memory leaks or crashes into the system. * * * @@ -371,55 +363,46 @@ Given some pointer to a data value it will ensure the memory it points to is fre * * * ```c -typedef mpc_val_t*(*mpc_apply_t)(mpc_val_t*); -typedef mpc_val_t*(*mpc_apply_to_t)(mpc_val_t*,void*); +typedef mpc_val_t*(*mpc_ctor_t)(void); ``` -This takes in some pointer to data and outputs some new or modified pointer to data, ensuring to free and old data no longer used. The `apply_to` variation takes in an extra pointer to some data such as state of the system. +Returns some data value when called. It can be used to create _empty_ versions of data types when certain combinators have no known default value to return. For example it may be used to return a newly allocated empty string. * * * ```c -typedef mpc_val_t*(*mpc_fold_t)(mpc_val_t*,mpc_val_t*); +typedef mpc_val_t*(*mpc_apply_t)(mpc_val_t*); +typedef mpc_val_t*(*mpc_apply_to_t)(mpc_val_t*,void*); ``` -This takes two pointers to data and must output some new combined pointer to data, ensuring to free and old data no longer used. When used with the `many`, `many1` and `count` functions this initially takes in `NULL` for it's first argument and following that takes in for it's first argument whatever was previously returned by the function itself. In this way users have a chance to build some initial data structure before populating it with whatever is passed as the second argument. +This takes in some pointer to data and outputs some new or modified pointer to data, ensuring to free and old data no longer used. The `apply_to` variation takes in an extra pointer to some data such as state of the system. * * * ```c -typedef mpc_val_t*(*mpc_afold_t)(int,mpc_val_t**); +typedef mpc_val_t*(*mpc_fold_t)(int,mpc_val_t**); ``` -Similar to the above, but it is passed in a list of pointers to data values which must all be folded together and output as a new single data value. Any old data no longer used must be freed. - -* * * - -```c -typedef mpc_val_t*(*mpc_lift_t)(void); -``` +This takes a list of pointers to data values and must return some combined or folded version of these data values. It must ensure to free and old data that is no longer used once after combination has taken place. This will ensure no memory is leaked. -This function returns some data value when called. It can be used to create _empty_ versions of data types when certain combinators have no known default value to return. For example it may be used to return a newly allocated empty string rather than `NULL`. First Example ------------- Using the above we can create a parser that matches a C identifier with relative ease. -First we build a fold function that will concatenate two strings together - freeing any data we no longer need. +First we build a fold function that will concatenate two strings together - freeing any data we no longer needed. For this sake of this tutorial we will write it by hand, but this (as well as many other useful fold functions) is actually included in _mpc_ as `mpcf_strfold`. ```c -mpc_val_t* parse_fold_string(mpc_val_t* x, mpc_val_t* y) { - - if (x == NULL) { return y; } - if (y == NULL) { return x; } - - char* x = realloc(x, strlen(x) + strlen(y) + 1); - strcat(x, y); - - free(y); +mpc_val_t* strfold(mpc_val_t* x, mpc_val_t* y) { + char* x = calloc(1, 1); + int i; + for (i = 0; i < n; i++) { + x = realloc(x, strlen(x) + strlen(xs[i]) + 1); + strcat(x, xs[i]); + free(xs[i]); + } return x; - } ``` @@ -428,16 +411,16 @@ Then we can actually specify the grammar using combinators to say how the basic ```c char* parse_ident(char* input) { - mpc_parser_t* alpha = mpc_else(mpc_range('a', 'z'), mpc_range('A', 'Z')); + mpc_parser_t* alpha = mpc_or(2, mpc_range('a', 'z'), mpc_range('A', 'Z')); mpc_parser_t* digit = mpc_range('0', '9'); mpc_parser_t* underscore = mpc_char('_'); - mpc_parser_t* ident0 = mpc_else(alpha, underscore); - mpc_parser_t* ident1 = mpc_many(mpc_or(3, alpha, digit, underscore), parse_fold_string); - mpc_parser_t* ident = mpc_also(ident0, ident1, free, parse_fold_string); + mpc_parser_t* ident0 = mpc_or(2, alpha, underscore); + mpc_parser_t* ident1 = mpc_many(strfold, mpc_or(3, alpha, digit, underscore)); + mpc_parser_t* ident = mpc_and(2, strfold, ident0, ident1, free); mpc_result_t r; - if (!mpc_parse("parse_ident", input, ident, &r)) { + if (!mpc_parse("", input, ident, &r)) { mpc_err_print(r.error); mpc_err_delete(r.error); exit(EXIT_FAILURE); @@ -449,17 +432,13 @@ char* parse_ident(char* input) { } ``` -Note that only `ident` must be deleted. This is because in referencing other parsers in how it is built it ensure they will be destructed along with it. +Note that only `ident` must be deleted. When we input a parser into a combinator we should consider it to be part of that combinator now. This means we shouldn't create a parser and input it into multiple places of it will be doubly feed. Self Reference -------------- -Building parsers in the above way can have issues with self-reference or cyclic-reference. - -To overcome this we can separate the construction of parsers into two different steps. Construction and Definition. - -Note that _mpc_ does not detect [left-recursive grammars](http://en.wikipedia.org/wiki/Left_recursion). These will go into an infinite loop when they attempt to parse input, and so should specified instead in right-recursive form. +Building parsers in the above way can have issues with self-reference or cyclic-reference. To overcome this we can separate the construction of parsers into two different steps. Construction and Definition. * * * @@ -467,7 +446,7 @@ Note that _mpc_ does not detect [left-recursive grammars](http://en.wikipedia.or mpc_parser_t* mpc_new(const char* name); ``` -This will construct a parser called `name` which can then be used by others, including itself. Any parser created using `mpc_new` is said to be _retained_. This means it will behave slightly differently to a normal parser. For example when deleting a parser that includes a _retained_ parser, the _retained_ parser it will not be deleted along with it. To delete a retained parser `mpc_delete` must be used on it directly. +This will construct a parser called `name` which can then be used by others, including itself, without ownership being transfered. Any parser created using `mpc_new` is said to be _retained_. This means it will behave differently to a normal parser when referenced. When deleting a parser that includes a _retained_ parser, the _retained_ parser it will not be deleted along with it. To delete a retained parser `mpc_delete` must be used on it directly. A _retained_ parser can then be defined using... @@ -477,7 +456,7 @@ A _retained_ parser can then be defined using... mpc_parser_t* mpc_define(mpc_parser_t* p, mpc_parser_t* a); ``` -This assigns the contents of parser `a` to `p`, and frees and memory used by `a`. With this technique parsers can now reference each other, as well as themselves, without trouble. +This assigns the contents of parser `a` to `p`, and deletes `a`. With this technique parsers can now reference each other, as well as themselves, without trouble. * * * @@ -485,7 +464,7 @@ This assigns the contents of parser `a` to `p`, and frees and memory used by `a` mpc_parser_t* mpc_undefine(mpc_parser_t* p); ``` -A final step is required. Parsers that reference each other must all be undefined before they are deleted. It is important to do any undefining before deletion. The reason for this is that to delete a parser it must look at each sub-parser that is used by it. If any of these have already been deleted a segfault is unavoidable. +A final step is required. Parsers that reference each other must all be undefined before they are deleted. It is important to do any undefining before deletion. The reason for this is that to delete a parser it must look at each sub-parser that is used by it. If any of these have already been deleted a segfault is unavoidable - even if they were retained beforehand. * * * @@ -495,16 +474,18 @@ void mpc_cleanup(int n, ...); To ease the task of undefining and then deleting parsers `mpc_cleanup` can be used. It takes `n` parsers as input, and undefines them all, before deleting them all. +Note: _mpc_ may have separate stages for construction and definition, but it does not detect [left-recursive grammars](http://en.wikipedia.org/wiki/Left_recursion). These will go into an infinite loop when they attempt to parse input, and so should specified instead in right-recursive form instead. + Common Parsers --------------- A number of common parsers are included. -* `mpc_parser_t* mpc_eoi(void);` Matches only the end of input, returns `NULL` * `mpc_parser_t* mpc_soi(void);` Matches only the start of input, returns `NULL` -* `mpc_parser_t* mpc_space(void);` Matches some whitespace character (" \f\n\r\t\v") +* `mpc_parser_t* mpc_eoi(void);` Matches only the end of input, returns `NULL` +* `mpc_parser_t* mpc_space(void);` Matches any whitespace character (" \f\n\r\t\v") * `mpc_parser_t* mpc_spaces(void);` Matches zero or more whitespace characters -* `mpc_parser_t* mpc_whitespace(void);` Matches zero or more whitespace characters and frees the result +* `mpc_parser_t* mpc_whitespace(void);` Matches spaces and frees the result, returns `NULL` * `mpc_parser_t* mpc_newline(void);` Matches `'\n'` * `mpc_parser_t* mpc_tab(void);` Matches `'\t'` * `mpc_parser_t* mpc_escape(void);` Matches a backslash followed by any character @@ -519,24 +500,24 @@ A number of common parsers are included. * `mpc_parser_t* mpc_alpha(void);` Matches and alphabet character * `mpc_parser_t* mpc_underscore(void);` Matches `'_'` * `mpc_parser_t* mpc_alphanum(void);` Matches any alphabet character, underscore or digit -* `mpc_parser_t* mpc_int(void);` Matches digits and converts to an `int*` -* `mpc_parser_t* mpc_hex(void);` Matches hexdigits and converts to an `int*` -* `mpc_parser_t* mpc_oct(void);` Matches octdigits and converts to an `int*` +* `mpc_parser_t* mpc_int(void);` Matches digits and returns an `int*` +* `mpc_parser_t* mpc_hex(void);` Matches hexdigits and returns an `int*` +* `mpc_parser_t* mpc_oct(void);` Matches octdigits and returns an `int*` * `mpc_parser_t* mpc_number(void);` Matches `mpc_int`, `mpc_hex` or `mpc_oct` * `mpc_parser_t* mpc_real(void);` Matches some floating point number as a string -* `mpc_parser_t* mpc_float(void);` Matches some floating point number and converts to `float*` +* `mpc_parser_t* mpc_float(void);` Matches some floating point number and returns a `float*` * `mpc_parser_t* mpc_char_lit(void);` Matches some character literal surrounded by `'` * `mpc_parser_t* mpc_string_lit(void);` Matches some string literal surrounded by `"` * `mpc_parser_t* mpc_regex_lit(void);` Matches some regex literal surrounded by `/` -* `mpc_parser_t* mpc_ident(void);` Matches a C identifier +* `mpc_parser_t* mpc_ident(void);` Matches a C style identifier Useful Parsers -------------- -* `mpc_parser_t* mpc_start(mpc_parser_t* a);` Matches the start of input an `a` +* `mpc_parser_t* mpc_start(mpc_parser_t* a);` Matches the start of input followed by `a` * `mpc_parser_t* mpc_end(mpc_parser_t* a, mpc_dtor_t da);` Matches `a` followed by the end of input -* `mpc_parser_t* mpc_enclose(mpc_parser_t* a, mpc_dtor_t da);` Matches the start of input, `a` and then the end of input +* `mpc_parser_t* mpc_enclose(mpc_parser_t* a, mpc_dtor_t da);` Matches the start of input, `a`, and the end of input * `mpc_parser_t* mpc_strip(mpc_parser_t* a);` Matches `a` striping any surrounding whitespace * `mpc_parser_t* mpc_tok(mpc_parser_t* a);` Matches `a` and strips any trailing whitespace * `mpc_parser_t* mpc_sym(const char* s);` Matches string `s` and strips any trailing whitespace @@ -559,8 +540,8 @@ Fold Functions A number of common fold functions a user might want are included. They reside under the `mpcf_*` namespace. * `void mpcf_dtor_null(mpc_val_t* x);` Empty destructor. Does nothing -* `mpc_val_t* mpcf_lift_null(void);` Returns `NULL` -* `mpc_val_t* mpcf_lift_emptystr(void);` Returns newly allocated empty string +* `mpc_val_t* mpcf_ctor_null(void);` Returns `NULL` +* `mpc_val_t* mpcf_ctor_str(void);` Returns `""` * `mpc_val_t* mpcf_free(mpc_val_t* x);` Frees `x` and returns `NULL` * `mpc_val_t* mpcf_int(mpc_val_t* x);` Converts a decimal string `x` to an `int*` * `mpc_val_t* mpcf_hex(mpc_val_t* x);` Converts a hex string `x` to an `int*` @@ -568,17 +549,14 @@ A number of common fold functions a user might want are included. They reside un * `mpc_val_t* mpcf_float(mpc_val_t* x);` Converts a string `x` to a `float*` * `mpc_val_t* mpcf_escape(mpc_val_t* x);` Converts a string `x` to an escaped version * `mpc_val_t* mpcf_unescape(mpc_val_t* x);` Converts a string `x` to an unescaped version -* `mpc_val_t* mpcf_fst(mpc_val_t* x, mpc_val_t* y);` Returns `x` -* `mpc_val_t* mpcf_snd(mpc_val_t* x, mpc_val_t* y);` Returns `y` -* `mpc_val_t* mpcf_fst_free(mpc_val_t* x, mpc_val_t* y);` Returns `x` and frees `y` -* `mpc_val_t* mpcf_snd_free(mpc_val_t* x, mpc_val_t* y);` Returns `y` and frees `x` -* `mpc_val_t* mpcf_freefold(mpc_val_t* t, mpc_val_t* x);` Returns `NULL` and frees `x` -* `mpc_val_t* mpcf_strfold(mpc_val_t* t, mpc_val_t* x);` Concatenates `t` and `x` and returns result -* `mpc_val_t* mpcf_afst(int n, mpc_val_t** xs);` Returns first argument -* `mpc_val_t* mpcf_asnd(int n, mpc_val_t** xs);` Returns second argument -* `mpc_val_t* mpcf_atrd(int n, mpc_val_t** xs);` Returns third argument -* `mpc_val_t* mpcf_astrfold(int n, mpc_val_t** xs);` Concatenates and returns all input strings -* `mpc_val_t* mpcf_between_free(int n, mpc_val_t** xs);` Frees first and third argument and returns second +* `mpc_val_t* mpcf_unescape(mpc_val_t* x);` Converts a string `x` to an unescaped version unescaping `\\/` +* `mpc_val_t* mpcf_fst(int n, mpc_val_t** xs);` Returns first element of `xs` +* `mpc_val_t* mpcf_snd(int n, mpc_val_t** xs);` Returns second element of `xs` +* `mpc_val_t* mpcf_trd(int n, mpc_val_t** xs);` Returns third element of `xs` +* `mpc_val_t* mpcf_fst_free(int n, mpc_val_t** xs);` Returns first element of `xs` and frees others +* `mpc_val_t* mpcf_snd_free(int n, mpc_val_t** xs);` Returns second element of `xs` and frees others +* `mpc_val_t* mpcf_trd_free(int n, mpc_val_t** xs);` Returns third element of `xs` and frees others +* `mpc_val_t* mpcf_strfold(int n, mpc_val_t** xs);` Concatenates all `xs` together as strings and returns result * `mpc_val_t* mpcf_maths(int n, mpc_val_t** xs);` Examines second argument as string to see which operator it is, then operators on first and third argument as if they are `int*`. @@ -616,17 +594,17 @@ int parse_maths(char* input) { mpc_parser_t* Term = mpc_new("term"); mpc_parser_t* Maths = mpc_new("maths"); - mpc_define(Expr, mpc_else( + mpc_define(Expr, mpc_or(2, mpc_and(3, mpcf_maths, Factor, mpc_oneof("*/"), Factor, free, free), Factor )); - mpc_define(Factor, mpc_else( + mpc_define(Factor, mpc_or(2, mpc_and(3, mpcf_maths, Term, mpc_oneof("+-"), Term, free, free), Term )); - mpc_define(Term, mpc_else(mpc_int(), mpc_parens(Expr, free))); + mpc_define(Term, mpc_or(2, mpc_int(), mpc_parens(Expr, free))); mpc_define(Maths, mpc_enclose(Expr, free)); mpc_result_t r; @@ -664,11 +642,11 @@ A cute thing about this is that it uses previous parts of the library to parse t Abstract Syntax Tree -------------------- -One can avoid passing in and around all those clumbsy function pointer if they don't care what type is output by _mpc_. For this generic Abstract Syntax Tree type `mpc_ast_t` is included. The combinator functions which act on this don't need information on how to destruct instances of the result as they know it will be a `mpc_ast_t`. So there are a number of combinator functions which work specifically (and only) on this type. They reside under `mpca_*`. +One can avoid passing in and around all those clumbsy function pointer if they don't care what type is output by _mpc_. For this a generic Abstract Syntax Tree type `mpc_ast_t` is included in _mpc_. The combinator functions which act on this don't need information on how to destruct or fold instances of the result as they know it will be a `mpc_ast_t`. So there are a number of combinator functions which work specifically (and only) on parsers that return this type. They reside under `mpca_*`. -Doing things via this method means that all the data processing must take place after the parsing - but to many this will be preferable. +Doing things via this method means that all the data processing must take place after the parsing. In many instances this is no problem or even preferable. -It also allows for one more trick. As all the fold and destructor functions are implicit then the user can simply specify the grammar of the language in some nice way and the system can try to build an AST for them from this alone. For this there are two functions supplied which take in a string and output a parser. The format for these grammars is simple and familar to those who have used parser generators before. It looks something like this. +It also allows for one more trick. As all the fold and destructor functions are implicit, the user can simply specify the grammar of the language in some nice way and the system can try to build a parser for the AST type from this alone. For this there are two functions supplied which take in a string and output a parser. The format for these grammars is simple and familar to those who have used parser generators before. It looks something like this. ``` expression : (('+' | '-') )*; @@ -682,11 +660,11 @@ maths : /^/ /$/; String literals are surrounded in double quotes `"`. Character literals in single quotes `'` and regex literals in slashes `/`. References to other parsers are surrounded in braces `<>` and referred to by name. -Parts specified one after another are parsed in order (like `mpc_and`), while parts separated by a pipe `|` are alternatives (like `mpc_or`). Parenthesis `()` are used to specify precidence. `*` can be used to mean zero or more of. `+` for one or more of. `?` for zero or one of. And a number inside braces `{5}` to mean N counts of. +Parts specified one after another are parsed in order (like `mpc_and`), while parts separated by a pipe `|` are alternatives (like `mpc_or`). Parenthesis `()` are used to specify precidence. `*` can be used to mean zero or more of. `+` for one or more of. `?` for zero or one of. `!` for negation. And a number inside braces `{5}` to mean N counts of. Rules are specified by rule name followed by a colon `:`, followed by the definition, and ending in a semicolon `;`. -In a cute bootstrapping this user input is parsed by existing parts of the _mpc_ library. It provides one of the more powerful features of the library. +Like with the regular expressions, this user input is parsed by existing parts of the _mpc_ library. It provides one of the more powerful features of the library. * * * @@ -724,7 +702,7 @@ This opens and reads in the contents of the file given by `filename` and passes Error Reporting --------------- -_mpc_ provides some automatic generation of error messages. These can be enhanced by the user by use of `mpc_expect` but even many of the defaults should provide both useful and readable. An example of an error message might look something like this: +_mpc_ provides some automatic generation of error messages. These can be enhanced by the user, with use of `mpc_expect`, but many of the defaults should provide both useful and readable. An example of an error message might look something like this: ``` :0:3: error: expected one or more of 'a' or 'd' at 'k' diff --git a/mpc.c b/mpc.c index fa5a71a..4b0952e 100644 --- a/mpc.c +++ b/mpc.c @@ -1,18 +1,5 @@ #include "mpc.h" -static int snprintf(char* str, size_t size, const char* fmt, ...) { - int x; - va_list va; - va_start(va, fmt); - x = vsprintf(str, fmt, va); - va_end(va); - return x; -} - -static int vsnprintf(char* str, size_t size, const char* fmt, va_list args) { - return snprintf(str, size, fmt, args); -} - /* ** State Type */ @@ -122,12 +109,13 @@ void mpc_err_print(mpc_err_t* x) { } void mpc_err_print_to(mpc_err_t* x, FILE* f) { - char* str = mpc_err_string_new(x); + char* str; mpc_err_string(x, &str); fprintf(f, "%s", str); free(str); } void mpc_err_string_cat(char* buffer, int* pos, int* max, char* fmt, ...) { + /* TODO: Error Checking on Length */ int left = ((*max) - (*pos)); va_list va; va_start(va, fmt); @@ -136,7 +124,33 @@ void mpc_err_string_cat(char* buffer, int* pos, int* max, char* fmt, ...) { va_end(va); } -char* mpc_err_string_new(mpc_err_t* x) { +static char char_unescape_buffer[3]; + +static char* mpc_err_char_unescape(char c) { + + char_unescape_buffer[0] = '\''; + char_unescape_buffer[1] = ' '; + char_unescape_buffer[2] = '\''; + + switch (c) { + + case '\a': "bell"; + case '\b': "backspace"; + case '\f': "formfeed"; + case '\r': "carriage return"; + case '\v': "vertical tab"; + case '\0': "end of input"; + case '\n': "newline"; + case '\t': "tab"; + case ' ' : "space"; + default: + char_unescape_buffer[1] = c; + return char_unescape_buffer; + } + +} + +void mpc_err_string(mpc_err_t* x, char** out) { char* buffer = calloc(1, 1024); int max = 1023; @@ -148,17 +162,16 @@ char* mpc_err_string_new(mpc_err_t* x) { "%s:%i:%i: error: %s\n", x->filename, x->state.row, x->state.col, x->failure); - return buffer; + *out = buffer; + return; } mpc_err_string_cat(buffer, &pos, &max, "%s:%i:%i: error: expected ", x->filename, x->state.row, x->state.col); - if (x->expected_num == 0) { - mpc_err_string_cat(buffer, &pos, &max, "ERROR: NOTHING EXPECTED"); - } else if (x->expected_num == 1) { - mpc_err_string_cat(buffer, &pos, &max, "%s", x->expected[0]); - } else { + if (x->expected_num == 0) { mpc_err_string_cat(buffer, &pos, &max, "ERROR: NOTHING EXPECTED"); } + if (x->expected_num == 1) { mpc_err_string_cat(buffer, &pos, &max, "%s", x->expected[0]); } + if (x->expected_num >= 2) { for (i = 0; i < x->expected_num-2; i++) { mpc_err_string_cat(buffer, &pos, &max, "%s, ", x->expected[i]); @@ -170,22 +183,10 @@ char* mpc_err_string_new(mpc_err_t* x) { } mpc_err_string_cat(buffer, &pos, &max, " at "); - if (x->state.next == '\a') { mpc_err_string_cat(buffer, &pos, &max, "bell"); } - else if (x->state.next == '\b') { mpc_err_string_cat(buffer, &pos, &max, "backspace"); } - else if (x->state.next == '\f') { mpc_err_string_cat(buffer, &pos, &max, "formfeed"); } - else if (x->state.next == '\r') { mpc_err_string_cat(buffer, &pos, &max, "carriage return"); } - else if (x->state.next == '\v') { mpc_err_string_cat(buffer, &pos, &max, "vertical tab"); } - else if (x->state.next == '\0') { mpc_err_string_cat(buffer, &pos, &max, "end of input"); } - else if (x->state.next == '\n') { mpc_err_string_cat(buffer, &pos, &max, "newline"); } - else if (x->state.next == '\t') { mpc_err_string_cat(buffer, &pos, &max, "tab"); } - else if (x->state.next == ' ') { mpc_err_string_cat(buffer, &pos, &max, "space"); } - else { mpc_err_string_cat(buffer, &pos, &max, "'%c'", x->state.next); } + mpc_err_string_cat(buffer, &pos, &max, mpc_err_char_unescape(x->state.next)); mpc_err_string_cat(buffer, &pos, &max, "\n"); - buffer = realloc(buffer, strlen(buffer) + 1); - - return buffer; - + *out = realloc(buffer, strlen(buffer) + 1); } static mpc_err_t* mpc_err_either(mpc_err_t* x, mpc_err_t* y) { @@ -272,9 +273,15 @@ char* mpc_err_filename(mpc_err_t* x) { return x->filename; } -char** mpc_err_expected(mpc_err_t* x, int* num) { - *num = x->expected_num; - return x->expected; +void mpc_err_expected(mpc_err_t* x, char** out, int* out_num, int out_max) { + + int i; + out_max = out_max < x->expected_num ? out_max : x->expected_num; + *out_num = 0; + for (i = 0; i < out_max; i++) { + out[i] = x->expected[i]; + (*out_num)++; + } } int mpc_err_line(mpc_err_t* x) { @@ -634,7 +641,7 @@ enum { }; typedef struct { char* m; } mpc_pdata_fail_t; -typedef struct { mpc_lift_t lf; void* x; } mpc_pdata_lift_t; +typedef struct { mpc_ctor_t lf; void* x; } mpc_pdata_lift_t; typedef struct { mpc_parser_t* x; char* m; } mpc_pdata_expect_t; typedef struct { char x; } mpc_pdata_single_t; typedef struct { char x; char y; } mpc_pdata_range_t; @@ -643,7 +650,7 @@ typedef struct { char* x; } mpc_pdata_string_t; typedef struct { mpc_parser_t* x; mpc_apply_t f; } mpc_pdata_apply_t; typedef struct { mpc_parser_t* x; mpc_apply_to_t f; void* d; } mpc_pdata_apply_to_t; typedef struct { mpc_parser_t* x; } mpc_pdata_predict_t; -typedef struct { mpc_parser_t* x; mpc_dtor_t dx; mpc_lift_t lf; } mpc_pdata_not_t; +typedef struct { mpc_parser_t* x; mpc_dtor_t dx; mpc_ctor_t lf; } mpc_pdata_not_t; typedef struct { int n; mpc_fold_t f; mpc_parser_t* x; mpc_dtor_t dx; } mpc_pdata_repeat_t; typedef struct { int n; mpc_parser_t** xs; } mpc_pdata_or_t; typedef struct { int n; mpc_fold_t f; mpc_parser_t** xs; mpc_dtor_t* dxs; } mpc_pdata_and_t; @@ -1285,29 +1292,16 @@ mpc_parser_t* mpc_define(mpc_parser_t* p, mpc_parser_t* a) { } void mpc_cleanup(int n, ...) { - va_list va; - va_start(va, n); - mpc_cleanup_va(n, va); - va_end(va); -} - -void mpc_cleanup_va(int n, va_list va) { - int i; mpc_parser_t** list = malloc(sizeof(mpc_parser_t*) * n); - for (i = 0; i < n; i++) { - list[i] = va_arg(va, mpc_parser_t*); - } - - for (i = 0; i < n; i++) { - mpc_undefine(list[i]); - } - - for (i = 0; i < n; i++) { - mpc_delete(list[i]); - } - + va_list va; + va_start(va, n); + for (i = 0; i < n; i++) { list[i] = va_arg(va, mpc_parser_t*); } + for (i = 0; i < n; i++) { mpc_undefine(list[i]); } + for (i = 0; i < n; i++) { mpc_delete(list[i]); } + va_end(va); + free(list); } @@ -1351,7 +1345,7 @@ mpc_parser_t* mpc_lift_val(mpc_val_t* x) { return p; } -mpc_parser_t* mpc_lift(mpc_lift_t lf) { +mpc_parser_t* mpc_lift(mpc_ctor_t lf) { mpc_parser_t* p = mpc_undefined(); p->type = MPC_TYPE_LIFT; p->data.lift.lf = lf; @@ -1516,7 +1510,7 @@ mpc_parser_t* mpc_predictive(mpc_parser_t* a) { return p; } -mpc_parser_t* mpc_not_lift(mpc_parser_t* a, mpc_dtor_t da, mpc_lift_t lf) { +mpc_parser_t* mpc_not_lift(mpc_parser_t* a, mpc_dtor_t da, mpc_ctor_t lf) { mpc_parser_t* p = mpc_undefined(); p->type = MPC_TYPE_NOT; p->data.not.x = a; @@ -1526,10 +1520,10 @@ mpc_parser_t* mpc_not_lift(mpc_parser_t* a, mpc_dtor_t da, mpc_lift_t lf) { } mpc_parser_t* mpc_not(mpc_parser_t* a, mpc_dtor_t da) { - return mpc_not_lift(a, da, mpcf_lift_null); + return mpc_not_lift(a, da, mpcf_ctor_null); } -mpc_parser_t* mpc_maybe_lift(mpc_parser_t* a, mpc_lift_t lf) { +mpc_parser_t* mpc_maybe_lift(mpc_parser_t* a, mpc_ctor_t lf) { mpc_parser_t* p = mpc_undefined(); p->type = MPC_TYPE_MAYBE; p->data.not.x = a; @@ -1538,7 +1532,7 @@ mpc_parser_t* mpc_maybe_lift(mpc_parser_t* a, mpc_lift_t lf) { } mpc_parser_t* mpc_maybe(mpc_parser_t* a) { - return mpc_maybe_lift(a, mpcf_lift_null); + return mpc_maybe_lift(a, mpcf_ctor_null); } mpc_parser_t* mpc_many(mpc_fold_t f, mpc_parser_t* a) { @@ -1660,13 +1654,13 @@ mpc_parser_t* mpc_real(void) { mpc_parser_t *p0, *p1, *p2, *p30, *p31, *p32, *p3; - p0 = mpc_maybe_lift(mpc_oneof("+-"), mpcf_lift_str); + p0 = mpc_maybe_lift(mpc_oneof("+-"), mpcf_ctor_str); p1 = mpc_digits(); - p2 = mpc_maybe_lift(mpc_and(2, mpcf_strfold, mpc_char('.'), mpc_digits(), free), mpcf_lift_str); + p2 = mpc_maybe_lift(mpc_and(2, mpcf_strfold, mpc_char('.'), mpc_digits(), free), mpcf_ctor_str); p30 = mpc_oneof("eE"); - p31 = mpc_maybe_lift(mpc_oneof("+-"), mpcf_lift_str); + p31 = mpc_maybe_lift(mpc_oneof("+-"), mpcf_ctor_str); p32 = mpc_digits(); - p3 = mpc_maybe_lift(mpc_and(3, mpcf_strfold, p30, p31, p32, free, free), mpcf_lift_str); + p3 = mpc_maybe_lift(mpc_and(3, mpcf_strfold, p30, p31, p32, free, free), mpcf_ctor_str); return mpc_expect(mpc_and(4, mpcf_strfold, p0, p1, p2, p3, free, free, free), "real"); @@ -1790,7 +1784,7 @@ static mpc_val_t* mpcf_re_or(int n, mpc_val_t** xs) { static mpc_val_t* mpcf_re_and(int n, mpc_val_t** xs) { int i; - mpc_parser_t* p = mpc_lift(mpcf_lift_str); + mpc_parser_t* p = mpc_lift(mpcf_ctor_str); for (i = 0; i < n; i++) { p = mpc_and(2, mpcf_strfold, p, xs[i], free); } @@ -1803,7 +1797,7 @@ static mpc_val_t* mpcf_re_repeat(int n, mpc_val_t** xs) { if (xs[1] == NULL) { return xs[0]; } if (strcmp(xs[1], "*") == 0) { free(xs[1]); return mpc_many(mpcf_strfold, xs[0]); } if (strcmp(xs[1], "+") == 0) { free(xs[1]); return mpc_many1(mpcf_strfold, xs[0]); } - if (strcmp(xs[1], "?") == 0) { free(xs[1]); return mpc_maybe_lift(xs[0], mpcf_lift_str); } + if (strcmp(xs[1], "?") == 0) { free(xs[1]); return mpc_maybe_lift(xs[0], mpcf_ctor_str); } num = *(int*)xs[1]; free(xs[1]); @@ -1818,14 +1812,14 @@ static mpc_parser_t* mpc_re_escape_char(char c, int range) { case 't': return mpc_char('\t'); case 'v': return mpc_char('\v'); case 'b': return mpc_char('\b'); - case 'A': return mpc_and(2, mpcf_snd, mpc_soi(), mpc_lift(mpcf_lift_str), free); - case 'Z': return mpc_and(2, mpcf_snd, mpc_eoi(), mpc_lift(mpcf_lift_str), free); + case 'A': return mpc_and(2, mpcf_snd, mpc_soi(), mpc_lift(mpcf_ctor_str), free); + case 'Z': return mpc_and(2, mpcf_snd, mpc_eoi(), mpc_lift(mpcf_ctor_str), free); case 'd': return mpc_digit(); - case 'D': return mpc_not_lift(mpc_digit(), free, mpcf_lift_str); + case 'D': return mpc_not_lift(mpc_digit(), free, mpcf_ctor_str); case 's': return mpc_space(); - case 'S': return mpc_not_lift(mpc_space(), free, mpcf_lift_str); + case 'S': return mpc_not_lift(mpc_space(), free, mpcf_ctor_str); case 'w': return mpc_alphanum(); - case 'W': return mpc_not_lift(mpc_alphanum(), free, mpcf_lift_str); + case 'W': return mpc_not_lift(mpc_alphanum(), free, mpcf_ctor_str); default: return NULL; } } @@ -1837,8 +1831,8 @@ static mpc_val_t* mpcf_re_escape(mpc_val_t* x) { /* Regex Special Characters */ if (s[0] == '.') { free(s); return mpc_any(); } - if (s[0] == '^') { free(s); return mpc_and(2, mpcf_snd, mpc_soi(), mpc_lift(mpcf_lift_str), free); } - if (s[0] == '$') { free(s); return mpc_and(2, mpcf_snd, mpc_eoi(), mpc_lift(mpcf_lift_str), free); } + if (s[0] == '^') { free(s); return mpc_and(2, mpcf_snd, mpc_soi(), mpc_lift(mpcf_ctor_str), free); } + if (s[0] == '$') { free(s); return mpc_and(2, mpcf_snd, mpc_eoi(), mpc_lift(mpcf_ctor_str), free); } /* Regex Escape */ if (s[0] == '\\') { @@ -1894,7 +1888,7 @@ static mpc_val_t* mpcf_re_range(mpc_val_t* x) { } free(x); - return comp ? mpc_not_lift(p, free, mpcf_lift_str) : p; + return comp ? mpc_not_lift(p, free, mpcf_ctor_str) : p; } static mpc_val_t* mpcf_re_invalid(void) { @@ -1904,6 +1898,7 @@ static mpc_val_t* mpcf_re_invalid(void) { mpc_parser_t* mpc_re(const char* re) { char* err_msg; + mpc_parser_t* err_out; mpc_result_t r; mpc_parser_t *Regex, *Term, *Factor, *Base, *Range, *RegexEnclose; @@ -1947,10 +1942,11 @@ mpc_parser_t* mpc_re(const char* re) { RegexEnclose = mpc_enclose(mpc_predictive(Regex), (mpc_dtor_t)mpc_delete); if(!mpc_parse("", re, RegexEnclose, &r)) { - err_msg = mpc_err_string_new(r.error); - r.output = mpc_failf("Invalid Regex: %s", err_msg); - free(err_msg); + mpc_err_string(r.error, &err_msg); + err_out = mpc_failf("Invalid Regex: %s", err_msg); mpc_err_delete(r.error); + free(err_msg); + r.output = err_out; } mpc_delete(RegexEnclose); @@ -1966,8 +1962,8 @@ mpc_parser_t* mpc_re(const char* re) { void mpcf_dtor_null(mpc_val_t* x) { return; } -mpc_val_t* mpcf_lift_null(void) { return NULL; } -mpc_val_t* mpcf_lift_str(void) { return calloc(1, 1); } +mpc_val_t* mpcf_ctor_null(void) { return NULL; } +mpc_val_t* mpcf_ctor_str(void) { return calloc(1, 1); } mpc_val_t* mpcf_free(mpc_val_t* x) { free(x); return NULL; } mpc_val_t* mpcf_int(mpc_val_t* x) { @@ -2109,15 +2105,6 @@ mpc_val_t* mpcf_unescape_regex(mpc_val_t* x) { return y; } -mpc_val_t* mpcf_strcrop(mpc_val_t* x) { - char* copy = malloc(strlen(x)); - strcpy(copy, x); - memmove(copy, copy+1, strlen(copy)-1); - copy[strlen(copy)-2] = '\0'; - free(x); - return copy; -} - mpc_val_t* mpcf_fst(int n, mpc_val_t** xs) { return xs[0]; } mpc_val_t* mpcf_snd(int n, mpc_val_t** xs) { return xs[1]; } mpc_val_t* mpcf_trd(int n, mpc_val_t** xs) { return xs[2]; } @@ -2401,7 +2388,7 @@ mpc_ast_t* mpc_ast_build(int n, const char* tag, ...) { } -mpc_ast_t* mpc_ast_insert_root(mpc_ast_t* a) { +mpc_ast_t* mpc_ast_add_root(mpc_ast_t* a) { mpc_ast_t* r; @@ -2409,7 +2396,7 @@ mpc_ast_t* mpc_ast_insert_root(mpc_ast_t* a) { if (a->children_num == 0) { return a; } if (a->children_num == 1) { return a; } - r = mpc_ast_new("root", ""); + r = mpc_ast_new(">", ""); mpc_ast_add_child(r, a); return r; } @@ -2429,22 +2416,25 @@ int mpc_ast_eq(mpc_ast_t* a, mpc_ast_t* b) { return 1; } -void mpc_ast_add_child(mpc_ast_t* r, mpc_ast_t* a) { +mpc_ast_t* mpc_ast_add_child(mpc_ast_t* r, mpc_ast_t* a) { r->children_num++; r->children = realloc(r->children, sizeof(mpc_ast_t*) * r->children_num); r->children[r->children_num-1] = a; + return r; } -void mpc_ast_add_tag(mpc_ast_t* a, const char* t) { +mpc_ast_t* mpc_ast_add_tag(mpc_ast_t* a, const char* t) { a->tag = realloc(a->tag, strlen(t) + 1 + strlen(a->tag) + 1); memmove(a->tag + strlen(t) + 1, a->tag, strlen(a->tag)+1); memmove(a->tag, t, strlen(t)); memmove(a->tag + strlen(t), "|", 1); + return a; } -void mpc_ast_tag(mpc_ast_t* a, const char* t) { +mpc_ast_t* mpc_ast_tag(mpc_ast_t* a, const char* t) { a->tag = realloc(a->tag, strlen(t) + 1); strcpy(a->tag, t); + return a; } static void mpc_ast_print_depth(mpc_ast_t* a, int d) { @@ -2476,14 +2466,20 @@ mpc_val_t* mpcf_fold_ast(int n, mpc_val_t** xs) { if (n == 0) { return NULL; } if (n == 1) { return xs[0]; } - if (n == 2 && xs[0] == NULL) { return xs[1]; } if (n == 2 && xs[1] == NULL) { return xs[0]; } + if (n == 2 && xs[0] == NULL) { return xs[1]; } r = mpc_ast_new(">", ""); for (i = 0; i < n; i++) { if (as[i] == NULL) { continue; } + + /* + printf("%i\n", i); + mpc_ast_print(as[i]); + */ + if (as[i] && as[i]->children_num > 0) { for (j = 0; j < as[i]->children_num; j++) { @@ -2501,28 +2497,22 @@ mpc_val_t* mpcf_fold_ast(int n, mpc_val_t** xs) { return r; } -mpc_val_t* mpcf_apply_str_ast(mpc_val_t* c) { +mpc_val_t* mpcf_str_ast(mpc_val_t* c) { mpc_ast_t* a = mpc_ast_new("", c); free(c); return a; } -static mpc_val_t* mpcf_apply_tag(mpc_val_t* x, void* d) { - mpc_ast_tag(x, d); - return x; -} - -static mpc_val_t* mpcf_apply_add_tag(mpc_val_t* x, void* d) { - mpc_ast_add_tag(x, d); - return x; -} - mpc_parser_t* mpca_tag(mpc_parser_t* a, const char* t) { - return mpc_apply_to(a, mpcf_apply_tag, (void*)t); + return mpc_apply_to(a, (mpc_apply_to_t)mpc_ast_tag, (void*)t); } mpc_parser_t* mpca_add_tag(mpc_parser_t* a, const char* t) { - return mpc_apply_to(a, mpcf_apply_add_tag, (void*)t); + return mpc_apply_to(a, (mpc_apply_to_t)mpc_ast_add_tag, (void*)t); +} + +mpc_parser_t* mpca_root(mpc_parser_t* a) { + return mpc_apply(a, (mpc_apply_t)mpc_ast_add_root); } mpc_parser_t* mpca_not(mpc_parser_t* a) { return mpc_not(a, (mpc_dtor_t)mpc_ast_delete); } @@ -2654,21 +2644,21 @@ static mpc_val_t* mpcaf_grammar_string(mpc_val_t* x) { char* y = mpcf_unescape(x); mpc_parser_t* p = mpc_tok(mpc_string(y)); free(y); - return mpca_tag(mpc_apply(p, mpcf_apply_str_ast), "string"); + return mpca_tag(mpc_apply(p, mpcf_str_ast), "string"); } static mpc_val_t* mpcaf_grammar_char(mpc_val_t* x) { char* y = mpcf_unescape(x); mpc_parser_t* p = mpc_tok(mpc_char(y[0])); free(y); - return mpca_tag(mpc_apply(p, mpcf_apply_str_ast), "char"); + return mpca_tag(mpc_apply(p, mpcf_str_ast), "char"); } static mpc_val_t* mpcaf_grammar_regex(mpc_val_t* x) { char* y = mpcf_unescape_regex(x); mpc_parser_t* p = mpc_tok(mpc_re(y)); free(y); - return mpca_tag(mpc_apply(p, mpcf_apply_str_ast), "regex"); + return mpca_tag(mpc_apply(p, mpcf_str_ast), "regex"); } typedef struct { @@ -2734,9 +2724,9 @@ static mpc_val_t* mpcaf_grammar_id(mpc_val_t* x, void* y) { free(x); if (p->name) { - return mpc_apply(mpca_add_tag(p, p->name), (mpc_apply_t)mpc_ast_insert_root); + return mpca_root(mpca_add_tag(p, p->name)); } else { - return mpc_apply(p, (mpc_apply_t)mpc_ast_insert_root); + return mpca_root(p); } } @@ -2787,7 +2777,7 @@ mpc_parser_t* mpca_grammar_st(const char* grammar, mpca_grammar_st_t* st) { )); if(!mpc_parse("", grammar, GrammarTotal, &r)) { - err_msg = mpc_err_string_new(r.error); + mpc_err_string(r.error, &err_msg); err_out = mpc_failf("Invalid Grammar: %s", err_msg); mpc_err_delete(r.error); free(err_msg); diff --git a/mpc.h b/mpc.h index 4b945a3..7d4adc1 100644 --- a/mpc.h +++ b/mpc.h @@ -23,18 +23,18 @@ struct mpc_err_t; typedef struct mpc_err_t mpc_err_t; +void mpc_err_delete(mpc_err_t* x); +void mpc_err_print(mpc_err_t* x); +void mpc_err_print_to(mpc_err_t* x, FILE* f); +void mpc_err_string(mpc_err_t* x, char** out); + int mpc_err_line(mpc_err_t* x); int mpc_err_column(mpc_err_t* x); char mpc_err_unexpected(mpc_err_t* x); -char** mpc_err_expected(mpc_err_t* x, int* num); +void mpc_err_expected(mpc_err_t* x, char** out, int* out_num, int out_max); char* mpc_err_filename(mpc_err_t* x); char* mpc_err_failure(mpc_err_t* x); -void mpc_err_delete(mpc_err_t* x); -void mpc_err_print(mpc_err_t* x); -void mpc_err_print_to(mpc_err_t* x, FILE* f); -char* mpc_err_string_new(mpc_err_t* x); - /* ** Parsing */ @@ -58,23 +58,22 @@ int mpc_fparse_contents(const char* filename, mpc_parser_t* p, mpc_result_t* r); */ typedef void(*mpc_dtor_t)(mpc_val_t*); +typedef mpc_val_t*(*mpc_ctor_t)(void); + typedef mpc_val_t*(*mpc_apply_t)(mpc_val_t*); typedef mpc_val_t*(*mpc_apply_to_t)(mpc_val_t*,void*); typedef mpc_val_t*(*mpc_fold_t)(int,mpc_val_t**); -typedef mpc_val_t*(*mpc_lift_t)(void); /* ** Building a Parser */ -void mpc_delete(mpc_parser_t* p); mpc_parser_t* mpc_new(const char* name); - mpc_parser_t* mpc_define(mpc_parser_t* p, mpc_parser_t* a); mpc_parser_t* mpc_undefine(mpc_parser_t* p); +void mpc_delete(mpc_parser_t* p); void mpc_cleanup(int n, ...); -void mpc_cleanup_va(int n, va_list va); /* ** Basic Parsers @@ -83,7 +82,7 @@ void mpc_cleanup_va(int n, va_list va); mpc_parser_t* mpc_pass(void); mpc_parser_t* mpc_fail(const char* m); mpc_parser_t* mpc_failf(const char* fmt, ...); -mpc_parser_t* mpc_lift(mpc_lift_t f); +mpc_parser_t* mpc_lift(mpc_ctor_t f); mpc_parser_t* mpc_lift_val(mpc_val_t* x); mpc_parser_t* mpc_any(void); @@ -95,23 +94,27 @@ mpc_parser_t* mpc_satisfy(int(*f)(char)); mpc_parser_t* mpc_string(const char* s); /* -** Core Parsers +** Combinator Parsers */ -mpc_parser_t* mpc_expect(mpc_parser_t* a, const char* expected); +mpc_parser_t* mpc_expect(mpc_parser_t* a, const char* e); mpc_parser_t* mpc_apply(mpc_parser_t* a, mpc_apply_t f); mpc_parser_t* mpc_apply_to(mpc_parser_t* a, mpc_apply_to_t f, void* x); -mpc_parser_t* mpc_predictive(mpc_parser_t* a); + mpc_parser_t* mpc_not(mpc_parser_t* a, mpc_dtor_t da); -mpc_parser_t* mpc_not_lift(mpc_parser_t* a, mpc_dtor_t da, mpc_lift_t lf); +mpc_parser_t* mpc_not_lift(mpc_parser_t* a, mpc_dtor_t da, mpc_ctor_t lf); mpc_parser_t* mpc_maybe(mpc_parser_t* a); -mpc_parser_t* mpc_maybe_lift(mpc_parser_t* a, mpc_lift_t lf); +mpc_parser_t* mpc_maybe_lift(mpc_parser_t* a, mpc_ctor_t lf); + mpc_parser_t* mpc_many(mpc_fold_t f, mpc_parser_t* a); mpc_parser_t* mpc_many1(mpc_fold_t f, mpc_parser_t* a); mpc_parser_t* mpc_count(int n, mpc_fold_t f, mpc_parser_t* a, mpc_dtor_t da); + mpc_parser_t* mpc_or(int n, ...); mpc_parser_t* mpc_and(int n, mpc_fold_t f, ...); +mpc_parser_t* mpc_predictive(mpc_parser_t* a); + /* ** Common Parsers */ @@ -180,19 +183,13 @@ mpc_parser_t* mpc_tok_brackets(mpc_parser_t* a, mpc_dtor_t ad); mpc_parser_t* mpc_tok_squares(mpc_parser_t* a, mpc_dtor_t ad); /* -** Regular Expression Parsers -*/ - -mpc_parser_t* mpc_re(const char* re); - -/* -** Common Fold Functions +** Common Function Parameters */ void mpcf_dtor_null(mpc_val_t* x); -mpc_val_t* mpcf_lift_null(void); -mpc_val_t* mpcf_lift_str(void); +mpc_val_t* mpcf_ctor_null(void); +mpc_val_t* mpcf_ctor_str(void); mpc_val_t* mpcf_free(mpc_val_t* x); mpc_val_t* mpcf_int(mpc_val_t* x); @@ -204,8 +201,6 @@ mpc_val_t* mpcf_escape(mpc_val_t* x); mpc_val_t* mpcf_unescape(mpc_val_t* x); mpc_val_t* mpcf_unescape_regex(mpc_val_t* x); -mpc_val_t* mpcf_strcrop(mpc_val_t* x); - mpc_val_t* mpcf_fst(int n, mpc_val_t** xs); mpc_val_t* mpcf_snd(int n, mpc_val_t** xs); mpc_val_t* mpcf_trd(int n, mpc_val_t** xs); @@ -217,13 +212,11 @@ mpc_val_t* mpcf_trd_free(int n, mpc_val_t** xs); mpc_val_t* mpcf_strfold(int n, mpc_val_t** xs); mpc_val_t* mpcf_maths(int n, mpc_val_t** xs); - /* -** Printing +** Regular Expression Parsers */ -void mpc_print(mpc_parser_t* p); - +mpc_parser_t* mpc_re(const char* re); /* ** AST @@ -236,30 +229,36 @@ typedef struct mpc_ast_t { struct mpc_ast_t** children; } mpc_ast_t; -void mpc_ast_delete(mpc_ast_t* a); mpc_ast_t* mpc_ast_new(const char* tag, const char* contents); mpc_ast_t* mpc_ast_build(int n, const char* tag, ...); -mpc_ast_t* mpc_ast_insert_root(mpc_ast_t* a); +mpc_ast_t* mpc_ast_add_root(mpc_ast_t* a); +mpc_ast_t* mpc_ast_add_child(mpc_ast_t* r, mpc_ast_t* a); +mpc_ast_t* mpc_ast_add_tag(mpc_ast_t* a, const char* t); +mpc_ast_t* mpc_ast_tag(mpc_ast_t* a, const char* t); -void mpc_ast_add_child(mpc_ast_t* r, mpc_ast_t* a); -void mpc_ast_add_tag(mpc_ast_t* a, const char* t); -void mpc_ast_tag(mpc_ast_t* a, const char* t); +void mpc_ast_delete(mpc_ast_t* a); void mpc_ast_print(mpc_ast_t* a); + int mpc_ast_eq(mpc_ast_t* a, mpc_ast_t* b); mpc_val_t* mpcf_fold_ast(int n, mpc_val_t** as); -mpc_val_t* mpcf_apply_str_ast(mpc_val_t* c); +mpc_val_t* mpcf_str_ast(mpc_val_t* c); mpc_parser_t* mpca_tag(mpc_parser_t* a, const char* t); mpc_parser_t* mpca_add_tag(mpc_parser_t* a, const char* t); +mpc_parser_t* mpca_root(mpc_parser_t* a); mpc_parser_t* mpca_total(mpc_parser_t* a); + mpc_parser_t* mpca_not(mpc_parser_t* a); mpc_parser_t* mpca_maybe(mpc_parser_t* a); + mpc_parser_t* mpca_many(mpc_parser_t* a); mpc_parser_t* mpca_many1(mpc_parser_t* a); mpc_parser_t* mpca_count(int n, mpc_parser_t* a); + mpc_parser_t* mpca_or(int n, ...); mpc_parser_t* mpca_and(int n, ...); + mpc_parser_t* mpca_grammar(const char* grammar, ...); mpc_err_t* mpca_lang(const char* language, ...); @@ -267,9 +266,11 @@ mpc_err_t* mpca_lang_file(FILE* f, ...); mpc_err_t* mpca_lang_filename(const char* filename, ...); /* -** Testing +** Debug & Testing */ +void mpc_print(mpc_parser_t* p); + int mpc_unmatch(mpc_parser_t* p, const char* s, void* d, int(*tester)(void*, void*), mpc_dtor_t destructor, diff --git a/mpc_optimise.h b/mpc_optimise.h index 8a64e1f..1b09a96 100644 --- a/mpc_optimise.h +++ b/mpc_optimise.h @@ -2,7 +2,8 @@ ** Just Some Ideas: ** ** - Predictive Optimisation. Check all first character of all possible roots. If no conflict then predictive. -** - Or Optimisation. Check if any terminal parses are _ored_ together. If so condence into single large range. +** - Or Optimisation. Check if any terminal parsers are _ored_ together. If so condence into single large range. +** - And Optimisation. Check if any terminal parsers are _anded_ together. If so condence into single large string. ** - Not Optimisation. Similar to the above. Convert _nots_ into positive cases by inverting full range of characters. ** - Also Optimisation. Two Character parsers together can be condensed to a single string parser. ** - Lookup Optimisation. Finite State Machine Parser. diff --git a/tests/grammar.c b/tests/grammar.c index 8f20f26..935b416 100644 --- a/tests/grammar.c +++ b/tests/grammar.c @@ -16,11 +16,11 @@ void test_grammar(void) { mpc_define(Value, mpca_grammar(" /[0-9]+/ | '(' ')' ", Expr)); mpc_define(Maths, mpca_total(Expr)); - t0 = mpc_ast_build(1, ">", mpc_ast_new("value|regex", "24")); - t1 = mpc_ast_build(1, ">", + t0 = mpc_ast_new("product|value|regex", "24"); + t1 = mpc_ast_build(1, "product|>", mpc_ast_build(3, "value|>", mpc_ast_new("char", "("), - mpc_ast_new("value|regex", "5"), + mpc_ast_new("expression|product|value|regex", "5"), mpc_ast_new("char", ")"))); t2 = mpc_ast_build(3, ">", @@ -37,11 +37,11 @@ void test_grammar(void) { mpc_ast_new("value|regex", "11")), mpc_ast_new("char", "+"), - mpc_ast_new("value|regex", "2")), + mpc_ast_new("product|value|regex", "2")), mpc_ast_new("char", ")")), mpc_ast_new("char", "+"), - mpc_ast_new("value|regex", "5")); + mpc_ast_new("product|value|regex", "5")); PT_ASSERT(mpc_match(Maths, " 24 ", t0, (int(*)(void*,void*))mpc_ast_eq, (mpc_dtor_t)mpc_ast_delete, (void(*)(void*))mpc_ast_print)); PT_ASSERT(mpc_match(Maths, "(5)", t1, (int(*)(void*,void*))mpc_ast_eq, (mpc_dtor_t)mpc_ast_delete, (void(*)(void*))mpc_ast_print));