TLDR; Getting started
a quick getting started will guide you through the implementation of a dumb parser
I needed a solution for building parsers and found all existing solution either
- too complicated to integrate with an additional build step as with ANTLR )
- or too different from the classical BNF notation (as parser combinators like sprache or Eto.Parse). These tools are great, but I don't feel comfortable with them.
SLY is highly inspired by the python lex yacc library (PLY)
A lexer - parser chain is fully described in only 2 C# files :
- Lexer : an enum that lists all the lexems used (plus metadata to describe their patterns) ;
- Parser : a class that lists production rules and their associated actions.
CSLY also has an additional feature that allow to write expression parsers(boolean or numeric expressions for instance) in a very compact and efficient way. (see expression parser)
Install from the NuGet gallery GUI or with the Package Manager Console using the following command:
Install-Package sly
or with dotnet core
dotnet add package sly
Yes 2 Lexers and not just Lexer. CSLY comes with 2 lexers :
- a regex based lexer very flexible but with some performance issues;
- a "generic lexer" based on a Finite State Machine that adresses the performance issue at the cost of some lesser flexibility.This lexer is inspired by this post
The full lexers documentation can be found in the
lexer wiki
Here is a lexer definition for a arithmetic expression parser using the generic lexer.
using sly.lexer;
namespace simpleExpressionParser
{
public enum SimpleExpressionToken
{
// float number
[Lexeme(GenericToken.Double)]
DOUBLE = 1,
// integer
[Lexeme(GenericToken.Int)]
INT = 3,
[Lexeme(GenericToken.Identifier)]
IDENTIFIER = 4,
// the + operator
[Lexeme(GenericToken.SugarToken,"+")]
PLUS = 5,
// the - operator
[Lexeme(GenericToken.SugarToken,"-")]
MINUS = 6,
// the * operator
[Lexeme(GenericToken.SugarToken,"*")]
TIMES = 7,
// the / operator
[Lexeme(GenericToken.SugarToken,"/")]
DIVIDE = 8,
// a left paranthesis (
[Lexeme(GenericToken.SugarToken,"(")]
LPAREN = 9,
// a right paranthesis )
[Lexeme(GenericToken.SugarToken,")")]
RPAREN = 10,
}
}
A parser is of type Parser <TIn,TOut> where :
- TIn is the enum token type as seen before
- TOut is the type of object produced bye the parser. Classicaly it will be an Asbtract Syntax Tree (AST) or it may be an int for an expression parser.
The grammar defining the parser is defined using C# attribute [Production("some grammar rule")]
mapped to methods ( in the same class used for the lexer)
Production rules can used :
- BNF notation
- EBNF notation : using multiplier operator
- zero or more : *
- one or more : +
- zero or more (optional) : ?
A terminal notation must exactly matche (case sensitive) an enum value. Once the syntaxic tree build, the methods of each rule will be used as a syntaxic tree visitor. Each production rule is associated to a method that acts as a visitor for the syntaxic tree.
a mathematical parser calculate a mathematical expression. It takes a string as input and return a numeric value. So each method of the parser will return a numeric value (an int for simplicity concern)
[Production("primary: INT")]
public int Primary(Token<ExpressionToken> intToken)
{
return intToken.IntValue;
}
[Production("primary: LPAREN expression RPAREN")]
public int Group(object discaredLParen, int groupValue ,object discardedRParen)
{
return groupValue;
}
[Production("expression : term PLUS expression")]
[Production("expression : term MINUS expression")]
public int Expression(int left, Token<ExpressionToken> operatorToken, int right)
{
object result = 0;
switch (operatorToken.TokenID)
{
case ExpressionToken.PLUS:
{
result = left + right;
break;
}
case ExpressionToken.MINUS:
{
result = left - right;
break;
}
default:
{
break;
}
}
return result;
}
as we 've seen above a parser is declared on only 2 files : * the lexer enum * the parser class
Once the class with all its methods has been written, it can be used to build the effective parser instance calling ParserBuilder.BuildParser. the builder methods takes 3 parameters :
- an instance of the class containing the lexer and parser definition
- the kind of parser. Currently only a recursive descent parsers are available. this implementation is limited to LL grammar by construction (no left recursion).There are 2 possible types :
- the root rule for the parser (grammar entrypoint).
the parser is typed according to the token type and output (int for our expression parser).
ExpressionParser expressionParserDefinition = new ExpressionParser()
// here see the typing :
// ExpressionToken is the token enum type
// int is the type of a parse evaluation
Parser<ExpressionToken,int> Parser = ParserBuilder.BuildParser<ExpressionToken,int>(expressionParserDefinition,
ParserType.LL_RECURSIVE_DESCENT,
"expression");
then calling
```var result = C#parser.Parse("2 + 2")```
will return the evaluation of the syntax tree.
the parser returns a ParseResult instance containing the evaluation value or a list of errors.
```c#
string expression = "2 + 2";
ParseResult<ExpressionToken> r = Parser.Parse(expression);
if (!r.IsError && r.Result != null && r.Result is int)
{
Console.WriteLine($"result of <{expression}> is {(int)r.Result}");
// outputs : result of <2 + 2> is 4"
}
else
{
if (r.Errors !=null && r.Errors.Any())
{
// display errors
r.Errors.ForEach(error => Console.WriteLine(error.ErrorMessage));
}
}
One build a parser expose :
-
a main API through the
Parse(string content)
method (chain lexical analysis, syntax parsing and finally call your parsing methods) -
the lexer through the Lexer property
-
the syntax parser through the SyntaxParser property (which type is a
ISyntaxParser
)
Many language needs parsing expressions (boolean or numeric). A recursive descent parser is hard to maintain when parsing expressions with multiple precedence levels. So CSLY offers a way to express expression parsing using only operator tokens and precedence level. CSLY will then generates production rules to parse expressions. It also manages precedence and left or right associativity.
here is a parser for a classical numeric expression parser using classical precedence and associativity.
Full expression parser generator documentation can be found on the (https://github.com/b3b00/csly/wiki/expression-parsing)[expression parsing wiki]
using sly.lexer;
using sly.parser.generator;
namespace simpleExpressionParser
{
public class SimpleExpressionParser
{
[Operation((int)ExpressionToken.PLUS, 2, Associativity.Right, 10)]
[Operation((int)ExpressionToken.MINUS, 2, Associativity.Left, 10)]
public int binaryTermExpression(int left, Token<ExpressionToken> operation, int right)
{
int result = 0;
switch (operation.TokenID)
{
case ExpressionToken.PLUS:
{
result = left + right;
break;
}
case ExpressionToken.MINUS:
{
result = left - right;
break;
}
}
return result;
}
[Operation((int)ExpressionToken.TIMES, 2, Associativity.Right, 50)]
[Operation((int)ExpressionToken.DIVIDE, 2, Associativity.Left, 50)]
public int binaryFactorExpression(int left, Token<ExpressionToken> operation, int right)
{
int result = 0;
switch (operation.TokenID)
{
case ExpressionToken.TIMES:
{
result = left * right;
break;
}
case ExpressionToken.DIVIDE:
{
result = left / right;
break;
}
}
return result;
}
[Operation((int)ExpressionToken.MINUS, 1, Associativity.Right, 100)]
public int unaryExpression(Token<ExpressionToken> operation, int value)
{
return -value;
}
[Operand]
[Production("operand : primary_value")]
public int operand(int value)
{
return value;
}
[Production("primary_value : INT")]
public int operand1(Token<ExpressionToken> value)
{
return value.IntValue;
}
[Production("primary_value : LPAREN SimpleExpressionParser_expressions RPAREN")]
public int operand2(Token<ExpressionToken> lparen, int value, Token<ExpressionToken> rparen)
{
return value;
}
}
}
Full examples are available under :
- jsonparser : a json parser
- expressionParser : a mathematical expression parser
- While Language : a dummy language inspired by this parsec sample
- You can also look at Tests that presents a simple EBNF grammar in EBNFTests