This project was developed for building a compiler for own programming language (Beaver language).
Note: I use a lot of ideas and boilerplate from awesome book "Writing an interpreter in Go"(Thorsten Ball) to create my
own compiler. To see other resources I used within the project, see the resources section.
- Parses all the input until the end of the input
- Can recognize identifiers: e.g.
hello_world
- Ignores whitespaces
- Added numbers (integers only): e.g.
246
- Added keywords:
let
,function
- Added operators:
+
,-
,*
,/
,<
,>
,|
- Added comparison and logical operators:
==
,!=
- Brackets and parenthesis support:
{}
,()
- Added keywords:
if
,else
,return
- Recognizes comma and semicolon:
,
,;
- Recognizes EOF
- All not recognized symbols are ILLEGAL tokens: e.g.
$
- Works with strings:
"hello"
We used "top down operator precedence" parser, also known as "Pratt parser"
- Parsing let expressions:
let a = 10;
- Parsing return statements:
return 500;
- Parsing int expressions:
5;
- Prefix operators:
<prefix operator><expression>;
,-10;
,!true;
Supports two operators:!
and-
- Infix operators
<expression><infix operator><expression>
, e.g.5 + 10
,2 - 8
Supports 8 operators:+
,-
,*
,/
,<
,>
,==
,!=
- Working with operations precedences
- Parsing function literals:
function(x, y) {}
- Call expressions:
<expression>(<comma separated expressions>)
- Works with strings:
"hello"
let
statement
let foo = 27;
let bar = 50;
let foobar = foo + bar;
return
statement
return foo;
return foo + bar;
return foobar(foo, bar);
- prefix operators
-5;
-100;
!true;
- infix operators
5 + 100;
5 - 6;
10 / 5;
2 * 90;
4 == 4;
3 != 6;
2 < 87;
5 > 4;
- Uses Beaver parser and prints given tokens from the input
Sample of REPL:
MacBook-Pro:compiler myuser$ go run main.go
Hello myuser! Starting Beaver REPL...
REPL (use Beaver commands):
beaver>>1 + 100
101
beaver>>true == false
false
beaver>>(2 + 10) * (5 + 1)
72
beaver>>if (10 > 100) { true } else { false }
false
beaver>>if (true) { 100 / 5 }
20
beaver>>let x = 10;
beaver>>x
10
beaver>>let y = true;
beaver>>y
true
beaver>>if (x == y) { return x; } else { return y; }
true
beaver>>let hi = "hello, world"
beaver>>hi
hello, world
beaver>>let x 12 * 3
__________
/ _ _ \
_/ _ _ \_
|_| | | | | |_|
\ |_| |_| /
| _ |
| | | | |
| |
|____________|
Woops! Something got wrong here:
expected next token to be '=', got 'INT' instead
- Just evaluates statements and expressions in a while
- Can evaluate integers expressions
- Can evaluate boolean expressions
- Can evaluate null
- Can evaluate prefix expressions:
!
,-
- Can evaluate infix expressions for integers:
+
,-
,*
,/
- Can evaluate infix expressions for comparing:
==
,!=
,>
,<
- Can evaluate conditionals:
if (conditional) { consequence }
orif (conditional) { consequence } else { alternative }
- Can evaluate return statements
- Error Handling
- Binding & Environments
- Evaluates let statements (using environment)
- Can evaluate functions calls, functions assigning
- Closures
- Integers
- Booleans
- Null
- Strings
- Arrays
- Objects
- C-like syntax
- Some elements of functional programming: closures, passing functions as arguments, returning function from function, assigning functions to variables
- Types: integer, boolean
- Conditions
- Only ASCII support
- Add UTF-8 support (using the )
- Add new types: float, double, string, etc.
- Add new operators and operations
- Consider the space as token
To install this software, you need to install Go programming language (depends on OS you have): https://golang.org/doc/install
[Not now] Or you can install this using Docker. In this case read the docs and installation guides
Clone the project:
git clone https://github.com/technoboom/compiler
Run the REPL:
go run ./main.go
To run all tests in the project use command: go test ./...
To test only lexer: go test ./lexer
To test only parser: go test ./parser
To test only ast: go test ./ast
To test tokens: go test ./tokens
Beaver language supports C-like syntax. Basic rules:
- all spaces ignored (maybe will be improved in the future)
- each sentence should contain semicolon at the end of the line
You can define new variable using let
keyword.
let x = 10;
let y = true;
You can use English letters and underscore inside variable identifiers
let arabica_coffee = 95;
let _strength_percent = 50;
let camelCase = true;
let UpperCamelCase = false;
Currently, Beaver language has 4 data types:
- integers:
let myInt = 1000;
- booleans:
let myBool = false;
- strings:
let str = "hello, my dear friend"
- null
The keyword function
used for defining functions.
let multiply = function(a, b) {
a * b;
}
Each function returns the last executed sentence.
In the sample above, the result of multiplication will be returned.
Functions can be assigned to variables, passed as arguments to other
functions, or can be returned by them.
Closures also works.
let newAdder = function(x) {
return function (y) { return y + x; }
};
let addTwo = newAdder(2);
return addTwo(2);
This sample will produce 4
.
In Beaver we can use keywords if
and else
to work with conditionals
if (temperature > 0) {
// it's hot enough
} else {
// you can mold snowballs
}
Alternative is optional and you can you if
even like:
if (temperature < 40) {
// stop, it's really cold here
}
You can use equal operator ==
for comparing two variables/values with one type:
if (number == 1000000) {
// perform the action to congratulate the one millionth visitor
}
Not equal operator !=
gives you a way to get around:
if (column != 1) {
// ignore the column
}
List of available operators:
- Arithmetic:
+
,-
,*
,/
- Comparing:
==
,!=
,>
,<
Whether you are building an interpreter or a compiler most of the steps remain the same. The most common, basic steps are:
- Lexical Analysis
- Parsing
- Semantic Analysis
- Optimization
- Code Generation
Code - representation of commands for computers which is most suitable for human reading and writing.
First step of building a compiler - performing lexical analysis.
Lexical analysis - process of scanning and splitting the code into small independent pieces - tokens.
Each token is associated with a literal string (lexeme) that will be used in next steps.
Literals (e.g., strings, integers, float numbers), keywords, operators are the main goals to recognize
on this step.
You can find my implementation of Lexer in ./lexer
folder.
Also we use structures from ./token
inside our parser.
During this step we are going to give some meanings to the tokens we received on the state of Lexical Analysis.
Each token is an object and it's placed into a tree data structure.
On this step we need to take care about correct language syntax. For different languages there are list of base rules: tabulation, opening and closing brackets, etc.
You can find my implementation of Prett Parser in ./parser
folder.
Also we use structures from ./ast
inside our parser.
On this stage we need to take care about correct language semantics.
As an example, we need to ensure that when we have some variable with some type and we are going to assign another type to this variable we will get an error.
On this stage we need to think about performance of our application. To do this,
we should remove overhead constructions and operations.
We can choose 1 of the options to perform this operation:
- At the stage of processing intermediate code
- At the stage of processing machine code (or other low-level representation)
We use intermediate code to produce targeted low-level code.
- Writing an interpreter in Go (Thorsten Ball)
- Compiler Construction by Niklaus Wirth (2014)
- Series of blog posts about building compilers: http://noeffclue.blogspot.ca/2014/05/compiler-part-1-introduction-to-writing.html