Skip to content

dmytro-feshchenko/compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status codecov Issue Count

Building a compiler in Go programming language

This project was developed for building a compiler for own programming language (Beaver language).
Note: I use a lot of ideas and boilerplate from awesome book "Writing an interpreter in Go"(Thorsten Ball) to create my own compiler. To see other resources I used within the project, see the resources section.

Done features

Lexer:

  • Parses all the input until the end of the input
  • Can recognize identifiers: e.g. hello_world
  • Ignores whitespaces
  • Added numbers (integers only): e.g. 246
  • Added keywords: let, function
  • Added operators: +, -, *, /, <, >, |
  • Added comparison and logical operators: ==, !=
  • Brackets and parenthesis support: {}, ()
  • Added keywords: if, else, return
  • Recognizes comma and semicolon: ,, ;
  • Recognizes EOF
  • All not recognized symbols are ILLEGAL tokens: e.g. $
  • Works with strings: "hello"

Parser

We used "top down operator precedence" parser, also known as "Pratt parser"

Features
  • Parsing let expressions: let a = 10;
  • Parsing return statements: return 500;
  • Parsing int expressions: 5;
  • Prefix operators: <prefix operator><expression>;, -10;, !true;
    Supports two operators: ! and -
  • Infix operators <expression><infix operator><expression>, e.g. 5 + 10, 2 - 8
    Supports 8 operators: +, -, *, /, <, >, ==, !=
  • Working with operations precedences
  • Parsing function literals: function(x, y) {}
  • Call expressions: <expression>(<comma separated expressions>)
  • Works with strings: "hello"
Samples:
  • let statement
let foo = 27;
let bar = 50;
let foobar = foo + bar;
  • return statement
return foo;
return foo + bar;
return foobar(foo, bar);
  • prefix operators
-5;
-100;
!true;
  • infix operators
5 + 100;
5 - 6;
10 / 5;
2 * 90;
4 == 4;
3 != 6;
2 < 87;
5 > 4;

REPL for interpreter

  • Uses Beaver parser and prints given tokens from the input

Sample of REPL:

MacBook-Pro:compiler myuser$ go run main.go
Hello myuser! Starting Beaver REPL...
REPL (use Beaver commands):
beaver>>1 + 100
101
beaver>>true == false
false
beaver>>(2 + 10) * (5 + 1)
72
beaver>>if (10 > 100) { true } else { false }
false
beaver>>if (true) { 100 / 5 }
20
beaver>>let x = 10;
beaver>>x  
10
beaver>>let y = true;
beaver>>y
true
beaver>>if (x == y) { return x; } else { return y; }
true
beaver>>let hi = "hello, world"
beaver>>hi
hello, world
beaver>>let x 12 * 3

     __________
    /  _    _  \
  _/   _    _   \_
 |_|  | |  | |  |_|
  \   |_|  |_|   /
   |      _     |
   |    | | |   |
   |            |
   |____________|

Woops! Something got wrong here:
expected next token to be '=', got 'INT' instead

Evaluator:

  • Just evaluates statements and expressions in a while
  • Can evaluate integers expressions
  • Can evaluate boolean expressions
  • Can evaluate null
  • Can evaluate prefix expressions: !, -
  • Can evaluate infix expressions for integers: +, -, *, /
  • Can evaluate infix expressions for comparing: ==, !=, >, <
  • Can evaluate conditionals:
    if (conditional) { consequence } or if (conditional) { consequence } else { alternative }
  • Can evaluate return statements
  • Error Handling
  • Binding & Environments
  • Evaluates let statements (using environment)
  • Can evaluate functions calls, functions assigning
  • Closures

Types:

  • Integers
  • Booleans
  • Null
  • Strings
  • Arrays
  • Objects

Planned features

  • C-like syntax
  • Some elements of functional programming: closures, passing functions as arguments, returning function from function, assigning functions to variables
  • Types: integer, boolean
  • Conditions

Limitations

  • Only ASCII support

How to improve:

  • Add UTF-8 support (using the )
  • Add new types: float, double, string, etc.
  • Add new operators and operations
  • Consider the space as token

Getting started:

Prerequisites

To install this software, you need to install Go programming language (depends on OS you have): https://golang.org/doc/install

[Not now] Or you can install this using Docker. In this case read the docs and installation guides

Installation

Clone the project:

git clone https://github.com/technoboom/compiler

Run REPL

Run the REPL:

go run ./main.go

Testing

To run all tests in the project use command: go test ./...
To test only lexer: go test ./lexer
To test only parser: go test ./parser
To test only ast: go test ./ast
To test tokens: go test ./tokens

Quick intro into Beaver language:

Syntax:

Beaver language supports C-like syntax. Basic rules:

  • all spaces ignored (maybe will be improved in the future)
  • each sentence should contain semicolon at the end of the line

Variables:

You can define new variable using let keyword.

let x = 10;
let y = true;

You can use English letters and underscore inside variable identifiers

let arabica_coffee = 95;
let _strength_percent = 50;
let camelCase = true;
let UpperCamelCase = false;

Currently, Beaver language has 4 data types:

  • integers: let myInt = 1000;
  • booleans: let myBool = false;
  • strings: let str = "hello, my dear friend"
  • null

Functions

The keyword function used for defining functions.

let multiply = function(a, b) {
    a * b;
}

Each function returns the last executed sentence. In the sample above, the result of multiplication will be returned.
Functions can be assigned to variables, passed as arguments to other functions, or can be returned by them.
Closures also works.

let newAdder = function(x) {
    return function (y) { return y + x; }
};

let addTwo = newAdder(2);
return addTwo(2);

This sample will produce 4.

Conditions

In Beaver we can use keywords if and else to work with conditionals

if (temperature > 0) {
    // it's hot enough
} else {
    // you can mold snowballs
}

Alternative is optional and you can you if even like:

if (temperature < 40) {
    // stop, it's really cold here
}

Operators

You can use equal operator == for comparing two variables/values with one type:

if (number == 1000000) {
    // perform the action to congratulate the one millionth visitor
}

Not equal operator != gives you a way to get around:

if (column != 1) {
    // ignore the column
}

List of available operators:

  • Arithmetic: +, -, *, /
  • Comparing: ==, !=, >, <

Intro into building compiler/interpreter:

Whether you are building an interpreter or a compiler most of the steps remain the same. The most common, basic steps are:

  1. Lexical Analysis
  2. Parsing
  3. Semantic Analysis
  4. Optimization
  5. Code Generation

1. Lexical Analysis

Code - representation of commands for computers which is most suitable for human reading and writing.
First step of building a compiler - performing lexical analysis. Lexical analysis - process of scanning and splitting the code into small independent pieces - tokens.
Each token is associated with a literal string (lexeme) that will be used in next steps.
Literals (e.g., strings, integers, float numbers), keywords, operators are the main goals to recognize on this step.
You can find my implementation of Lexer in ./lexer folder.
Also we use structures from ./token inside our parser.

2. Parsing

During this step we are going to give some meanings to the tokens we received on the state of Lexical Analysis.
Each token is an object and it's placed into a tree data structure. On this step we need to take care about correct language syntax. For different languages there are list of base rules: tabulation, opening and closing brackets, etc.

Pratt Parser

You can find my implementation of Prett Parser in ./parser folder.
Also we use structures from ./ast inside our parser.

3. Semantic Analysis

On this stage we need to take care about correct language semantics.
As an example, we need to ensure that when we have some variable with some type and we are going to assign another type to this variable we will get an error.

4. Optimization

On this stage we need to think about performance of our application. To do this, we should remove overhead constructions and operations.
We can choose 1 of the options to perform this operation:

  1. At the stage of processing intermediate code
  2. At the stage of processing machine code (or other low-level representation)

5. Code Generation

We use intermediate code to produce targeted low-level code.

Resources:

  1. Writing an interpreter in Go (Thorsten Ball)
  2. Compiler Construction by Niklaus Wirth (2014)
  3. Series of blog posts about building compilers: http://noeffclue.blogspot.ca/2014/05/compiler-part-1-introduction-to-writing.html

About

Building a compiler in Go programming language

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published