Skip to content
/ gocaml Public

🐫 Statically typed functional programming language implementation with Go and LLVM

License

Notifications You must be signed in to change notification settings

rhysd/gocaml

Repository files navigation

GoCaml 🐫

Linux and macOS Build Status Windows Build Status Coverage Status

GoCaml is a MinCaml implementation in Go using LLVM. MinCaml is a minimal subset of OCaml for educational purpose. It is statically-typed and compiled into a binary. (spec)

This project aims my practices for understanding type inference, closure transform and introducing own intermediate language (IL) to own language.

Example:

let rec gcd m n =
  if m = 0 then n else
  if m <= n then gcd m (n - m) else
  gcd n (m - n) in
print_int (gcd 21600 337500)

You can see more examples. (e.g. Brainfxxk interpreter, N-Queens puzzle)

Tasks

  • Lexer -> (doc)
  • Parser with goyacc -> (doc)
  • Alpha transform (doc)
  • Type inference (Hindley Milner monomorphic type system) -> (doc)
  • GoCaml intermediate language (GCIL) (doc)
  • K normalization from AST into GCIL (doc)
  • Closure transform (doc)
  • Code generation (LLVM IR, assembly, object, executable) using LLVM (doc)
  • LLVM IR level optimization passes
  • Garbage collection with Boehm GC
  • Debug information (DWARF) using LLVM's Debug Info builder

Difference from Original MinCaml

  • MinCaml assumes external symbols' types are int when it can't be inferred. GoCaml does not have such an assumption. GoCaml assumes unknown return type of external functions as () (void in C), but in other cases, falls into compilation error. When you use nested external functions call, you need to clarify the return type of inner function call. For example, when f in g (f ()) returns int, you need to show it like g ((f ()) + 0). Note that this pitfall does not occur for built-in functions because a compiler knows their types.
  • MinCaml allows - unary operator for float literal. So for example -3.14 is valid but -f (where f is float) is not valid. GoCaml does not allow - unary operator for float values totally. You need to use -. unary operator instead (e.g. -.3.14).
  • GoCaml adds more operators. * and / for integers, && and || for booleans.
  • GoCaml has string type. String value is immutable and used with slices.
  • GoCaml does not have Array.create, which is an alias to Array.make. Array.length is available to obtain the size of array.
  • Some useful built-in functions are added (described in below section).

Language Spec

Program

Program is represented as one expression which MUST be evaluated as unit type. So () is the smallest program for GoCaml.

Sequence Expression

Sequenced program can be represented by joining multiple expressons with ;.

e1; e2; e3; e4

In above program, expressions are evaluated in order of e1 -> e2 -> e3 -> e4 and the sequenced expression is evaluated to the value of e4. Program must be evaluated to unit type, so the e4 expression must be evaluated to () (unit value).

Comments

There is a block comment syntax. It starts with (* and ends with *). Any comment must be closed with *), otherwise it falls into syntax error.

(*
   This is comment.
*)

Constants

There are unit, integer, boolean, float and string constants.

(* integer *)
42;

(* float *)
3.0;
3.14e+10;
3.14e-10;
1.;

(* boolean *)
true;
false;

(* string *)
"hello, world";
"contains\tescapes\n";

(* only one constant which is typed to unit *)
()

Show values

print_* and println_* built-in functions are available to output values to stdout.

print_int 42;
println_bool true

Please see 'Built-in functions' section below for more detail.

Unary operators

You can use some unary prefixed operators.

-42;

(* GoCaml distinguishes float and int in operators. -. is a float version of - *)
-.3.14;

not true;

()

Arithmetic binary operators

As mentioned above, GoCaml distinguishes int and float in operators. Operators for float values are suffixed by . (dot).

(* integer calcuration *)
1 + 2;
1 - 2;
1 * 2;
1 / 2;

(* float calcuration *)
1.0 + 2.0;
1.0 - 2.0;
1.0 * 2.0;
1.0 / 2.0;

()

Integer operators must have integer values as their operands. And float operators must have float values as their operands. There is no implicit conversion. You need to convert explicitly by using built-in functions (e.g. 3.14 +. (int_to_float 42)).

Note that strings don't have any operators for concatenating two strings or slicing sub string. They can be done with str_concat and str_sub built-in functions (See 'Built-in Functions' section).

Relational operators

Equal operator is = (NOT ==), Not-equal operator is <>. Compare operators are the same as C (<, <=, > and >=).

42 = 42; (* => true *)
42 <> 42; (* => false *)

3.14 > 2.0;
1.0 < 3.0;
2.0 >= 2.0;
1.0 <= 3.0;

()

Tuples (described below) and strings can be compared with = or <>, but cannot be compared with <, <=, > and >=. Arrays (described below) cannot be compared directly with any compare operators. You need to compare each element explicitly.

Logical operators

&& and || are available for boolean values.

println_bool (true || false && false || false)

Variable

let expression binds some value to a variable.

(* 42 is bound to a *)
let a = 42 in
(* You can use the variable as value in any expression *)
a + 4;

()

The syntax is let {name} = {e1} in {e2}. e2 will be evaluated where e1 is bound to name. By chain let, you can define multiple variables.

let pi = 3.14.1592 in
let r = 2 in
let area = r *. r *. pi in
print_float area

And you can redefine the same name variable as already-defined ones.

let a = 42 in
println_int a;
let a = true in
println_bool a

The first a and the second a are different variable. Second one just shadows first one. So you can always redefine any variable names as any type. Shadowed variable can be no longer referred.

Functions are first-class object in GoCaml. So you can also bind functions to variable as value.

let rec hello _ = println_str "hello" in
let f = hello in

(* Shows "helllo" *)
f ();

(* Binds external function *)
let p = println_str in

(* Shows "hi" *)
p "hi"

Functions

let rec is a keyword to define a function. Syntax is let rec name params... = e1 in e2 where function name is defined as e1 and then e2 will be evaluated. f a b c is an expression to apply function f with argument a, b and c. As long as the argument is simple, you don't need to use ().

Note that, if you use some complecated expression (for example, binary operators), you need to use () like f (a+b) c. If you specify f a + b c, it would be parsed as (f a) + (b c).

let rec f a b c = a + b + c in
let d = f 10 20 30 in

(* Output: 60 *)
println_int d

You can make a recursive function as below.

let rec fib x =
    if x <= 1 then 1 else
    fib (x - 1) + fib (x - 2)
in
println_int (fib 10)

Functions can be nested.

let rec sqrt x =
    let rec abs x = if x > 0.0 then x else -.x in
    let rec go z p =
        if abs (p -. z) <= 0.00001 then z else
        let (p, z) = z, z -. (z *. z -. x) /. (2.0 *. z) in
        go z p
    in
    go x 0.0
in
println_float (sqrt 10.0)

(* Error because of out of scope: go 10.0 0.0 *)

In above example, abs and go is nested in sqrt. Nested function is a hidden implementation of the outer function because inner scope is not visible from outside.

Functions can capture any outer variables (=environment). Functions which captured outer environment are called 'closure'. As many functional languages or modern languages, GoCaml has closure functions.

(* Define variable *)
let pi = 3.14 in

(* Captures outer defined variable 'pi' into its body *)
let rec circle r = r *. r *. pi in

(* Invoke closure *)
println_float (circle 10.0)

Below is a bit more complecated example:

let rec make_adder x =
    let z = 1 in
    let rec f y = x + y + z in
    f
in
let add = make_adder 3 in

(* Output: 104 *)
println_int (add 100)

Here, inner function f captures hidden variable special_value. make_special_value_adder returns a closure which captured the variable.

Tuples

N-elements tuple can be created with comma-separated expression e1, e2, ..., en. Element of tuple can be extracted with let expression.

(* (int, bool, string) is bound to t *)
let t = 1, true, "aaa" in

(* Destructuring tuple with `let` expression *)
let i, b, s = t in

let rec fst pair = let x, _ = pair in x in

(* Show '42' *)
println_int (fst (42, true))

Arrays

Array can be created with Array.make size elem where created array is allocated with size elemens and all elements are initialized as elem.

arr.(idx) accesses to the element of array where arr is an array and idx is an integer. And arr.(idx) <- val updates the idxth element to val.

(* Make boolean array whose size is 42 *)
let arr = Array.make 42 true in

(* Output: true *)
println_bool arr.(8)

(* Update element *)
arr.(8) <- false;

(* Output: false *)
println_bool arr.(8)

Note that arrays are NOT immutable because of performance (GoCaml doesn't have persistentarray). e1.(e2) <- e3 is always evaluated to () and updates the element destructively. Accessing to out of bounds of arrays causes undefined behavior.

External symbols

All symbols which are not defined but used are treated as external symbols. External symbol means extern names in C. So you have responsibility to define the symbols in other object which will be linked to executable. Please see below 'How to Work with C' section to know how to do that.

Note that all external symbols' types MUST be determined by type inference. Unknown type symbol causes compilation error.

(* This causes compile error because type of 'x' is unknown *)
x;

(* Type of 'y' is known as 'int' because it's passed to argument of `println_int` *)
println_int y;

(* When return type of function is (), it is treated as void in C *)
some_func ();

(* Below 'pow' function is inferred as float -> float -> float *)
print_float (pow 3.0 1.0 2.0)

Prerequisities

  • Go 1.2+ (Go 1.7+ is recommended)
  • GNU make
  • Clang
  • cmake (for building LLVM)
  • Git

Installation

Linux or macOS

For Linux or macOS, below commands build gocam binary at root of the repository.

$ go get -d github.com/rhysd/gocaml
$ cd $GOPATH/src/github.com/rhysd/gocaml

# Full-installation with building LLVM locally
$ make

Using USE_SYSTEM_LLVM=true, it will build gocaml binary with system-installed LLVM libraries. Note that it still clones LLVM repository to obtain Go bindings of llvm-c.

# Use system-installed LLVM. You need to install LLVM in advance (see below)
$ USE_SYSTEM_LLVM=true make

To use USE_SYSTEM_LLVM, you need to install LLVM 4.0.0 with system's package manager in advance.

If you use Debian-family Linux, use LLVM apt repository or download LLVM official binary.

$ sudo apt-get install libllvm4.0 llvm-4.0-dev
$ export LLVM_CONFIG=llvm-config-4.0

If you use macOS, use Homebrew. GoCaml's installation script will automatically detect LLVM installed with Homebrew.

$ brew install llvm

And you need to install libgc as dependency.

# On Debian-family Linux
$ sudo apt-get install libgc-dev

# On macOS
$ brew install bdw-gc

Windows

Currently Windows is not well-supported. You need to clone LLVM repository to $GOPATH/src/llvm.org/ and build Go bindings of llvm-c following the instruction. It needs cmake command and C++ toolchain.

It also needs to build libgc static library and put it to library path.

After installing goyacc, generate a parser code with it.

$ goyacc -o parser/grammar.go parser/grammar.go.y

Finally you can build gocaml binary with go build.

Usage

gocaml command is available to compile sources. Please refer gocaml -help.

Usage: gocaml [flags] [file]

  Compiler for GoCaml.
  When file is given as argument, compiler will compile it. Otherwise, compiler
  attempt to read from STDIN as source code to compile.

Flags:
  -asm
    	Emit assembler code to stdout
  -ast
    	Show AST for input
  -externals
    	Display external symbols
  -g	Compile with debug information
  -gcil
    	Emit GoCaml Intermediate Language representation to stdout
  -help
    	Show this help
  -ldflags string
    	Flags passed to underlying linker
  -llvm
    	Emit LLVM IR to stdout
  -obj
    	Compile to object file
  -opt int
    	Optimization level (0~3). 0: none, 1: less, 2: default, 3: aggressive (default -1)
  -show-targets
    	Show all available targets
  -target string
    	Target architecture triple
  -tokens
    	Show tokens for input

Compiled code will be linked to small runtime. In runtime, some functions are defined to print values and it includes <stdlib.h> and <stdio.h>. So you can use them from GoCaml codes.

gocaml uses clang for linking objects by default. If you want to use other linker, set $GOCAML_LINKER_CMD environment variable to your favorite linker command.

Program Arguments

You can access to program arguments via special global variable argv. argv is always defined before program starts.

print_str "argc: "; println_int (Array.length argv);
print_str "prog: "; println_str (argv.(0))

Built-in Functions

Built-in functions are defined as external symbols.

  • print_int : int -> ()
  • print_bool : bool -> ()
  • print_float : float -> ()
  • print_str : string -> ()

Output the value to stdout.

  • println_int : int -> ()
  • println_bool : bool -> ()
  • println_float : float -> ()
  • println_str : string -> ()

Output the value to stdout with newline.

  • float_to_int : float -> int
  • int_to_float : int -> float
  • int_to_str : int -> string
  • str_to_int : string -> int
  • float_to_str : float -> string
  • str_to_float : string -> float

Convert between float and int, string and int, float and int.

  • str_length : string -> int

Return the size of string.

  • str_concat : string -> string -> string

Concat two strings as a new allocated string because strings are immutable in GoCaml.

  • str_sub : string -> int -> int -> string

Returns substring of first argument. Second argument is an index to start and Third argument is an index to end. Returns string slice [start, end) so it does not cause any allocation.

  • get_line : () -> string
  • get_char : () -> string

Get user input by line or character and return it as string.

  • to_char_code : string -> int
  • from_char_code : int -> string

Covert between a character and integer. First character of string is converted into integer and integer is converted into one character string.

  • do_garbage_collection : () -> ()
  • enable_garbage_collection : () -> ()
  • disable_garbage_collection : () -> ()

These functions control behavior of GC. do_garbage_collection runs GC with stopping the world. enable_garbage_collection/disable_garbage_collection starts/stops GC. (GC is enabled by default)

  • bit_and : int -> int -> int
  • bit_or : int -> int -> int
  • bit_xor : int -> int -> int
  • bit_rsft : int -> int -> int
  • bit_lsft : int -> int -> int
  • bit_inv : int -> int

Built-in functions instead of bitwise operators.

  • time_now : () -> int

Returns epoch time in seconds.

How to Work with C

All symbols not defined in source are treated as external symbols. So you can define it in C source and link it to compiled GoCaml code after.

Let's say to write C code.

// gocaml.h is put in runtime/ directory. Please add it to include directory path.
#include "gocaml.h"

gocaml_int plus100(gocaml_int const i)
{
    return i + 100;
}

Then compile it to an object file:

$ clang -Wall -c plus100.c -o plus100.o

Then you can refer the function from GoCaml code:

println_int (plus100 10)

println_int is a function defined in runtime. So you don't need to care about it.

Finally comile the GoCaml code and the object file together with gocaml compiler. You need to link .o file after compiling GoCaml code by passing the object file to -ldflags.

$ gocaml -ldflags plus100.o test.ml

After the command, you can find test executable. Executing by ./test will show 110.

Cross Compilation

For example, let's say to want to make an x86 binary on x86_64 Ubuntu.

$ cd /path/to/gocaml
$ make clean
# Install gcc-multilib
$ sudo apt-get install gcc-4.8-multilib

Then you need to know target triple string for the architecture compiler will compile into. The format is {name}-{vendor}-{sys}-{abi}. (ABI might be omitted)

You can know all the supported targets by below command:

$ ./gocaml -show-targets

Then you can compile source into object file for the target.

# Create object file for specified target
$ ./gocaml -obj -target i686-linux-gnu source.ml

# Compile runtime for the target
CC=gcc CFLAGS=-m32 make ./runtime/gocamlrt.a

Finally link object files into one executable binary by hand.

$ gcc -m32 -lgc source.o ./runtime/gocamlrt.a

About

🐫 Statically typed functional programming language implementation with Go and LLVM

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •