Skip to content

Latest commit

 

History

History
333 lines (282 loc) · 11.4 KB

Report.md

File metadata and controls

333 lines (282 loc) · 11.4 KB

Mini Python Compiler

Department: 政治大學資訊科學系四年級 Std Num: 110703065 Name: 詹松霖

This is the final project of the NTUT compiler course, check out the Specification

[TOC]

Purpose

As part of a learning project, I set out with three primary goals. First, I aimed to implemement the SSA(Static Single Assignment) Form, which is a common IR used in production ready compilers like GCC and LLVM. Second, I wanted to implement several optimization passes on SSA form to deepen my practical knowledge. Finally, I wanted to try out the programming language Zig, which is aimed to be a modern version of c.

Overview

mini-python.drawio

Components

Tokenizer: Breaks down text into smaller units called tokens. Lexer: Built on top of tokenizer, adds additional tokens like BEGIN, END for downstream parsing tasks. Also provides a peek method. Parser: By leveraging the lexer, it parses the tokens into Abstract Syntax Tree. Analyzer: Performs static analysis on the Abstract Syntax Tree with rules specified in the specification.

Intermediate Representations

Control Flow Graph IR

Before transforming to SSA Form, we have to first obtain the controlw flow graph. Each function definition and the main function would be transformed into seperate ControlFlowGraphs. In this transformation, we unwrap every Statement into Blocks and SimpleStatemts

For example, the following code would be transformed into.

if True:
    print("True")
else:
    print("False")
print("Done")

Image generated by generateMermaidDiagram function.

A more complex example, like a nested for loop would be.

a = list(range(10))
for i in a:
    for j in a:
        print(i+j)
print("done")

SSA Form IR

SSA stands for Single Static Assignment, in this form of intermediate representation, all variables are assigned only once, and φ(···) are inserted where control flows are joined.

def foo(a, b):
    a = a + 1 + b
    print(a)

foo(1, 2)

The code above would be transformed to the following IR.

foo:
0: Block __foo_0_entry
    a_1 = arg 0
    b_1 = arg 1
    tmp_1 = a_1 add 1
    tmp_2 = tmp_1 add b_1
    a_2 = tmp_2
    print a_2
    return None

Code Generation

For the Code generation part, I wanted to implement SelectDAG , but didn't have the time to finish before deadline, so the code generation part is quite naive and simple, though modular , but with excessive malloc every where causing serious performance issue.

foo:
0: Block __foo_0_entry
    a_1 = arg 0
    b_1 = arg 1
    tmp_1 = a_1 add 1       <-malloc for tmp_1
    tmp_2 = tmp_1 add b_1   <-malloc for tmp2
    a_2 = tmp_2
    print a_2
    return None

By looking at the code, we can tell only ONE malloc is needed, and we can reuse the memory tmp_1 for tmp_2, but for simplicity and deadline pressure, I didn't implement a good code generation process.

Extension

I chose to implement a compiler optimization for the extension part, with the SSA Form, it's quite simple to perform a constant propagation and folding optimization in O(n). The optimization code is at https://github.com/SpeedReach/mini-python/blob/v0.1.0/src/optimization/constant_folding.zig

For example:

def foo():
    a = 1 + 4
    b = a * 3 + a
    print(b + 30)

foo()

Before Optimization:

foo:
foo:
4294967295: Block __foo_end
    next 4294967295
0: Block __foo_0_entry
    tmp_1 = 1 add 4
    a_1 = tmp_1
    tmp_2 = a_1 mul 3
    tmp_3 = tmp_2 add a_1
    b_1 = tmp_3
    tmp_4 = b_1 add 30
    print tmp_4
    return None
    next 4294967295

After Optimize:

foo:
4294967295: Block __foo_end
    next 4294967295
0: Block __foo_0_entry
    print 50
    return None
    next 4294967295

Detail Implementation

For in syntax

We transform the for x in list syntax into the following blocks during ControlFlowGraph transformation.

entry:
    i = 0
    n = len(list)
forCond:
    if (i < n)
        goto forBody
    else:
        goto forExit
forBody
    x = list[i]
    ....
    goto forCond
forExit
....

Global Variables

Global variables need special handling in SSA form, because while we treat variables as immutable in SSA form, but a global value may change between function calls. So we do not try to make global variables in a immutable form like other variables in SSA, instead we use a load and store instruction for it. A demonstration of the SSA form is as following.

def update():
    a = a + 1

def read():
    print(a)

a=5
update()
read()
Main: 
4294967295: Block main_end
    next 4294967295
0: Block main_0_entry
    store 5 to a
    tmp_1 = update()
    tmp_2 = tmp_1
    tmp_3 = read()
    tmp_4 = tmp_3
    next 4294967295


read:
4294967295: Block __read_end
    next 4294967295
0: Block __read_0_entry
    tmp_1 = load a
    print tmp_1
    return None


update:
4294967295: Block __update_end
    next 4294967295
0: Block __update_0_entry
    tmp_1 = load a
    tmp_2 = tmp_1 add 1
    store tmp_2 to a
    return None

In code generation, we store all global variables in the .bss section.

SSA Transformation

After acquiring the ControlFlowGraph IR, we have to transform it in to SSA Form. SSA form is an intermediate representation of code where each variable is assigned exactly once. In SSA form, instead of reusing variables, each assignment creates a new version of the variable, typically denoted with a subscript number. This makes it easier to track data flow and perform optimizations.

A key challenge in SSA form is handling control flow, particularly when variables might have different values coming from different paths. This is solved using a special "phi function" (φ) that selects the appropriate value based on which path was taken.

For example:

def a():
    b = 0
    if True:
        b = 1
    print(b)
a:
4294967295: Block __a_end
    next 4294967295
0: Block __a_0_entry
    b_1 = 0
    next 1
1: Block __a_1_ifCondBlock
    if true
    then 2
    else 3
2: Block __a_2_ifBody
    b_2 = 1
    next 3
3: Block __a_3ifExit
    b_3 = Phi(b_1 from 1, b_2 from 2, )
    print b_3
    return None

So in order to generate the SSA Form, we have to achieve the following targets.

  1. Find where to insert (φ)
  2. Rename Variables

I chose the method in the book Cooper, Keith D., and Linda Torczon. *Engineering a Compiler*. 2nd ed., Morgan Kaufmann, 2011. Chap 9.

A detailed explaination can be found in the section 9.3 STATIC SINGLE-ASSIGNMENT FORM.

In short, dominance plays an important role in SSA Form generation.

  1. We first compute Dominance Set for each node. image

  2. Then we compute the Dominance Tree with the Dominance Set from step 1. The dominance tree traversed by bfs during SSA transformation, and is also used to compute the Dominance Frontier.

Definition:

Given a node n in a flow graph,
the set of nodes that strictly dominate n is given by (Dom(n) − n). The node
in that set that is closest to n is called n’s immediate dominator, denoted
IDom(n). The entry node of the flow graph has no immediate dominator.
The dominator tree of a flow graph contains every node of the flow graph.
Its edges encode the IDom sets in a simple way. If m is IDom(n), then the
dominator tree has an edge from m to n
  1. Next we compute the Dominance Frontier with the Dominace Tree and ControlFlowGraph. image

  2. Place (φ) at appropriate spots.

The naive algorithm placed a φ-function for every variable at the start of
every join node. With dominance frontiers, the compiler can determine more
precisely where φ-functions might be needed. The basic idea is simple.
A definition of x in block b forces a φ-function at every node in df(b). Since
that φ-function is a new definition of x, it may, in turn, force the insertion of
additional φ-functions.
  1. Finally, we rename the variables. image

Failing test case analysis

  1. exec/bool3.py and exec/bool4.py fails to execute because I forgot about lazy evaluation for and or when designing the SSA form IR. The current SSA form transforms the following code to.
print(1>2 and len(1))
0: Block main_0_entry
    tmp_1 = 1 gt 2
    tmp_2 = len(1, )
    tmp_3 = tmp_1 and tmp_2
    print tmp_3

It evaluates both side on the binary operation first, then performs and or which makes the test fail when executing len(1).

  1. exec/mandelbrot.py and exec/pascal.py fails when doing malloc, and shows the error
..Fatal glibc error: malloc.c:2599 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)
./test: 第 149 列: 591470 中止                  (核心已傾印) ./a.out > out

My guess is the inefficient use of memory, causing it to reach the maximum heap size available.

  1. exec-fail/add1.py This test is exepected to fail during runtime, but due to constant folding, my compiler is able to detect weird behaviors like this, so the following code wouldn't even pass the compile stage.
print(1 + "foo")
  1. syntax/bad/testfile-liste_args_invalide2-1.py This test is expect to fail at parsing stage, which doesn't check for same name parameters in the function. Given the sytax in the specification, I fail to see what makes this code fail during parsing stage, this code seems fine to me.
def a (x,x,) : 1
1

image

  1. typing/bad/testfile-list-6.py This test is expect to fail at type checking stage. I also can't find a reason that this test should fail.
def f():
    return x
x = 1

In the specification mentioned:

The scope of variables is statically defined. A variable is either local to a function or global.
A local variable x is introduced either as a function parameter or via an assignment x = e
anywhere in the function body. The scope of a local variable extends to the full body of the
function. A global variable is introduced via an assignment in the toplevel code (the code
outside of function definitions at the end of the program).