Department: 政治大學資訊科學系四年級 Std Num: 110703065 Name: 詹松霖
This is the final project of the NTUT compiler course, check out the Specification
[TOC]
As part of a learning project, I set out with three primary goals. First, I aimed to implemement the SSA(Static Single Assignment) Form, which is a common IR used in production ready compilers like GCC and LLVM. Second, I wanted to implement several optimization passes on SSA form to deepen my practical knowledge. Finally, I wanted to try out the programming language Zig, which is aimed to be a modern version of c.
Tokenizer: Breaks down text into smaller units called tokens.
Lexer: Built on top of tokenizer, adds additional tokens like BEGIN, END for downstream parsing tasks. Also provides a peek
method.
Parser: By leveraging the lexer, it parses the tokens into Abstract Syntax Tree.
Analyzer: Performs static analysis on the Abstract Syntax Tree with rules specified in the specification.
Before transforming to SSA Form, we have to first obtain the controlw flow graph. Each function definition and the main function would be transformed into seperate ControlFlowGraphs. In this transformation, we unwrap every Statement into Blocks and SimpleStatemts
For example, the following code would be transformed into.
if True:
print("True")
else:
print("False")
print("Done")
Image generated by generateMermaidDiagram
function.
A more complex example, like a nested for loop would be.
a = list(range(10))
for i in a:
for j in a:
print(i+j)
print("done")
SSA stands for Single Static Assignment, in this form of intermediate representation, all variables are assigned only once, and φ(···) are inserted where control flows are joined.
def foo(a, b):
a = a + 1 + b
print(a)
foo(1, 2)
The code above would be transformed to the following IR.
foo:
0: Block __foo_0_entry
a_1 = arg 0
b_1 = arg 1
tmp_1 = a_1 add 1
tmp_2 = tmp_1 add b_1
a_2 = tmp_2
print a_2
return None
For the Code generation part, I wanted to implement SelectDAG , but didn't have the time to finish before deadline, so the code generation part is quite naive and simple, though modular , but with excessive malloc every where causing serious performance issue.
foo:
0: Block __foo_0_entry
a_1 = arg 0
b_1 = arg 1
tmp_1 = a_1 add 1 <-malloc for tmp_1
tmp_2 = tmp_1 add b_1 <-malloc for tmp2
a_2 = tmp_2
print a_2
return None
By looking at the code, we can tell only ONE malloc is needed, and we can reuse the memory tmp_1 for tmp_2, but for simplicity and deadline pressure, I didn't implement a good code generation process.
I chose to implement a compiler optimization for the extension part, with the SSA Form, it's quite simple to perform a constant propagation and folding optimization in O(n). The optimization code is at https://github.com/SpeedReach/mini-python/blob/v0.1.0/src/optimization/constant_folding.zig
For example:
def foo():
a = 1 + 4
b = a * 3 + a
print(b + 30)
foo()
Before Optimization:
foo:
foo:
4294967295: Block __foo_end
next 4294967295
0: Block __foo_0_entry
tmp_1 = 1 add 4
a_1 = tmp_1
tmp_2 = a_1 mul 3
tmp_3 = tmp_2 add a_1
b_1 = tmp_3
tmp_4 = b_1 add 30
print tmp_4
return None
next 4294967295
After Optimize:
foo:
4294967295: Block __foo_end
next 4294967295
0: Block __foo_0_entry
print 50
return None
next 4294967295
We transform the for x in list
syntax into the following blocks during ControlFlowGraph transformation.
entry:
i = 0
n = len(list)
forCond:
if (i < n)
goto forBody
else:
goto forExit
forBody
x = list[i]
....
goto forCond
forExit
....
Global variables need special handling in SSA form, because while we treat variables as immutable in SSA form, but a global value may change between function calls.
So we do not try to make global variables in a immutable form like other variables in SSA, instead we use a load
and store
instruction for it.
A demonstration of the SSA form is as following.
def update():
a = a + 1
def read():
print(a)
a=5
update()
read()
Main:
4294967295: Block main_end
next 4294967295
0: Block main_0_entry
store 5 to a
tmp_1 = update()
tmp_2 = tmp_1
tmp_3 = read()
tmp_4 = tmp_3
next 4294967295
read:
4294967295: Block __read_end
next 4294967295
0: Block __read_0_entry
tmp_1 = load a
print tmp_1
return None
update:
4294967295: Block __update_end
next 4294967295
0: Block __update_0_entry
tmp_1 = load a
tmp_2 = tmp_1 add 1
store tmp_2 to a
return None
In code generation, we store all global variables in the .bss section.
After acquiring the ControlFlowGraph IR, we have to transform it in to SSA Form. SSA form is an intermediate representation of code where each variable is assigned exactly once. In SSA form, instead of reusing variables, each assignment creates a new version of the variable, typically denoted with a subscript number. This makes it easier to track data flow and perform optimizations.
A key challenge in SSA form is handling control flow, particularly when variables might have different values coming from different paths. This is solved using a special "phi function" (φ) that selects the appropriate value based on which path was taken.
For example:
def a():
b = 0
if True:
b = 1
print(b)
a:
4294967295: Block __a_end
next 4294967295
0: Block __a_0_entry
b_1 = 0
next 1
1: Block __a_1_ifCondBlock
if true
then 2
else 3
2: Block __a_2_ifBody
b_2 = 1
next 3
3: Block __a_3ifExit
b_3 = Phi(b_1 from 1, b_2 from 2, )
print b_3
return None
So in order to generate the SSA Form, we have to achieve the following targets.
- Find where to insert (φ)
- Rename Variables
I chose the method in the book Cooper, Keith D., and Linda Torczon. *Engineering a Compiler*. 2nd ed., Morgan Kaufmann, 2011.
Chap 9.
A detailed explaination can be found in the section 9.3 STATIC SINGLE-ASSIGNMENT FORM
.
In short, dominance plays an important role in SSA Form generation.
-
We first compute Dominance Set for each node.
-
Then we compute the Dominance Tree with the Dominance Set from step 1. The dominance tree traversed by bfs during SSA transformation, and is also used to compute the Dominance Frontier.
Definition:
Given a node n in a flow graph,
the set of nodes that strictly dominate n is given by (Dom(n) − n). The node
in that set that is closest to n is called n’s immediate dominator, denoted
IDom(n). The entry node of the flow graph has no immediate dominator.
The dominator tree of a flow graph contains every node of the flow graph.
Its edges encode the IDom sets in a simple way. If m is IDom(n), then the
dominator tree has an edge from m to n
-
Next we compute the Dominance Frontier with the Dominace Tree and ControlFlowGraph.
-
Place (φ) at appropriate spots.
The naive algorithm placed a φ-function for every variable at the start of
every join node. With dominance frontiers, the compiler can determine more
precisely where φ-functions might be needed. The basic idea is simple.
A definition of x in block b forces a φ-function at every node in df(b). Since
that φ-function is a new definition of x, it may, in turn, force the insertion of
additional φ-functions.
exec/bool3.py
andexec/bool4.py
fails to execute because I forgot about lazy evaluation forand
or
when designing the SSA form IR. The current SSA form transforms the following code to.
print(1>2 and len(1))
0: Block main_0_entry
tmp_1 = 1 gt 2
tmp_2 = len(1, )
tmp_3 = tmp_1 and tmp_2
print tmp_3
It evaluates both side on the binary operation first, then performs and
or
which makes the test fail when executing len(1).
exec/mandelbrot.py
andexec/pascal.py
fails when doing malloc, and shows the error
..Fatal glibc error: malloc.c:2599 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)
./test: 第 149 列: 591470 中止 (核心已傾印) ./a.out > out
My guess is the inefficient use of memory, causing it to reach the maximum heap size available.
exec-fail/add1.py
This test is exepected to fail during runtime, but due to constant folding, my compiler is able to detect weird behaviors like this, so the following code wouldn't even pass the compile stage.
print(1 + "foo")
syntax/bad/testfile-liste_args_invalide2-1.py
This test is expect to fail at parsing stage, which doesn't check for same name parameters in the function. Given the sytax in the specification, I fail to see what makes this code fail during parsing stage, this code seems fine to me.
def a (x,x,) : 1
1
typing/bad/testfile-list-6.py
This test is expect to fail at type checking stage. I also can't find a reason that this test should fail.
def f():
return x
x = 1
In the specification mentioned:
The scope of variables is statically defined. A variable is either local to a function or global.
A local variable x is introduced either as a function parameter or via an assignment x = e
anywhere in the function body. The scope of a local variable extends to the full body of the
function. A global variable is introduced via an assignment in the toplevel code (the code
outside of function definitions at the end of the program).