Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs
src	src
README.md	README.md
pom.xml	pom.xml

Chapter 2

In this chapter we extend the language grammar to include arithmetic operations such as addition, subtraction, multiplication, division, and unary minus. This allows us to write statements such as:

return 1 + 2 * 3 + -5;

Here is the complete language grammar for this chapter.

Extensions to Intermediate Representation

In Chapter 1 we introduced following nodes.

Node Name	Type	Description	Inputs	Value
Start	Control	Start of function	None	None for now as we do not have function arguments yet
Return	Control	End of function	Predecessor control node, Data node value	Return value of the function
Constant	Data	Constants such as integer literals	None, however Start node is set as input to enable graph walking	Value of the constant

We extend the set of nodes by adding following additional node types.

Node Name	Type	Description	Inputs	Value
Add	Data	Add two values	Two data nodes, values are added, order not important	Result of the add operation
Sub	Data	Subtract a value from another	Two data nodes, values are subtracted, order matters	Result of the subtract
Mul	Data	Multiply two values	Two data nodes, values are multiplied, order not important	Result of the multiply
Div	Data	Divide a value by another	Two data nodes, values are divided, order matters	Result of the division
Minus	Data	Negate a value	One data node, value is negated	Result of the unary minus

Peephole Optimizations

Nodes in the graph can be peephole-optimized. The graph is viewed through a "peephole", a small chunk of graph, and if a certain pattern is detected we locally rewrite the graph.

During parsing, these peephole optimizations are particularly easy to check and apply: there are no uses (yet) of a just-created Node from a just-parsed piece of syntax, so there's no effort to the "rewrite" part of the problem. We just replace in-place before installing Nodes into the graph.

This replacement might allow us to kill the unused inputs from the replaced Node, basically doing a modest Dead Code Elimination during parsing.

E.g. Suppose we already parsed out a constant 1, and a constant 2; then when we parse an Add(1,2), the peephole rule for constant math replaces the Add with a constant 3. At this point, we also kill the unused Add, which recursively may kill the unused constants 1 and 2.

Constant Folding and Constant Propagation

In this chapter and next we focus on a particular peephole optimization: constant folding and constant propagation. Since we do not have non-constant values until Chapter 4, the main feature we demonstrate now is constant folding. However, we introduce some additional ideas into the compiler at this stage, to set the scene for Chapter 4.

It is useful for the compiler to know at various points of the program whether a node's value is a constant. The compiler can use this knowledge to perform various optimizations such as:

Evaluate expressions at compile time and replace an expression with a constant.
The compiler may be able to identify regions of code that are dead and no longer needed, such as when a conditional branch always take one of the branches (Chapter 6).
Pointers may be known to be not-null, and a null check can be skipped (Chapter 10).
Array indices may be known to be in-range, and a range check can be skipped.
Many additional optimizations are possible when the compiler learns more about the possible set of Node values.

In order to achieve above, we annotate Nodes with Types.

The Type annotation serves two purposes:

it defines the set of operations allowed on the Node
it defines the set of values the Node takes on

The type itself is identified by the Java class sub-typing relationship; all types are subtypes of the class Type. For now, we only have the following hierarchy of types:

Type
+-- TypeInteger

It turns out that the set of values associated with a Type at a specific Node can be conveniently represented as a "lattice". Our lattice has the following structure:

Our lattice elements can be one of three types:

The highest element is "top", denoted by T; assigning T means that the Node's value may or may not be a compile time constant.
All elements in the middle are constants.
The lowest is "bottom", denoted by ⊥; assigning ⊥ means that we know that the Node's value is not a compile time constant.

An invariant of peephole optimizations is that the type of a Node always moves down the lattice.

In later chapters we will explore extending this lattice, as it frequently forms the heart of core optimizations we want our compiler to do.

There are other important properties of the Lattice that we discuss in Chapter 4 and Chapter 10, such as the "meet" and "join" operators and their rules.

Nodes Pre Peephole Optimization

The following visual shows how the graph looks like pre-peephole optimization:

Control nodes appear as square boxes with yellow background
Control edges are in bold red
The edges from Constants to Start are shown in dotted lines as these are not true control edges
We label each edge with its position in the node's list of inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter02

chapter02

README.md

Chapter 2

Extensions to Intermediate Representation

Peephole Optimizations

Constant Folding and Constant Propagation

Nodes Pre Peephole Optimization

Post-peephole

Files

chapter02

Directory actions

More options

Directory actions

More options

Latest commit

History

chapter02

Folders and files

parent directory

README.md

Chapter 2

Extensions to Intermediate Representation

Peephole Optimizations

Constant Folding and Constant Propagation

Nodes Pre Peephole Optimization

Post-peephole