Skip to content

Commit

Permalink
Added more information about phases of compiler (freeCodeCamp#1360)
Browse files Browse the repository at this point in the history
  • Loading branch information
79man authored and Bouncey committed Oct 25, 2017
1 parent ad55721 commit 4522042
Showing 1 changed file with 42 additions and 8 deletions.
50 changes: 42 additions & 8 deletions src/pages/computer-science/compilers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,55 @@ title: Compilers
---
## Compilers

Compilers are a kind of translator. We write the source code in JavaScript, Python, and other languages. Then the compiler takes the code and converts it to code the computer understands.
### Programming
At its heart, a barebones computer (aka a stored program computer) is nothing but a machine which knows how to read steps written in a fixed instruction set and execute the same. The set of instructions a computer understands is very specific to it. This is also known as machine language (**opcodes**). Machine Language is often alos referred to as Bianry Code.

This converted code is binary code, which is nothing but 1s and 0s. When you run your source code, a compiler translates all the code first, then produces the binary code. Then the computer takes the binary code and runs it.
Humans interact with computers using **Programs**. A program is simply a sequence of opcodes provided to the computer along with data that is necessary for executing the opcodes.

If there are errors in your source code, the compiler detects them. This stops the compilation process. Even after the compiler converts the code, the converted code can still fail when it's running.
For example,
```
ADD 10, 20 // ADD is the Opcode
// and 10, 20 are the two operands(data)
// needed for the ADD instruction to be executed successfully
```
Humans develop programs to solve complex problems. Looking at how simple opcodes are, if we try to develop programs using opcodes alone, it will be very cumbersome and difficult to debug. To solve this problem, high level languages like C/C++, Python, Java, Javascript, etc were developed.

<b>Parts of a compiler</b>
Now, high level languages aren't suitable for execution by computers. Hence, the need arose for a translator that can digest the high-level language programs and convert them to machine language instructions suitable for execution by a computer.

Compilation of a program can usually be broken down into several steps:
#### [HUMANS] -> [Highlevel language programs] -> [Translator] -> [Machine Language] -> [Computer]

1. In <b>lexical analysis</b>, a program called a lexer takes in the source code and produces a stream of tokens. These tokens annotate the source code with data that will be useful in later steps. For example, quoted strings in the source code may result in tokens of type 'string literal'.
A **compiler** is a type of translator program, that translates high level languages into binary code, which is nothing but 1s and 0s. When you run your source code, a compiler translates all the code first, then produces the binary code. Then the computer takes the binary code and runs it.

2. <b>Parsing</b> involves building an <b>abstract syntax tree (AST)</b>, a data structure that summarizes the source code, using the tokens output by the lexer.
If there are errors in your source code, the compiler detects and flags them. This stops the compilation process. Once all erros are fixed, the compiler converts the code and generates an executable program.

3. <b>Code generation</b> uses the AST to output code in the desired language.
## Parts of a compiler
Most compilers break down into three primary stages: Parsing, Transformation, and Code Generation

1. *Parsing* is taking raw code and turning it into a more abstract representation of the code.
2. *Transformation* takes this abstract representation and manipulates to do whatever the compiler wants it to.
3. *Code Generation* takes the transformed representation of the code and turns it into new code.

#### Parsing
Parsing typically gets broken down into two phases: **Lexical Analysis** and **Syntactic Analysis**.

*Lexical Analysis* takes the raw code and splits it apart into these things called tokens by a thing called a tokenizer (or lexer).
```
Tokens are an array of tiny little objects that describe an isolated piece of the syntax.
They could be numbers, labels, punctuation, operators, etc.
```

*Syntactic Analysis* takes the tokens and reformats them into a representation that describes each part of the syntax
and their relation to one another. This is known as an intermediate representation or Abstract Syntax Tree.
```
An Abstract Syntax Tree, or AST for short, is a deeply nested object.
It represents code in a way that is both easy to work with and tells us a lot of information.
```
#### Transformation
The next type of stage for a compiler is transformation. Again, this just takes the AST from the last step and makes changes to it.
It can manipulate the AST in the same language or it can translate it into an entirely new language.

#### Code Generation
The final phase of a compiler is code generation. Sometimes compilers will do things that overlap with transformation, but for the most part code generation just takes the AST and converts it to binary code.

All compilers need to perform these steps. Most modern compilers also carry out other steps such as checking for type errors and optimizing the resulting compiled code.

Expand Down

0 comments on commit 4522042

Please sign in to comment.