Added 2019 Cheng Li's compiler course labs

xphuhu · Apr 24, 2020 · d76715c · d76715c
1 parent 3236500
commit d76715c
Show file tree

Hide file tree

Showing 73 changed files with 12,682 additions and 1 deletion.
diff --git a/编译原理和技术/README.md b/编译原理和技术/README.md
@@ -1,4 +1,4 @@
 # 课程主页
 - [张昱-编译H](http://staff.ustc.edu.cn/~yuzhang/compiler/)
-- [李诚-编译](http://staff.ustc.edu.cn/~chengli7/courses/compiler18/)
+- [李诚-编译](http://staff.ustc.edu.cn/~chengli7/courses/compiler18/) [2019秋GitLab](http://210.45.114.30/gbxu/notice_board/issues/1)
 - [郑启龙-编译](http://staff.ustc.edu.cn/~qlzheng/compiler/)
diff --git a/编译原理和技术/labs/2019-licheng/README.md b/编译原理和技术/labs/2019-licheng/README.md
@@ -0,0 +1,23 @@
+# 2019秋李诚老师编译原理Labs
+
+古宜民 17少 (https://github.com/ustcpetergu) 
+
+队友：苏文治，朱凡
+
+ Labs for "Principles and Techniques of Compilers" course by Cheng Li, USTC 2019. 
+ `lab1_lexical_analyzer`: Lab1, lexical analyzer, using flex
+ `lab2_syntax_analyzer`: Lab2, syntax analyzer, using bison and lab1, from source code to syntax tree
+ `lab3-0`: Warmup about LLVM code generation
+ `lab3-1`: The main CMinus Compiler: syntax tree to LLVM IR
+ `lab3-2`: Source code reading report for LLVM Pass(dce and adce)
+ `lab4`: RISC-V machine code generation and execution & LLVM RegAllocFast source code reading
+
+因为2019秋的Labs很多都是基于七位助教提供的框架补充内容/继续开发，很多Tutorial和Instruction均不是我们原创，并且当时项目目录结构就比较混乱，在我们的开发后更加混乱，想要运行代码有一定难度：所以这里在保持一学期项目完整度的情况下尽可能只放了我们原创的内容，包括主代码文件，工作和讨论记录，以及实验报告。
+
+代码文件只能阅读，不具备运行条件。相比之下实验报告参考价值更大一些。每个lab的实验要求在实验报告里有总结。
+
+学期后我整理了lab1、lab2、lab3-1的代码（词法分析，语法分析，语法树到IR）成为一整个可以比较方便地运行和继续开发(立了个flag)的repo，在[ustcpetergu/CminusC](https://github.com/ustcpetergu/CminusC)。Bug肯定是有的，但应该不多了（助教的样例测试分数98/100）。
+
+该学期的课程主页：http://210.45.114.30/gbxu/notice_board/issues
+
+
diff --git a/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/CMINUS.md b/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/CMINUS.md
@@ -0,0 +1,74 @@
+# C-
+`C MINUS`是C语言的一个子集，该语言的语法在《编译原理与实践》第九章附录中有详细的介绍。
+##  Lexical Conventions
+1. 关键字  
+`else if int return void while`  
+2. 专用符号  
+`+ - * / < <= > >= == != = ; , ( ) [ ] { } /* */`  
+3. 标识符ID和整数NUM，通过下列正则表达式定义:   
+`ID=letter letter*`  
+`NUM=digit digit*`  
+`letter = a|...|z|A|...|Z`  
+`digit = 0|...|9`  
+
+4. 注释用`/*...*/`表示，可以超过一行。注释不能嵌套。  
+`/*...*/`  
+
+> 思考
+> 1. 识别C-语言Token的DFA设计
+> 2. note that: [, ], 和 [] 是三种不同的tokens。[]用于声明数组类型，[]中间不得有空格。
+
+## Syntax
+1. program → declaration-list
+2. declaration-list → declaration-list declaration | declaration
+3. declaration → var-declaration | fun-declaration
+4. var-declaration → type-specifier `ID` ; | type-specifier `ID` `[` `NUM` `]`; 
+5. type-specifier → `int` | `void`
+6. fun-declaration → type-specifier `ID` `(`params`)` compound-stmt
+7. params → param-list | `void`
+8. param-list→ param-list , param | param
+9. param → type-specifier `ID` | type-specifier `ID` `[]`
+10. compound-stmt → `{` local-declarations statement-list `}`
+11. local-declarations → local-declarations var-declaration | empty 
+12. statement-list → statement-list statement | empty
+13. statement → expression-stmt | compound-stmt| selection-stmt
+| iteration-stmt | return-stmt
+14. expression-stmt → expression ; | ;
+15. selection-stmt → `if` `(` expression `)` statement | `if` `(` expression `)` statement `else` statement
+16. iteration-stmt → `while` `(` expression `)` statement
+17. return-stmt → `return` ; | `return` expression ;
+18. expression → var `=` expression | simple-expression
+19. var → `ID` | `ID` `[` expression `]`
+20. simple-expression → additive-expression relop additive- expression | additive-expression
+21. relop → `<=` | `<` | `>` | `>=` | `==` | `!=`
+22. additive-expression → additive-expression addop term | term 
+23. addop → `+` | `-`
+24. term → term mulop factor | factor
+25. mulop → `*` | `/`
+26. factor → `(` expression `)` | var | call | `NUM`
+27. call → `ID` `(` args `)`
+28. args → arg-list | empty
+29. arg-list → arg-list , expression | expression
+
+> 思考：
+> 1. C-语言语法的特点，它的CFG
+
+# Sample Programs of C-
+```c
+int gcd (int u, int v) { /* calculate the gcd of u and v */
+    if (v == 0) return u;
+    else return gcd(v, u - u / v * v); /* v,u-u/v*v is equals to u mod v*/
+}
+int main() {
+    int x; int y; int temp;
+    x = 72;
+    y = 18;
+    if (x<y) {
+        temp = x;
+        x = y;
+        y = temp;
+    }
+    gcd(x,y);
+    return 0;
+}
+```
diff --git a/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/doc/lab1-report.md b/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/doc/lab1-report.md
@@ -0,0 +1,52 @@
+# Lab1 report
+
+2019.9.21
+
+## Designs & solved problems
+
+**Pattern matching order**
+
+First match comments and `\n`, then special symbols like `+-*/`,`<=`, `,;`. Then match keywords `else, if, int, return, void, while`. Then identifiers and numbers. At last match(and skip) blank characters. If nothing matches, a `.` will do the match and then return error. Most of these orders can be changed without any problem, but certain kinds of pattern must be matched after other kinds. Identifiers should be matched after keywords, or keywords will be match as identifiers. And `==` should be matched before `=`, and `<=` before `<`. 
+
+**Regular expression design**
+
+Most patterns are quite easy, just remember to escape special characters. But I didn't find any specific document that tells me what should be escaped and what shouldn't. So I need to check by hand if any error occurs if I do or do not escape a certain symbol. 
+
+The most difficult patterns is the multiple line comment `/* ... */`. We need to match the beginning and ending `/*` and `*/`, and make sure in the middle no `*/` can occur. But single `*` and `/` and `/*` are legal to occur. My idea is that some other character must appear between a `/` and a `*`. The expression is like(`.` means characters except `*` and `/` but include `\n`; escaping `\` is omitted) **`/* (*?.+/?)* */`**. But this cannot handle the case in which `/` appears first in the middle of comment. So the final result: **`/* (/? .* *? .+ /?)* */`**. 
+
+*[Notice 2020.4] this expression may be wrong!*
+
+**Location counter**
+
+I spent most of the time on this. 
+
+To count the locations(lines and columns) of lexemes, a global counter should be updated when a pattern is matched. Manually update counters by hand in each match like `{colume+=2; return IF;}` is quite annoying, it'll be better if I only need to write the counting code once. I searched on the Internet and found something called `bison-location` which seems to require another bison file, and I failed to get it work. But I found a macro called `YY_USER_ACTION` which is called automatically when a character is matched, and this is also used in that bison method. So I just copied the counting code from stackoverflow and modified it to work with my program: instead of using the `yylloc` variable maintained by bison, I use my own structure to count. Maybe a more elegant way is possible. 
+
+**List files in directory**
+
+To get all `cminus` files in the `testcase/` directory, I used the `opendir` and `readdir` functions. File extension is checked to determine whether this is a `cminus` file. 
+
+I modified some parts of the code given by TAs to make the structure clearer, like moving `suffix` and `extension` string into `main` and pass `extension` as the second parameter of `getAllTestcase()`. 
+
+**Testcase design**
+
+The testcase should include: normal symbols and identifiers, symbols that may cause ambiguity (`[]` and `[`, `<=` and `<`), multi-line comments, multiple comments in one line. Also some error cases like single `!` or other illegal symbols should be tested. 
+
+## Time spent
+
+Some parts is done in fragmentary time, so may not be very accurate.
+
+~0.5h get familiar with the project and what I need to do. 
+
+~0.5h write the main parts
+
+~1h implement the location counter
+
+~1h further debugging and testing (like the regular expression handling comments)
+
+~0.5h other functions(like directory listing)
+
+~0.75h report
+
+
+
diff --git a/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/lexical_analyzer.c b/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/lexical_analyzer.c
@@ -0,0 +1,40 @@
+#include "lexical_analyzer.h"
+
+const char * strtoken(Token t)
+{
+	switch (t) {
+		case ERROR      : return "ERROR";
+		case ADD        : return "ADD";
+		case SUB        : return "SUB";
+		case MUL        : return "MUL";
+		case DIV        : return "DIV";
+		case LT         : return "LT";
+		case LTE        : return "LTE";
+		case GT         : return "GT";
+		case GTE        : return "GTE";
+		case EQ         : return "EQ";
+		case NEQ        : return "NEQ";
+		case ASSIN      : return "ASSIN";
+		case SEMICOLON  : return "SEMICOLON";
+		case COMMA      : return "COMMA";
+		case LPARENTHESE: return "LPARENTHESE";
+		case RPARENTHESE: return "RPARENTHESE";
+		case LBRACKET   : return "LBRACKET";
+		case RBRACKET   : return "RBRACKET";
+		case LBRACE     : return "LBRACE";
+		case RBRACE     : return "RBRACE";
+		case ELSE       : return "ELSE";
+		case IF         : return "IF";
+		case INT        : return "INT";
+		case RETURN     : return "RETURN";
+		case VOID       : return "VOID";
+		case WHILE      : return "WHILE";
+		case IDENTIFIER : return "IDENTIFIER";
+		case NUMBER     : return "NUMBER";
+		case LETTER     : return "LETTER";
+		case ARRAY      : return "ARRAY";
+		case EOL        : return "EOL";
+		case COMMENT    : return "COMMENT";
+		case BLANK      : return "BLANK";
+	}
+}
diff --git a/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/lexical_analyzer.h b/编译原理和技术/labs/2019-licheng/lab1_lexical_analyzer/lexical_analyzer.h
@@ -0,0 +1,50 @@
+#ifndef _LEXICAL_ANALYZER_H_
+#define _LEXICAL_ANALYZER_H_
+
+#include <stdio.h>
+
+extern int fileno (FILE *__stream) __THROW __wur;
+
+#ifndef YYTOKENTYPE
+#define YYTOKENTYPE
+typedef enum cminus_token_type {
+	ERROR = 258,
+	ADD = 259,
+	SUB = 260,
+	MUL = 261,
+	DIV = 262,
+	LT = 263,
+	LTE = 264,
+	GT = 265,
+	GTE = 266,
+	EQ = 267,
+	NEQ = 268,
+	ASSIN = 269,
+	SEMICOLON = 270,
+	COMMA = 271,
+	LPARENTHESE = 272,
+	RPARENTHESE = 273,
+	LBRACKET = 274,
+	RBRACKET = 275,
+	LBRACE = 276,
+	RBRACE = 277,
+	ELSE = 278,
+	IF = 279,
+	INT = 280,
+	RETURN = 281,
+	VOID = 282,
+	WHILE = 283,
+	IDENTIFIER = 284,
+	NUMBER = 285,
+	ARRAY = 286,
+	LETTER = 287,
+	EOL = 288,
+	COMMENT = 289,
+	BLANK = 290
+
+} Token;
+#endif /* YYTOKENTYPE */
+
+const char * strtoken(Token t);
+
+#endif /* lexical_analyzer.h */