README.md

caparrov · Jul 4, 2017 · db371e7 · db371e7
1 parent 0101713
commit db371e7
Showing 1 changed file with 20 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -1,20 +1,33 @@
 # ERM: Extended Roofline Model
 
 
+## Execution flow in ERM
 
-# ERM flow
 
-<object data="https://github.com/caparrov/ERM/files/1121412/erm-execution-flow-steps.pdf" type="application/pdf" width="700px" height="700px">
-    <embed src="https://github.com/caparrov/ERM/files/1121412/erm-execution-flow-steps.pdf">
-        This browser does not support PDFs. Please download the PDF to view it: <a href="https://github.com/caparrov/ERM/files/1121412/erm-execution-flow-steps.pdf">Download PDF</a>.</p>
-    </embed>
-</object>
+
+The following figure illustrates the execution flow of ERM. 
+
 
 
 ![Alt text](images/erm-execution-flow-steps-web.gif?raw=true "Optional Title")
 
 [![](images/erm-execution-flow-steps-web.gif)](images/erm-execution-flow-steps-web.gif)
 
 
-<img src="https://github.com/caparrov/ERM/files/1121412/erm-execution-flow-steps.pdf" align="left" hspace="10" vspace="6">
 
+
+
+It consists of three steps, namely, compilation,execution of the LLVM IR instruction trace, and analysis of the scheduled DAG, as explainednext.* **Step 1: Compilation**. The first step consists of compiling the source code of the applicationinto LLVM IR. Throughout this dissertation, we use clang v3.4 (the LLVM C/C++front-end) and compilation flags -emit-llvm -c -g -O3. If the code contains intrinsics, weneed to specify the architecture flag, .e.g., -mavx for Intel AVX intrinsics, and -mfpu=neon forARM NEON; if we want to analyze scalar code we need the additional flags -fno-vectorize-fno-slp-vectorize to prevent vectorization.
+
+* **Step 2: Execution of the LLVM IR instruction trace**. The next step in the simulationprocess is to execute the LLVM IR (also known as bitcode) to obtain the dynamicinstruction trace and generate and schedule the computation DAG. Our approach, hence,can be considered a trace-driven simulation, in which the instruction trace is simulated ona generic microarchitectural model. We take advantage of the modular design of LLVM toimplement our analysis with two alternative approaches: Using the LLVM interpreter andinstrumenting the bitcode file. As explained next, both approaches analyze the same LLVMIR instruction trace and both approaches produce exactly the same results because theyshare the source code of the library that implements the scheduling Algorithm 2. They differin the LLVM tools and modules involved in the analysis, and language and ISA extensionssupported.Interpreter. The LLVM infrastructure includes an interpreter, lli, that executes bitcodefiles by analyzing LLVM IR instruction by instruction. Fig. 2.14(a) shows the structure of
+the run function of the interpreter, which accommodates the structure of Algorithm 1: Thecode highlighted in green is the main loop over the instructions of the LLVM IR instructiontrace (this is original LLVM code), and the code highlighted in blue is the code we insert tocall the analysis function. The main advantage of using the interpreter is that it requires asingle step in the execution flow after compilation. Further, since the interpreter is part ofthe LLVM infrastructure, it is tightly integrated with the entire framework. Unfortunately,it is not actively maintained and many recent ISA extension, such as the vector intrinsicslisted in Table 2.4, are not supported. Also, it has limited support for some C++ features,such as variable-length argument functions.Instrumentation and execution of the dynamic trace. This approach consists ofthree steps. First, the bitcode file is instrumented by inserting calls to a runtime librarythat generates the information necessary to build the dynamic dependence graph. Second,the instrumented bitcode is compiled and executed; when executed, the inserted runtimefunctions record the data of the dynamic computation DAG in a trace (e.g., dependencesbetween instructions, address and size of memory accesses, etc.) referred to as taskgraph.To perform these two steps ERM relies on Contech [83], an LLVM-based framework forgenerating dynamic task graphs. Finally, the dynamic task graph is analyzed with anLLVM pass using the LLVM analyzer opt. As with the interpreter, the structure of theLLVM pass, shown in 2.14(b), matches the scheduling Algorithm 1: The nested for loop,highlighted in green, iterates over the basic blocks of a task, and over the instructions ofeach basic block. Although this approach requires more steps and involves more LLVMmodules than using the interpreter, it does not have the limitation of language/ISA featuresnot supported; once an application has been compiled into LLVM IR and the dynamic taskgraph is generated, it can be analyzed.The two code snippets in Fig. 2.14 demonstrate the modularity of our approach to generatingand analyzing scheduled DAGs: Our library that implements the scheduling Algorithm 2can be used with any tool that parses a dynamic instruction trace according to Algorithm 1.
+
+* **Step 3: Analysis of the scheduled DAG**. The final step is to analyze the scheduled DAGobtained in Step 2 and report the data defined in (2.8)–(2.19). Since the two approaches presented above use the same library for analyzing the dynamic instruction trace, bothapproaches produce exactly the same output information.
+
+
+
+## Installing
+
+
+
+## Limitations of ERM