Previous | Table of Contents | Next |
The front end of the compiler has completed its task. It has created an abstract syntax tree and symbol table for each of the procedures being compiled. Now the compiler builds a different representationone used for improving the procedures (optimization), code generation, instruction scheduling, and register allocation. First, we must make two decisions concerning the structure of the compiler.
Optimizing compilers use a range of different data structures to represent procedures being compiled. At one extreme the procedure may be represented as a tree; at the other, each procedure may be represented as a sequence of machine instructions for the target machine.
Representing the procedure as a tree makes the original structure of the procedure clear. A procedure consists of declarations, statements, and expressions. Each of these contains components of the same form, so it is natural to represent the procedure as a tree. If a tree structure is used, an abstract syntax tree is the natural choice. The abstract syntax tree is the natural organization for tree-oriented optimization algorithms such as algebraic identities and Sethi-Ullman register numbering.
Representing the procedure as machine instructions makes many optimization algorithms easier. They can each be individually optimized and positioned. The fastest instruction sequence does not naturally match the abstract syntax tree. The individual instructions must be easily manipulatedcreated, replicated, deleted, or movedwhich is more easily done with a sequence of instructions rather than a tree.
The compiler presented here gains the advantages of both trees and instruction sequences. A procedure is represented as a flow graph of sequences of instructions for an abstract RISC processor. This abstract machine has an inexhaustible supply of registers, called temporaries. There is a standard set of instructions for manipulating integers, long integers, floating point numbers, and double-precision numbers.
When tree-oriented algorithms are applicable, the procedure representation is translated into a form called static single assignment (SSA) form. When the compiler translates the flow graph into SSA form, the compiler reconstructs the expression trees, which can then be used in the tree-oriented algorithms. The compiler also computes the nested loops of the procedure, providing the tree structure of statements most needed in the compiler.
The assembly language procedure is not represented as text. As noted in chapter 2, it is stored as a directed graph called the flow graph. The flow graph is a variant of the idea of flow charts originally used in programming. The flow graph has the following components:
Consider Figure 4.1 as a fragment of a procedure representing the computation of the statement A = B + C * (B+A). The computation is broken into individual computations. Before the value of a variable can be referenced, the address of the variable must be loaded and a load operation for the variable must be executed. All values are loaded into temporaries. For typographical purposes integer temporaries are represented by an integer prefixed with a letter T. Note that the addresses of A and B are used twice and loaded only once. The name A indicates the constant address of variable A. The value of B is used twice and loaded only once. These are examples of redundant expression elimination. The individual operation names (or opcodes) will be described later: iLDC stands for load integer constant, iSLD stands for load integer value from static memory, iADD is integer add, iMUL is integer multiply, and iSST is integer store into static memory. These names are taken from the Massive Scalar Compiler Project at Rice University.
Previous | Table of Contents | Next |