5. Local Optimization

Table of Contents

Chapter 4
Flow Graph

The front end of the compiler has completed its task. It has created an abstract syntax tree and symbol table for each of the procedures being compiled. Now the compiler builds a different representation—one used for improving the procedures (optimization), code generation, instruction scheduling, and register allocation. First, we must make two decisions concerning the structure of the compiler.

4.1 How Are Procedures Stored?

Optimizing compilers use a range of different data structures to represent procedures being compiled. At one extreme the procedure may be represented as a tree; at the other, each procedure may be represented as a sequence of machine instructions for the target machine.

Representing the procedure as a tree makes the original structure of the procedure clear. A procedure consists of declarations, statements, and expressions. Each of these contains components of the same form, so it is natural to represent the procedure as a tree. If a tree structure is used, an abstract syntax tree is the natural choice. The abstract syntax tree is the natural organization for tree-oriented optimization algorithms such as algebraic identities and Sethi-Ullman register numbering.

Representing the procedure as machine instructions makes many optimization algorithms easier. They can each be individually optimized and positioned. The fastest instruction sequence does not naturally match the abstract syntax tree. The individual instructions must be easily manipulated—created, replicated, deleted, or moved—which is more easily done with a sequence of instructions rather than a tree.

The compiler presented here gains the advantages of both trees and instruction sequences. A procedure is represented as a flow graph of sequences of instructions for an abstract RISC processor. This abstract machine has an inexhaustible supply of registers, called temporaries. There is a standard set of instructions for manipulating integers, long integers, floating point numbers, and double-precision numbers.

When tree-oriented algorithms are applicable, the procedure representation is translated into a form called static single assignment (SSA) form. When the compiler translates the flow graph into SSA form, the compiler reconstructs the expression trees, which can then be used in the tree-oriented algorithms. The compiler also computes the nested loops of the procedure, providing the tree structure of statements most needed in the compiler.

The assembly language procedure is not represented as text. As noted in chapter 2, it is stored as a directed graph called the flow graph. The flow graph is a variant of the idea of flow charts originally used in programming. The flow graph has the following components:

• The instructions are much like machine instructions in an abstract RISC processor. Each instruction consists of an operation code (opcode) representing the operation being performed, a set of input operands that are used to perform the operation indicated by the opcode, and a set of output targets that name the values being changed.

• The individual instructions have operands that are constants or temporaries. The set of temporaries is an arbitrarily large set of objects, like the physical registers in a real processor. Each temporary holds a value for some portion of the execution of the procedure. Some set of instructions will evaluate an expression and place it in the target temporary. Instructions that use this value as an operand reference the temporary as an operand.

• The instructions form a program in the same manner that assembly code on a real processor forms a program. The execution starts with the first instruction. Instructions are executed in turn until a branching instruction is found. The instructions are broken into sequences called blocks. The only instruction that is the destination of a branching instruction is the first instruction in a block. The only branching instructions are the last instructions in the block. At the end of the block there is a branching instruction representing each possible path out of the block.

• The blocks form a flow graph having the blocks as nodes in the graph. The edges between the blocks represent the possible execution paths leaving the block. The edge (B1, B2) indicates that there is some way that the execution of the procedure can travel directly from B1 to B2. The flow graph will have two distinguished nodes: the start block Entry and the exit block Exit.

Consider Figure 4.1 as a fragment of a procedure representing the computation of the statement A = B + C * (B+A). The computation is broken into individual computations. Before the value of a variable can be referenced, the address of the variable must be loaded and a load operation for the variable must be executed. All values are loaded into temporaries. For typographical purposes integer temporaries are represented by an integer prefixed with a letter T. Note that the addresses of A and B are used twice and loaded only once. The name A indicates the constant address of variable A. The value of B is used twice and loaded only once. These are examples of redundant expression elimination. The individual operation names (or opcodes) will be described later: iLDC stands for load integer constant, iSLD stands for load integer value from static memory, iADD is integer add, iMUL is integer multiply, and iSST is integer store into static memory. These names are taken from the Massive Scalar Compiler Project at Rice University.

Table of Contents

Chapter 4Flow Graph

4.1 How Are Procedures Stored?

Chapter 4
Flow Graph