4. Flow Graph

Table of Contents

Chapter 3
Graphs

A prime prerequisite for being a compiler writer is being a “data structure junkie.” One must live, breathe, and love data structures, so we will not provide the usual complete list of all background mathematics that usually appears in a compiler book. We assume that you have access to any one of a number of data structure or introductory compiler writing books, such as Lorho (1984) or Fischer and LeBlanc (1988). This design assumes that you are familiar with the following topics, which are addressed by each of the data structure books referenced.

• Equivalence relations and partitions. The compiler frequently computes equivalence relations or partitions sets. An equivalence relation is frequently represented as a partition: All of the elements that are mutually equivalent are grouped together into a set of elements. Hence the whole set can be represented by a set of disjoint sets of elements. Partitions are frequently implemented as UNION/FIND data structures. This approach was pioneered by Tarjan (1975).

• Partial ordering relations on sets. A compiler contains a number of explicit and implicit partial orderings. Operands must be computed before the expression for which they are an operand, for example. The compiler must be able to represent these relations.

The topics that are addressed in this chapter concern graphs. A number of the data structures within a compiler—the flow graph and the call graph, for instance—are represented as directed graphs. Undirected graphs are used to represent the interference relationship for register allocation. Thus these topics are addressed here to the extent that the theory is used in implementing the compiler. The topics addressed are as follows:

Data structures for implementing directed and undirected graphs

Depth-first search and the classification of edges in a directed graph

Dominators, postdominators, and dominance frontiers

Computing loops in a graph

Representing sets

3.1 Directed Graphs

A directed graph consists of a set of nodes N and a set of edges E. Each edge has a node that is its tail, and a node that is its head. Some books define an edge to be an ordered pair of nodes—tail and head; however, this makes the description of the compiler more difficult. It is possible to have two edges with the same tail and head. In a flow graph containing a C switch statement or a Pascal case statement, two different alternatives that have the same statement bodies will create two edges having identical tails and heads.

For a flow graph, there are two distinguished nodes. Entry is a node with no predecessors, representing the point where the procedure starts. Exit is a node with no successors, where the procedure exits. All execution paths start at Entry; all finite paths representing a complete execution end at Exit. Note that infinite-length paths are possible, representing infinite loops in the flow graph.

If a procedure has multiple entry points, as is possible in Fortran, then a single Entry node is created that contains no instructions, with an edge between Entry and each actual entry point. When instructions are emitted, the procedure entry code is inserted at each of the entry points. The existence of the single Entry node ensures that the program analysis will be performed correctly. Similarly, if there are multiple nodes with no successors, then a single Exit node is created, with an edge between each original exit node and Exit.

Each execution of the procedure is represented by a path from Entry to Exit. Unfortunately, the converse is not true: there are paths from Entry to Exit that do not represent paths of execution; for example, if there are two conditional branches in the flow graph branching on the same conditional expression. In this case the second conditional branch can only branch in the same direction as the first one. The path that branches the other way is not possible. The compiler cannot identify this situation, so it assumes that all paths are possible. This assumption decreases the amount of optimization.

The graph in Figure 3.1 represents the flow graph for the running example. Node BO is the Entry node. Node B5 is the Exit node. Any execution path in the procedure is represented as a path between B0 and B5.

Directed graphs are implemented using two different techniques. Usually the nodes are represented as some data structure and the edges are represented by adding two attributes to each node: the set of successors and the set of predecessors of the node. The set of successors of X is the set of nodes Y that are heads of edges, with the tail X. Similarly, the set of predecessors of X is the set of nodes P that are the tails of edges, with head X. Thus in Figure 3.1 the predecessors of B3 are B2 and B6, while the successors of B3 are B2 and B4. Note that any node X satisfies the relation: X is a predecessor of each of its successors, and X is a successor of each of its predecessors. These sets are implemented as linked lists, with the head of the list contained in the data structure representing the node.

Figure 3.1 Flow Graph for MAXCOL

An alternative technique is to assign an integer to each node and represent each edge as a bit in a Boolean matrix. If there is an edge between nodes X and Y, then the bit in the position EDGE[X,Y] is set to true; otherwise, it is false.

The successor/predecessor representation has the advantage that it is efficient to scan through all the edges leaving or entering a node. It is also space efficient if the directed graph is sparse, as is true of most flow graphs. The matrix approach is more efficient in building the directed graph because it is easier to check whether a particular node is already a successor. We will use a derivative of the matrix approach during register allocation; otherwise, the successor/predecessor implementation will be used.

In an undirected graph the edges do not have a sense of direction. One is not traveling from one node to another in a particular direction. Instead, undirected graphs represent the idea of neighbors: two nodes are adjacent or they are not. The techniques for implementing directed graphs are used to implement undirected graphs: for each edge {X,Y} in the undirected graph, build two edges (Y,X) and (Y,X) in the implementation. In the matrix form, this means that the matrix is symmetric and only half of the matrix need be stored.

Table of Contents

Chapter 3Graphs

3.1 Directed Graphs

Chapter 3
Graphs