12. Scheduling and Rescheduling

Table of Contents

Chapter 11
Limiting Resources

The compiler is now ready to assign machine resources (registers, condition codes, and so forth) to each computation. There are several available algorithms for resource allocation, each of which performs best on particular classes of programs. Each register allocation algorithm works well when there is an abundance of physical registers available. Each performs badly when the required number of physical registers greatly exceeds the number available.

The algorithm presented here is a wedding of these, attempting to use each type of resource allocation where it works best. The algorithm is structured as a sequence of algorithms that each do a part of the allocation. Previous algorithms organized in this fashion have suffered from the phase-ordering problem: allocating one set of temporary registers has made it more difficult to allocate other sets. The compiler mitigates this problem by performing the LIMIT phase.

The first thing that must be done, therefore, is to reduce the number of required registers. This is done before the register allocation and scheduling phases, allowing each phase to assume that an adequate number of physical registers is available. These functions are performed in the following order, as is shown graphically in Figure 11.1:

1. The LIMIT phase reduces the need for machine resources as much as possible without slowing the execution of the program. It

• Performs peephole optimization to create the exact sequence of target instructions.

• Performs register coalescing and register renaming to avoid as many copy operations as possible and replaces each temporary that is used in multiple independent parts of the flow graph by distinct temporaries.

Figure 11.1 Back-End Structure

• Reduces the register pressure, limiting the number of registers needed at each program point to fit the physical registers available.

2. The SCHEDULE phase reorganizes the target instructions to reduce the number of machine cycles needed for executing the flow graph. At the same time, it avoids increasing the register pressure beyond the available set of registers.

3. The REGISTER phase assigns the temporaries to physical registers. This is done in three steps.

• First, temporaries that are live between blocks (global temporaries) are assigned to registers.

• Within a block, temporaries that can share storage with a global temporary are assigned registers.

• Then unassigned temporaries that are live within a block are assigned registers.

4. The RESCHEDULE phase is a reexecution of the SCHEDULE phase. It is only performed if the register allocator has inserted load and store operations.

The register allocation phases must use the target resources effectively. That means using the fewest possible registers. When there are insufficient registers, the register allocator inserts the fewest possible load and store operations. Using the minimum number of registers and inserting the minimum number of load and store operations is unrealistic since the problems are NP-complete. Instead, we use heuristics to do as good a job as possible.

11.1 Design of LIMIT

LIMIT performs four actions to reduce the number of instructions and registers used. These are grouped into two separate subphases. First, LIMIT performs peephole optimization, register renaming, and register coalescing. Then it reduces the number of registers needed (it reduces the register pressure) so that no more registers are needed at any point of the flow graph than exist in the machine.

As noted above, this compiler combines the operations of peephole optimization, register renaming, and register coalescing into a single algorithm. Why? When implemented separately, the algorithms take large amounts of space and time. Consider each of the parts:

• Register renaming involves breaking a single temporary into multiple temporaries if it is used independently in different parts of the flow graph. This involves knowing which evaluations of a temporary may be used at each use. This information can be given by the static single assignment (SSA) form of the graph.

• Register coalescing removes register-to-register copies using a data structure called the conflict graph. The conflict graph can be the largest data structure in the compiler. We observe that much of renaming can occur as renaming on the static single assignment form of the flow graph without the conflict graph. The conflict graph need only be built for a small fraction of the temporary registers, decreasing its size.

• Peephole optimization works best when the compiler can inspect the definitions of the operands of each instruction. We have this with the static single assignment form, so peephole optimization can be performed here. It certainly needs to be performed before instruction scheduling.

• As a cleanup phase, dead code must be eliminated. Again this algorithm operates on the static single assignment form of the flow graph.

Since the algorithms all work on the static single assignment form, they can be performed sequentially; however, they can also be combined. Peephole optimization can be performed at the same time that register copies are initially being eliminated for register coalescing. And we will see shortly that register renaming and register coalescing can be combined into one algorithm that computes a partition of the temporaries for reforming the normal flow graph.

After these algorithms based on static single assignment form, the algorithm operates on the normal flow graph and the loop structure to insert load and store operations to reduce the number of registers needed to match the registers available in the target machine. Thus the main procedure for LIMIT has the form shown in Figure 11.2.

The algorithms could be combined further if abnormal edges in the flow graph did not exist.¹Peephole optimization, local coalescing, and the construction of the static single assignment form could be done simultaneously. Since the compiler must avoid copy operations on abnormal edges, these edges and the corresponding φ-nodes must be identified before any coalescing or peephole optimizations are performed. This identification can be done during the construction of the static single assignment form; however, it is described separately for simplicity.

¹By now, you have figured out that abnormal edges are the bane of the compiler writer’s existence.

Figure 11.2 LIMIT Main Procedure

Figure 11.3 Effect of Abnormal Edges

All copies caused by φ-nodes on abnormal edges must be eliminated. So nothing can be done to temporaries in these φ-nodes that will cause a copy. This is achieved by not changing the points of evaluation or use. Actually, uses can be eliminated but not added. To identify these temporaries, the algorithm in Figure 11.3 is performed, which computes two sets: Occurs_In_Abnormal_φ_node, which is the set of all temporaries involved in these edges, and Pairs_In_AbnormaL_φ_node, which is the set of pairs that could cause a copy if the compiler is not careful. The algorithm simply looks at all blocks that have any φ-nodes and considers each predecessor to see if it is formed with an abnormal edge. If so, each of the φ-nodes is scanned and the sets formed.

Table of Contents

Chapter 11Limiting Resources

11.1 Design of LIMIT

Chapter 11
Limiting Resources