14. The Object Module

Table of Contents

Chapter 13
Register Allocation

The compiler has already performed register renaming and register coalescing, reduced the register pressure, and scheduled the instructions. It is time to assign registers to each temporary. The register allocator must satisfy the following constraints in order of importance:

Correctness: The compiler must assign distinct temporaries to distinct registers if there is a point in the flow graph where both temporaries might contain distinct values and both are used later. If either of the temporaries is uninitialized at that point, then the compiler is free to assume that the two temporaries have the same value.

Avoid spilling: The compiler should assign temporaries to registers so that the number of LOAD and STORE instructions inserted by the register allocator are as small as possible during the execution of the program.

Use few registers: The compiler should use as few registers in the register set as possible. Registers that are saved by the calling procedure should be used before registers saved by the called procedure.

Many compilers take a simplistic view of register allocation. They describe register allocation in terms of some algorithmic problem—such as graph coloring or bin packing—and then use some heuristic solution for that particular formulation. Such register allocators perform well on problems needing few registers; however, if the number of registers needed is significantly greater than the number of registers available, each of these register allocation methods generates a large number of spill instructions, namely, the loads and stores to memory generated by the register allocator.

The problem is that each of these allocation techniques uses one of the two types of information available. The graph-coloring allocators use the concept of interference or conflict graphs. The conflict graph has no concept of which instructions are near each other, so it performs poorly on blocks. The bin-packing register allocators perform well on blocks, but have to use an approximation to handle control flow. It is possible to create situations where one algorithm will work better than another. This approach to register allocation was chosen to expose the best attributes of each of the algorithms.

This compiler combines the two. Recall that the compiler has already inserted spilling instructions to reduce the register pressure to less than or equal to the number of registers available. The compiler will now use three distinct allocation algorithms to allocate registers:

• The compiler uses a derivative of graph-coloring register allocation introduced by Preston Briggs (1992) to perform allocation of temporaries that are live across block boundaries.

• The compiler uses a derivative of the FAT algorithm introduced by Laurie Hendron (1993) to perform allocation of the local temporaries that can be allocated to the same registers as global temporaries.

• The compiler uses the standard single-pass register allocation algorithm to allocate registers to temporaries that are live only within a single block. This is a bin-packing algorithm that allocates the local temporaries one at a time as the block is walked in reverse execution order.

By separating the assignment of local and global temporaries, the compiler introduces the possibility of a phase-ordering problem: The assignment of global temporaries may inhibit the assignment of local temporaries. This is unavoidable since the optimal allocation of registers is an NP-complete problem. The design is such that the particular choice of algorithms to use will avoid as much of the problem as is possible.

To illustrate the interplay between global and local register allocation, consider the pictorial representation of a block in Figure 13.1 The set of instructions where a temporary is live is represented by a horizontal distance. Each temporary is represented by a distinct row. The global register allocator will create situations such as R1, R2, and R4. R1 contains a value assigned in another block and used for the last time in this block. R2 is assigned a value in this block and used in another block. R4 combines the two: R4 is assigned a value in this block, control flow leaves the block, and returns to the block using the value earlier in the block. R3 is a typical local temporary. It is assigned a value in the block and used for the last time later in the block. In large procedures, this is most of the temporaries. R1, R2, and R4 are allocated by the global allocator. R2 and R4 are combined with other local temporaries by the FAT algorithm. R3 is allocated using the local allocator.

Figure 13.1 Pictorial Representation of Block

Figure 13.2 Driver for Register Allocation

Recall that all of these algorithms are approximations to an optimal allocation. An optimal allocation can be discovered by solving an integer programming problem; however, this technique is too expensive for a production compiler.

The main algorithm for register allocation is to perform each form of allocation in turn. The FAT algorithm and the local register allocator work together by the FAT algorithm creating data structures that are used by the local register allocator. The calling structure is shown in Figure 13.2. First, perform global register allocation and then apply the FAT algorithm and local register allocation algorithms on each block.

Table of Contents

Chapter 13Register Allocation

Chapter 13
Register Allocation