Previous | Table of Contents | Next |
The compiler is now ready to assign machine resources (registers, condition codes, and so forth) to each computation. There are several available algorithms for resource allocation, each of which performs best on particular classes of programs. Each register allocation algorithm works well when there is an abundance of physical registers available. Each performs badly when the required number of physical registers greatly exceeds the number available.
The algorithm presented here is a wedding of these, attempting to use each type of resource allocation where it works best. The algorithm is structured as a sequence of algorithms that each do a part of the allocation. Previous algorithms organized in this fashion have suffered from the phase-ordering problem: allocating one set of temporary registers has made it more difficult to allocate other sets. The compiler mitigates this problem by performing the LIMIT phase.
The first thing that must be done, therefore, is to reduce the number of required registers. This is done before the register allocation and scheduling phases, allowing each phase to assume that an adequate number of physical registers is available. These functions are performed in the following order, as is shown graphically in Figure 11.1:
Figure 11.1 Back-End Structure
The register allocation phases must use the target resources effectively. That means using the fewest possible registers. When there are insufficient registers, the register allocator inserts the fewest possible load and store operations. Using the minimum number of registers and inserting the minimum number of load and store operations is unrealistic since the problems are NP-complete. Instead, we use heuristics to do as good a job as possible.
LIMIT performs four actions to reduce the number of instructions and registers used. These are grouped into two separate subphases. First, LIMIT performs peephole optimization, register renaming, and register coalescing. Then it reduces the number of registers needed (it reduces the register pressure) so that no more registers are needed at any point of the flow graph than exist in the machine.
As noted above, this compiler combines the operations of peephole optimization, register renaming, and register coalescing into a single algorithm. Why? When implemented separately, the algorithms take large amounts of space and time. Consider each of the parts:
Since the algorithms all work on the static single assignment form, they can be performed sequentially; however, they can also be combined. Peephole optimization can be performed at the same time that register copies are initially being eliminated for register coalescing. And we will see shortly that register renaming and register coalescing can be combined into one algorithm that computes a partition of the temporaries for reforming the normal flow graph.
After these algorithms based on static single assignment form, the algorithm operates on the normal flow graph and the loop structure to insert load and store operations to reduce the number of registers needed to match the registers available in the target machine. Thus the main procedure for LIMIT has the form shown in Figure 11.2.
The algorithms could be combined further if abnormal edges in the flow graph did not exist.1Peephole optimization, local coalescing, and the construction of the static single assignment form could be done simultaneously. Since the compiler must avoid copy operations on abnormal edges, these edges and the corresponding φ-nodes must be identified before any coalescing or peephole optimizations are performed. This identification can be done during the construction of the static single assignment form; however, it is described separately for simplicity.
1By now, you have figured out that abnormal edges are the bane of the compiler writers existence.
Figure 11.2 LIMIT Main Procedure
Figure 11.3 Effect of Abnormal Edges
All copies caused by φ-nodes on abnormal edges must be eliminated. So nothing can be done to temporaries in these φ-nodes that will cause a copy. This is achieved by not changing the points of evaluation or use. Actually, uses can be eliminated but not added. To identify these temporaries, the algorithm in Figure 11.3 is performed, which computes two sets: Occurs_In_Abnormal_φ_node, which is the set of all temporaries involved in these edges, and Pairs_In_AbnormaL_φ_node, which is the set of pairs that could cause a copy if the compiler is not careful. The algorithm simply looks at all blocks that have any φ-nodes and considers each predecessor to see if it is formed with an abnormal edge. If so, each of the φ-nodes is scanned and the sets formed.
Previous | Table of Contents | Next |