13. Register Allocation

Table of Contents

Chapter 12
Scheduling and Rescheduling

Reduced instruction set computing (RISC) processors increase execution speed by making the instructions simple (thus keeping the hardware simple) and by a number of other devices for increasing the number of instructions that can be executed in a fixed interval of time.

The processor is pipelined. This means that the execution of individual instructions is broken into small tasks that are approximately equal in size. The individual tasks for a single instruction are performed on an assembly line (called a pipeline). Pipelining gains the efficiencies of an assembly line. A second instruction can start execution, or be issued, when the first instruction is moved from the first stage. One instruction can be issued during each instruction cycle for each pipeline. However, each instruction may not complete its evaluation for one or more cycles after it is issued. If a later instruction attempts to use the value computed in an earlier instruction that has not yet completed all stages of the pipeline, the processor will stall, or delay the issuing of the second instruction until the first one has completed. To gain performance, the compiler will reorder the instructions to avoid processor stalls.¹

¹Early RISC processors did not stall when a value was not ready. Instead they executed the instruction using garbage as input. It was the responsibility of the compiler to ensure that such execution did not happen. All recent processors will stall while waiting for operands since the indeterminancy of some instructions, particularly LOAD, multiply, and divide instructions, made scheduling difficult.

The processor will issue multiple instructions at the same time. The processor will load a small set of instructions, called a packet, and analyze the relationship between the instructions. If the instructions do not use or change the operands of the other instructions in the set, then the instructions may be issued at the same time. This gains performance if the processor has more than one computing unit, such as an integer arithmetic unit and floating-point unit.

The processor may have more than one integer unit and more than one floating-point unit. In that case, the packet of instructions that can be fetched is larger and more than one arithmetic instruction may be issued simultaneously. Not all of the arithmetic units may be identical. In that case the compiler will reorder the instructions so that each packet will have instructions that can execute on different arithmetic units.

A processor with these three characteristics is called a superscalar processor. Most processors in use today are superscalar. Many of them have an additional characteristic called out-of-order execution. Such a processor will operate as described above and will allow later instructions in a packet to execute even when the instructions that precede them are constrained from execution. This later characteristic will not be discussed here since there are few things that the compiler can do to enhance the execution of out-of-order processors that are not important for the normal superscalar processors.

The Digital Alpha 21164 is an example of a superscalar processor. Consider how it matches the criteria above. First, the Alpha is not an out-of-order execution processor. All instructions are executed in order; if the execution of an instruction is delayed, all instructions following it are delayed also.

The Alpha is pipelined. Most instructions for the integer arithmetic units take one cycle; the floating-point instructions take four cycles. Some of the exceptions to these rules are the conditional move, LOAD, multiplication, and floating-division instructions.

The Alpha will attempt to issue four four-byte instructions during each clock cycle. The block of instructions must be aligned on an address that is a multiple of sixteen. If the address is not a multiple of sixteen, then the packet that contains the current instruction is fetched and the initial instructions in the packet are ignored, thus decreasing the number of instruction that can be issued during that clock cycle. If the instructions of a packet contain dependences so that they cannot all be issued, then the initial part of the packet is issued, up to the first instruction that cannot be issued immediately.

The Alpha contains two integer arithmetic units and two floating-point units. Both integer arithmetic units can execute most integer instructions. The exceptions are that shift operations are done on one of the units and branching operations are done on the other. There is a floating-point multiplication unit and a floating-point addition unit. Some instructions are shared between the two units.

Ideally, the Alpha will issue four instructions during a clock cycle. Two of the instructions will be executed by the two integer arithmetic units. One of the other instructions will be an instruction that can be executed by the floating-point multiplication unit, and the final instruction can be executed by the floating-point addition unit. These instructions can occur in any order in the packet of instructions.

Table of Contents

Chapter 12Scheduling and Rescheduling

Chapter 12
Scheduling and Rescheduling