5.1 Hardware Basis for Control of Flow | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

Control of logical flow based upon currently calculated conditions gives computer programs their extraordinary power and versatility. Procedure calls and subroutine calls can be important forms of control when they facilitate organizing a program into coherent modules, but are perhaps not as essential, or as fundamental, as branch instructions. A running computer program makes "decisions" by comparing a current result against a fixed standard or computed quantity using some logical relationship, such as equality or greater-than. The machine must have some type of instruction that can bring about a different course of program flow for two opposite outcomes of an implicit test, such as less-than-or-equal-to versus greater-than.

Some historical architectures had skip instructions that would increment the instruction pointer (IP) either normally or by an extra amount, depending on the test that is, one instruction could be conditionally skipped. One architecture that was contemporary with the development of the FORTRAN language had a three-way test in hardware with space in the instruction word for three address values. Which of the three was selected for the instruction pointer was based on whether the tested value was negative, zero, or positive. This hardware feature found its way into software, in the form of a three-way version of the IF statement in the original FORTRAN language specification.

Computer architectures commonly include a set of branch instructions, almost all of which are dichotomous. Depending on the result of the test, either the branch is "taken" or it "falls through." If the branch is taken, the instruction pointer must be altered, and this is usually accomplished by adding a signed offset contained as an immediate operand within the instruction. If the branch falls through, the already updated value in the instruction pointer is used to fetch the next instruction in line i.e., the instruction that immediately follows the branch instruction itself.

Modern computer architectures have used several major techniques at the hardware level to support conditional branching: condition codes, state-management, the single compare branch instructions of PA-RISC architecture, and now the special predicate registers of the EPIC Itanium architecture.

5.1.1 Condition Codes

Some architectures have used special storage locations inside the CPU to hold condition code values. These have been reducible to single-bit Boolean values signifying the presence or absence of some recently tested condition.

For instance, CISC architectures (e.g., VAX) use single-bit condition codes that retain the result of a recent test that "came out negative" instead of positive, that "came out zero" instead of nonzero, and a limited number of other tests. One example from those architectures would be to branch if some quantity x is currently equal to another quantity y:

 cmp x,y beq equal

where equal is the symbolic label on the instruction to be executed next if the branch is taken. Another example would be to branch if some result r has just been computed and is negative (i.e., less than zero):

 sub x1,y1,r     // r = x1 - y1 blt negative

where negative is the symbolic label on the instruction to be executed next if the branch is taken.

Numerous questions arise from this approach. When a test is made, should all of the separate condition codes be set appropriately to 0 or 1, or should some of them be permitted to retain their settings from previous tests? Can the operation of the ISA be designed so that the way each condition code is set seems natural and easy for programmers to remember? What happens to the values stored in the condition codes across procedure calls? How can the values stored in the condition codes for a user program be retained across potential interrupt-handling code within an operating system? Are there time penalties associated with storing condition code values as part of the execution of some instruction and then retrieving the code values as part of the execution of a subsequent conditional branch?

Several of these questions have a close relationship to the concept of machine state. Modern operating systems handle interrupts, manage resources, schedule processes, and implement memory and file protection. While such work of the operating system is occurring, user processes are essentially suspended. When any suspended process resumes execution, it must be able to pick up exactly where it left off. No pair of instructions within a program should ever produce wrong or ambiguous results even if the operating system intrudes to attend to events logically unrelated to that program. Condition codes must be preserved under virtually all conditions unrelated to the affected process. The state of a process is nearly synonymous with the machine state at the point of suspension, as the machine state includes the contents in the condition codes and processor registers.

5.1.2 State-Management Approaches

If the condition codes are hard-wired to virtually every instruction, then the special mechanisms and instructions used to support interrupt handling must preserve and restore the appropriate values in the entire set of condition codes. Their values are part of the state of the interrupted process and must be restored when that process runs again.

Some architectures would only be able to preserve such bits of state information by performing a store to memory with a subsequent load from memory, increasing overhead for the operating system when it manages scheduling and interrupt handling.

Accordingly, some RISC architectures (e.g., Alpha) adopt a different state-management approach to conditional branching without using condition codes. Typically the strategy is to design the compare instructions to write the equivalent of a Boolean condition code into a general register. The branch instructions test the value in that register against an implied reference standard of zero. In this approach, the first example in Section 5.1.1 would become:

 cmpeq  x,y,t     // t=1 if x=y bne    t,equal

where the branch to the instruction at the label equal is taken if the value in register t is not equal to zero. If x is actually equal to y, then t is set to 1; otherwise, it is set to 0. The second example would become:

 subq   x1,y1,r   // r = x1 - y1 blt    r,negative

where the branch to the instruction at the label negative is taken if the value in register r is negative as a result of the subtraction. Otherwise the branch falls through.

Saving and restoring register contents are standard operations that come into play, if necessary, at times of context switching. Thus neither hardware nor software needs any special capabilities or strategies that are not already in place.

Not all RISC architectures adopt a state-management approach. Some, such as the PowerPC, use condition codes while still adhering to RISC principles. The EPIC Itanium ISA, though it has many RISC-like features, takes yet another approach.

5.1.3 Predicate Registers

The Itanium architecture includes not only large numbers of integer and floating-point registers for use in computation, but also several kinds of additional processor registers (Appendix D), which we shall take up when uses for them arise.

Architectures that use condition codes typically have only half a dozen of them, and each one reports on only one variety of conditional outcome. Architectures that use an entire N-bit general register to preserve much more arbitrary conditional outcomes across state transitions obviously set up a competition for the fixed pool of general registers, which are the fastest forms of storage available, between computational and comparative needs. Yet it is not difficult to think of applications where it would be convenient to retain quite a lot of Boolean conditions, as determined, through extensive code sequences.

Such requirements led to the Itanium architecture providing a bank of 64 single-bit predicate registers, Pr₀ Pr₆₃. Of these, the first 16 are said to be static and are used to retain conditional outcomes in sequential code. The higher-numbered predicate registers have uses in specially coded loop structures. Since there are 64 predicate registers in all, the Itanium design provides ways to save or restore the entire 64-bit vector en masse, using essentially one quad word transfer.

As in C, the value 1 in a predicate register signifies True and the value 0 False. Just as general register Gr₀ is hard-wired to contain the fixed value 0, so predicate register Pr₀ is hard-wired to contain the fixed value 1. Any attempt to write into Pr₀ is ignored without signaling any error condition.

In this chapter, we shall first show how predicate registers provide a way of handling if…then…else constructs and simple loop control using standard branch instructions. Except for a new syntax for the lines of assembly language code, this material will seem somewhat familiar if you have programmed for other systems in assembly language.

Later in this chapter, we shall show how the all-pervasive use of predication with Itanium instructions can entirely eliminate many occurrences of branch instructions that would otherwise be necessary with other architectures.