5.2 Integer Compare Instructions | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

Comparisons between pairs of data values lie close to the heart of all truly useful CPU operations, from a programmer's perspective. The outcome of a comparison is a Boolean true or false condition that can be used to select alternative sequences of code at a juncture in an algorithm where a choice or branching is needed.

The relationship between comparisons and the concept of predication is very tight. For one outcome (true or false), we want a certain set of actions to occur; for the other outcome (false or true), we want a different set of actions to occur. That is, we want to predicate one set of actions upon one premise (true or false), while we want to predicate the other set of actions upon the opposite premise (false or true). In order to achieve this in the linear world of stored machine instructions, we need to direct the subsequent flow of control down one way for one alternative and a second path for the other.

The Itanium architecture supports predication more thoroughly than any previous architecture. Its approach to predication is to capture the Boolean result from a comparison in a pair of predicate registers. In general, the first of the pair of predicate registers will be set to 1 and the second to 0 for one outcome, while for the logically opposite outcome, the first predicate register will instead be set to 0 and the second to 1. In special cases where only one predicate register is essential, the permanently true predicate register Pr₀ can be substituted as a placeholder.

The Itanium ISA provides a full complement of instructions that compare pairs of 32- or 64-bit integers, or pairs of double-precision floating-point numbers; these instructions set a pair of predicate registers as just outlined. The Itanium ISA also provides parallel forms of compare instructions that make multiple simultaneous comparisons of data elements of various smaller widths and types packed into single registers; these instructions capture the multiple independent Boolean true or false outcomes into a regular (Gr or Fr) register.

We shall proceed to discuss the features of Itanium integer compare and branch instructions that would be most similar to such instructions in other architectures, leading to the ability to present sample programs containing traditional loop structures. Later, we will come back to aspects of compare instructions, branch instructions, and loop control that are very specific to the Itanium architecture.

5.2.1 Signed Comparison and Equality

Almost all modern computer architectures provide not only for detecting whether two integer quantities are equal, but also for determining how the two quantities actually differ. In algebraic notation, there are six useful cases: equal (=), not equal (!=), less than (<), less than or equal (<=), greater than or equal (>=), and greater than (>).

The Itanium ISA supports a dual set of signed comparison instructions, cmp to compare 64-bit quad words and cmp4 to compare 32-bit double words. From the discussion of integer load instructions in Section 4.5.3, you should appreciate the tie-in between having the ld4 instruction perform zero-extension and the desirability of having compare instructions that are specifically sensitive to the rightmost 32 bits without needing to perform sign-extension. The designers of the Itanium architecture anticipated the continued importance of 32-bit data during the migration to 64 bits.

While here we describe the syntax and behavior of cmp, the 64-bit signed comparison, nearly everything in this section applies equally well to cmp4, the 32-bit signed comparison. There are several forms:

 cmp.crel.ctype p1,p2=r2,r3     // two registers cmp.crel.ctype p1,p2=imm8,r3   // immed and one register cmp.crel.ctype p1,p2=r0,r3     // test 0 versus register cmp.crel.ctype p1,p2=r3,r0     // test register versus 0

where two predicate registers (Pr₀ Pr₆₃) must always be specified, though Pr₀ (permanently 1) may be used in either position. In the most common use of these instructions, they can be read in pseudo-English from left to right.

The two-register form, for instance, sets p1 true and p2 false if r2 crel r3 is true given the current contents in the two general registers; otherwise, it sets p1 false and p2 true.

The conditional relation codes for crel (the comparison relationship completer) for the compare instructions are obvious and easy to remember: eq, ne, lt, le, ge, and gt in the same order as above. Notice that when two registers are involved, there is complete symmetry in which quantity is to the left and which is to the right in the comparison. From a programmer's perspective, it is very convenient to have this complete set of signed comparisons.

As an interesting aside, these six comparisons can be collapsed into variations of just two fundamental operations at the digital logic level. First, the assembler can interchange the programmer's order of p1 and p2, thus only requiring that the machine hardware be built to report either equality or inequality. The four other cases can be similarly handled for instance, a determination for a <= b is equivalent to a determination for b < a with reversal of the assignment of the two predicates. When a is an immediate operand, the assembler would instead replace it with a test on (a 1) < b with no reversal of the assignment of the two predicates.

In two's complement representation, simply testing the value of a single integer against zero is sufficient to determine whether the integer is negative, zero, or positive. An architecture may or may not give such a test an opcode distinct from a general compare instruction. The Itanium ISA neatly takes advantage of this by using register r0, which always contains the value 0, in one of the positions of the compare instruction. (We defer discussing additional "parallel" capabilities of these special forms with zero.)

Eight choices are offered for ctype (the comparison type completer). The null choice of none at all gives a standard comparison as just described. The choice of unc for an unconditional comparison is described in Section 5.7.1. The remaining six choices (or, and, or.andcm, orcm, andcm, and.orcm) specify so-called parallel inequality forms of comparison that we discuss later.

5.2.2 Unsigned Comparison

In the current context of compare instructions, it is useful to return to the distinction made in Chapter 1 between representations of signed and unsigned numbers. Binary computers compare operands at the bit-pattern level. The comparisons for equality (eq) and inequality (ne) in Section 5.2.1 will also work in the context of unsigned integer data. If two bit patterns match at all bit positions, the two unsigned quantities represented are equal, while if the two bit patterns differ at one or more bit positions, the two unsigned quantities represented are not equal.

In this section, we explore inequality for unsigned quantities. We cannot use the same machine instructions as for signed integers. Consider two quantities and their bit patterns, a = 110 and b = 011. If these are 3-bit unsigned representations, then all instructions that compare them should report results consistent with the inequality a > b because their values are unequal, in the sense that 6 > 3. But if instead these are 3-bit two's complement representations, then all instructions that compare them should report results consistent with the inequality a < b because their values are unequal, in the sense that 2 < +3. Notice that this is not a simple logical reversal of the signed comparison result a > b; that would have been a <= b.

Machine instructions for any architecture cannot guess which representational semantics we intend. Therefore, we could find that a second set of compare operations is required to make the appropriate unsigned comparisons. Most architectures offer a second set of compare instructions. In algebraic notation, there are again four useful cases: less than (<), less than or equal (<=), greater than or equal (>=), and greater than (>).

The Itanium ISA uses a dual set of comparison instructions (cmp and cmp4) for unsigned as for signed data, with the same choices of ctype (the comparison type), but with different choices of crel (the comparison relationship completer). The comparison relationships are crel = ltu, leu, geu, gtu in the same order as above. It is very convenient, from a programmer's perspective, to have a complete set of unsigned comparisons.

Markstein has illustrated the usefulness of unsigned comparisons in the context of coding multiprecision arithmetic routines e.g., detecting carry conditions when adding integers that are wider than 64 bits.

We draw your attention to two common instances of unsigned data where you should be careful to use unsigned instead of signed comparisons. First, any table of arbitrarily assigned codes like ASCII or Unicode character sets (Section 2.5.3) has a sorting order that will almost always be an unsigned counting sequence. Second, memory addresses are intrinsically unsigned. Always use the unsigned forms of compare instruction for addresses.