5.7 Other Structured Programming Constructs | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

Here we introduce unc (the unconditional compare type completer) for the Itanium compare instructions and outline, at the assembly language level, a few additional structured programming constructs used by high-level languages.

5.7.1 Unconditional Compare Instructions

We shall briefly discuss the unconditional form of Itanium integer compare instructions with the compare type completer ctype=unc, whose behavior is best understood by contrasting it with the ordinary form of a compare instruction with no completer. The unconditional compare instructions have familiar features:

 cmp.crel.unc p1,p2=r2,r3       // two registers cmp.crel.unc p1,p2=imm8,r3     // immed and one register cmp.crel.unc p1,p2=r0,r3       // test 0 versus register cmp.crel.unc p1,p2=r3,r0       // test register versus 0

where standard codes for crel (the compare relationship completer) are supported (Sections 5.2.1 and 5.2.2). The opcode cmp4 behaves similarly for 32-bit comparisons.

An ordinary compare instruction that is predicated true executes and sets both predicate registers according to some comparative test. If it is predicated false, it does not execute at all, and neither predicate register changes state from previously contained Boolean values.

An unconditional compare that is predicated true behaves exactly like a predicated true ordinary compare. If predicated false, it unconditionally sets both destination predicate registers to 0 (false) without ever actually performing a comparison. How could this be useful?

5.7.2 Nested If...Then...Else Structures

Suppose that we needed to devise an if…then…else construct where the THEN and/or ELSE block contains another if…then…else construct:

 <entrance from prior code> if [outer conditional relationship] then     if [THEN-block conditional relationship]     then         <do inner THEN block A>     else         <do inner ELSE block B> else     if [ELSE-block conditional relationship]     then         <do inner THEN block C>     else         <do inner ELSE block D> endif <exit to subsequent code>

Could this nested structure be programmed wholly with predication, and thus no branches?

A predicated unconditional comparison, if false, transfers its own predication onto both of its destination predicates. This can effectively be used to quash entire blocks of instructions that should not execute:

 <entrance from prior code> cmp.crel pt,pf = [outer conditional test] (pt) cmp.crel.unc pa,pb = [THEN-block conditional test] (pa) <do inner THEN block A> (pb) <do inner ELSE block B> (pf) cmp.crel.unc pc,pd = [ELSE-block conditional test] (pc) <do inner THEN block C> (pd) <do inner ELSE block D> <exit to subsequent code>

where Pr₀ (the permanently true register) can be substituted as a placeholder for any THEN or ELSE blocks that should be executed regardless.

Trace what happens. If pt is 1, the THEN-block unconditional compare executes and sets either pa or pb to 1 and the other to 0; block A or block B will execute. In this case pf will be 0 and thus both pc and pd are forced to 0 by the unconditional compare of the ELSE-block; neither block C nor block D can execute. Similar analysis for all other assumptions for the three comparisons will show that exactly one of A, B, C, or D will execute.

This entirely branch-free implementation may be highly useful for a situation where this nested conditional logic involves rather short code blocks for A through D. Keep in mind, however, that while all of the code blocks must be loaded and pass into execution units, only one of the four predicated code blocks (A, B, C, or D) will actually have an effect. If one or more of the blocks has appreciable length, different implementations that do contain branch instructions should be considered.

5.7.3 Multiway Branching

The Itanium architecture includes templates that accommodate one, two, or three branch instructions in one bundle. Three independent B-units can analyze and execute up to three branch instructions in parallel during the same clock cycle. How could that be useful?

We just demonstrated one programming technique that selects which one of four code blocks should have an effect (Section 5.7.2). Similarly, suppose that we have a programming application where various conditional tests determine which one of four blocks of code (A, B, C, or D) should execute next:

 <compare instructions to set predicates pb, pc, pd> (pb) br.cond.dpnt.few B      // 3 branch instructions (pc) br.cond.dpnt.few C      //  in one group (pd) br.cond.dpnt.few D      //   in a bundle                              // fall through to A A:     <instructions for A>     br.sptk.many JOIN        // skip ahead to JOIN B:     <instructions for B>     br.sptk.many JOIN        // skip ahead to JOIN C:     <instructions for C>     br.sptk.many JOIN        // skip ahead to JOIN D:     <instructions for D>                              // fall through to JOIN JOIN:

Notice that two cases (B, C) involve taking two branches in reaching the appropriate instruction block and going to JOIN.

Does it matter in which order the three branch instructions are slotted into the bundle? Not if the conditions defining the three predicates are mutually exclusive. Otherwise, if more than one of pb, pc, or pd could be simultaneously true, then the path taken is determined by the order of the branch instructions in the bundle.

Taking advantage of the zero latency of compare branch pairs, compare instructions can precede the branch instructions in the same group. The entire multiway branching construct would then execute in a single clock cycle, limited only by the availability of required execution unit types.

Other architectures may achieve equivalent multiway branching by having the compiler construct a jump table, perhaps in the form of an array of offsets, one of which can be added to the current address in the instruction pointer in order to produce a jump at runtime. The new value of the instruction pointer (IP) determines which block of instructions (A, B, C, or D) executes next.

5.7.4 Simple Case Structures

In the last two sections, we have, in fact, been discussing case structures. We now return to the view of case structures in high-level languages and consider their implementations in assembly language.

Many programming languages have offered one or another method for implementing a multiway branch. In classic FORTRAN this was the "computed" GOTO instruction, in which a set of possible values of a testable integer quantity could be associated with a set of destination addresses. In classic BASIC a "computed" GOSUB was also offered. The C programming language introduced a more versatile switch construct. Most current languages, such as the newer BASIC variants after which this pseudocode is patterned, can handle cases based on arbitrary test conditions:

 SELECT <selector variable or expression> CASE  10 to  7,  3      <what to do when selector (>= -10 AND <= -7) OR = -3> CASE +5      <what to do when selector = +5> CASE 0      <what to do when selector = 0> CASE ELSE      <what to do otherwise> END SELECT

All of the "what to do" segments may be arbitrarily complex instruction blocks. They may also be of null length (in order to help define ELSE by a process of elimination). The chief advantage of such case structures is the way they make obvious how to insert additional cases later when a program is being extended.

A compiler would first bring the value of the selector variable or evaluated expression into a general register. For each case, the compiler would need to perform enough comparisons that would resolve into a Boolean criterion (i.e., Itanium predicate corresponding to that case). When the condition for a case is rather complex, the Itanium ISA provides specifically designed versions of the compare instruction, which will be presented in Chapter 6.

Any of the methods previously shown can be used to organize the necessary predicated branch instructions. With the SELECT construct, the cases are mutually exclusive; thus the code for each must branch to a rejoining point below the END SELECT statement.

For the switch construct in C, a break statement indicates the end of the pertinent code block for each case. Consider a very simple illustration where a result R is computed in one of three ways, depending on the value of some selector W:

 switch (W) { case 1:     R = P - Q;     break; case 2:     R = P + Q;     break; case 4:     R = P;     break; }

In the absence of the break statement, program execution would flow into the block intended for the next case. That is sometimes quite useful, but not here because we said we would illustrate mutual exclusivity.

This simple case structure is amenable to a very compact implementation in Itanium assembly language using compare instructions and predication:

 cmp.eq     p1,p0 = 1,rW           // p1=1 if rW=1 cmp.eq     p2,p0 = 2,rW           // p2=1 if rW=2 cmp.eq     p4,p0 = 4,rW;;         // p4=1 if rW=4 (p1) sub   rR = rP,rQ             // R = P - Q (p2) add   rR = rP,rQ             // R = P + Q (p4) mov   rR = rP;;              // R = P                                   // R has been determined

where notation like rR means an assigned general register. Subject to availability of enough execution units of the correct types in the processor implementation, these instructions can coexist in two groups and potentially execute in two clock cycles.

The Itanium architecture permits several instructions to target the same register (rR) in one clock cycle if at most one of those instructions is predicated true. Without such mutual exclusivity for the cases, the assembly language would have to include more stops (;;).