4.8 Itanium Addressing Modes

The topic of addressing modes the means of specifying locations is much simpler for an EPIC or RISC architecture than for CISC architectures. We have split our discussion of addressing modes into two parts: This section discusses Itanium addressing modes, while the next section explores addressing modes that are used in other computer architectures.

Most of the Itanium instructions that operate on integer or Boolean data employ the same immediate and register direct addressing modes as the arithmetic instructions. A good way to understand addressing modes is always to seek an answer to the question, "Where, and through what means, can the data be found?"

4.8.1 Immediate Addressing

With immediate addressing, the instruction itself contains the operand data. For the Itanium ISA, no additional address calculation or fetch from memory is required, as the immediate operand is already in the CPU. The size of the immediate operand varies widely from one Itanium instruction to another, depending on use and available space.

An immediate operand is part of the .text section of a program, which is treated as a read-only memory segment by the operating system. Immediate addressing is almost exclusively used for numerical constants whose values are known at the time of program assembly or compilation. Furthermore, immediate addressing makes sense only for source, and not destination, operands.

4.8.2 Register Direct Addressing

In direct addressing, the instruction contains an address that points to the data. For the Itanium architecture and certain others, this form of addressing is restricted to register direct addressing. Bits within the instruction specify the number or name (i.e., the address) of the register Rx that contains the data.

EPIC and RISC architectures require that virtually all computations take place on data already in registers. Register direct addressing predominates as you look through Itanium assembly language programs. As EPIC and RISC architectures have rather large numbers of registers, numerous bits within the typical instruction are utilized as register specifiers (Section 4.1.2).

Years ago, when computers had very little memory, some architectures could also specify full memory addresses within their instruction words. That form of addressing is known as memory direct addressing.

4.8.3 Register Indirect Addressing

When a register contains an address pointer to the actual data, the process of accessing such data is referred to as indirect addressing. For many assemblers, the syntax for indirect addressing involves simply putting the name of the register in parentheses, (Rx), while Itanium assemblers expect square brackets, [Rx].

A bit field within the instruction contains the register number, x. During the execution, when the operand is required, the contents of this register x are sent to memory and interpreted as the effective address for the information unit containing the data. Again, this addressing mode in EPIC or RISC designs is more strictly called register indirect addressing.

Most CISC architectures support a variety of two-phase addressing where the address pointer would be found in some information unit in memory rather than in a register i.e., memory indirect addressing. Since the speed of memory access has not kept up with the speed of central processors, memory indirect addressing has been made obsolete.

4.8.4 Autoincrement Addressing

Many computer architectures have offered a variation of register indirect addressing that first uses a register as an address pointer to the actual data and then adjusts the value (address) in that register so that it points to the next identically sized element. We refer to this process of accessing data as autoincrement addressing.

The Itanium architecture offers flexibility in postincrementing, not limiting it to the size of specific data elements. For store operations, the desired postincrement amount is expressed as a signed 9-bit immediate constant inside the instruction. For load operations, the postincrement amount can also be determined dynamically, using a value in a general register.

Note that the signed nature of Itanium postincrementing allows running through a list in memory in either direction, once a register has first been pointed at the first or last element.

While some RISC architectures have offered autoincrement addressing, others, like the Alpha architecture, have not, because of a philosophy of optimizing the operations across the instruction set by trimming out all multifunctional instructions. The Itanium architecture follows a more temperate approach and includes certain highly useful multifunctional instructions if they can be implemented to execute as rapidly as all other instructions.

4.8.5 Summary of Itanium Addressing Modes

Table 4-4 recapitulates the four addressing modes that we have encountered for the Itanium ISA. As already mentioned, autoincrement addressing is implemented as a special case of register indirect addressing. Be sure that you comprehend the concept of effective address.

Table 4-4. Effective Addresses for Itanium Addressing Modes
Addressing Mode	Assembler Syntax	Effective Address
Immediate	`imm`	Various bits within the instruction are organized into an integer value, often signed, or into a subcode used internally to select cases of the instruction
Register Direct	`Rx`	The named register
Register Indirect	`[Rx]`	Contents of the named register
Postincrement	`[Rx], imm` `or` `[Rx], Ry`	Contents of the named pointer register; then the pointer register is postincremented by the signed quantity given statically as `imm` or dynamically in the register `Ry`

4.8.6 Addressing Details in Previous Programs

We can now benefit from a closer look at some of the addressing details in previous programs. First, we will look at SQUARES (Figure 1-3) using the gdb debugger in a Linux programming environment.

Addressing in SQUARES

We previously obtained a symbol table for SQUARES using the nm command (Figure 3-4). Only a few entries from that table are of interest here, which we rearranged into descending order by address:

 6000000000000968 A _end 6000000000000798 ? _GLOBAL_OFFSET_TABLE_ 6000000000000770 d sq3 6000000000000768 d sq2 6000000000000760 d sq1 6000000000000750 D __data_start 4000000000000640 ? _fini 4000000000000590 t done 4000000000000520 t first 4000000000000520 T main 4000000000000400 T _start

We also demonstrated in Section 3.8.4 how to disassemble a binary program with the debugger. Here is an excerpt from SQUARES where the first computed square is being stored at sq1:

 0x4000000000000530 <main+16>:   [MMI]      mov r20=1;; 0x4000000000000531 <main+17>:              addl r14=-56,r1 0x4000000000000532 <main+18>:              nop.i 0x0;; 0x4000000000000540 <main+32>:   [MMI]      st8 [r14]=r20;; 0x4000000000000541 <main+33>:              add r21=r22,r21 0x4000000000000542 <main+34>:              nop.i 0x0;;

Comparing this information with the assembly language source for SQUARES, we can infer that register r1 has been given the task of the global pointer, symbolically gp.

Now we are in a position to verify how the address of sq1 is computed at runtime from the value given to the global pointer when the program is run:

 L> gdb bin/squares [messages deleted here] (gdb) break done Breakpoint 1 at 0x4000000000000590 (gdb) run Starting program: /home/user/bin/squares Breakpoint 1, 0x4000000000000590 in done () (gdb) p/x $r1 $2 = 0x6000000000000798 (gdb) q The program is running.  Exit anyway? (y or n) y L>

Observe that immediate values in the disassembly are decimal radix. We first convert decimal 56 in the instruction at <main+17> to 0xffff...ffc8, then add that to the value 0x6000...0798 just shown for the global pointer (r1), and obtain the address value 0x6000...0760 in agreement with the symbol table equivalent for sq1. You can also verify the addressing details for sq2 and sq3.

Addressing in DOTPROD

We used a different method to address data in the DOTPROD program (Figure 4-5), using movl instructions that establish pointers to each vector.

We can use the nm command to obtain a symbol table for DOTPROD. Only a few entries, rearranged into descending order by address, concern us here:

 6000000000000978 A _end 60000000000007a8 ? _GLOBAL_OFFSET_TABLE_ 600000000000077e d W 6000000000000778 d V 6000000000000770 d P 6000000000000760 D __data_start 4000000000000680 ? _fini 40000000000005e0 t done 4000000000000520 t first 4000000000000520 T main 4000000000000420 T _start

This time, we need to show that the movl instructions do indeed contain the correct addresses established at the end of the assembly and linking processes.

One way to examine those addresses would be to disassemble the program and recover the full 64-bit binary value from each two-slot movl instruction. Instead of reverse engineering each number, we can simply use the debugger:

 L> gdb bin/dotprod [messages deleted here] (gdb) break first Breakpoint 1 at 0x4000000000000520 (gdb) run Starting program: /house/user/bin/dotprod Breakpoint 1, 0x4000000000000520 in main () (gdb) stepi 6 0x4000000000000550 in main () (gdb) p/x $r14 $1 = 0x6000000000000778 (gdb) p/x $r15 $2 = 0x600000000000077e (gdb) p/x $r16 $3 = 0x6000000000000770 (gdb) q The program is running.  Exit anyway? (y or n) y L>

We had to direct the debugger to step through six instructions, as the three movl instructions take up three instruction bundles. The gdb debugger counts instruction slots and ignores the nop contained in the third slot of each bundle. We see that the three pointer registers have been dynamically loaded with the addresses expected from our inspection of the symbol table entries.

We have led you through these two examples in some detail in order to demystify the addressing details of programming as fully as possible. When working with your own programs, you should never hesitate to use techniques like these to resolve any ambiguities in your assessment of program behavior.

Addressing in the HP-UX environment

The cc_bundled command produces by default an executable program file in which addresses appear to be only 32 bits wide. For instance, a few lines selected and reordered from the output of the nm -x command for SQUARES compiled without the +DD64 option are as follows:

 [Index]    Value     Size       Type  Bind  O  Shndx   Name [90]     |0x400001b0|0x00000000|NOTYP|GLOB |0|   .sbss|__gp [65]     |0x40000010|0x00000008|OBJT |LOCAL|0|   .data|sq3 [64]     |0x40000008|0x00000008|OBJT |LOCAL|0|   .data|sq2 [63]     |0x40000000|0x00000008|OBJT |LOCAL|0|   .data|sq1 [67]     |0x04000930|0x00000000|NOTYP|LOCAL|0|   .text|done [66]     |0x040008c0|0x00000000|NOTYP|LOCAL|0|   .text|first [98]     |0x040008c0|0x00000090|FUNC |GLOB |0|   .text|main [36]     |0x040008c0|0x00000000|SECT |LOCAL|0|   .text|.text

The 32-bit address 0x400001b0 corresponds to a 64-bit address 0x20000000400001b0 that can be shown to be in register r1 using the adb debugger. The two instruction bundles relevant to storing a value of sq1 can be found as follows:

 adb> main+0x10/2i main + 0x10:       addl             r20=1,r0;;       addl             r14=0xfffffffffffffe50,r1       nop.i            0;;       st8              [r14]=r20;;       add              r21=r22,r21       nop.i            0;; adb>

It is easy to show that the addl instruction yields a 64-bit value in register r14 that leads to the storage for sq1:

 adb> 0x20000000400001b0+0xfffffffffffffe50/jx 0x2000000040000000:                 0x1 adb>

We used the capability of adb to evaluate a hexadecimal expression, which we could also verify by hand.

In the DOTPROD program, the correctness of the addressing for V, W, and P as set up in registers r14, r15, and r16 can similarly be shown by comparing nm and adb output:

 [Index]    Value     Size       Type  Bind  O  Shndx   Name [90]     |0x400001b0|0x00000000|NOTYP|GLOB |0|   .sbss|__gp [65]     |0x4000000e|0x00000002|OBJT |LOCAL|0|   .data|W [64]     |0x40000008|0x00000002|OBJT |LOCAL|0|   .data|V [63]     |0x40000000|0x00000008|OBJT |LOCAL|0|   .data|P [67]     |0x04000980|0x00000000|NOTYP|LOCAL|0|   .text|done [66]     |0x040008c0|0x00000000|NOTYP|LOCAL|0|   .text|first [97]     |0x040008c0|0x000000e0|FUNC |GLOB |0|   .text|main [36]     |0x040008c0|0x00000000|SECT |LOCAL|0|   .text|.text H> adb bin/dotprod adb> main:b adb> :r Process 9801 Thread 9918 Execed Breakpoint 1 set at address 0x40008c0 main: >       nop.m            0         movl             r14=0x2000000040000008;; Hit Breakpoint 1 at address 0x40008c0 adb> ,6:s main + 0x20: >       nop.m            0         movl             r16=0x2000000040000000;; main + 0x20:         nop.m            0 >         movl             r16=0x2000000040000000;; main + 0x30: >       addl             r20=0,r0;;         ld2              r21=[r14],2         nop.i            0;; adb> r gp    0x20000000400001b0  r2    0x200000007edcc1a0   r3    0x200000007ef41820 r4    0                   r5    0xc000000000000408   r6    0x200000007eff6240 r7    0x200000007efa02a8  r8    0x40008c0   r9    0x200000007ffffa50 r10   0x40000000          r11   0x200000007ef9c5cc   sp    0x200000007ffffa00 tp    0x200000007ed20000  r14   0x2000000040000008   r15   0x200000004000000e r16   0x2000000040000000  r17   0x7ffffd24   r18   0x7ffffbd8 [remainder of register dump deleted] adb> q H>

We clearly see that the three registers that we chose for addressing have indeed been set up to point to the addresses of V, W, and P.