6.5 CPU Memory Access

Most CPUs have two or three different ways to access memory. The most common memory addressing modes modern CPUs support are direct, indirect, and indexed . A few CPUs (like the 80x86) support additional addressing modes like scaled indexed , while some RISC CPUs only support indirect access to memory. Having additional memory addressing modes makes memory access more flexible. Sometimes a particular addressing mode can allow you to access data in a complex data structure with a single instruction, where two or more instructions would be required on a CPU without that addressing mode. Therefore, having a wide variety of ways to access memory is generally good as these complex addressing modes allow you to use fewer instructions.

It would seem that the 80x86 processor family (with many different typesof memory addressing modes) would be more efficient than a RISC processor that only supports a small number of addressing modes. In many respects, this is absolutely true; those RISC processors can often take three to five instructions to do what a single 80x86 instruction does. However, this does not mean that an 80x86 program will run three to five times faster. Don't forget that access to memory is very slow, usually requiring wait states. Whereas the 80x86 frequently accesses memory, RISC processors rarely do. Therefore, that RISC processor can probably execute the first four instructions, which do not access memory at all, while the single 80x86 instruction, which accesses memory, is spinning on some wait states. In the fifth instruction the RISC CPU might access memory and will incur wait states of its own. If both processors execute an average of one instruction per clock cycle and have to insert 30 wait states for a main memory access, we're talking about a difference of 31 clock cycles (80x86) versus 35 clock cycles (RISC), only about a 12 percent difference.

If an application must access slow memory, then choosing an appropriate addressing mode will often allow that application to compute the same result with fewer instructions and with fewer memory accesses, thus improving performance. Therefore, understanding how an application can use the different addressing modes a CPU provides is important if you want to write fast and compact code.

6.5.1 The Direct Memory Addressing Mode

The direct addressing mode encodes a variable's memory address as part of the actual machine instruction that accesses the variable. On the 80x86 for example, direct addresses are 32-bit values appended to the instruction's encoding. Generally, a program uses the direct addressing mode to access global static variables . Here's an example of the direct addressing mode in HLA assembly language:

 static      i:dword;           . . .      mov( eax, i ); // Store EAX's value into the i variable.

When accessing variables whose memory address is known prior to the execution of the program, the direct addressing mode is ideal. With a single instruction you can reference the memory location associated with the variable. On those CPUs that don't support a direct addressing mode, you may need an extra instruction (or more) to load a register with the variable's memory address prior to accessing that variable.

6.5.2 The Indirect Addressing Mode

The indirect addressing mode typically uses a register to hold a memory address (there are a few CPUs that use memory locations to hold the indirect address, but this form of indirect addressing is rare in modern CPUs).

There are a couple of advantages of the indirect addressing mode over the direct addressing mode. First, you can modify the value of an indirect address (the value being held in a register) at run time. Second, encoding which register specifies the indirect address takes far fewer bits than encoding a 32-bit (or 64-bit) direct address, so the instructions are smaller. One disadvantage of the indirect addressing mode is that it may take one or more instructions to load a register with an address before you can access that address.

Here is a typical example of a sequence in HLA assembly language that uses an 80x86 indirect addressing mode (brackets around the register name denote the use of indirect addressing):

 static      byteArray: byte[16];           . . .      lea( ebx, byteArray ); // Loads EBX register with the address                             // of byteArray.      mov( [ebx], al );      // Loads byteArray[0] into AL.      inc( ebx );            // Point EBX at the next byte in memory                             // (byteArray[1]).      mov( [ebx], ah );      // Loads byteArray[1] into AH.

The indirect addressing mode is useful for many operations, such as accessing objects referenced by a pointer variable.

6.5.3 The Indexed Addressing Mode

The indexed addressing mode combines the direct and indirect addressing modes into a single addressing mode. Specifically, the machine instructions using this addressing mode encode both an offset (direct address) and a register in the bits that make up the instruction. At run time, the CPU computes the sum of these two address components to create an effective address . This addressing mode is great for accessing array elements and for indirect access to objects like structures and records. Though the instruction encoding is usually larger than for the indirect addressing mode, the indexed addressing mode offers the advantage that you can specify an address directly within an instruction without having to use a separate instruction to load the address into a register.

Here is a typical example of a sequence in HLA that uses an 80x86 indexed addressing mode:

 static      byteArray: byte[16];           . . .      mov( 0, ebx );                  // Initialize an index into the array.      while( ebx < 16 ) do          mov( 0, byteArray[ebx] );   // Zeros out byteArray[ebx].          inc( ebx );                 // EBX := EBX +1, move on to the                                      // next array element.      endwhile;

The byteArray[ebx] instruction in this short program demonstrates the indexed addressing mode. The effective address is the address of the byteArray variable plus the current value in the EBX register.

To avoid wasting space encoding a 32-bit or 64-bit address into every instruction that uses an indexed addressing mode, many CPUs provide a shorter form that encodes an 8-bit or 16-bit offset as part of the instruction. When using this smaller form, the register provides the base address of the object in memory, and the offset provides a fixed displacement into that data structure in memory. This is useful, for example, when accessing fields of a record or structure in memory via a pointer to that structure. The earlier HLA example encodes the address of byteArray using a 4-byte address. Compare this with the following use of the indexed addressing mode:

 lea( ebx, byteArray ); // Loads the address of byteArray into EBX.      . . .  mov( al, [ebx+2] );   // Stores al into byteArray[2]

This last instruction encodes the displacement value using a single byte (rather than four bytes), hence the instruction is shorter and more efficient.

6.5.4 The Scaled Indexed Addressing Modes

The scaled indexed addressing mode, available on several CPUs, provides two facilities above and beyond the indexed addressing mode:

The ability to use two registers (plus an offset) to compute the effective address
The ability to multiply one of those two registers' values by a common constant (typically 1, 2, 4, or 8) prior to computing the effective address.

This addressing mode is especially useful for accessing elements of arrays whose element sizes match one of the scaling constants (see the discussion of arrays in the next chapter for the reasons).

The 80x86 provides a scaled index addressing mode that takes one of several forms, as shown in the following HLA statements:

 mov( [ebx+ecx*1], al );            // EBX is base address, ecx is index.  mov( wordArray[ecx*2], ax );       // wordArray is base address, ecx is index.  mov( dwordArray[ebx+ecx*4], eax ); // Effective address is combination                                     // of offset(dwordArray)+ebx+(ecx*4).