3.2 The 80x86 Addressing Modes

The 80x86 processors let you access memory in many different ways. Until now, you've only seen a single way to access a variable, the so-called displacement-only addressing mode. In this section you'll see some additional ways your programs can access memory using 80x86 memory addressing modes. The 80x86 memory addressing modes provide flexible access to memory, allowing you to easily access variables, arrays, records, pointers, and other complex data types. Mastery of the 80x86 addressing modes is the first step toward mastering 80x86 assembly language.

When Intel designed the original 8086 processor, it provided it with a flexible, though limited, set of memory addressing modes. Intel added several new addressing modes when it introduced the 80386 microprocessor while retaining all the modes of the previous processors. However, in 32-bit environments like Windows, BeOS, and Linux, these earlier addressing modes are not very useful; indeed, HLA doesn't even support the use of these older, 16-bit-only addressing modes. Fortunately, anything you can do with the older addressing modes can be done with the new addressing modes as well (even better, as a matter of fact). Therefore, you won't need to bother learning the old 16-bit addressing modes when writing code for today's high-performance operating systems. Do keep in mind, however, that if you intend to work under MS-DOS or some other 16-bit operating system, you will need to study up on those old addressing modes (see the 16-bit edition of this book on the accompanying CD-ROM for details).

3.2.1 80x86 Register Addressing Modes

Most 80x86 instructions can operate on the 80x86's general purpose register set. By specifying the name of the register as an operand to the instruction, you can access the contents of that register. Consider the 80x86 mov (move) instruction:

 mov( source, destination );

This instruction copies the data from the source operand to the destination operand. The 8-bit, 16-bit, and 32-bit registers are certainly valid operands for this instruction. The only restriction is that both operands must be the same size. Now let's look at some actual 80x86 mov instructions:

     mov( bx, ax );    // Copies the value from BX into AX     mov( al, dl );    // Copies the value from AL into DL     mov( edx, esi );  // Copies the value from EDX into ESI     mov( bp, sp );    // Copies the value from BP into SP     mov( cl, dh );    // Copies the value from CL into DH     mov( ax, ax );    // Yes, this is legal!

The registers are the best place to keep variables. Instructions using the registers are shorter and faster than those that access memory. Of course, most computations require at least one register operand, so the register addressing mode is very popular in 80x86 assembly code. Throughout this chapter you'll see the abbreviated operands reg and r/m (register/memory) used wherever you may use one of the 80x86's general purpose registers.

3.2.2 80x86 32-Bit Memory Addressing Modes

The 80x86 provides hundreds of different ways to access memory. This may seem like quite a bit at first, but fortunately most of the addressing modes are simple variants of one another so they're very easy to learn. And learn them you should! The key to good assembly language programming is the proper use of memory addressing modes.

The addressing modes provided by the 80x86 family include displacementonly, base, displacement plus base, base plus indexed, and displacement plus base plus indexed. Variations on these five forms provide all the different addressing modes on the 80x86. See, from hundreds down to five. It's not so bad after all!

3.2.2.1 The Displacement-Only Addressing Mode

The most common addressing mode, and the one that's easiest to understand, is the displacement-only (or direct) addressing mode. The displacement-only addressing mode consists of a 32-bit constant that specifies the address of the target location. Assuming that variable J is an int8 variable appearing at address $8088, the instruction "mov( J, al );" loads the AL register with a copy of the byte at memory location $8088. Likewise, if int8 variable K is at address $1234 in memory, then the instruction "mov( dl, K );" stores the value in the DL register to memory location $1234 (see Figure 3-1).

click to expand
Figure 3-1: Displacement-Only (Direct) Addressing Mode.

The displacement-only addressing mode is perfect for accessing simple scalar variables.

Intel named this the displacement-only addressing mode because a 32-bit constant (displacement) follows the mov opcode in memory. On the 80x86 processors, this displacement is an offset from the beginning of memory (that is, address zero). The examples in this chapter will often access bytes in memory. Don't forget, however, that you can also access words and double words on the 80x86 processors by specifying the address of their first byte (see Figure 3-2).

click to expand
Figure 3-2: Accessing a Word or DWord Using the Displacement Only Addressing Mode.

3.2.2.2 The Register Indirect Addressing Modes

The 80x86 CPUs let you access memory indirectly through a register using the register indirect addressing modes. The term "indirect" means that the operand is not the actual address, but rather, the operand's value specifies the memory address to use. In the case of the register indirect addressing modes, the value the register holds is the address of the memory location to access. For example, the instruction "mov( eax, [ebx] );" tells the CPU to store EAX's value at the location whose address is in EBX (the square brackets around EBX tell HLA to use the register indirect addressing mode).

There are eight forms of this addressing mode on the 80x86; the following instructions are examples of these eight forms:

      mov( [eax], al );      mov( [ebx], al );      mov( [ecx], al );      mov( [edx], al );      mov( [edi], al );      mov( [esi], al );      mov( [ebp], al );      mov( [esp], al );

These eight addressing modes reference the memory location at the offset found in the register enclosed by brackets (EAX, EBX, ECX, EDX, EDI, ESI, EBP, or ESP, respectively).

Note that the register indirect addressing modes require a 32-bit register. You cannot specify a 16-bit or 8-bit register when using an indirect addressing mode.^[1] Technically, you could load a 32-bit register with an arbitrary numeric value and access that location indirectly using the register indirect addressing mode:

      mov( $1234_5678, ebx );      mov( [ebx], al );     // Attempts to access location $1234_5678.

Unfortunately (or fortunately, depending on how you look at it), this will probably cause the operating system to generate a protection fault because it's not always legal to access arbitrary memory locations. As it turns out, there are better ways to load the address of some object into a register; you'll see how to do this shortly.

The register indirect addressing mode has many uses. You can use it to access data referenced by a pointer, you can use it to step through array data, and, in general, you can use it whenever you need to modify the address of a variable while your program is running.

The register indirect addressing mode provides an example of an anonymous variable. When using the register indirect addressing mode you refer to the value of a variable by its numeric memory address (e.g., the value you load into a register) rather than by the name of the variable. Hence the phrase "anonymous variable."

HLA provides a simple operator that you can use to take the address of a static variable and put this address into a 32-bit register. This is the "&" (address of) operator (note that this is the same symbol that C/C++ uses for the address-of operator). The following example loads the address of variable J into EBX and then stores EAX's current value into J using the register indirect addressing mode:

     mov( &J, ebx );          // Load address of J into EBX.     mov( eax, [ebx] );              // Store EAX into J.

Of course, it would have been easier to store EAX's value directly into J rather than using two instructions to do this indirectly. However, you can easily imagine a code sequence where the program loads one of several different addresses into EBX prior to the execution of the "mov( eax, [ebx]);" statement, thus storing EAX into one of several different locations depending on the execution path of the program.

Caution

The "&" (address-of) operator is not a general address-of operator like the "&" operator in C/C++. You may only apply this operator to static variables.^[2] You cannot apply it to generic address expressions or other types of variables. Later, you will learn about the "load effective address" instruction that provides a general solution for obtaining the address of some variable in memory.

3.2.2.3 Indexed Addressing Modes

The indexed addressing modes use the following syntax:

     mov( VarName[ eax ], al );     mov( VarName[ ebx ], al );     mov( VarName[ ecx ], al );     mov( VarName[ edx ], al );     mov( VarName[ edi ], al );     mov( VarName[ esi ], al );     mov( VarName[ ebp ], al );     mov( VarName[ esp ], al );

VarName is the name of some variable in your program.

The indexed addressing mode computes an effective address^[3] by adding the address of the variable to the value of the 32-bit register appearing inside the square brackets. Their sum is the actual address in memory the instruction accesses. So if VarName is at address $1100 in memory and EBX contains an eight, then "mov(VarName[ ebx ], al);" loads the byte at address $1108 into the AL register (see Figure 3-3).

click to expand
Figure 3-3: Indexed Addressing Mode.

The indexed addressing mode is really handy for accessing elements of arrays. You will see how to use this addressing mode for that purpose a little later in this book.

3.2.2.4 Variations on the Indexed Addressing Mode

There are two important syntactical variations of the indexed addressing mode. Both forms generate the same basic machine instructions, but their syntax suggests other uses for these variants.

The first variant uses the following syntax:

     mov( [ ebx + constant ], al );     mov( [ ebx - constant ], al );

These examples use only the EBX register. However, you can use any of the other 32-bit general purpose registers in place of EBX. This addressing mode computes its effective address by adding the value in EBX to the specified constant, or subtracting the specified constant from EBX (see Figures 3-4 and 3-5).

click to expand
Figure 3-4: Indexed Addressing Mode Using a Register Plus a Constant.

click to expand
Figure 3-5: Indexed Addressing Mode Using a Register Minus a Constant.

This particular variant of the addressing mode is useful if a 32-bit register contains the base address of a multibyte object and you wish to access a memory location some number of bytes before or after that location. One important use of this addressing mode is accessing fields of a record (or structure) when you have a pointer to the record data. This addressing mode is also invaluable for accessing automatic (local) variables in procedures (see the chapter on procedures for more details).

The second variant of the indexed addressing mode is actually a combination of the previous two forms. The syntax for this version is the following:

     mov( VarName[ ebx + constant ], al );     mov( VarName[ ebx - constant ], al );

Once again, this example uses only the EBX register. You may, however, substitute any of the 32-bit general purpose registers in lieu of EBX in these two examples. This particular form is quite useful when accessing elements of an array of records (structures) in an assembly language program (more on that in the next chapter).

These instructions compute their effective address by adding or subtracting the constant value from VarName's address and then adding the value in EBX to this result. Note that HLA, not the CPU, computes the sum or difference of VarName's address and constant. The actual machine instructions above contain a single constant value that the instructions add to the value in EBX at runtime. Because HLA substitutes a constant for VarName, it can reduce an instruction of the form

 mov( VarName[ ebx + constant], al );

to an instruction of the form

 mov( constant1[ ebx + constant2], al );

Because of the way these addressing modes work, this is semantically equivalent to

 mov( [ebx + (constant1 + constant2)], al );

HLA will add the two constants together at compile time, effectively producing the following instruction:

 mov( [ebx + constant_sum], al );

Of course, there is nothing special about subtraction. You can easily convert the addressing mode involving subtraction to addition by simply taking the two's complement of the 32-bit constant and then adding this complemented value (rather than subtracting the uncomplemented value).

3.2.2.5 Scaled Indexed Addressing Modes

The scaled indexed addressing modes are similar to the indexed addressing modes with two differences: (1) the scaled indexed addressing modes allow you to combine two registers plus a displacement, and (2) the scaled indexed addressing modes let you multiply the index register by a (scaling) factor of 1, 2, 4, or 8. The syntax for these addressing modes is

     VarName[ IndexReg₃₂*scale ]     VarName[ IndexReg₃₂*scale + displacement ]     VarName[ IndexReg₃₂*scale - displacement ]     [ BaseReg₃₂ + IndexReg₃₂*scale ]     [ BaseReg₃₂ + IndexReg₃₂*scale + displacement ]     [ BaseReg₃₂ + IndexReg₃₂*scale - displacement ]     VarName[ BaseReg₃₂ + IndexReg₃₂*scale ]     VarName[ BaseReg₃₂ + IndexReg₃₂*scale + displacement ]     VarName[ BaseReg₃₂ + IndexReg₃₂*scale - displacement ]

In these examples, BaseReg₃₂ represents any general purpose 32-bit register; IndexReg₃₂ represents any general purpose 32-bit register except ESP, and scale must be one of the constants: 1, 2, 4, or 8.

The primary difference between the scaled indexed addressing mode and the indexed addressing mode is the inclusion of the IndexReg₃₂*scale component. These modes compute the effective address by adding in the value of this new register multiplied by the specified scaling factor (see Figure 3-6 for an example involving EBX as the base register and ESI as the index register).

click to expand
Figure 3-6: The Scaled Indexed Addressing Mode.

In Figure 3-6, suppose that EBX contains $100, ESI contains $20, and VarName is at base address $2000 in memory. Then the following instruction:

 mov( VarName[ ebx + esi*4 + 4 ], al );

will move the byte at address $2184 ($2000 + $100 + $20*4 + 4) into the AL register.

The scaled indexed addressing mode is useful for accessing elements of arrays whose elements are 2, 4, or 8 bytes each. This addressing mode is also useful for access elements of an array when you have a pointer to the beginning of the array.

Caution

Although this addressing mode contains two variable components (the base and index registers), don't get the impression that you use this addressing mode to access elements of a two-dimensional array by loading the two array indices into the two registers. Two-dimensional array access is quite a bit more complicated than this. The next chapter will consider multidimensional array access and discuss how to do this.

3.2.2.6 Addressing Mode Wrap-Up

Well, believe it or not, you've just learned several hundred addressing modes! That wasn't hard now, was it? If you're wondering where all these modes came from, just consider the fact that the register indirect addressing mode isn't a single addressing mode, but eight different addressing modes (involving the eight different registers). Combinations of registers, constant sizes, and other factors multiply the number of possible addressing modes on the system. In fact, you need only memorize about two dozen forms and you've got it made. In practice, you'll use less than half the available addressing modes in any given program (and many addressing modes you may never use at all). So learning all these addressing modes is actually much easier than it sounds.

^[1]Actually, the 80x86 does support addressing modes involving certain 16-bit registers, as mentioned earlier. However, HLA does not support these modes and they are not useful under 32-bit operating systems.

^[2]Note: The term "static" here indicates a static, read only, or storage object.

^[3]The effective address is the ultimate address in memory that an instruction will access, once all the address calculations are complete.