The Basics of the CPU | Debugging Applications for MicrosoftВ® .NET and Microsoft WindowsВ® (Pro-Developer)

[Previous] [Next]

The Intel instruction set has been around for quite a while and has its roots in the 8086 CPU that Intel first released in 1978. In the days of MS-DOS and 16-bit Microsoft Windows, assembly language used to be a little quirky and hard to use because of the way the CPU handled memory, which was through 64-KB blocks of memory called segments. Fortunately, today on Microsoft Windows 98 and Microsoft Windows 2000, the CPU has direct access to the entire address space, which means that assembly language is much easier to deal with.

The assembly language that I'll be introducing in this chapter will be the basic 32-bit instruction set that is compatible across all x86 architecture CPUs from both Intel and Advanced Micro Devices (AMD). The advanced features on the Intel Pentiums, such as MMX, aren't generally an issue because Windows uses relatively few such features. I won't get into the real grungy parts of assembly-language instruction formats such as the ModR/M and SIB bytes, which both indicate ways to access memory. For the purposes of this chapter, memory access is memory access. I also won't be covering floating-point instructions. Operations on the Intel CPU floating-point unit (FPU) are similar to normal instructions. The main differences are that the FPU has its own set of registers and the floating-point instructions use a register stack-based architecture. If this chapter inspires you to learn more about the Intel family of CPUs—and I hope it does—you should download the three-volume "Intel Architecture Software Developer's Manual" Adobe PDF files from www.intel.com. Intel even offers the manuals in book form for free if you want to be cool and have them on your bookshelf.

One key point to remember is that the x86 CPUs are very flexible and provide you with many ways to carry out similar operations. Fortunately for us, the Microsoft compilers do a good job of picking the fastest way to do an operation and reusing that construct wherever applicable, so recognizing what a section of code is doing is easier. In this chapter, I'll cover the most commonly used instructions you'll see in assembly language. If you're interested in all the assembly-language instructions, you can consult the Intel manuals.

Registers

The first topic I want to cover is the registers. Because every bit of data that your application handles passes through the registers at one time or another, knowing the purpose of each register can help you recognize code gone awry. x86 CPUs have eight general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, and EBP), six segment registers (CS, DS, ES, SS, FS, and GS), an instruction pointer (EIP), and a flags register (EFLAGS). The CPU has other registers as well, such as the debug and machine control registers, but they are special-purpose registers and you won't encounter them in normal user-mode debugging. The general-purpose registers, which are listed in Table 6-1, are all 32-bit registers. Notice that some of the registers allow mnemonics to access different portions of the complete 32-bit register. The only segment register of interest for this chapter is the FS register, which holds the thread information block (TIB) that describes the currently executing thread. The other segment registers are used, but the operating system configures them in such a way that they are transparent to normal operation. The instruction pointer holds the address of the currently executing instruction.

Table 6-1 General-Purpose Registers

32-Bit Register	16-Bit Access	Low-Byte Access (bits 0 7)	High-Byte Access (bits 8 15)	Special Uses
EAX	AX	AL	AH	Integer function return values are stored here.
EBX	BX	BL	BH
ECX	CX	CL	CH	Loop instruction counters use this register for counting.
EDX	DX	DL	DH	The high 32 bits of 64-bit values are stored here.
ESI	SI			In memory move or compare instructions, the source address is stored here.
EDI	DI			In memory move or compare instructions, the destination address is stored here.
ESP	SP			The stack pointer. This register is changed implicitly when calling functions, returning from functions, making room on the stack for local variables, and cleaning up the stack.
EBP	BP			Base/frame pointer. This register holds the stack frame for a procedure.

The flags register, EFLAGS, contains the status flags and the control flags. Various instructions set bits in EFLAGS to indicate the result of those instructions. For example, the ZF (Zero Flag) bit is set to 1 if the result of an instruction is 0. In Chapter 4, I described setting the CPU into single-step mode, which involved setting the TF (Trap Flag) in the EFLAGS register. Figure 6-1 shows the Registers window from the Visual C++ debugger. The Registers window displays the EFLAGS register as EFL. Notice that I'm not showing floating-point registers in the Registers window. You can hide the floating-point registers by right-clicking in the Registers window and unchecking Floating-Point Registers on the menu.

Figure 6-1 Visual C++ Registers window

Table 6-2 lists the flag values shown in the Registers window. The Visual C++ documentation doesn't mention what the flag values in the Registers window mean, so you might never have seen these values before. Unfortunately, the mnemonics Visual C++ uses for these flags doesn't correspond to the Intel mnemonics, so you'll have to translate when referring to the Intel documentation.

One minor problem with the Registers window is that the flags update, but unlike the regular registers, which turn a different color when they change, the flags stay the same color even when they change. You need to keep an eye on the particular flag you're interested in to see it change. Fortunately, you rarely need to look at the individual flag values. What I do to make spotting flag changes easier is to click the New File button and open a new scratch file. I then copy the existing flags from the Registers window and paste them into the scratch text window to compare the values before and after.

Table 6-2 Registers Window Flag Values

Registers Window Flag	Meaning	Intel Manual Mnemonic	Notes
OV	Overflow Flag	OF	Set to 1 if the operation resulted in an integer overflow or underflow.
UP	Direction Flag	DF	Set to 1 if string instructions are processed from highest address to lowest address (autodecrement). 0 means that string instructions are processed from lowest address to highest address (autoincrement).
EI	Interrupt Enable Flag	IF	Set to 1 if interrupts are enabled. This flag will always be 1 in a usermode debugger.
PL	Sign Flag	SF	Reflects the most significant bit of an instruction result. Set to 0 for positive values, 1 for negative values.
ZR	Zero Flag	ZF	Set to 1 if the instruction result is 0. This flag is important for compare instructions.
AC	Auxiliary Carry Flag	AF	Set to 1 if a binary-coded decimal (BCD) operation generated a carry or a borrow.
PE	Parity Flag	PF	Set to 1 if the least significant byte of the result contains an even number of bits set to 1.
CY	Carry Flag	CF	Set to 1 if an arithmetic operation generates a carry or a borrow out of the most significant bit of the result. Also set to 1 on an overflow condition for unsigned integer arithmetic.

One important feature of the Registers window is that you can edit the values in it. Although the Registers window looks like a standard text window, such as the Output window, you can change the values in it if you put the cursor on the first number to the right of the equal sign for the register you want to change and type in your revision.

Instruction Format and Memory Addressing

The basic instruction format for the Intel CPUs is below. All instructions follow the same pattern.

[prefix] instruction [operands]

For the most part, you see prefixes only on some string functions. (I'll cover the common situations in which string functions use prefixes in the "String Manipulation" section later in the chapter.) The operands format, shown below, indicates the direction of the operation. The source goes into the destination, so read the operands from right to left.

Single-instruction operands : XXX source Two-instruction operands: XXX destination, source

The source operand can be a register, a memory reference, or an immediate value—that is, a hard-coded value. The destination operand can be a register or a memory reference. The Intel CPUs don't allow both a source and a destination to be memory references.

Memory references are those operands that appear within brackets. For example, the memory reference [0040129Ah] means "get the value at memory location 0x0040129A." The h is the assembly language way of specifying a hexadecimal number. Using [0040129Ah] is the same as accessing a pointer to an integer in C with *pIVal. Memory references can be through registers, as in [EAX], which means "get the memory at the address in EAX." Another common memory reference specifies an address by adding an offset to a register value. [EAX+0Ch] means "add 0xC to the value in EAX and get that memory." Some memory references, such as [EAX+EBX*2], which indicates that the memory reference is from a calculation involving several registers, become fairly complicated.

To differentiate the sizes of memory references, you'll often see a memory reference preceded by a pointer size. The pointer sizes are shown as BYTE PTR, WORD PTR, and DWORD PTR for byte, word, and double-word references, respectively. You can think of these just as you think of a C++ cast. If the disassembly doesn't specify a pointer size, the size is a double word.

Sometimes an instruction's memory reference is straightforward and you can easily see the address for that memory. For example, a reference to [EBX] is just a reference to the memory held in the EBX register, so you can simply pull up the Memory window and type in EBX to look at it. Other times, however, it isn't possible to figure out the memory reference without performing some complicated hexadecimal multiplication. Fortunately, the Registers window will show you what memory the instruction is about to reference.

Notice the line "0012F988 = 0012F9D4" at the bottom of Figure 6-1. That line is the effective address display. The current instruction, in this case at 0x5F42D8B8, is referencing the address 0x0012F988, the left-hand side of the line. The right-hand side of the line is the value at the 0x0012F988 memory location, 0x0012F9D4. Only those instructions that access memory will show the effective address in the Registers window. Because x86 CPUs allow only one of the operands to be a memory reference, just keeping an eye on the effective address display can show you what memory you're about to access and what the value is at that memory location.

If the memory access isn't valid, the CPU generates either a General Protection Fault (GPF) or a page fault. A GPF indicates that you're trying to access memory that you don't have access to. A page fault indicates that you're trying to access a memory location that doesn't exist. If you're looking at a line of assembly language that crashes, the part to look at is the memory reference. That will tell you which values were invalid. For example, if the memory reference is [EAX], you need to look at the value in EAX. If EAX holds an invalid address, you need to start scanning backward in the assembly-language listing to see what instruction set EAX to the invalid value. Keep in mind that you might need to go back several calls to find the instruction. I'll show you how to walk the stack manually in the section "The Memory Window and the Disassembly Window" later in the chapter.