9.3 Executing Instructions, Step by Step

To be able to write great code, you need to understand how a CPU executes individual machine instructions. To that end, let's consider four represen-tative 80x86 instructions: mov, add, loop , and jnz (jump if not zero). By understanding these four instructions, you can get a good feel for how a CPU executes all the instructions in the instruction set.

The mov instruction copies the data from the source operand to the destination operand. The add instruction adds the value of its source operand to its destination operand. The loop and jnz instructions are conditional-jump instructions - they test some condition and then jump to some other instruction in memory if the condition is true, or continue with the next instruction if the condition is false. The jnz instruction tests a Boolean variable within the CPU known as the zero flag and either transfers control to the target instruction if the zero flag contains zero, or continues with the next instruction if the zero flag contains one. The program specifies the address of the target instruction (the instruction to jump to) by specifying the distance, in bytes, from the jnz instruction to the target instruction in memory.

The loop instruction decrements the value of the ECX register and transfers control to a target instruction if ECX does not contain zero (after the decrement). This is a good example of a Complex Instruction Set Computer (CISC) instruction because it does multiple operations:

It subtracts one from ECX.
It does a conditional jump if ECX does not contain zero.

That is, loop is roughly equivalent to the following instruction sequence:

 sub( 1, ecx ); // On the 80x86, the sub instruction sets the zero flag  jnz SomeLabel; // the result of the subtraction is zero.

To execute the mov, add, jnz , and loop instructions, the CPU has to execute a number of different steps. Although each 80x86 CPU is different and doesn't necessarily execute the exact same steps, these CPUs do execute a similar sequence of operations. Each operation requires a finite amount of time to execute, and the time required to execute the entire instruction generally amounts to one clock cycle per operation or stage (as we usually refer to each of these steps) that the CPU executes. Obviously, the more steps needed for an instruction, the slower it will run. Complex instructions generally run slower than simple instructions, because complex instructions usually have many execution stages.

9.3.1 The mov Instruction

Although each CPU is different and may run different steps when executing instructions, the 80x86 mov( srcReg,destReg ); instruction could use the following execution steps:

Fetch the instruction's opcode from memory.
Update the EIP (extended instruction pointer) register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the data from the source register (srcReg) .
Store the fetched value into the destination register (destReg) .

The mov( srcReg,destMem ); instruction could use the following execution steps:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the displacement associated with the memory operand from the memory location immediately following the opcode.
Update EIP to point at the first byte beyond the operand that follows the opcode.
If the mov instruction uses a complex addressing mode (for example, the indexed addressing mode), compute the effective address of the destination memory location.
Fetch the data from srcReg .
Store the fetched value into the destination memory location.

Note that a mov( srcMem,destReg ); instruction is very similar, simply swapping the register access for the memory access in these steps.

The mov( constant,destReg ); instruction could use the following execution steps:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the constant associated with the source operand from the memory location immediately following the opcode.
Update EIP to point at the first byte beyond the constant that follows the opcode.
Store the constant value into the destination register.

Assuming each step requires one clock cycle for execution, this sequence will require six clock cycles to execute.

The mov( constant,destMem ); instruction could use the following execution steps:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the displacement associated with the memory operand from the memory location immediately following the opcode.
Update EIP to point at the first byte beyond the operand that follows the opcode.
Fetch the constant operand's value from the memory location immediately following the displacement associated with the memory operand.
Update EIP to point at the first byte beyond the constant.
If the mov instruction uses a complex addressing mode (for example, the indexed addressing mode), compute the effective address of the destination memory location.
Store the constant value into the destination memory location.

9.3.2 The add Instruction

The add instruction is a little more complex. Here's a typical set of operations that the add( srcReg,destReg ); instruction must complete:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the value of the source register and send it to the arithmetic logical unit (ALU), which handles arithmetic on the CPU.
Fetch the value of the destination register operand and send it to the ALU.
Instruct the ALU to add the values.
Store the result back into the destination register operand.
Update the flags register with the result of the addition operation.

Note	The flags register, also known as the condition-codes register or program-status word, is an array of Boolean variables in the CPU that tracks whether the previous instruction produced an overflow, a zero result, a negative result, or other such condition.

If the source operand is a memory location instead of a register, and the add instruction takes the form add( srcMem,destReg ); then the operation is slightly more complicated:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the displacement associated with the memory operand from the memory location immediately following the opcode.
Update EIP to point at the first byte beyond the operand that follows the opcode.
If the add instruction uses a complex addressing mode (for example, the indexed addressing mode), compute the effective address of the source memory location.
Fetch the source operand's data from memory and send it to the ALU.
Fetch the value of the destination register operand and send it to the ALU.
Instruct the ALU to add the values.
Store the result back into the destination register operand.
Update the flags register with the result of the addition operation.

If the source operand is a constant and the destination operand is a register, the add instruction takes the form add( constant,destReg ); and here is how the CPU might deal with it:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the constant operand that immediately follows the opcode in memory and send it to the ALU.
Update EIP to point at the first byte beyond the constant that follows the opcode.
Fetch the value of the destination register operand and send it to the ALU.
Instruct the ALU to add the values.
Store the result back into the destination register operand.
Update the flags register with the result of the addition operation.

This instruction sequence requires nine cycles to complete.

If the source operand is a constant, and the destination operand is a memory location, then the add instruction takes the form add( constant, destMem ); and the operation is slightly more complicated:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the byte following the opcode.
Decode the instruction's opcode to see what instruction it specifies.
Fetch the displacement associated with the memory operand from memory immediately following the opcode.
Update EIP to point at the first byte beyond the operand that follows the opcode.
If the add instruction uses a complex addressing mode (for example, the indexed addressing mode), compute the effective address of the destination memory location.
Fetch the constant operand that immediately follows the memory operand's displacement value and send it to the ALU.
Fetch the destination operand's data from memory and send it to the ALU.
Update EIP to point at the first byte beyond the constant that follows the memory operand.
Instruct the ALU to add the values.
Store the result back into the destination memory operand.
Update the flags register with the result of the addition operation.

This instruction sequence requires 11 or 12 cycles to complete, depending on whether the effective address computation is necessary.

9.3.3 The jnz Instruction

Because the 80x86 jnz instruction does not allow different types of operands, there is only one sequence of steps needed for this instruction. The jnz label; instruction might use the following sequence of steps:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the displacement value following the instruction.
Decode the opcode to see what instruction it specifies.
Fetch the displacement value (the jump distance) and send it to the ALU.
Update the EIP register to hold the address of the instruction following the displacement operand.
Test the zero flag to see if it is clear (that is, if it contains zero).
If the zero flag was clear, copy the value in EIP to the ALU.
If the zero flag was clear, instruct the ALU to add the displacement and EIP values.
If the zero flag was clear, copy the result of the addition back to the EIP.

Notice how the jnz instruction requires fewer steps, and thus runs in fewer clock cycles, if the jump is not taken. This is very typical for conditional-jump instructions.

9.3.4 The loop Instruction

Because the 80x86 loop instruction does not allow different types of operands, there is only one sequence of steps needed for this instruction. The 80x86 loop instruction might use an execution sequence like the following:

Fetch the instruction's opcode from memory.
Update the EIP register with the address of the displacement operand following the opcode.
Decode the opcode to see what instruction it specifies.
Fetch the value of the ECX register and send it to the ALU.
Instruct the ALU to decrement this value.
Send the result back to the ECX register. Set a special internal flag if this result is nonzero.
Fetch the displacement value (the jump distance) following the opcode in memory and send it to the ALU.
Update the EIP register with the address of the instruction following the displacement operand.
Test the special internal flag to see if ECX was nonzero.
If the flag was set (that is, it contains one), copy the value in EIP to the ALU.
If the flag was set, instruct the ALU to add the displacement and EIP values.
If the flag was set, copy the result of the addition back to the EIP register.

As with the jnz instruction, you'll note that the loop instruction executes more rapidly if the branch is not taken and the CPU continues execution with the instruction that immediately follows the loop instruction.