Computers derive much of their power from their ability to execute code selectively and from the speed at which they execute repetitive algorithms. Programs in high-level languages like Ada, C++, or Pascal use if-then, if-then-else, and case structures to execute code and loop structures selectively, such as while (pre-test) loops, until (post-test) loops, and for (counter-controlled) loops to repetitively execute code. Some high-level languages have a goto statement for unconditional branching. Somewhat more primitive languages (like older versions of BASIC) depend on fairly simple if statements and an abundance of goto statements for both selective execution and looping.
The 80×86 assembly language programmer’s job is similar to the old BASIC programmer’s job. The 80×86 microprocessor can execute some instructions that are roughly comparable to for statements, but most branching and looping is done with 80×86 statements that are similar to, but even more primitive than, simple if and goto statements. The objective of this chapter is to describe the machine implementation of language structures such as if-then, if-then-else, while, until, and for.
The 80×86 jmp (jump) instruction corresponds to goto in a high-level language. As coded in assembly language, jmp usually has the form
jmp StatementLabel
where StatementLabel corresponds to the name field of some other assembly language statement. Recall that the name field is followed by a colon (:) when used to label an executable statement. The colon is not used in the jmp statement itself. As an example, if there were alternative conditions under which a program should be terminated, the code might contain
jmp quit ; exit from program . . quit: INVOKE ExitProcess, 0 ; exit with return code 0 . .
Figure 5.1 shows a complete example: a program that will input numbers repeatedly and, after each number is entered, display the count of the numbers so far, the cumulative sum, and the average. The program implements the following pseudocode design.
program to input numbers and display running average and sum ; author: R. Detmer ; date: revised 9/97 .386 .MODEL FLAT INCLUDE io.h cr EQU 0dh ; carriage return character Lf EQU 0ah ; linefeed character .STACK 4096 ; reserve 4096-byte stack .DATA ; reserve storage for data sum DWORD ? explain BYTE cr,Lf,"As you input numbers one at a time, this",cr,Lf BYTE "program will report the count of numbers so far,",cr,Lf BYTE "the sum so far, and the average.",cr,Lf,Lf,0 prompt BYTE "number? ",0 number BYTE 16 DUP (?) countLabel BYTE "count",0 sumLabel BYTE " sum",0 avgLabel BYTE " average",0 value BYTE 11 DUP (?), 0 nextPrompt BYTE cr,Lf,Lf,"next ",0 .CODE ; start of main program code _start: output explain ; initial instructions mov sum,0 ; sum := 0 mov ebx,0 ; count := 0 forever: output prompt ; prompt for number input number,16 ; read ASCII characters atod number ; convert to integer add sum,eax ; add number to sum inc ebx ; add 1 to count dtoa value,ebx ; convert count to ASCII output countLabel ; display label for count output value ; display count dtoa value,sum ; convert sum to ASCII output sumLabel ; display label for sum output value ; display sum mov eax,sum ; get sum cdq ; extend sum to 64 bits idiv ebx ; sum / count dtoa value,eax ; convert average to ASCII output avgLabel ; display label for average output value ; output average output nextPrompt ; skip down, start next prompt jmp forever ; repeat PUBLIC _start ; make entry point public END
Figure 5.1: Program with forever loop
display instructions; sum := 0; count := 0; forever loop prompt for number; input ASCII characters for number; convert number to 2's complement form; add number to sum; add 1 to count; convert count to ASCII; display label and count; convert sum to ASCII; display label and sum; average := sum / count; display label and average; end loop;
This program must store values for count and sum, and all registers except EBX and ECX are used by the input/output macros and/or the division instruction. The value of count is kept in EBX, and sum is stored in a doubleword reserved in the data segment. Note that sum could have been initialized to zero by the DWORD directive instead of by the mov statement; as implemented, the code is more consistent with the design, but is slightly wasteful of time and space since sum only needs to be initialized once.
This program has several faults. One slight shortcoming is that it does not round the average. The major fault, however, is that it contains a forever loop with no way to get out. In fact, the usual termination code for a program is not even included since it could not be reached anyway. Fortunately there is a way to stop this program without turning off or resetting the computer; simply press control-C when the prompt for a number appears. This works because the input macro uses a Kernel32 service for input, and this function gives special treatment to control-C. Figure 5.2 shows a sample run of this program.
As you input numbers one at a time, this program will report the count of numbers so far, the sum so far, and the average. number? 75 count 1 sum 75 average 75 next number? 93 count 2 sum 168 average 84 next number? 78 count 3 sum 246 average 82 next number? (control-C pressed)
Figure 5.2: Sample run of program
The one jmp in the program in Fig. 5.1 transfers control to a point that precedes the jmp statement itself. This is called a backward reference. The code illustrates a forward reference.
jmp quit ; exit from program . . quit: INVOKE ExitProcess, 0 ; exit with return code 0
There are several 80×86 jmp instructions, in two major groups. All work by changing the value in the instruction pointer register EIP, so that the next instruction to be executed comes from a new address rather than from the address immediately following the current instruction. Jumps can be intersegment, changing the code segment register CS as well as EIP. However, this does not happen with flat memory model programming, so these instructions will not be covered. The intrasegment jumps are summarized in Fig. 5.3; the first two are the most commonly used.
Clock Cycles |
|||||
---|---|---|---|---|---|
Type |
386 |
486 |
Pentium |
Number of Bytes |
Opcode |
relative near |
7+ |
3 |
1 |
5 |
E9 |
relative short |
7+ |
3 |
1 |
2 |
EB |
register indirect |
10+ |
5 |
2 |
2 |
FF |
memory indirect |
10+ |
5 |
2 |
2+ |
FF |
Figure 5.3: jmp instructions
Each relative jump instruction contains the displacement of the target from the jmp statement itself. This displacement is added to the address of the next instruction to find the address of the target. The displacement is a signed number, positive for a forward reference and negative for a backward reference. For the relative short version of the instruction, only a single byte of displacement is stored; this is changed to a signextended to a doubleword before the addition. The relative near format includes a 32-bit displacement.
The 8-bit displacement in an relative short jump can serve for a target statement up to 128 bytes before or 127 bytes after the jmp instruction. This displacement is measured from the byte following the object code of the jmp itself since at the time an instruction is being executed, EIP logically contains the address of the next instruction to be executed. The 32-bit displacement in a relative near jump instruction can serve for a target statement up to 2,147,483,648 bytes before or 2,147,483,647 bytes after the jmp instruction.
There is no difference in the coding for a relative short jump and for a relative near jump. The assembler uses a short jump if the target is within the small range in order to generate more compact code. A near jump is used automatically if the target is more than 128 bytes away.
The indirect jump instructions use a 32-bit address for the target rather than a displacement. However, this address is not encoded in the instruction itself. Instead, it is either in a register or in a memory doubleword. Thus the format
jmp edx
means to jump to the address stored in EDX. The memory indirect format can use any valid reference to a doubleword of memory. If Target is declared as a DWORD in the data section, then
jmp Target
jumps to the address stored in that doubleword, not to that point in the data section. Using register indirect addressing, you could have
jmp DWORD PTR [ebx]
that causes a jump to the address stored at the doubleword whose address is in EBX! Fortunately, these indirect forms are rarely needed.
Exercises 5.1
hardLoop: jmp hardLoop
is executed, it continues to execute "forever." What is the object code for this statement?
.DATA ... addrStore DWORD ? ... .CODE ... doAgain: ... (3 instructions) jmp doAgain ... (200 instructions) jmp doAgain ... jmp addrStore ... jmp eax ... jmp [edi]
Programming Exercise 5.1
As you input numbers one at a time, this program will report the sum so far and the average. number 1 ? 10 sum 10 average 10 number 2 ? 50 sum 60 average 30
and so forth.
Conditional jump instructions make it possible to implement if structures, other selection structures, and loop structures in 80×86 machine language. There are many of these instructions. Each has the format
j− targetStatement
where the last part of the mnemonic identifies the condition under which the jump is to be executed. If the condition holds, then the jump takes place; otherwise, the next instruction (the one following the conditional jump) is executed.
With one exception (the jcxz/jecxz instruction, covered in Section 5.4), the "conditions" considered by the conditional jump instructions are settings of various flags in the flag registers. For example, the instruction
jz endWhile
means to jump to the statement with label endWhile if the zero flag ZF is set to 1; otherwise fall through to the next statement.
Conditional jump instructions do not modify the flags; they only react to previously set flag values. Recall how the flags in the flag register get values in the first place. Some instructions (like mov) leave some or all flags unchanged, some (like add) explicitly set some flags according to the value of a result, and still others (like div) unpredictably alter some flags, leaving them with unknown values.
Suppose, for example, that the value in the EAX register is added to a sum representing an account balance, and three distinct treatments are needed, depending on whether the new balance is negative, zero, or positive. A pseudocode design for this could be
add value to balance; if balance < 0 then ... { design for negative balance } elseif balance = 0 then ... { design for zero balance } else ... { design for positive balance } end if;
The following 80×86 code fragment implements this design.
add balance,eax ; add value to balance jns elseIfZero ; jump if balance not negative ... ; code for negative balance jmp endBalanceCheck elseIfZero: jnz elsePos ; jump if balance not zero ... ; code for zero balance jmp endBalanceCheck elsePos: ... ;code for positive balance endBalanceCheck:
Appropriate flags are set or cleared by the add instruction. No other instruction shown in the above code fragment changes the flags. The design checks first for (balance < 0). The code does this with the instruction
jns elseIfZero
which says to jump to elseIfZero if the sign flag is not set; that is, if (balance < 0) is not true. The code following this instruction corresponds to statements following the first then in the design. The statement
jmp endBalanceCheck
at the end of this block of statements is necessary so that the CPU skips the statements that correspond to the other cases. If the first conditional jump transfers control to elseIfZero, then the balance must be non-negative. The design checks to see if the balance is zero; the instruction
elseIfZero: jnz elsePos
jumps to elsePos if the zero flag ZF=0. The last instruction that set flags is the add at the beginning, so the jump occurs of the balance was not zero. The code for the (balance=0) case must again end with an unconditional jump to endBalanceCheck. Finally, the code that corresponds to the else in the design is at elsePos. This last block of code does not need a jump to endBalanceCheck since execution will fall through to this point.
The 80×86 code above directly corresponds to the order of statements in the design. If you are actually doing production coding in assembly language, a good technique is to initially code following a careful design and then reexamine the code to see if there are places where you can make it more efficient if there is a need to do so. This corresponds to what happens in many high-level language compilers. Most initially produce machine language that corresponds to the order of the high-level language statements being translated. Some compilers may then optimize the code, rearranging some statements for efficiency.
In the previous code, the label endBalanceCheck is on a line by itself. Technically this label will reference the address of whatever statement follows it, but it is far simpler to treat it as the part of the current design structure without worrying about what comes next. If what comes after this structure is changed, the code for this structure can remain the same. If the next statement requires another label, that is perfectly okay-multiple labels can reference the same spot in memory. Labels are not part of object code, so extra labels do not add to the length of object code or to execution time.
When writing code to mirror a design, one often wants to use labels like if, then, else, and endif. Unfortunately, IF, ELSE, and ENDIF are MASM directives, so they cannot be used as labels. In addition, IF1, IF2, and several other desirable labels are also reserved for use as directives. One solution is to use long descriptive labels like elseIfZero in the above example. Since no reserved word contains an underscore, another solution is to use labels like if_1 and endif_2 that parallel keywords in the original design.
The terms set a flag and reset a flag are often used to mean "give the value 1" to a flag and "give the value 0" to a flag, respectively. (Sometimes the word clear is used instead of reset.) As you have seen, many instructions set or reset flags. However, the cmp (compare) instructions are probably the most common way to establish flag values.
Each cmp instruction compares two operands and sets or resets AF, CF, OF, PF, SF, and ZF. The only job of a cmp instruction is to fix flag values; this is not just a side effect of some other function. Each has the form
cmp operand1, operand2
A cmp executes by calculating operand1 minus operand2, exactly like a sub instruction; the value of the difference and what happens in doing the subtraction determines the flag settings. A cmp instruction is unlike sub in that the value at the operand1 location is not changed. The flags that are of most interest in this book are CF, OF, SF, and ZF. The carry flag CF is set if there is a borrow for the subtraction and reset if no borrow is required. The overflow flag OF is set if there is an overflow and reset otherwise. The sign flag SF is set if the difference represents a negative 2's complement number (the leading bit is one) and is reset if the number is zero or positive. Finally, the zero flag ZF is set if the difference is zero and is reset if it is nonzero.
Here are a few examples showing how the flags are set or reset when some representative byte length numbers are compared. Recall that the subtraction operation is the same for unsigned and signed (2's complement) values. Just as a single bit pattern can be interpreted as a unsigned number or a signed number, flag values have different interpretations after comparison of unsigned or signed values. The "interpretation" columns below show the relationship of the operands under both signed and unsigned interpretations.
What flag values characterize the relations equal, less than, and greater than? Equality is easy; the ZF flag is set if and only if operand1 has the same value as operand2 no matter whether the numbers are interpreted as signed or unsigned. This is illustrated by Example 1 below. Less than and greater than take a bit more analysis.
flags |
interpretation |
||||||||
---|---|---|---|---|---|---|---|---|---|
operand1 |
operand2 |
difference |
CF |
OF |
SF |
ZF |
signed |
unsigned |
|
1 |
3B |
3B |
00 |
0 |
0 |
0 |
1 |
op1=op2 |
op1=op2 |
2 |
3B |
15 |
26 |
0 |
0 |
0 |
0 |
op1>op2 |
op1>op2 |
3 |
15 |
3B |
DA |
1 |
0 |
1 |
0 |
op1 |
op1 |
4 |
F9 |
F6 |
03 |
0 |
0 |
0 |
0 |
op1>op2 |
op1>op2 |
5 |
F6 |
F9 |
FD |
1 |
0 |
1 |
0 |
op1 |
op1 |
6 |
15 |
F6 |
1F |
1 |
0 |
0 |
0 |
op1>op2 |
op1 |
7 |
F6 |
15 |
E1 |
0 |
0 |
1 |
0 |
op1 |
op1>op2 |
8 |
68 |
A5 |
C3 |
1 |
1 |
1 |
0 |
op1>op2 |
op1 |
9 |
A5 |
68 |
3D |
0 |
1 |
0 |
0 |
op1 |
op1>op2 |
When one first thinks about less than, it seems as if the carry flag should be set for a borrow whenever operand1 is less than operand2. This logic is correct if one interprets the operands as unsigned numbers. Examples 3, 5, 6, and 8 all have operand1 < operand2 as unsigned numbers, and these are exactly the examples where CF=1. Therefore, for unsigned numbers, CF=0 means that operand1 ≥ operand2. Strict inequality for unsigned numbers is characterized by CF=0 and ZF=0, that is operand1 ≥ operand2 and operand1 [.notequal] operand2.
Examples 3, 5, 7, and 9 have operand1 < operand2 as signed numbers. What characterizes this situation is that SF[.notequal]OF. In the remaining examples, SF=OF, and operand1 ≥ operand2 are signed numbers. Strict inequality for unsigned numbers is characterized by SF=OF and ZF=0, that is operand1 ≥ operand2 and operand1 [.notequal] operand2.
The cmp instructions are listed in Fig. 5.4. Looking back at Fig. 4.5, one sees that the entries in the various columns are almost all the same as for sub instructions. When the first operand is in memory, the cmp instructions require fewer clock cycles than corresponding sub instructions since the result need not be stored. There are alternative opcodes for some operand combinations-the ones listed are those chosen by MASM 6.11.
Clock Cycles |
||||||
---|---|---|---|---|---|---|
Destination Operand |
Source Operand |
386 |
486 |
Pentium |
Number of Bytes |
opcode |
register 8 |
immediate 8 |
2 |
1 |
1 |
3 |
80 |
register 16 |
immediate 8 |
2 |
1 |
1 |
3 |
83 |
register 32 |
immediate 8 |
2 |
1 |
1 |
3 |
83 |
register 16 |
immediate 16 |
2 |
1 |
1 |
4 |
81 |
register 32 |
immediate 32 |
2 |
1 |
1 |
6 |
81 |
AL |
immediate 8 |
2 |
1 |
1 |
2 |
3C |
AX |
immediate 16 |
2 |
1 |
1 |
3 |
3D |
EAX |
immediate 32 |
2 |
1 |
1 |
5 |
3D |
memory byte |
immediate 8 |
5 |
2 |
2 |
3+ |
80 |
memory word |
immediate 8 |
5 |
2 |
2 |
3+ |
83 |
memory doubleword |
immediate 8 |
5 |
2 |
2 |
3+ |
83 |
memory word |
immediate 16 |
5 |
2 |
2 |
4+ |
81 |
memory doubleword |
immediate 32 |
5 |
2 |
2 |
6+ |
81 |
register 8 |
register 8 |
2 |
1 |
1 |
2 |
38 |
register 16 |
register 16 |
2 |
1 |
1 |
2 |
3B |
register 32 |
register 32 |
2 |
1 |
1 |
2 |
3B |
register 8 |
memory byte |
6 |
2 |
2 |
2+ |
3A |
register 16 |
memory word |
6 |
2 |
2 |
2+ |
3B |
register 32 |
memory doubleword |
6 |
2 |
2 |
2+ |
3B |
memory byte |
register 8 |
5 |
2 |
2 |
2+ |
38 |
memory word |
register 16 |
5 |
2 |
2 |
2+ |
39 |
memory doubleword |
register 32 |
5 |
2 |
2 |
2+ |
39 |
Figure 5.4: cmp instructions
A few reminders are in order about immediate operands. These can be coded in your choice of bases or as characters. Assuming that pattern references a word in the data segment, each of the following is allowable.
cmp eax, 356 cmp pattern, 0d3a6h cmp bh, '$'
Note that an immediate operand must be the second operand. The instruction
cmp 100, total ; illegal
is not acceptable since the first operand is immediate.
Finally it is time to list the conditional jump instructions; they are shown in Fig. 5.5. Many of these have alternative mnemonics that generate exactly the same machine code; these describe the same set of conditions a different way. Often one mnemonic is more natural than the other for implementation of a given design.
Appropriate for use after comparison of unsigned operands |
||||
---|---|---|---|---|
opcode |
||||
mnemonic |
description |
flags to jump |
short |
near |
ja |
jump if above |
CF=0 and ZF=0 |
77 |
OF 87 |
jnbe |
jump if not below or equal |
|||
jae |
jump if above or equal |
CF=0 |
73 |
OF 83 |
jnb |
jump if not below |
|||
jb |
jump if below |
CF=1 |
72 |
OF 82 |
jnae |
jump if not above or equal |
|||
jbe |
jump if below or equal |
CF=1 or ZF=1 |
76 |
OF 86 |
jna |
jump if not above |
|||
jg |
jump if greater |
SF=OF and ZF=0 |
7F |
OF 8F |
jnle |
jump if not less or equal |
|||
jge |
jump if greater or equal |
SF=OF |
7D |
OF 8D |
jnl |
jump if not less |
|||
jl |
jump if less |
SF≠OF |
7C |
OF 8C |
jnge |
jump if not greater or equal |
|||
jle |
jump if less or equal |
SF≠OF or ZF=1 |
7E |
OF 8E |
jng |
jump if not greater |
|||
Opcode |
||||
Other conditional jumps |
||||
mnemonic |
description |
flags to jump |
short |
near |
je |
jump if equal |
ZF=1 |
74 |
OF 84 |
jz |
jump if zero |
|||
jne |
jump if not equal |
ZF=0 |
75 |
OF 85 |
jnz |
jump if not zero |
|||
js |
jump if sign |
SF=1 |
78 |
OF 88 |
jns |
jump if not sign |
SF=0 |
79 |
OF 89 |
jc |
jump if carry |
CF=1 |
72 |
0F 82 |
jnc |
jump if not carry |
CF=0 |
73 |
0F 83 |
jp |
jump if parity |
PF=1 |
7A |
OF 8A |
jpe |
jump if parity even |
|||
jnp |
jump if not parity |
PF=0 |
7B |
OF 8B |
jpo |
jump if parity odd |
|||
jo |
jump if overflow |
OF=1 |
70 |
OF 80 |
jno |
jump if not overflow |
OF=0 |
71 |
OF 81 |
Figure 5.5: Conditional jump instructions
Conditional jump instructions always compare the first operand to the second operand. For example, for the instruction jg, "jump if greater" means to jump if operand1 > operand2.
Each conditional jump instruction takes a single clock cycle for execution. No conditional jump instruction changes any flag value. Each instruction has a short version and a near version. Just as with short unconditional jump instructions, a short conditional jump encodes a single-byte displacement and can transfer control 128 bytes before or 127 bytes after the address of the byte following the instruction itself. A short conditional jump requires two bytes of object code, one for the opcode and one for the displacement. A near conditional jump encodes a 32-bit displacement in addition to a two-byte opcode, giving a total length of six bytes. It can transfer control up to 2,147,483,648 bytes backward or 2,147,483,647 forward. The number of bytes and number of clock cycles for conditional jump instructions is summarized in Fig. 5.6.
Clock Cycles |
||||
---|---|---|---|---|
386 |
486 |
Pentium |
Number of Bytes |
|
short conditional jump |
7+, 3 |
3, 1 |
1 |
2 |
near conditional jump |
7+, 3 |
3, 1 |
1 |
6 |
For the 80386 and 80486 the longer time is when the jump is executed; the shorter time is for no jump. |
Figure 5.6: Timing and size of conditional jump instructions
One more pair of examples will illustrate the difference between the conditional jumps appropriate after comparison of signed and unsigned numbers. Suppose a value is stored in EAX and some action needs to be taken when that value is larger than 100. If the value is unsigned, one might code
cmp eax, 100 ja bigger
The jump would be chosen for any value bigger than 0000006416, including values between 8000000016 and FFFFFFFF16, which represent both large unsigned numbers and negative 2's complement numbers. If the value in EAX is interpreted as signed, then the instructions
cmp ax,100 jg bigger
are appropriate. The jump will only be taken for values between 00000064 and 7FFFFFFF, not for those bit patterns that represent negative 2's complement numbers.
We now look at three examples showing implementation of if structures. The implementations are consistent with what a high-level language compiler would use. First consider the design
if value < 10 then add 1 to smallCount; else add 1 to largeCount; end if;
Suppose that value is stored in the EBX register and that smallCount and largeCount reference words in memory. The following 80×86 code implements this design.
cmp ebx, 10 ; value < 10 ? jnl elseLarge inc smallCount ; add 1 to small_count jmp endValueCheck elseLarge: inc largeCount ; add 1 to large_count endValueCheck:
Note that this code is completely self-contained; you do not need to know what comes before or after in the overall design to implement this portion. You must have a plan for making labels, though, to avoid duplicates and reserved words. A compiler often produces a label consisting of a letter followed by a sequence number, but most of the time we can do better as humans writing code.
Now consider the design
if (total ≥ 100) or (count = 10) then add value to total; end if;
Assume that total and value reference doublewords in memory and that count is stored in the CX register. Here is assembly language code to implement this design.
cmp total, 100 ; total >= 100 ? jge addValue cmp cx, 10 ; count = 10 ? jne endAddCheck addValue: mov ebx, value ; copy value add total, ebx ; add value to total endAddCheck:
Notice that the design's or requires two cmp instructions. If either of the corresponding tests is passed, then the addition is performed. (Why was the addition done with two statements? Why not use add total,value?) This code implements a short-cut or - if the first condition is true, then the second is not checked at all. The code implemented for some languages always checks both operands of an or operation, even if the first is true.
Finally consider the design
if (count > 0) and (ch = backspace) then subtract 1 from count; end if;
For this third example, assume that count is in the CX register, ch is in the AL register and that backspace has been equated to 0816, the ASCII backspace character. This design can be implemented as follows.
cmp cx, 0 ; count > 0 ? jng endCheckCh cmp al, backspace ; ch a backspace? jne endCheckCh dec count ; subtract 1 from count endCheckCh:
This compound condition uses and, so both parts must be true in order to execute the action. This code implements a short-cut and - if the first condition is false, then the second is not checked at all. The code implemented for some languages always checks both operands of an and operation, even if the first is false.
This section ends with an example implementing a simple game program. The computer asks one player to enter a number. After it is typed in, the screen is cleared, and the other player tries to guess the number. After each guess the computer reports "too low," "too high," or "you got it." After the number is finally guessed, the number of attempts is reported, and the players are asked if they want to play another game. The pseudocode design in Fig. 5.7 gives a more precise description.
until response='N' or response='n' loop prompt first player for target; input target and convert to 2's complement form; clear screen; count := 0; until guess=target loop add 1 to count; prompt second player for guess; input guess and convert to 2's complement; if guess=target then display "you got it"; elseif guess
Figure 5.7: Design for game program
The assembly language source code for the game program is shown in Fig. 5.8. Note that screen is cleared by writing 24 line feed characters. The loop and selection structures in the program faithfully follow the design. Recall that an until loop is a post-test loop. The next section carefully describes how to implement both until and while loops.
; program to implement number guessing game ; author: R. Detmer ; date: revised 9/97 .386 .MODEL FLAT INCLUDE io.h ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD cr EQU 0dh ; carriage return character Lf EQU 0ah ; linefeed character .STACK 4096 ; reserve 4096-byte stack .DATA ; reserve storage for data prompt1 BYTE cr,Lf,Lf,"Player 1, please enter a number: ", 0 target DWORD ? clear BYTE 24 DUP (Lf), 0 prompt2 BYTE cr,Lf,"Player 2, your guess? ", 0 stringIn BYTE 20 DUP (?) lowOutput BYTE "too low", cr, Lf, 0 highOutput BYTE "too high", cr, Lf, 0 gotItOutput BYTE "you got it", cr, Lf, 0 countLabel BYTE Lf, "Number of guesses:" countOut BYTE 6 DUP (?) BYTE cr, Lf, Lf, Lf, "Do you want to play again? ",0 .CODE ; start of main program code _start: untilDone: output prompt1 ; ask player 1 for target input stringIn, 20 ; get number atod stringIn ; convert to integer mov target,eax ; store target output clear ; clear screen mov cx, 0 ; zero count untilMatch: inc cx ; increment count of guesses output prompt2 ; ask player 2 for guess input stringIn, 20 ; get number atod stringIn ; convert to integer cmp eax, target ; compare guess and target jne ifLess ; guess = target ? equal: output gotItOutput ; display "you got it" jmp endCompare ifLess: jnl isGreater ; guess < target ? output lowOutput ; display "too low" jmp endCompare isGreater: output highOutput ; display "too high" endCompare: cmp eax, target ; compare guess and target jne untilMatch ; ask again if guess not = target itoa countOut, cx ; convert count to ASCII output countLabel ; display label, count and prompt input stringIn, 20 ; get response cmp stringIn, 'n' ; response = 'n' ? je endUntilDone ; exit if so cmp stringIn, 'N' ; response = 'N' ? jne untilDone ; repeat if not endUntilDone: INVOKE ExitProcess, 0 ; exit with return code 0 PUBLIC _start ; make entry point public END ; end of source code
Figure 5.8: Game program
The outside until loop in the game program is terminated by either a "N" or "n" response to a query to the players. The input macro is used to get the response in the same input area as used for numbers earlier. Since the address of a multibyte object is the address of its first byte, the instruction
cmp stringIn, 'n' ; response = 'n' ?
is really comparing the first (and probably only) character of input to the letter "n". This is not a comparison of two strings.
Exercises 5.2
(a) |
cmp eax, value jl dest |
(b) |
cmp eax, value jb dest |
(c) |
cmp eax, 04fh je dest |
(d) |
cmp eax, 79 jne dest |
(e) |
cmp value, 0 jl dest |
(f) |
cmp value, −200 jge dest |
(g) |
add eax, 200 js dest |
(h) |
add value, 200 jz dest |
Assumptions: count is in ECX; value references a doubleword in memory
Assumptions: count is in ECX; value references a doubleword in memory
Assumptions: each of a, b, and c references a doubleword in memory; the character check is in the AL register
Assumption: value is in EDX
Assumptions: ch is in AL; each of lowerCount, upperCount, and other-Count references a doubleword in memory
Programming Exercises 5.2
until (value ≥ 0) and (value ≤ 1000) loop input value and convert to 2's complement; if (value < 0) or (value > 1000) then display "enter value 0 to 1000"; end if; end until;
Most programs contain loops. Commonly used loop structures include while, until, and for loops. This section describes how to implement all three of these structures in 80×86 assembly language. The next section describes additional instructions that can be used to implement for loops.
A while loop can be indicated by the following pseudocode design.
while continuation condition loop ... { body of loop } end while;
The continuation condition, a Boolean expression, is checked first. If it is true, then the body of the loop is executed. The continuation condition is then checked again. Whenever the value of the Boolean expression is false, execution continues with the statement following end while.
An 80×86 implementation of a while loop follows a pattern much like this one.
while: . ; code to check Boolean expression . . body: . ; loop body . . jmp while ; go check condition again endWhile:
It often takes several statements to check the value of the Boolean expression. If it is determined that the value is false, then there will be a jump to endWhile. If it is determined that the continuation condition is true, then the code will either fall through to body or there will be a jump to its label. Notice that the body of the loop ends with a jmp to go check the condition again. Two common mistakes are to omit this jump or to jump to the body instead.
The label while in this model is not allowed in actual code since while is a reserved word in MASM. In fact, MASM 6.11 has a while directive that simplifies writing code for while loops. It is not used in this book since our main concern is understanding how structures are implemented at the machine language level.
For an example, suppose that the design
------------ while (sum < 1000) loop ... { body of loop } end while; ------------
is to be coded in 80×86 assembly language. Assuming that sum references a doubleword in memory, one possible implementation is
whileSum: cmp sum, 1000 ; sum < 1000? jnl endWhileSum ; exit loop if not . ; body of loop . . jmp whileSum ; go check condition again endWhileSum:
The statement
jnl endWhileSum
directly implements the design. An alternative would be to use
jge endWhileSum
which transfers control to the end of the loop if sum ≥ 1000. This works since the inequality (sum ≥ 1000) will be true exactly when the (sum < 1000) is false, but the jnl mnemonic makes it easier to implement the design without having to reverse the inequality.
For a short example showing a complete loop body, suppose that the integer base 2 logarithm of a positive number needs to be determined. The integer base 2 logarithm of a number is the largest integer x such that
2x ≤ number
The following design does the job.
---------- x := 0; twoToX := 1; while twoToX ≤ number multiply twoToX by 2; add 1 to x; end while; subtract 1 from x; ----------
Assuming that number references a doubleword in memory, the following 80×86 code implements the design, using the EAX register for twoToX and the CX register for x.
mov cx, 0 ; x := 0 mov eax, 1 ; twoToX := 1 whileLE: cmp eax, number ; twoToX <= number? jnle endWhileLE ; exit if not body: add eax, eax ; multiply twoToX by 2 inc cx ; add 1 to x jmp whileLE ; go check condition again endWhileLE: dec cx ; subtract 1 from x
Often the continuation condition in a while is compound, having two parts connected by Boolean operators and or or. Both operands of an and must be true for a true conjunction. With an or, the only way the disjunction can be false is if both operands are false.
Changing a previous example to include a compound condition, suppose that the following design is to be coded.
while (sum < 1000) and (count ≤ 24) loop ... { body of loop } end while;
Assuming that sum references a doubleword in memory and the value of count is in CX, an implementation is
whileSum: cmp sum, 1000 ; sum < 1000? jnl endWhileSum ; exit if not cmp cx, 24 ; count <= 24 jnle endWhileSum ; exit if not . ; body of loop . . jmp whileSum ; go check condition again endWhileSum:
Modifying the example another time, here is a design with an or instead of an and.
while (sum < 1000) or (flag = 1) loop ... { body of loop } end while;
This time, assume that sum is in the EAX register and that flag is a single byte in the DH register. Here is 80×86 code that implements the design.
whileSum: cmp eax, 1000 ; sum < 1000? jl body ; execute body if so cmp dh,1 ; flag = 1? jne endWhileSum ; exit if not body: . ; body of loop . . jmp whileSum ; go check condition again endWhileSum:
Notice the difference in the previous two examples. For an and the loop is exited if either operand of the compound condition is false. For an or the loop body is executed if either operand of the compound condition is true.
Sometimes processing in a loop is to continue while normal values are encountered and to terminate when some sentinel value is encountered. If data are being entered from the keyboard, this design can be written
get value from keyboard; while (value is not sentinel) loop ... { body of loop } get value from keyboard; end while;
In some high-level languages, implementation code must exactly parallel this design. One of the advantages of assembly language is that one has more flexibility. An equivalent design is
while (value entered from keyboard is not sentinel) loop ... { body of loop } end while;
This design does not require two separate instructions to input data. It can be coded in some high-level languages and also in 80×86 assembly language.
For a concrete example illustrating implementation of such a design, suppose that non-negative numbers entered at the keyboard are to be added, with any negative entry serving as a sentinel value. A design looks like
sum := 0; while (number keyed in is not negative) loop add number to sum; end while;
Assuming appropriate definitions in the data segment, the 80×86 code could be
mov ebx, 0 ; sum := 0 whileNotNeg: output prompt ; prompt for input input number,10 ; get number from keyboard atod number ; convert to 2's complement js endWhile ; exit if negative add ebx, eax ; add number to sum jmp whileNotNeg ; go get next number endWhile:
Recall that the atod macro affects the sign flag SF, setting it if the ASCII characters are converted to a negative number in the EAX register and clearing it otherwise.
The body of a for loop, a counter-controlled loop, is executed once for each value of a loop index (or counter) in a given range. In some high-level languages, the loop index can be some type other than integer; in assembly language the index is usually an integer. A for loop can be described by the following pseudocode.
for index := initialValue to finalValue loop ... { body of loop } end for;
A for loop can easily be translated into a while structure.
index := initialValue; while index ≤ finalValue loop ... { body of loop } add 1 to index; end while;
Such a while is readily coded in 80×86 assembly language.
As an example, suppose that a collection of numbers needs to be added and no value is convenient as a sentinel. Then one might want to ask a user how many numbers are to be entered and loop for that many entries. The design looks like
prompt for tally of numbers; input tally; sum := 0 for count := 1 to tally loop prompt for number; input number; add number to sum; end for;
Making straightforward assumptions about definitions in the data segment, here is an 80×86 implementation of the design.
output prompt1 ; prompt for tally input value, 20 ; get tally (ASCII) atoi value ; convert to 2's complement mov tally, ax ; store tally mov edx, 0 ; sum := 0 mov bx, 1 ; count := 1 forCount: cmp bx, tally ; count <= tally? jnle endFor ; exit if not output prompt2 ; prompt for number input value, 20 ; get number (ASCII) atod value ; convert to 2's complement add edx, eax ; add number to sum inc bx ; add 1 to count jmp forCount ; repeat endFor:
In a for loop implementation where one is sure that the body of the loop will be executed at least once (i.e., initialValue ≤ finalValue), one can check the index against the final value at the end of the loop body rather than prior to the body. Other variations are also possible. Additional instructions for implementing for loops will be covered in Section 5.4.
You have already seen examples of until loops. In general, an until loop can be expressed as follows in pseudocode.
until termination condition loop ... { body of loop } end until;
The body of the loop is executed at least once; then the termination condition is checked. If it is false, then the body of the loop is executed again; if true, execution continues with the statement following end until.
An 80×86 implementation of an until loop usually looks like the following code fragment.
until: . ; start of loop body . . . . . ; code to check termination condition endUntil:
If the code to check the termination condition determines that the value is false, then there will be a jump to until. If it is determined that the value is true, then the code will either fall through to endUntil or there will be a jump to that label.
The game program implemented in Fig. 5.8 contained two simple until loops. Here is an example with a compound terminating condition. Given the design
count := 0; until (sum > 1000) or (count = 100) loop ... { body of loop } add 1 to count; end until;
the following 80×86 code provides an implementation. Assume that sum references a word in the data segment and that count is stored in CX.
mov cx, 0 ; count := 0 until: . ; body of loop . . inc cx ; add 1 to count cmp sum, 1000 ; sum > 1000 ? jg endUntil ; exit if sum > 1000 cmp cx, 100 ; count = 100 ? jne until ; continue if count not = 100 endUntil:
Other loop structures can also be coded in assembly language. The forever loop is frequently useful. As it appears in pseudocode, it almost always has an exit loop statement to transfer control to the end of the loop; this is often conditional—that is, in an if statement. Here is a fragment of a typical design.
forever loop . . . if (response = 's') or (response = 'S') then exit loop; end if; . . . end loop;
Assuming that the value of response is in the AL register, this can be implemented as follows in 80×86 assembly language.
forever: . . . cmp al, 's' ; response = 's'? je endLoop ; exit loop if so cmp al, 'S' ; response = 'S'? je endLoop ; exit loop if so . . . jmp forever ; repeat loop body endLoop:
Exercises 5.3
Programming Exercise 5.3
display "First number? "; input number; minimum := number; maximum := number; while (response to "Another number? " is 'Y' or 'y') loop input number; if (number < minimum) then minimum := number; end if; if (number > maximum) then maximum := number; end if; end while; display the minimum value; display the maximum value;
sum := 0; count := 0; while (number entered from keyboard ≠ -999999) loop add number to sum; add 1 to count; end while; if (count = 0) then display "No numbers entered"; else average := sum/count; display sum and average; end if;
ACount := 0; BCount := 0; CCount := 0; DCount := 0; FCount := 0; while (grade entered at keyboard ≥ 0) loop if (grade ≥ 90) then add 1 to ACount; elseif (grade ≥ 80) then add 1 to BCount; elseif (grade ≥ 70) then add 1 to CCount; elseif (grade ≥ 60) then add 1 to DCount; else add 1 to FCount; end if; end while; display "Number of As", ACount; display "Number of Bs", BCount; display "Number of Cs", CCount; display "Number of Ds", DCount; display "Number of Fs", FCount;
gcd := number1; remainder := number2; until (remainder = 0) loop dividend := gcd; gcd := remainder; remainder := dividend mod gcd; end until;
Write a complete 80×86 assembly language program that implements the following design, with appropriate prompts for input and labels for output.
until (number1 > 0) loop input number1; end until; until (number2 > 0) loop input number2; end until; find gcd of number1 and number2; (see design above) display gcd;
total := 0; forever loop display "number? "; input number; display "action (+, -, c or q) ? "; input action; if (action = '+') then add number to total; elseif (action = '-') then subtract number from total; elseif (action = 'c') or (action = 'C') then total := 0; elseif (action = 'q') or (action = 'Q') then exit loop; else display "Unknown action"; end if; display "total", total; end loop;
Often the number of times the body of a loop must be executed is known in advance, either as a constant that can be coded when a program is written, or as the value of a variable that is assigned before the loop is executed. The for loop structure is ideal for coding such a loop.
The previous section showed how to translate a for loop into a while loop. This technique always works and is frequently the best way to code a for loop. However, the 80×86 microprocessor has instructions that make coding certain for loops very easy.
Consider the following two for loops, the first of which counts forward and the second of which counts backward.
for index := 1 to count loop ... { body of loop } end for;
and
for index := count downto 1 loop ... { body of loop } end for;
The body of each loop executes count times. If the value of index is not needed for display or for calculations within the body of the loop, then the loop that counts down is equivalent to the loop that counts up, although the design may not be as natural. Backward for loops are very easy to implement in 80×86 assembly language with the loop instruction.
The loop instruction has the format
loop statementLabel
where statementLabel is the label of a statement that is a short displacement (128 bytes backward or 127 bytes forward) from the loop instruction. The loop instruction causes the following actions to take place:
In addition to the loop instruction, there are two conditional loop instructions that are less frequently used. Features of all three instructions are summarized in Fig. 5.9. Each requires two bytes of object code; the first byte is the opcode and the second byte is the displacement to the destination statement. Two times are given for 80486 and Pentium instructions, the first showing how many clock cycles are required if the jump is not taken, and the second showing how many clock cycles are required if it is taken. The situation is more complex for the 80386, but it also has two distinct execution times. None of these instructions changes any flag.
Clock Cycles |
|||||
---|---|---|---|---|---|
Mnemonic |
386 |
486 |
Pentium |
Number of Bytes |
Opcode |
loop |
11+ |
6/7 |
5/6 |
2 |
E2 |
loope/loopz |
11+ |
6/9 |
7/8 |
2 |
E1 |
loopne/loopnz |
11+ |
6/9 |
7/8 |
2 |
E0 |
Figure 5.9: loop instructions
Although the ECX register is a general register, it has a special place as a counter in the loop instruction and in several other instructions. No other register can be substituted for ECX in these instructions. In practice this often means that when a loop is coded, either ECX is not used for other purposes or a counter value is put in ECX before a loop instruction is executed but is saved elsewhere to free ECX for other uses for most of the body of the loop.
The backward for loop structure
for count := 20 downto 1 loop ... { body of loop } end for;
can be coded as follows in 80×86 assembly language.
mov ecx, 20 ; number of iterations forCount: . ; body of loop . . loop forCount ; repeat body 20 times
The counter in the ECX register will be 20 the first time the body of the loop is executed and will be decremented to 19 by the loop instruction. The value 19 is not zero, so control transfers to the start of the loop body at label forCount. The second time the body of the loop is executed, the ECX register will contain 19. The last time the value in ECX will be one; it will be decremented to zero by the loop instruction, and the jump to for-Count will not be taken.
The obvious label to mark the body of a for loop is for. Unfortunately this is a reserved word in MASM. It is used for a directive that simplifies coding of for loops. Again, our primary interest is in learning how the computer works at the machine level, so this directive will not be used.
Now suppose that the doubleword in memory referenced by number contains the number of times a loop is to be executed. The 80×86 code to implement a backward for loop could be
mov ecx, number ; number of iterations forIndex: . ; body of loop . . loop forIndex ; repeat body number times
This is safe code only if the value stored at number is not zero. If it is zero, then the loop body is executed, the zero value is decremented to FFFFFFFF (a borrow is required to do the subtraction), the loop body is executed again, the value FFFFFFFF is decremented to FFFFFFFE, and so forth. The body of the loop is executed 4,294,967,296 times before the value in ECX gets back down to zero! To avoid this problem, one could code
mov ecx, number ; number of iterations cmp ecx, 0 ; number = 0 ? je endFor ; skip loop if number = 0 forIndex: . ; body of loop . . loop forIndex ; repeat body number times endFor:
If number is a signed value and might be negative, then
jle endFor ; skip loop if number <= 0
is a more appropriate conditional jump.
There is another way to guard a for loop so that it is not executed when the value in ECX is zero. The 80×86 instruction set has a jecxz conditional jump instruction that jumps to its destination if the value in the ECX register is zero. Using the jecxz instruction, the example above can be coded as
mov ecx, number ; number of iterations jecxz endFor ; skip loop if number = 0 forIndex: . ; body of loop . . loop forIndex ; repeat body number times endFor:
There is also a jcxz instruction that checks the CX register rather than the ECX register. Both instructions are two bytes long, the opcode E3 plus a single-byte displacement; the prefix byte 67 distinguishes between the 16-bit size and the 32-bit versions. Like the other conditional jump instructions, jcxz/jecxz affects no flag value. They do take longer to execute, six clock cycles on a Pentium if the jump takes place (if the value in ECX is zero), and five clock cycles to fall through to the next statement otherwise.
The jecxz instruction can be used to code a backward for loop when the loop body is longer than 127 bytes, too large for the loop instruction's single-byte displacement. For example, the structure
for counter := 50 downto 1 loop ... { body of loop } end for;
could be coded as
mov ecx, 50 ; number of iterations forCounter: . ; body of loop . . dec ecx ; decrement loop counter jecxz endFor ; exit if counter = 0 jmp forCounter ; otherwise repeat body endFor:
However, since the dec instruction sets or resets the zero flag ZF, the faster conditional jump
jz endFor
can be used instead of the jecxz instruction.
It is often convenient to use a loop statement to implement a for loop, even when the loop index increases and must be used within the body of the loop. The loop statement uses ECX to control the number of iterations, while a separate counter serves as the loop index.
For example, to implement the for loop
for index := 1 to 50 loop ...{ loop body using index } end for;
the EBX register might be used to store index counting from 1 to 50 while the ECX register counts down from 50 to 1.
mov ebx, 1 ; index := 1 mov ecx, 50 ; number of iterations for loop forNbr: . . ; use value in EBX for index . inc ebx ; add 1 to index loop forNbr ; repeat
Figure 5.9 listed two variants of the loop instruction, loopz/loope and loopnz/loopne. Each of these work like loop, decrementing the counter in ECX. However, each examines the value of the zero flag ZF as well as the new value in the ECX register to decide whether or not to jump to the destination location. The loopz/loope instruction jumps if the new value in ECX is nonzero and the zero flag is set (ZF=1). The loopnz/loopne instruction jumps if the new value in ECX is nonzero and the zero flag is clear (ZF=0).
The loopz and loopnz instructions are useful in special circumstances. Some programming languages allow loop structures such as
for year := 10 downto 1 until balance = 0 loop ... { body of loop } end for;
This confusing structure means to terminate loop execution using whichever loop control is satisfied first. That is, the body of the loop is executed 10 times (for year = 10, 9,…,1) unless the condition balance = 0 is true at the bottom of some execution of the loop body, in which case the loop terminates with fewer than 10 iterations. If the value of balance is in the EBX register, the following 80×86 code could be used.
mov ecx, 10 ; maximum number of iterations forYear: . ; body of loop . . cmp ebx, 0 ; balance = 0 ? loopne forYear ; repeat 10 times if balance not 0
Exercises 5.4
ASCIIcount BYTE 11 DUP (?) ASCIIsum BYTE 11 DUP (?) BYTE 13, 10, 0 ; carriage return, linefeed
Programming Exercise 5.4
number square 1 1 2 4 3 9 4 16 5 25
A B C 3 4 5 4 3 5
Programs frequently use arrays to store collections of data values. Loops are commonly used to manipulate the data in arrays. This section shows one way to access 1-dimensional arrays in 80×86 assembly language; other techniques will appear in Chapter 9 with discussion of additional memory addressing modes.
This section contains a complete program to implement the design below. The program first accepts a collection of positive numbers from the keyboard, counting them and storing them in an array. It then calculates the average of the numbers by going back through the numbers stored in the array, accumulating the total in sum. Finally the numbers in the array are scanned again, and this time the numbers larger than the average are displayed. The first two loops could be combined, of course, with the sum being accumulated as the numbers are keyed in. As a general programming philosophy, clearer code results from separating tasks; they should be combined only if there is a real need to save execution time or bytes of object code.
nbrElts := 0; { input numbers into array } get address of first item of array; while (number from keyboard > 0) loop convert number to 2's complement; store number at address in array; add 1 to nbrElts; get address of next item of array; end while; sum := 0; { find sum and average } get address of first item of array; for count := nbrElts downto 1 loop add doubleword at address in array to sum; get address of next item of array; end for; average := sum/nbrElts; display average; get address of first item of array; { list big numbers } for count := nbrElts downto 1 loop if doubleword of array > average then convert doubleword to ASCII; display value; end if; get address of next item of array; end for;
This design contains the curious instructions "get address of first item of array" and "get address of next item of array." These reflect the particular assembly language implementation, one which works well if the task at hand involves moving sequentially through an array. The 80×86 feature which makes this possible is register indirect addressing, first discussed in Chapter 3. The example will use the EBX register to contain the address of the word currently being accessed; then [ebx] references the doubleword at the address in the EBX register rather than the doubleword in the register itself. In the 80×86 architecture any of the general registers EAX, EBX, ECX, and EDX or the index registers EDI and ESI are appropriate for use as a "pointer." The ESI and EDI registers are often reserved for use with strings, which are usually arrays of characters. String operations are covered in Chapter 7. The program listing appears in Fig. 5.10.
; input a collection of numbers ; report their average and the numbers which are above average ; author: R. Detmer ; date: revised 9/97 .386 .MODEL FLAT INCLUDE io.h ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD cr EQU 0dh ; carriage return character Lf EQU 0ah ; linefeed character maxNbrs EQU 100 ; size of number array .STACK 4096 .DATA directions BYTE cr, Lf, 'You may enter up to 100 numbers' BYTE ' one at a time.',cr,Lf BYTE 'Use any negative number to terminate input.',cr,Lf,Lf BYTE 'This program will then report the average and list',cr,Lf BYTE 'those numbers which are above the average.',cr,Lf,Lf,Lf,0 prompt BYTE 'Number? ',0 number BYTE 20 DUP (?) nbrArray DWORD maxNbrs DUP (?) nbrElts DWORD ? avgLabel BYTE cr,Lf,Lf,'The average is' outValue BYTE 11 DUP (?), cr,Lf,0 aboveLabel BYTE cr,Lf,'Above average:',cr,Lf,Lf,0 .CODE _start: ; input numbers into array output directions ; display directions mov nbrElts,0 ; nbrElts := 0 lea ebx,nbrArray ; get address of nbrArray whilePos: output prompt ; prompt for number input number,20 ; get number atod number ; convert to integer jng endWhile ; exit if not positive mov [ebx],eax ; store number in array inc nbrElts ; add 1 to nbrElts add ebx,4 ; get address of next item of array jmp whilePos ; repeat endWhile: ; find sum and average mov eax,0 ; sum := 0 lea ebx,nbrArray ; get address of nbrArray mov ecx,nbrElts ; count := nbrElts jecxz quit ; quit if no numbers forCount1: add eax,[ebx] ; add number to sum add ebx,4 ; get address of next item of array loop forCount1 ; repeat nbrElts times cdq ; extend sum to quadword idiv nbrElts ; calculate average dtoa outValue,eax ; convert average to ASCII output avgLabel ; print label and average output aboveLabel ; print label for big numbers ; display numbers above average lea ebx,nbrArray ; get address of nbrArray mov ecx,nbrElts ; count := nbrElts forCount2: cmp [ebx],eax ; doubleword > average ? jng endIfBig ; continue if average not less dtoa outValue,[ebx] ; convert value from array to ; ASCII output outValue ; display value endIfBig: add ebx,4 ; get address of next item of array loop forCount2 ; repeat quit: INVOKE ExitProcess, 0 ; exit with return code 0 PUBLIC _start ; make entry point public END ; end of source code
Figure 5.10: Program using array
The design statement "get address of first item of array" is implemented by the 80×86 statement
lea ebx, nbrArray
The mnemonic lea stands for "load effective address." The lea instruction has the format
lea destination, source
The destination will normally be a 32-bit general register; the source is any reference to memory. The address of the source is loaded into the register. (Contrast this with mov destination, source where the value at the source address is copied to the destination.) The lea instruction has opcode 8D takes one clock cycle on a Pentium, one or two on an 80486, and two on an 80386.
The design statement "get address of next item of array" is implemented using the 80×86 statement
add ebx, 4
Since each doubleword occupies four bytes of storage, adding 4 to the address of the current element of an array gives the address of the next element of the array.
If one were planning to code this program in a high-level language, then the design of the first two loops might be
nbrElts := 0; { input numbers into array } while number from keyboard > 0 loop add 1 to nbrElts; store number in nbrSrray[nbrElts]; end while; sum := 0; { find sum and average } for count := 1 to nbrElts loop add nbrArray[count] to sum; end for;
This design exploits one of the principal features of arrays, namely that any element can be accessed at any time by simply giving its index; the elements do not have to be accessed sequentially. Such random access can be implemented using register indirect addressing. For example, the design statement "add nbrArray[count] to sum" can be implemented as follows, assuming the same register usage as before—the ECX register for count and the EAX register for sum.
mov edx,ecx ; count dec edx ; count-1 add edx,edx ; 2*(count-1) add edx,edx ; 4*(count-1) lea ebx,nbrArray ; starting address of array add ebx,edx ; address of nbrArray[count] add eax,[ebx] ; add array[count] to sum
The technique here is to calculate the number of bytes in the array prior to the desired element and add this number to the starting address. There are more efficient ways to directly access an array element; these will be covered in later chapters.
Exercises 5.5
Programming Exercises 5.5
nbrElts := 0; get address of first item of array; while (number from keyboard > 0) loop convert number to 2's complement; store number at address in array; add 1 to nbrElts; get address of next item of array; end while; until (response = 'N') or (response = 'n') display "Search for? "; input keyValue; convert keyValue to 2's complement; get address of first item of array; count := 1; forever loop if count > nbrElts then display keyValue, "not in array"; exit loop; end if; if keyValue = current element of array then display keyValue, "is element", count; exit loop; end if; add 1 to count; get address of next item of array; end loop; display "Search for another number? "; input response; end until;
until (response = 'N') or (response = 'n') display "Search for? "; input keyValue; convert keyValue to 2's complement; store keyValue at position (nbrElts+1) in array; get address of first item of array; count := 1; while keyValue not equal to current array element loop add 1 to count; get address of next word of array; end while; if count > nbrElts then display keyValue, "not in array"; exit loop; else display keyValue, "is element", count; exit loop; end if; display "Search for another number? "; input response; end until;
prime[1] := 2; { first prime number } prime[2] := 3; { second prime number } primeCount := 2; candidate := 4; { first candidate for a new prime } while primeCount < 100 loop index := 1; while (index ≤ primeCount) and (prime[index] does not evenly divide candidate)loop add 1 to index; end while; if (index > primeCount) then {no existing prime evenly divides the candidate, so it is a new prime} add 1 to primeCount; prime[primeCount] := candidate; end if; add 1 to candidate; end while; display "Prime Numbers"; for index := 1 to 100 loop {display the numbers 5 per line } display prime[index]; if index is divisible by 5 then skip to a new line; end if; end for;
Chapter 2 discussed the central processing unit's basic operation cycle:
A CPU must have circuitry to perform each of these functions. One of the things that computer designers have done to speed up CPU operation is to design CPUs with stages that can carry out these (and other) operations almost independently.
The first stage of the CPU might have the job of fetching the next instruction from memory, perhaps doing just enough decoding to recognize the number of bytes the instruction has and update the program counter PC. The first stage passes on information to the second stage whose job might be to finish decoding the instruction, perhaps also computing some operand addresses. Meanwhile the first stage can be fetching the next instruction from memory. The second stage could pass a fully-decoded instruction to the third stage for execution. Meanwhile, the first stage could have passed on its second instruction to stage two, so that the first stage can be fetching a third instruction. This sort of design is called a pipeline. If the pipeline is kept full, the resulting throughput of the CPU is three times faster than if it had to finish the complete fetch-decode-execute process for each instruction before proceeding to the next one.
Figure 5.11 illustrates the operation of a pipeline. The instructions being processed are shown as horizontal strips of three boxes labeled with 1, 2, and 3 to indicate stages. The horizontal axis shows time. You can see that at any given time parts of three instructions are being executed.
CPU Stage |
Instruction being processed |
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
2 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
3 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
||
Time interval |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
Figure 5.11: Instructions in a pipeline
A pipelined CPU is not as simple as illustrated above. One problem may occur if, say, stage 2 needs to compute an address based on the contents of a register modified by stage 3 of the previous instruction; the register might not yet contain the correct address. A CPU can be designed to avoid such problems, usually at the cost of a "hole" in the pipeline.
A more serious problem occurs when the CPU executes a conditional jump instruction. With a conditional jump the CPU cannot tell which of two possible sequences of instructions will be executed next until the condition itself is evaluated by the last stage. Earlier stages may be working on one instruction stream, only to be forced to discard all this work and refill the pipeline from the beginning with instructions from the alternative stream.
This chapter introduced 80×86 instructions that can be used to implement many high-level design or language features including if statements, various loops structures, and arrays.
The jmp instruction unconditionally transfers control to a destination statement. It has several versions, including one that jumps to a short destination 128 bytes before or 127 bytes after the jmp and one that jumps to a near destination a 32-bit displacement away. The jmp instruction is used in implementing various loop structures, typically transferring control back to the beginning of the loop, and in the if-then-else structure at the end of the "then code" to transfer control to endif so that the else code is not also executed. A jmp statement corresponds directly to the goto statement that is available in most high-level languages.
Conditional jump instructions examine the settings of one or more flags in the flag register and jump to a destination statement or fall through to the next instruction depending on the flag values. Conditional jump instructions have short and near displacement versions. There is a large collection of conditional jump instructions. They are used in if statements and loops, often in combination with compare instructions, to check Boolean conditions.
The cmp (compare) instructions have the sole purpose of setting or resetting flags in the EFLAGS register. Each compares two operands and assigns flag values. The comparison is done by subtracting the second operand from the first. The difference is not retained as it is with a sub instruction. Compare instructions often precede conditional jump instructions.
Loop structures like while, until, and for loops can be implemented using compare, jump, and conditional jump instructions. The loop instruction provides another way to implement many for loops. To use the loop instruction, a counter is placed in the ECX register prior to the start of the loop. The loop instruction itself is at the bottom of the loop body; it decrements the value in ECX and transfers control to a destination (normally the first statement of the body) if the new value in ECX is not zero. This results in the body of the loop being executed the number of times originally placed in the ECX register. The conditional jump jecxz instruction can be used to guard against executing such a loop when the initial counter value is zero.
Storage for an array can be reserved using the DUP directive in the data segment of a program. The elements of an array can be sequentially accessed by putting the address of the first element of the array in a register and adding the size of an array element repeatedly to get to the next element. The current element is referenced using register indirect addressing. The lea (load effective address) instruction is commonly used to load the initial address of the array.
Pipelining is done by a CPU with multiple stages that work on more than one instruction at a time, doing such tasks as fetching one, while decoding another, while executing a third. This can greatly speed up CPU operation.
Preface