Table of contents:

Computers derive much of their power from their ability to execute code selectively and from the speed at which they execute repetitive algorithms. Programs in high-level languages like Ada, C++, or Pascal use if-then, if-then-else, and case structures to execute code and loop structures selectively, such as while (pre-test) loops, until (post-test) loops, and for (counter-controlled) loops to repetitively execute code. Some high-level languages have a goto statement for unconditional branching. Somewhat more primitive languages (like older versions of BASIC) depend on fairly simple if statements and an abundance of goto statements for both selective execution and looping.

The 80×86 assembly language programmer’s job is similar to the old BASIC programmer’s job. The 80×86 microprocessor can execute some instructions that are roughly comparable to for statements, but most branching and looping is done with 80×86 statements that are similar to, but even more primitive than, simple if and goto statements. The objective of this chapter is to describe the machine implementation of language structures such as if-then, if-then-else, while, until, and for.

Unconditional Jumps

The 80×86 jmp (jump) instruction corresponds to goto in a high-level language. As coded in assembly language, jmp usually has the form

 jmp StatementLabel

where StatementLabel corresponds to the name field of some other assembly language statement. Recall that the name field is followed by a colon (:) when used to label an executable statement. The colon is not used in the jmp statement itself. As an example, if there were alternative conditions under which a program should be terminated, the code might contain

 jmp quit ; exit from program
 .
 .
 quit: INVOKE ExitProcess, 0 ; exit with return code 0
 .
 .

Figure 5.1 shows a complete example: a program that will input numbers repeatedly and, after each number is entered, display the count of the numbers so far, the cumulative sum, and the average. The program implements the following pseudocode design.

program to input numbers and display running average and sum
; author: R. Detmer
; date: revised 9/97

.386
.MODEL FLAT

INCLUDE io.h

cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character

.STACK 4096 ; reserve 4096-byte stack

.DATA ; reserve storage for data
sum DWORD ?
explain BYTE cr,Lf,"As you input numbers one at a time, this",cr,Lf
 BYTE "program will report the count of numbers so far,",cr,Lf
 BYTE "the sum so far, and the average.",cr,Lf,Lf,0
prompt BYTE "number? ",0
number BYTE 16 DUP (?)
countLabel BYTE "count",0
sumLabel BYTE " sum",0
avgLabel BYTE " average",0
value BYTE 11 DUP (?), 0
nextPrompt BYTE cr,Lf,Lf,"next ",0
.CODE ; start of main program code
_start:
 output explain ; initial instructions
 mov sum,0 ; sum := 0
 mov ebx,0 ; count := 0

forever: output prompt ; prompt for number
 input number,16 ; read ASCII characters
 atod number ; convert to integer

 add sum,eax ; add number to sum
 inc ebx ; add 1 to count
 dtoa value,ebx ; convert count to ASCII
 output countLabel ; display label for count
 output value ; display count

 dtoa value,sum ; convert sum to ASCII
 output sumLabel ; display label for sum
 output value ; display sum

 mov eax,sum ; get sum
 cdq ; extend sum to 64 bits
 idiv ebx ; sum / count
 dtoa value,eax ; convert average to ASCII
 output avgLabel ; display label for average
 output value ; output average

 output nextPrompt ; skip down, start next prompt
 jmp forever ; repeat

PUBLIC _start ; make entry point public
 END

Figure 5.1: Program with forever loop


display instructions;
sum := 0;
count := 0;
forever loop
prompt for number;
input ASCII characters for number;
convert number to 2's complement form;
add number to sum;
add 1 to count;
convert count to ASCII;
display label and count;
convert sum to ASCII;
display label and sum;
average := sum / count;
display label and average;
end loop;

This program must store values for count and sum, and all registers except EBX and ECX are used by the input/output macros and/or the division instruction. The value of count is kept in EBX, and sum is stored in a doubleword reserved in the data segment. Note that sum could have been initialized to zero by the DWORD directive instead of by the mov statement; as implemented, the code is more consistent with the design, but is slightly wasteful of time and space since sum only needs to be initialized once.

This program has several faults. One slight shortcoming is that it does not round the average. The major fault, however, is that it contains a forever loop with no way to get out. In fact, the usual termination code for a program is not even included since it could not be reached anyway. Fortunately there is a way to stop this program without turning off or resetting the computer; simply press control-C when the prompt for a number appears. This works because the input macro uses a Kernel32 service for input, and this function gives special treatment to control-C. Figure 5.2 shows a sample run of this program.

 As you input numbers one at a time, this
 program will report the count of numbers so far,
 the sum so far, and the average.

 number? 75
 count 1 sum 75 average 75

 next number? 93
 count 2 sum 168 average 84
 next number? 78
 count 3 sum 246 average 82
 next number? (control-C pressed)

Figure 5.2: Sample run of program

The one jmp in the program in Fig. 5.1 transfers control to a point that precedes the jmp statement itself. This is called a backward reference. The code illustrates a forward reference.

 jmp quit ; exit from program
 .
 .
 quit: INVOKE ExitProcess, 0 ; exit with return code 0

There are several 80×86 jmp instructions, in two major groups. All work by changing the value in the instruction pointer register EIP, so that the next instruction to be executed comes from a new address rather than from the address immediately following the current instruction. Jumps can be intersegment, changing the code segment register CS as well as EIP. However, this does not happen with flat memory model programming, so these instructions will not be covered. The intrasegment jumps are summarized in Fig. 5.3; the first two are the most commonly used.

	Clock Cycles
Type	386	486	Pentium	Number of Bytes	Opcode
relative near	7+	3	1	5	E9
relative short	7+	3	1	2	EB
register indirect	10+	5	2	2	FF
memory indirect	10+	5	2	2+	FF

Figure 5.3: jmp instructions

Each relative jump instruction contains the displacement of the target from the jmp statement itself. This displacement is added to the address of the next instruction to find the address of the target. The displacement is a signed number, positive for a forward reference and negative for a backward reference. For the relative short version of the instruction, only a single byte of displacement is stored; this is changed to a signextended to a doubleword before the addition. The relative near format includes a 32-bit displacement.

The 8-bit displacement in an relative short jump can serve for a target statement up to 128 bytes before or 127 bytes after the jmp instruction. This displacement is measured from the byte following the object code of the jmp itself since at the time an instruction is being executed, EIP logically contains the address of the next instruction to be executed. The 32-bit displacement in a relative near jump instruction can serve for a target statement up to 2,147,483,648 bytes before or 2,147,483,647 bytes after the jmp instruction.

There is no difference in the coding for a relative short jump and for a relative near jump. The assembler uses a short jump if the target is within the small range in order to generate more compact code. A near jump is used automatically if the target is more than 128 bytes away.

The indirect jump instructions use a 32-bit address for the target rather than a displacement. However, this address is not encoded in the instruction itself. Instead, it is either in a register or in a memory doubleword. Thus the format

 jmp edx

means to jump to the address stored in EDX. The memory indirect format can use any valid reference to a doubleword of memory. If Target is declared as a DWORD in the data section, then

 jmp Target

jumps to the address stored in that doubleword, not to that point in the data section. Using register indirect addressing, you could have

 jmp DWORD PTR [ebx]

that causes a jump to the address stored at the doubleword whose address is in EBX! Fortunately, these indirect forms are rarely needed.

Exercises 5.1

If the statement
```
hardLoop: jmp hardLoop
```
is executed, it continues to execute "forever." What is the object code for this statement?

Identify the type (relative near, relative short, register indirect, or memory indirect) of each jmp instruction in the following code fragment.

.DATA
 ...
addrStore DWORD ?
 ...
.CODE
 ...
doAgain:
 ... (3 instructions)
 jmp doAgain
 ... (200 instructions)
 jmp doAgain
 ...
 jmp addrStore
 ...
 jmp eax
 ...
 jmp [edi]

Programming Exercise 5.1

Modify the program in Fig. 5.1 so that the prompt rather than the response to it tells which number is being entered. That is, the sample run in Fig. 5.2 would be changed to
```
As you input numbers one at a time, this program
will report the sum so far and the average.
 number 1 ? 10
 sum 10 average 10

 number 2 ? 50
 sum 60 average 30
```

and so forth.

Conditional Jumps, Compare Instructions, and if Structures

Conditional jump instructions make it possible to implement if structures, other selection structures, and loop structures in 80×86 machine language. There are many of these instructions. Each has the format

 j− targetStatement

where the last part of the mnemonic identifies the condition under which the jump is to be executed. If the condition holds, then the jump takes place; otherwise, the next instruction (the one following the conditional jump) is executed.

With one exception (the jcxz/jecxz instruction, covered in Section 5.4), the "conditions" considered by the conditional jump instructions are settings of various flags in the flag registers. For example, the instruction

 jz endWhile

means to jump to the statement with label endWhile if the zero flag ZF is set to 1; otherwise fall through to the next statement.

Conditional jump instructions do not modify the flags; they only react to previously set flag values. Recall how the flags in the flag register get values in the first place. Some instructions (like mov) leave some or all flags unchanged, some (like add) explicitly set some flags according to the value of a result, and still others (like div) unpredictably alter some flags, leaving them with unknown values.

Suppose, for example, that the value in the EAX register is added to a sum representing an account balance, and three distinct treatments are needed, depending on whether the new balance is negative, zero, or positive. A pseudocode design for this could be

add value to balance;

if balance < 0
then
...
{ design for negative balance }
elseif balance = 0
then
...
{ design for zero balance }
else
...
{ design for positive balance }
end if;

The following 80×86 code fragment implements this design.

 add balance,eax ; add value to balance
 jns elseIfZero ; jump if balance not negative
 ... ; code for negative balance
 jmp endBalanceCheck
elseIfZero: jnz elsePos ; jump if balance not zero
 ... ; code for zero balance
 jmp endBalanceCheck
elsePos: ... ;code for positive balance

endBalanceCheck:

Appropriate flags are set or cleared by the add instruction. No other instruction shown in the above code fragment changes the flags. The design checks first for (balance < 0). The code does this with the instruction

 jns elseIfZero

which says to jump to elseIfZero if the sign flag is not set; that is, if (balance < 0) is not true. The code following this instruction corresponds to statements following the first then in the design. The statement

 jmp endBalanceCheck

at the end of this block of statements is necessary so that the CPU skips the statements that correspond to the other cases. If the first conditional jump transfers control to elseIfZero, then the balance must be non-negative. The design checks to see if the balance is zero; the instruction

 elseIfZero: jnz elsePos

jumps to elsePos if the zero flag ZF=0. The last instruction that set flags is the add at the beginning, so the jump occurs of the balance was not zero. The code for the (balance=0) case must again end with an unconditional jump to endBalanceCheck. Finally, the code that corresponds to the else in the design is at elsePos. This last block of code does not need a jump to endBalanceCheck since execution will fall through to this point.

The 80×86 code above directly corresponds to the order of statements in the design. If you are actually doing production coding in assembly language, a good technique is to initially code following a careful design and then reexamine the code to see if there are places where you can make it more efficient if there is a need to do so. This corresponds to what happens in many high-level language compilers. Most initially produce machine language that corresponds to the order of the high-level language statements being translated. Some compilers may then optimize the code, rearranging some statements for efficiency.

In the previous code, the label endBalanceCheck is on a line by itself. Technically this label will reference the address of whatever statement follows it, but it is far simpler to treat it as the part of the current design structure without worrying about what comes next. If what comes after this structure is changed, the code for this structure can remain the same. If the next statement requires another label, that is perfectly okay-multiple labels can reference the same spot in memory. Labels are not part of object code, so extra labels do not add to the length of object code or to execution time.

When writing code to mirror a design, one often wants to use labels like if, then, else, and endif. Unfortunately, IF, ELSE, and ENDIF are MASM directives, so they cannot be used as labels. In addition, IF1, IF2, and several other desirable labels are also reserved for use as directives. One solution is to use long descriptive labels like elseIfZero in the above example. Since no reserved word contains an underscore, another solution is to use labels like if_1 and endif_2 that parallel keywords in the original design.

The terms set a flag and reset a flag are often used to mean "give the value 1" to a flag and "give the value 0" to a flag, respectively. (Sometimes the word clear is used instead of reset.) As you have seen, many instructions set or reset flags. However, the cmp (compare) instructions are probably the most common way to establish flag values.

Each cmp instruction compares two operands and sets or resets AF, CF, OF, PF, SF, and ZF. The only job of a cmp instruction is to fix flag values; this is not just a side effect of some other function. Each has the form

 cmp operand1, operand2

A cmp executes by calculating operand1 minus operand2, exactly like a sub instruction; the value of the difference and what happens in doing the subtraction determines the flag settings. A cmp instruction is unlike sub in that the value at the operand1 location is not changed. The flags that are of most interest in this book are CF, OF, SF, and ZF. The carry flag CF is set if there is a borrow for the subtraction and reset if no borrow is required. The overflow flag OF is set if there is an overflow and reset otherwise. The sign flag SF is set if the difference represents a negative 2's complement number (the leading bit is one) and is reset if the number is zero or positive. Finally, the zero flag ZF is set if the difference is zero and is reset if it is nonzero.

Here are a few examples showing how the flags are set or reset when some representative byte length numbers are compared. Recall that the subtraction operation is the same for unsigned and signed (2's complement) values. Just as a single bit pattern can be interpreted as a unsigned number or a signed number, flag values have different interpretations after comparison of unsigned or signed values. The "interpretation" columns below show the relationship of the operands under both signed and unsigned interpretations.

What flag values characterize the relations equal, less than, and greater than? Equality is easy; the ZF flag is set if and only if operand1 has the same value as operand2 no matter whether the numbers are interpreted as signed or unsigned. This is illustrated by Example 1 below. Less than and greater than take a bit more analysis.

				flags				interpretation
	operand1	operand2	difference	CF	OF	SF	ZF	signed	unsigned
1	3B	3B	00	0	0	0	1	op1=op2	op1=op2
2	3B	15	26	0	0	0	0	op1>op2	op1>op2
3	15	3B	DA	1	0	1	0	op1	op1
4	F9	F6	03	0	0	0	0	op1>op2	op1>op2
5	F6	F9	FD	1	0	1	0	op1	op1
6	15	F6	1F	1	0	0	0	op1>op2	op1
7	F6	15	E1	0	0	1	0	op1	op1>op2
8	68	A5	C3	1	1	1	0	op1>op2	op1
9	A5	68	3D	0	1	0	0	op1	op1>op2

When one first thinks about less than, it seems as if the carry flag should be set for a borrow whenever operand1 is less than operand2. This logic is correct if one interprets the operands as unsigned numbers. Examples 3, 5, 6, and 8 all have operand1 < operand2 as unsigned numbers, and these are exactly the examples where CF=1. Therefore, for unsigned numbers, CF=0 means that operand1 ≥ operand2. Strict inequality for unsigned numbers is characterized by CF=0 and ZF=0, that is operand1 ≥ operand2 and operand1 [.notequal] operand2.

Examples 3, 5, 7, and 9 have operand1 < operand2 as signed numbers. What characterizes this situation is that SF[.notequal]OF. In the remaining examples, SF=OF, and operand1 ≥ operand2 are signed numbers. Strict inequality for unsigned numbers is characterized by SF=OF and ZF=0, that is operand1 ≥ operand2 and operand1 [.notequal] operand2.

The cmp instructions are listed in Fig. 5.4. Looking back at Fig. 4.5, one sees that the entries in the various columns are almost all the same as for sub instructions. When the first operand is in memory, the cmp instructions require fewer clock cycles than corresponding sub instructions since the result need not be stored. There are alternative opcodes for some operand combinations-the ones listed are those chosen by MASM 6.11.

		Clock Cycles
Destination Operand	Source Operand	386	486	Pentium	Number of Bytes	opcode
register 8	immediate 8	2	1	1	3	80
register 16	immediate 8	2	1	1	3	83
register 32	immediate 8	2	1	1	3	83
register 16	immediate 16	2	1	1	4	81
register 32	immediate 32	2	1	1	6	81
AL	immediate 8	2	1	1	2	3C
AX	immediate 16	2	1	1	3	3D
EAX	immediate 32	2	1	1	5	3D
memory byte	immediate 8	5	2	2	3+	80
memory word	immediate 8	5	2	2	3+	83
memory doubleword	immediate 8	5	2	2	3+	83
memory word	immediate 16	5	2	2	4+	81
memory doubleword	immediate 32	5	2	2	6+	81
register 8	register 8	2	1	1	2	38
register 16	register 16	2	1	1	2	3B
register 32	register 32	2	1	1	2	3B
register 8	memory byte	6	2	2	2+	3A
register 16	memory word	6	2	2	2+	3B
register 32	memory doubleword	6	2	2	2+	3B
memory byte	register 8	5	2	2	2+	38
memory word	register 16	5	2	2	2+	39
memory doubleword	register 32	5	2	2	2+	39

Figure 5.4: cmp instructions

A few reminders are in order about immediate operands. These can be coded in your choice of bases or as characters. Assuming that pattern references a word in the data segment, each of the following is allowable.

 cmp eax, 356
 cmp pattern, 0d3a6h
 cmp bh, '$'

Note that an immediate operand must be the second operand. The instruction

 cmp 100, total ; illegal

is not acceptable since the first operand is immediate.

Finally it is time to list the conditional jump instructions; they are shown in Fig. 5.5. Many of these have alternative mnemonics that generate exactly the same machine code; these describe the same set of conditions a different way. Often one mnemonic is more natural than the other for implementation of a given design.

Appropriate for use after comparison of unsigned operands
			opcode
mnemonic	description	flags to jump	short	near
ja	jump if above	CF=0 and ZF=0	77	OF 87
jnbe	jump if not below or equal
jae	jump if above or equal	CF=0	73	OF 83
jnb	jump if not below
jb	jump if below	CF=1	72	OF 82
jnae	jump if not above or equal
jbe	jump if below or equal	CF=1 or ZF=1	76	OF 86
jna	jump if not above
jg	jump if greater	SF=OF and ZF=0	7F	OF 8F
jnle	jump if not less or equal
jge	jump if greater or equal	SF=OF	7D	OF 8D
jnl	jump if not less
jl	jump if less	SF≠OF	7C	OF 8C
jnge	jump if not greater or equal
jle	jump if less or equal	SF≠OF or ZF=1	7E	OF 8E
jng	jump if not greater
			Opcode
Other conditional jumps
mnemonic	description	flags to jump	short	near
je	jump if equal	ZF=1	74	OF 84
jz	jump if zero
jne	jump if not equal	ZF=0	75	OF 85
jnz	jump if not zero
js	jump if sign	SF=1	78	OF 88
jns	jump if not sign	SF=0	79	OF 89
jc	jump if carry	CF=1	72	0F 82
jnc	jump if not carry	CF=0	73	0F 83
jp	jump if parity	PF=1	7A	OF 8A
jpe	jump if parity even
jnp	jump if not parity	PF=0	7B	OF 8B
jpo	jump if parity odd
jo	jump if overflow	OF=1	70	OF 80
jno	jump if not overflow	OF=0	71	OF 81

Figure 5.5: Conditional jump instructions

Conditional jump instructions always compare the first operand to the second operand. For example, for the instruction jg, "jump if greater" means to jump if operand1 > operand2.

Each conditional jump instruction takes a single clock cycle for execution. No conditional jump instruction changes any flag value. Each instruction has a short version and a near version. Just as with short unconditional jump instructions, a short conditional jump encodes a single-byte displacement and can transfer control 128 bytes before or 127 bytes after the address of the byte following the instruction itself. A short conditional jump requires two bytes of object code, one for the opcode and one for the displacement. A near conditional jump encodes a 32-bit displacement in addition to a two-byte opcode, giving a total length of six bytes. It can transfer control up to 2,147,483,648 bytes backward or 2,147,483,647 forward. The number of bytes and number of clock cycles for conditional jump instructions is summarized in Fig. 5.6.

	Clock Cycles
	386	486	Pentium	Number of Bytes
short conditional jump	7+, 3	3, 1	1	2
near conditional jump	7+, 3	3, 1	1	6
For the 80386 and 80486 the longer time is when the jump is executed; the shorter time is for no jump.

Figure 5.6: Timing and size of conditional jump instructions

One more pair of examples will illustrate the difference between the conditional jumps appropriate after comparison of signed and unsigned numbers. Suppose a value is stored in EAX and some action needs to be taken when that value is larger than 100. If the value is unsigned, one might code

 cmp eax, 100
 ja bigger

The jump would be chosen for any value bigger than 0000006416, including values between 8000000016 and FFFFFFFF16, which represent both large unsigned numbers and negative 2's complement numbers. If the value in EAX is interpreted as signed, then the instructions

 cmp ax,100
 jg bigger

are appropriate. The jump will only be taken for values between 00000064 and 7FFFFFFF, not for those bit patterns that represent negative 2's complement numbers.

We now look at three examples showing implementation of if structures. The implementations are consistent with what a high-level language compiler would use. First consider the design


if value < 10
then
 add 1 to smallCount;
else
 add 1 to largeCount;
end if;

Suppose that value is stored in the EBX register and that smallCount and largeCount reference words in memory. The following 80×86 code implements this design.

 cmp ebx, 10 ; value < 10 ?
 jnl elseLarge
 inc smallCount ; add 1 to small_count
 jmp endValueCheck
 elseLarge: inc largeCount ; add 1 to large_count
 endValueCheck:

Note that this code is completely self-contained; you do not need to know what comes before or after in the overall design to implement this portion. You must have a plan for making labels, though, to avoid duplicates and reserved words. A compiler often produces a label consisting of a letter followed by a sequence number, but most of the time we can do better as humans writing code.

Now consider the design


if (total ≥ 100) or (count = 10)
then
 add value to total;
end if;

Assume that total and value reference doublewords in memory and that count is stored in the CX register. Here is assembly language code to implement this design.

 cmp total, 100 ; total >= 100 ?
 jge addValue
 cmp cx, 10 ; count = 10 ?
 jne endAddCheck
 addValue: mov ebx, value ; copy value
 add total, ebx ; add value to total
 endAddCheck:

Notice that the design's or requires two cmp instructions. If either of the corresponding tests is passed, then the addition is performed. (Why was the addition done with two statements? Why not use add total,value?) This code implements a short-cut or - if the first condition is true, then the second is not checked at all. The code implemented for some languages always checks both operands of an or operation, even if the first is true.

Finally consider the design

if (count > 0) and (ch = backspace)
then
 subtract 1 from count;
end if;

For this third example, assume that count is in the CX register, ch is in the AL register and that backspace has been equated to 0816, the ASCII backspace character. This design can be implemented as follows.

 cmp cx, 0 ; count > 0 ?
 jng endCheckCh
 cmp al, backspace ; ch a backspace?
 jne endCheckCh
 dec count ; subtract 1 from count
 endCheckCh:

This compound condition uses and, so both parts must be true in order to execute the action. This code implements a short-cut and - if the first condition is false, then the second is not checked at all. The code implemented for some languages always checks both operands of an and operation, even if the first is false.

This section ends with an example implementing a simple game program. The computer asks one player to enter a number. After it is typed in, the screen is cleared, and the other player tries to guess the number. After each guess the computer reports "too low," "too high," or "you got it." After the number is finally guessed, the number of attempts is reported, and the players are asked if they want to play another game. The pseudocode design in Fig. 5.7 gives a more precise description.


 until response='N' or response='n' loop

 prompt first player for target;
 input target and convert to 2's complement form;
 clear screen;
 count := 0;

 until guess=target loop

 add 1 to count;
 prompt second player for guess;
 input guess and convert to 2's complement;

 if guess=target

 then
 display "you got it";
 elseif guess

Figure 5.7: Design for game program

The assembly language source code for the game program is shown in Fig. 5.8. Note that screen is cleared by writing 24 line feed characters. The loop and selection structures in the program faithfully follow the design. Recall that an until loop is a post-test loop. The next section carefully describes how to implement both until and while loops.


; program to implement number guessing game
; author: R. Detmer
; date: revised 9/97

.386
.MODEL FLAT

INCLUDE io.h

ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD

cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character
.STACK 4096 ; reserve 4096-byte stack
.DATA ; reserve storage for data
prompt1 BYTE cr,Lf,Lf,"Player 1, please enter a number: ", 0
target DWORD ?
clear BYTE 24 DUP (Lf), 0
prompt2 BYTE cr,Lf,"Player 2, your guess? ", 0
stringIn BYTE 20 DUP (?)
lowOutput BYTE "too low", cr, Lf, 0
highOutput BYTE "too high", cr, Lf, 0
gotItOutput BYTE "you got it", cr, Lf, 0
countLabel BYTE Lf, "Number of guesses:"
countOut BYTE 6 DUP (?)
 BYTE cr, Lf, Lf, Lf, "Do you want to play again? ",0
.CODE ; start of main program code
_start:

untilDone: output prompt1 ; ask player 1 for target
 input stringIn, 20 ; get number
 atod stringIn ; convert to integer
 mov target,eax ; store target
 output clear ; clear screen
 mov cx, 0 ; zero count

untilMatch: inc cx ; increment count of guesses
 output prompt2 ; ask player 2 for guess
 input stringIn, 20 ; get number
 atod stringIn ; convert to integer

 cmp eax, target ; compare guess and target
 jne ifLess ; guess = target ?
equal: output gotItOutput ; display "you got it"
 jmp endCompare
ifLess: jnl isGreater ; guess < target ?
 output lowOutput ; display "too low"
 jmp endCompare
isGreater: output highOutput ; display "too high"
endCompare:
 cmp eax, target ; compare guess and target
 jne untilMatch ; ask again if guess not = target

 itoa countOut, cx ; convert count to ASCII
 output countLabel ; display label, count and prompt
 input stringIn, 20 ; get response
 cmp stringIn, 'n' ; response = 'n' ?
 je endUntilDone ; exit if so
 cmp stringIn, 'N' ; response = 'N' ?
 jne untilDone ; repeat if not
endUntilDone:

 INVOKE ExitProcess, 0 ; exit with return code 0

PUBLIC _start ; make entry point public
 END ; end of source code

Figure 5.8: Game program

The outside until loop in the game program is terminated by either a "N" or "n" response to a query to the players. The input macro is used to get the response in the same input area as used for numbers earlier. Since the address of a multibyte object is the address of its first byte, the instruction


 cmp stringIn, 'n' ; response = 'n' ?

is really comparing the first (and probably only) character of input to the letter "n". This is not a comparison of two strings.

Exercises 5.2

Assume for each part of this problem that the EAX register contains 00 00 00 4F and the doubleword referenced by value contains FF FF FF 38. Determine whether or not each of the conditional jump statements causes a jump to dest.

(a)	cmp eax, value jl dest	(b)	cmp eax, value jb dest
(c)	cmp eax, 04fh je dest	(d)	cmp eax, 79 jne dest
(e)	cmp value, 0 jl dest	(f)	cmp value, −200 jge dest
(g)	add eax, 200 js dest	(h)	add value, 200 jz dest

Each part of this problem gives a design with an if structure and some assumptions about how the variables are stored in an assembly language program. Give a fragment of assembly language code that implements the design.
1. design: if count = 0 then count := value; end if;
  Assumptions: count is in ECX; value references a doubleword in memory
2. design: if count > value then count := 0; end if;
  Assumptions: count is in ECX; value references a doubleword in memory
3. design: if a + b = c then check := 'Y'; else check := 'N'; end if;
  Assumptions: each of a, b, and c references a doubleword in memory; the character check is in the AL register
4. design: if (value ≤ -1000) or (value ≥ 1000) then value := 0; end if;
  Assumption: value is in EDX
5. design: if (ch ≥ 'a') and (ch ≤ 'z') then add 1 to lowerCount; else if (ch ≥ 'A') and (ch ≤ 'Z') then add 1 to upperCount; else add 1 to otherCount; end if; end if;
  Assumptions: ch is in AL; each of lowerCount, upperCount, and other-Count references a doubleword in memory

Programming Exercises 5.2

Modify the game program to accept only numbers between 0 and 1000 from either player. A design for the new code section is


until (value ≥ 0) and (value ≤ 1000) loop
 input value and convert to 2's complement;
 if (value < 0) or (value > 1000)
 then
 display "enter value 0 to 1000";
 end if;
end until;

Modify the game program so that it only allows Player 2 five attempts at guessing the number entered by Player 1. If the fifth attempt is incorrect, display "Sorry, the number is value of target" and proceed to asking the players if they want another game.

5 3 Implementing Loop Structures

Most programs contain loops. Commonly used loop structures include while, until, and for loops. This section describes how to implement all three of these structures in 80×86 assembly language. The next section describes additional instructions that can be used to implement for loops.

A while loop can be indicated by the following pseudocode design.


while continuation condition loop
 ... { body of loop }
end while;

The continuation condition, a Boolean expression, is checked first. If it is true, then the body of the loop is executed. The continuation condition is then checked again. Whenever the value of the Boolean expression is false, execution continues with the statement following end while.

An 80×86 implementation of a while loop follows a pattern much like this one.


 while: . ; code to check Boolean expression
 .
 .
 body: . ; loop body
 .
 .
 jmp while ; go check condition again
 endWhile:

It often takes several statements to check the value of the Boolean expression. If it is determined that the value is false, then there will be a jump to endWhile. If it is determined that the continuation condition is true, then the code will either fall through to body or there will be a jump to its label. Notice that the body of the loop ends with a jmp to go check the condition again. Two common mistakes are to omit this jump or to jump to the body instead.

The label while in this model is not allowed in actual code since while is a reserved word in MASM. In fact, MASM 6.11 has a while directive that simplifies writing code for while loops. It is not used in this book since our main concern is understanding how structures are implemented at the machine language level.

For an example, suppose that the design


------------
while (sum < 1000) loop
 ... { body of loop }
end while;
------------

is to be coded in 80×86 assembly language. Assuming that sum references a doubleword in memory, one possible implementation is


 whileSum: cmp sum, 1000 ; sum < 1000?
 jnl endWhileSum ; exit loop if not
 . ; body of loop
 .
 .
 jmp whileSum ; go check condition again
 endWhileSum:

The statement


 jnl endWhileSum

directly implements the design. An alternative would be to use


 jge endWhileSum

which transfers control to the end of the loop if sum ≥ 1000. This works since the inequality (sum ≥ 1000) will be true exactly when the (sum < 1000) is false, but the jnl mnemonic makes it easier to implement the design without having to reverse the inequality.

For a short example showing a complete loop body, suppose that the integer base 2 logarithm of a positive number needs to be determined. The integer base 2 logarithm of a number is the largest integer x such that


 2x ≤ number

The following design does the job.


----------
x := 0;
twoToX := 1;
while twoToX ≤ number
 multiply twoToX by 2;
 add 1 to x;
end while;
subtract 1 from x;
----------

Assuming that number references a doubleword in memory, the following 80×86 code implements the design, using the EAX register for twoToX and the CX register for x.


 mov cx, 0 ; x := 0
 mov eax, 1 ; twoToX := 1
 whileLE: cmp eax, number ; twoToX <= number?
 jnle endWhileLE ; exit if not
 body: add eax, eax ; multiply twoToX by 2
 inc cx ; add 1 to x
 jmp whileLE ; go check condition again
 endWhileLE:
 dec cx ; subtract 1 from x

Often the continuation condition in a while is compound, having two parts connected by Boolean operators and or or. Both operands of an and must be true for a true conjunction. With an or, the only way the disjunction can be false is if both operands are false.

Changing a previous example to include a compound condition, suppose that the following design is to be coded.


while (sum < 1000) and (count ≤ 24) loop
 ... { body of loop }
end while;

Assuming that sum references a doubleword in memory and the value of count is in CX, an implementation is


 whileSum: cmp sum, 1000 ; sum < 1000?
 jnl endWhileSum ; exit if not
 cmp cx, 24 ; count <= 24
 jnle endWhileSum ; exit if not
 . ; body of loop
 .
 .
 jmp whileSum ; go check condition again
 endWhileSum:

Modifying the example another time, here is a design with an or instead of an and.


while (sum < 1000) or (flag = 1) loop
 ... { body of loop }
end while;

This time, assume that sum is in the EAX register and that flag is a single byte in the DH register. Here is 80×86 code that implements the design.


 whileSum: cmp eax, 1000 ; sum < 1000?
 jl body ; execute body if so
 cmp dh,1 ; flag = 1?
 jne endWhileSum ; exit if not
 body: . ; body of loop
 .
 .
 jmp whileSum ; go check condition again
 endWhileSum:

Notice the difference in the previous two examples. For an and the loop is exited if either operand of the compound condition is false. For an or the loop body is executed if either operand of the compound condition is true.

Sometimes processing in a loop is to continue while normal values are encountered and to terminate when some sentinel value is encountered. If data are being entered from the keyboard, this design can be written


get value from keyboard;
while (value is not sentinel) loop
 ... { body of loop }
 get value from keyboard;
end while;

In some high-level languages, implementation code must exactly parallel this design. One of the advantages of assembly language is that one has more flexibility. An equivalent design is


while (value entered from keyboard is not sentinel) loop
 ... { body of loop }
end while;

This design does not require two separate instructions to input data. It can be coded in some high-level languages and also in 80×86 assembly language.

For a concrete example illustrating implementation of such a design, suppose that non-negative numbers entered at the keyboard are to be added, with any negative entry serving as a sentinel value. A design looks like


sum := 0;
while (number keyed in is not negative) loop
 add number to sum;
end while;

Assuming appropriate definitions in the data segment, the 80×86 code could be


 mov ebx, 0 ; sum := 0
whileNotNeg: output prompt ; prompt for input
 input number,10 ; get number from keyboard
 atod number ; convert to 2's complement
 js endWhile ; exit if negative
 add ebx, eax ; add number to sum
 jmp whileNotNeg ; go get next number
 endWhile:

Recall that the atod macro affects the sign flag SF, setting it if the ASCII characters are converted to a negative number in the EAX register and clearing it otherwise.

The body of a for loop, a counter-controlled loop, is executed once for each value of a loop index (or counter) in a given range. In some high-level languages, the loop index can be some type other than integer; in assembly language the index is usually an integer. A for loop can be described by the following pseudocode.


for index := initialValue to finalValue loop
 ... { body of loop }
end for;

A for loop can easily be translated into a while structure.


index := initialValue;
while index ≤ finalValue loop
 ... { body of loop }
 add 1 to index;
end while;

Such a while is readily coded in 80×86 assembly language.

As an example, suppose that a collection of numbers needs to be added and no value is convenient as a sentinel. Then one might want to ask a user how many numbers are to be entered and loop for that many entries. The design looks like


prompt for tally of numbers;
input tally;
sum := 0
for count := 1 to tally loop
 prompt for number;
 input number;
 add number to sum;
end for;

Making straightforward assumptions about definitions in the data segment, here is an 80×86 implementation of the design.


 output prompt1 ; prompt for tally
 input value, 20 ; get tally (ASCII)
 atoi value ; convert to 2's complement
 mov tally, ax ; store tally

 mov edx, 0 ; sum := 0
 mov bx, 1 ; count := 1

forCount: cmp bx, tally ; count <= tally?
 jnle endFor ; exit if not
 output prompt2 ; prompt for number
 input value, 20 ; get number (ASCII)
 atod value ; convert to 2's complement
 add edx, eax ; add number to sum
 inc bx ; add 1 to count
 jmp forCount ; repeat
endFor:

In a for loop implementation where one is sure that the body of the loop will be executed at least once (i.e., initialValue ≤ finalValue), one can check the index against the final value at the end of the loop body rather than prior to the body. Other variations are also possible. Additional instructions for implementing for loops will be covered in Section 5.4.

You have already seen examples of until loops. In general, an until loop can be expressed as follows in pseudocode.


until termination condition loop
 ... { body of loop }
end until;

The body of the loop is executed at least once; then the termination condition is checked. If it is false, then the body of the loop is executed again; if true, execution continues with the statement following end until.

An 80×86 implementation of an until loop usually looks like the following code fragment.


 until: . ; start of loop body
 . .
 . .
 . ; code to check termination condition
 endUntil:

If the code to check the termination condition determines that the value is false, then there will be a jump to until. If it is determined that the value is true, then the code will either fall through to endUntil or there will be a jump to that label.

The game program implemented in Fig. 5.8 contained two simple until loops. Here is an example with a compound terminating condition. Given the design


count := 0;
until (sum > 1000) or (count = 100) loop
 ... { body of loop }
 add 1 to count;
end until;

the following 80×86 code provides an implementation. Assume that sum references a word in the data segment and that count is stored in CX.


 mov cx, 0 ; count := 0
 until: . ; body of loop
 .
 .
 inc cx ; add 1 to count
 cmp sum, 1000 ; sum > 1000 ?
 jg endUntil ; exit if sum > 1000
 cmp cx, 100 ; count = 100 ?
 jne until ; continue if count not = 100
 endUntil:

Other loop structures can also be coded in assembly language. The forever loop is frequently useful. As it appears in pseudocode, it almost always has an exit loop statement to transfer control to the end of the loop; this is often conditional—that is, in an if statement. Here is a fragment of a typical design.


forever loop
 .
 .
 .
 if (response = 's') or (response = 'S')
 then
 exit loop;
 end if;
 .
 .
 .
end loop;

Assuming that the value of response is in the AL register, this can be implemented as follows in 80×86 assembly language.


 forever: .
 .
 .
 cmp al, 's' ; response = 's'?
 je endLoop ; exit loop if so
 cmp al, 'S' ; response = 'S'?
 je endLoop ; exit loop if so
 .
 .
 .
 jmp forever ; repeat loop body
 endLoop:

Exercises 5.3

Each part of this problem contains a design with a while loop. Assume that sum references a doubleword in the data segment and that the value of count is in the ECX register. Give a fragment of 80×86 code that implements the design.
1. sum := 0; count := 1; while (sum < 1000) loop add count to sum; add 1 to count; end while;
2. sum := 0; count := 1; while (sum < 1000) and (count ≤ 50) loop add count to sum; add 1 to count; end while;
3. sum := 0; count := 100; while (sum < 1000) or (count ≥ 0) loop add count to sum; subtract 1 from count; end while;
Each part of this problem contains a design with a until loop. Assume that sum references a doubleword in the data segment and that the value of count is in the ECX register. Give a fragment of 80×86 code that implements the design.
1. sum := 0; count := 1; until (sum > 5000) loop add count to sum; add 1 to count; end until;
2. sum := 0; count := 1; until (sum > 5000) or (count = 40) loop add count to sum; add 1 to count; end until;
3. sum := 0; count := 1; until (sum ≥ 5000) and (count > 40) loop add count to sum; add 1 to count; end until;
Each part of this problem contains a design with a for loop. Assume that sum references a doubleword in the data segment and that the value of count is in the ECX register. Give a fragment of 80×86 code that implements the design.
1. sum := 0; for count := 1 to 100 loop add count to sum; end for;
2. sum := 0; for count := -10 to 50 loop add count to sum; end for;
3. sum := 1000; for count := 100 downto 50 loop subtract 2*count from sum; end for;

Programming Exercise 5.3

Write a complete 80×86 assembly language program that will accept numbers from the keyboard and report the minimum and maximum of the numbers. Implement the following design, adding appropriate labels to output.


display "First number? ";
input number;
minimum := number;
maximum := number;
while (response to "Another number? " is 'Y' or 'y') loop
 input number;
 if (number < minimum)
 then
 minimum := number;
 end if;
 if (number > maximum)
 then
 maximum := number;
 end if;
end while;
display the minimum value;
display the maximum value;

Write a complete 80×86 assembly language program that will accept numbers from the keyboard and report the sum and average of the numbers. The count of numbers is not known in advance; use the value −999999 as a sentinel to terminate input. Implement the following design, adding appropriate prompts for input and labels for output.
```
sum := 0;
count := 0;

while (number entered from keyboard ≠ -999999) loop
 add number to sum;
 add 1 to count;
end while;

if (count = 0)
then
 display "No numbers entered";
else
 average := sum/count;
 display sum and average;
end if;
```

Write a complete 80×86 assembly language program to help your overworked instructor analyze examination grades. The program will input an unknown number of examination grades, using any negative grade as a sentinel, and then report the number of As (90–100), Bs (80–89), Cs (70–79), Ds (60–69), and Fs (under 60). Implement the following design. Prompt for input as appropriate.


ACount := 0;
BCount := 0;
CCount := 0;
DCount := 0;
FCount := 0;

while (grade entered at keyboard ≥ 0) loop
 if (grade ≥ 90)
 then
 add 1 to ACount;
 elseif (grade ≥ 80)
 then
 add 1 to BCount;
 elseif (grade ≥ 70)
 then
 add 1 to CCount;
 elseif (grade ≥ 60)
 then
 add 1 to DCount;
 else
 add 1 to FCount;
 end if;
end while;

display "Number of As", ACount;
display "Number of Bs", BCount;
display "Number of Cs", CCount;
display "Number of Ds", DCount;
display "Number of Fs", FCount;

The greatest common divisor of two non-negative integers is the largest integer that evenly divides both numbers. The following algorithm will find the greatest common divisor of number1 and number2.
```
gcd := number1;
remainder := number2;

until (remainder = 0) loop
 dividend := gcd;
 gcd := remainder;
 remainder := dividend mod gcd;
end until;
```
Write a complete 80×86 assembly language program that implements the following design, with appropriate prompts for input and labels for output.
```
until (number1 > 0) loop
 input number1;
 end until;

 until (number2 > 0) loop
 input number2;
 end until;

 find gcd of number1 and number2; (see design above)
 display gcd;
```

Write a complete 80×86 assembly language program to simulate a simple calculator. The calculator does addition and subtraction operations and also accepts commands to clear the accumulated value or to quit. Implement the following design.


total := 0;

forever loop
 display "number? ";
 input number;

 display "action (+, -, c or q) ? ";
 input action;

 if (action = '+')
 then
 add number to total;
 elseif (action = '-')
 then
 subtract number from total;
 elseif (action = 'c') or (action = 'C')
 then
 total := 0;
 elseif (action = 'q') or (action = 'Q')
 then
 exit loop;
 else
 display "Unknown action";
 end if;

 display "total", total;
end loop;

5 4 for Loops in Assembly Language

Often the number of times the body of a loop must be executed is known in advance, either as a constant that can be coded when a program is written, or as the value of a variable that is assigned before the loop is executed. The for loop structure is ideal for coding such a loop.

The previous section showed how to translate a for loop into a while loop. This technique always works and is frequently the best way to code a for loop. However, the 80×86 microprocessor has instructions that make coding certain for loops very easy.

Consider the following two for loops, the first of which counts forward and the second of which counts backward.


for index := 1 to count loop
 ... { body of loop }
end for;

and


for index := count downto 1 loop
 ... { body of loop }
end for;

The body of each loop executes count times. If the value of index is not needed for display or for calculations within the body of the loop, then the loop that counts down is equivalent to the loop that counts up, although the design may not be as natural. Backward for loops are very easy to implement in 80×86 assembly language with the loop instruction.

The loop instruction has the format


 loop statementLabel

where statementLabel is the label of a statement that is a short displacement (128 bytes backward or 127 bytes forward) from the loop instruction. The loop instruction causes the following actions to take place:

the value in ECX is decremented
if the new value in ECX is zero, then execution continues with the statement following the loop instruction
if the new value in ECX is nonzero, then a jump to the instruction at statementLabel takes place

In addition to the loop instruction, there are two conditional loop instructions that are less frequently used. Features of all three instructions are summarized in Fig. 5.9. Each requires two bytes of object code; the first byte is the opcode and the second byte is the displacement to the destination statement. Two times are given for 80486 and Pentium instructions, the first showing how many clock cycles are required if the jump is not taken, and the second showing how many clock cycles are required if it is taken. The situation is more complex for the 80386, but it also has two distinct execution times. None of these instructions changes any flag.

	Clock Cycles
Mnemonic	386	486	Pentium	Number of Bytes	Opcode
loop	11+	6/7	5/6	2	E2
loope/loopz	11+	6/9	7/8	2	E1
loopne/loopnz	11+	6/9	7/8	2	E0

Figure 5.9: loop instructions

Although the ECX register is a general register, it has a special place as a counter in the loop instruction and in several other instructions. No other register can be substituted for ECX in these instructions. In practice this often means that when a loop is coded, either ECX is not used for other purposes or a counter value is put in ECX before a loop instruction is executed but is saved elsewhere to free ECX for other uses for most of the body of the loop.

The backward for loop structure


for count := 20 downto 1 loop
 ... { body of loop }
end for;

can be coded as follows in 80×86 assembly language.


 mov ecx, 20 ; number of iterations
 forCount: . ; body of loop
 .
 .
 loop forCount ; repeat body 20 times

The counter in the ECX register will be 20 the first time the body of the loop is executed and will be decremented to 19 by the loop instruction. The value 19 is not zero, so control transfers to the start of the loop body at label forCount. The second time the body of the loop is executed, the ECX register will contain 19. The last time the value in ECX will be one; it will be decremented to zero by the loop instruction, and the jump to for-Count will not be taken.

The obvious label to mark the body of a for loop is for. Unfortunately this is a reserved word in MASM. It is used for a directive that simplifies coding of for loops. Again, our primary interest is in learning how the computer works at the machine level, so this directive will not be used.

Now suppose that the doubleword in memory referenced by number contains the number of times a loop is to be executed. The 80×86 code to implement a backward for loop could be


 mov ecx, number ; number of iterations
 forIndex: . ; body of loop
 .
 .
 loop forIndex ; repeat body number times

This is safe code only if the value stored at number is not zero. If it is zero, then the loop body is executed, the zero value is decremented to FFFFFFFF (a borrow is required to do the subtraction), the loop body is executed again, the value FFFFFFFF is decremented to FFFFFFFE, and so forth. The body of the loop is executed 4,294,967,296 times before the value in ECX gets back down to zero! To avoid this problem, one could code


 mov ecx, number ; number of iterations
 cmp ecx, 0 ; number = 0 ?
 je endFor ; skip loop if number = 0
 forIndex: . ; body of loop
 .
 .
 loop forIndex ; repeat body number times
 endFor:

If number is a signed value and might be negative, then


 jle endFor ; skip loop if number <= 0

is a more appropriate conditional jump.

There is another way to guard a for loop so that it is not executed when the value in ECX is zero. The 80×86 instruction set has a jecxz conditional jump instruction that jumps to its destination if the value in the ECX register is zero. Using the jecxz instruction, the example above can be coded as


 mov ecx, number ; number of iterations
 jecxz endFor ; skip loop if number = 0
 forIndex: . ; body of loop
 .
 .
 loop forIndex ; repeat body number times
 endFor:

There is also a jcxz instruction that checks the CX register rather than the ECX register. Both instructions are two bytes long, the opcode E3 plus a single-byte displacement; the prefix byte 67 distinguishes between the 16-bit size and the 32-bit versions. Like the other conditional jump instructions, jcxz/jecxz affects no flag value. They do take longer to execute, six clock cycles on a Pentium if the jump takes place (if the value in ECX is zero), and five clock cycles to fall through to the next statement otherwise.

The jecxz instruction can be used to code a backward for loop when the loop body is longer than 127 bytes, too large for the loop instruction's single-byte displacement. For example, the structure


for counter := 50 downto 1 loop
 ... { body of loop }
end for;

could be coded as


 mov ecx, 50 ; number of iterations
 forCounter: . ; body of loop
 .
 .
 dec ecx ; decrement loop counter
 jecxz endFor ; exit if counter = 0
 jmp forCounter ; otherwise repeat body
 endFor:

However, since the dec instruction sets or resets the zero flag ZF, the faster conditional jump


 jz endFor

can be used instead of the jecxz instruction.

It is often convenient to use a loop statement to implement a for loop, even when the loop index increases and must be used within the body of the loop. The loop statement uses ECX to control the number of iterations, while a separate counter serves as the loop index.

For example, to implement the for loop


for index := 1 to 50 loop
 ...{ loop body using index }
end for;

the EBX register might be used to store index counting from 1 to 50 while the ECX register counts down from 50 to 1.


 mov ebx, 1 ; index := 1
 mov ecx, 50 ; number of iterations for loop
 forNbr: .
 . ; use value in EBX for index
 .
 inc ebx ; add 1 to index
 loop forNbr ; repeat

Figure 5.9 listed two variants of the loop instruction, loopz/loope and loopnz/loopne. Each of these work like loop, decrementing the counter in ECX. However, each examines the value of the zero flag ZF as well as the new value in the ECX register to decide whether or not to jump to the destination location. The loopz/loope instruction jumps if the new value in ECX is nonzero and the zero flag is set (ZF=1). The loopnz/loopne instruction jumps if the new value in ECX is nonzero and the zero flag is clear (ZF=0).

The loopz and loopnz instructions are useful in special circumstances. Some programming languages allow loop structures such as


for year := 10 downto 1 until balance = 0 loop
 ... { body of loop }
end for;

This confusing structure means to terminate loop execution using whichever loop control is satisfied first. That is, the body of the loop is executed 10 times (for year = 10, 9,…,1) unless the condition balance = 0 is true at the bottom of some execution of the loop body, in which case the loop terminates with fewer than 10 iterations. If the value of balance is in the EBX register, the following 80×86 code could be used.


 mov ecx, 10 ; maximum number of iterations
 forYear: . ; body of loop
 .
 .
 cmp ebx, 0 ; balance = 0 ?
 loopne forYear ; repeat 10 times if balance not 0

Exercises 5.4

Each part of this problem has a for loop implemented with a loop statement. How many times is each loop body executed?
1. mov ecx, 10 forA: . . ; body of loop . loop forA
2. mov ecx, 1 forB: . . ; body of loop . loop forB
3. mov ecx, 0 forC: . . ; body of loop . loop forC
4. mov ecx, -1 forD: . . ; body of loop . loop forD
Each part of this problem contains a design with a for loop. Assume that sum references a doubleword in the data segment. Give a fragment of 80×86 code that uses a loop statement to implement the design. Use the dtoa and output macros for display, assuming that the data segment contains
```
ASCIIcount BYTE 11 DUP (?)
ASCIIsum BYTE 11 DUP (?)
 BYTE 13, 10, 0 ; carriage return, linefeed
```
1. sum := 0; for count := 50 downto 1 loop add count to sum; display count, sum; end for;
2. sum := 0; for count := 1 to 50 loop add count to sum; display count, sum; end for;
3. sum := 0; for count := 1 to 50 loop add (2*count - 1) to sum; display count, sum; end for;

Programming Exercise 5.4

Write a complete 80×86 program to input a positive integer value N and to display a table of integers from 1 to N and their squares. Use a two-column format such as
```
number square
 1 1
 2 4
 3 9
 4 16
 5 25
```
A Pythagorean triple consists of three positive integers A, B, and C such that A2 + B2 = C2. For example, the numbers 3, 4, and 5 form a Pythagorean triple since 9 + 16 = 25. Write a complete 80×86 program to input a value for C and then display all possible Pythagorean triples with this value for C, if any. For example, if 5 is entered for the value of C, then the output might be
```
A B C
3 4 5
4 3 5
```

5 5 Arrays

Programs frequently use arrays to store collections of data values. Loops are commonly used to manipulate the data in arrays. This section shows one way to access 1-dimensional arrays in 80×86 assembly language; other techniques will appear in Chapter 9 with discussion of additional memory addressing modes.

This section contains a complete program to implement the design below. The program first accepts a collection of positive numbers from the keyboard, counting them and storing them in an array. It then calculates the average of the numbers by going back through the numbers stored in the array, accumulating the total in sum. Finally the numbers in the array are scanned again, and this time the numbers larger than the average are displayed. The first two loops could be combined, of course, with the sum being accumulated as the numbers are keyed in. As a general programming philosophy, clearer code results from separating tasks; they should be combined only if there is a real need to save execution time or bytes of object code.


 nbrElts := 0; { input numbers into array }
 get address of first item of array;

 while (number from keyboard > 0) loop
 convert number to 2's complement;
 store number at address in array;
 add 1 to nbrElts;
 get address of next item of array;
 end while;

 sum := 0; { find sum and average }
 get address of first item of array;

 for count := nbrElts downto 1 loop
 add doubleword at address in array to sum;
 get address of next item of array;
 end for;

 average := sum/nbrElts;
 display average;

 get address of first item of array; { list big numbers }

 for count := nbrElts downto 1 loop
 if doubleword of array > average
 then
 convert doubleword to ASCII;
 display value;
 end if;
 get address of next item of array;
 end for;

This design contains the curious instructions "get address of first item of array" and "get address of next item of array." These reflect the particular assembly language implementation, one which works well if the task at hand involves moving sequentially through an array. The 80×86 feature which makes this possible is register indirect addressing, first discussed in Chapter 3. The example will use the EBX register to contain the address of the word currently being accessed; then [ebx] references the doubleword at the address in the EBX register rather than the doubleword in the register itself. In the 80×86 architecture any of the general registers EAX, EBX, ECX, and EDX or the index registers EDI and ESI are appropriate for use as a "pointer." The ESI and EDI registers are often reserved for use with strings, which are usually arrays of characters. String operations are covered in Chapter 7. The program listing appears in Fig. 5.10.


; input a collection of numbers
; report their average and the numbers which are above average
; author: R. Detmer
; date: revised 9/97

.386
.MODEL FLAT

INCLUDE io.h

ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD

cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character
maxNbrs EQU 100 ; size of number array

.STACK 4096

.DATA
directions BYTE cr, Lf, 'You may enter up to 100 numbers'
 BYTE ' one at a time.',cr,Lf
 BYTE 'Use any negative number to terminate
 input.',cr,Lf,Lf
 BYTE 'This program will then report the average and
 list',cr,Lf
 BYTE 'those numbers which are above the
 average.',cr,Lf,Lf,Lf,0
prompt BYTE 'Number? ',0
number BYTE 20 DUP (?)
nbrArray DWORD maxNbrs DUP (?)
nbrElts DWORD ?
avgLabel BYTE cr,Lf,Lf,'The average is'
outValue BYTE 11 DUP (?), cr,Lf,0
aboveLabel BYTE cr,Lf,'Above average:',cr,Lf,Lf,0

.CODE
_start:
; input numbers into array

 output directions ; display directions
 mov nbrElts,0 ; nbrElts := 0
 lea ebx,nbrArray ; get address of nbrArray

whilePos: output prompt ; prompt for number
 input number,20 ; get number
 atod number ; convert to integer
 jng endWhile ; exit if not positive
 mov [ebx],eax ; store number in array
 inc nbrElts ; add 1 to nbrElts
 add ebx,4 ; get address of next item of array
 jmp whilePos ; repeat
endWhile:

; find sum and average

 mov eax,0 ; sum := 0
 lea ebx,nbrArray ; get address of nbrArray
 mov ecx,nbrElts ; count := nbrElts

 jecxz quit ; quit if no numbers
forCount1: add eax,[ebx] ; add number to sum
 add ebx,4 ; get address of next item of array
 loop forCount1 ; repeat nbrElts times

 cdq ; extend sum to quadword
 idiv nbrElts ; calculate average
 dtoa outValue,eax ; convert average to ASCII
 output avgLabel ; print label and average
 output aboveLabel ; print label for big numbers

; display numbers above average
 lea ebx,nbrArray ; get address of nbrArray
 mov ecx,nbrElts ; count := nbrElts

forCount2: cmp [ebx],eax ; doubleword > average ?
 jng endIfBig ; continue if average not less
 dtoa outValue,[ebx] ; convert value from array to
 ; ASCII
 output outValue ; display value
endIfBig:
 add ebx,4 ; get address of next item of array
 loop forCount2 ; repeat

quit: INVOKE ExitProcess, 0 ; exit with return code 0

PUBLIC _start ; make entry point public

 END ; end of source code

Figure 5.10: Program using array

The design statement "get address of first item of array" is implemented by the 80×86 statement


 lea ebx, nbrArray

The mnemonic lea stands for "load effective address." The lea instruction has the format


 lea destination, source

The destination will normally be a 32-bit general register; the source is any reference to memory. The address of the source is loaded into the register. (Contrast this with mov destination, source where the value at the source address is copied to the destination.) The lea instruction has opcode 8D takes one clock cycle on a Pentium, one or two on an 80486, and two on an 80386.

The design statement "get address of next item of array" is implemented using the 80×86 statement


 add ebx, 4

Since each doubleword occupies four bytes of storage, adding 4 to the address of the current element of an array gives the address of the next element of the array.

If one were planning to code this program in a high-level language, then the design of the first two loops might be


nbrElts := 0; { input numbers into array }
while number from keyboard > 0 loop
 add 1 to nbrElts;
 store number in nbrSrray[nbrElts];
end while;

sum := 0; { find sum and average }
for count := 1 to nbrElts loop
 add nbrArray[count] to sum;
end for;

This design exploits one of the principal features of arrays, namely that any element can be accessed at any time by simply giving its index; the elements do not have to be accessed sequentially. Such random access can be implemented using register indirect addressing. For example, the design statement "add nbrArray[count] to sum" can be implemented as follows, assuming the same register usage as before—the ECX register for count and the EAX register for sum.


 mov edx,ecx ; count
 dec edx ; count-1
 add edx,edx ; 2*(count-1)
 add edx,edx ; 4*(count-1)
 lea ebx,nbrArray ; starting address of array
 add ebx,edx ; address of nbrArray[count]
 add eax,[ebx] ; add array[count] to sum

The technique here is to calculate the number of bytes in the array prior to the desired element and add this number to the starting address. There are more efficient ways to directly access an array element; these will be covered in later chapters.

Exercises 5.5

Modify the program in Fig. 5.10, adding a loop that will display those elements of the array that are smaller than the average. (The numbers that are equal to the average should not be displayed by either loop.)
Modify the program in Fig. 5.10, replacing the last loop by one that displays all numbers that are within 5 of the average. Include values equal to average–5 or to average+5.
Modify the program in Fig. 5.10, adding a loop that will display the list of numbers backwards. (Hint: Find the address of nbrArray[nbrElts], display the element at this address first, and subtract 4 repeatedly until all elements are displayed.)
Modify the program in Fig. 5.10 to ensure that the user gives at most maxNbrs values.

Programming Exercises 5.5

It is often necessary to search an array for a given value. Write a complete program that inputs a collection of integers and then sequentially searches for values stored in the array. Implement the following design.


nbrElts := 0;
get address of first item of array;
while (number from keyboard > 0) loop
 convert number to 2's complement;
 store number at address in array;
 add 1 to nbrElts;
 get address of next item of array;
end while;
until (response = 'N') or (response = 'n')
 display "Search for? ";
 input keyValue;
 convert keyValue to 2's complement;
 get address of first item of array;
 count := 1;
 forever loop
 if count > nbrElts
 then
 display keyValue, "not in array";
 exit loop;
 end if;
 if keyValue = current element of array
 then
 display keyValue, "is element", count;
 exit loop;
 end if;
 add 1 to count;
 get address of next item of array;
 end loop;
 display "Search for another number? ";
 input response;
end until;

Programming Exercise 1 above shows one way to search an array. An alternative way is to put the value you are searching for at the end of the array. A search then always finds the value, and success or failure depends on whether the value was found before or after position nbrElts. Write a complete program that uses this technique. The design is the same as in Exercise 1 except for the body of the search loop; it is replaced by the following.


until (response = 'N') or (response = 'n')
 display "Search for? ";
 input keyValue;
 convert keyValue to 2's complement;
 store keyValue at position (nbrElts+1) in array;
 get address of first item of array;
 count := 1;

while keyValue not equal to current array element loop
 add 1 to count;
 get address of next word of array;
end while;

if count > nbrElts
then
 display keyValue, "not in array";
 exit loop;
 else
 display keyValue, "is element", count;
 exit loop;
 end if;

 display "Search for another number? ";
 input response;
end until;

There are many ways to determine prime numbers. Here is a design for one way to find the first 100 primes. Implement this design in 80×86 assembly language.


prime[1] := 2; { first prime number }
prime[2] := 3; { second prime number }
primeCount := 2;
candidate := 4; { first candidate for a new prime }
while primeCount < 100 loop
 index := 1;
 while (index ≤ primeCount)
 and (prime[index] does not evenly divide
 candidate)loop
 add 1 to index;
 end while;
 if (index > primeCount)
 then {no existing prime evenly divides the candidate, so it is a new prime}
 add 1 to primeCount;
 prime[primeCount] := candidate;
 end if;
 add 1 to candidate;
end while;

display "Prime Numbers";
for index := 1 to 100 loop {display the numbers 5 per line }
 display prime[index];
 if index is divisible by 5 then skip to a new line;
 end if;
end for;

5 6 Something Extra Pipelining

Chapter 2 discussed the central processing unit's basic operation cycle:

fetch an instruction from memory
decode the instruction
execute the instruction

A CPU must have circuitry to perform each of these functions. One of the things that computer designers have done to speed up CPU operation is to design CPUs with stages that can carry out these (and other) operations almost independently.

The first stage of the CPU might have the job of fetching the next instruction from memory, perhaps doing just enough decoding to recognize the number of bytes the instruction has and update the program counter PC. The first stage passes on information to the second stage whose job might be to finish decoding the instruction, perhaps also computing some operand addresses. Meanwhile the first stage can be fetching the next instruction from memory. The second stage could pass a fully-decoded instruction to the third stage for execution. Meanwhile, the first stage could have passed on its second instruction to stage two, so that the first stage can be fetching a third instruction. This sort of design is called a pipeline. If the pipeline is kept full, the resulting throughput of the CPU is three times faster than if it had to finish the complete fetch-decode-execute process for each instruction before proceeding to the next one.

Figure 5.11 illustrates the operation of a pipeline. The instructions being processed are shown as horizontal strips of three boxes labeled with 1, 2, and 3 to indicate stages. The horizontal axis shows time. You can see that at any given time parts of three instructions are being executed.

CPU Stage	Instruction being processed
1	1	2	3	4	5	6	7	8	9	10	11
2		1	2	3	4	5	6	7	8	9	10
3			1	2	3	4	5	6	7	8	9
Time interval	1	2	3	4	5	6	7	8	9	10	11

Figure 5.11: Instructions in a pipeline

A pipelined CPU is not as simple as illustrated above. One problem may occur if, say, stage 2 needs to compute an address based on the contents of a register modified by stage 3 of the previous instruction; the register might not yet contain the correct address. A CPU can be designed to avoid such problems, usually at the cost of a "hole" in the pipeline.

A more serious problem occurs when the CPU executes a conditional jump instruction. With a conditional jump the CPU cannot tell which of two possible sequences of instructions will be executed next until the condition itself is evaluated by the last stage. Earlier stages may be working on one instruction stream, only to be forced to discard all this work and refill the pipeline from the beginning with instructions from the alternative stream.

Chapter Summary

This chapter introduced 80×86 instructions that can be used to implement many high-level design or language features including if statements, various loops structures, and arrays.

The jmp instruction unconditionally transfers control to a destination statement. It has several versions, including one that jumps to a short destination 128 bytes before or 127 bytes after the jmp and one that jumps to a near destination a 32-bit displacement away. The jmp instruction is used in implementing various loop structures, typically transferring control back to the beginning of the loop, and in the if-then-else structure at the end of the "then code" to transfer control to endif so that the else code is not also executed. A jmp statement corresponds directly to the goto statement that is available in most high-level languages.

Conditional jump instructions examine the settings of one or more flags in the flag register and jump to a destination statement or fall through to the next instruction depending on the flag values. Conditional jump instructions have short and near displacement versions. There is a large collection of conditional jump instructions. They are used in if statements and loops, often in combination with compare instructions, to check Boolean conditions.

The cmp (compare) instructions have the sole purpose of setting or resetting flags in the EFLAGS register. Each compares two operands and assigns flag values. The comparison is done by subtracting the second operand from the first. The difference is not retained as it is with a sub instruction. Compare instructions often precede conditional jump instructions.

Loop structures like while, until, and for loops can be implemented using compare, jump, and conditional jump instructions. The loop instruction provides another way to implement many for loops. To use the loop instruction, a counter is placed in the ECX register prior to the start of the loop. The loop instruction itself is at the bottom of the loop body; it decrements the value in ECX and transfers control to a destination (normally the first statement of the body) if the new value in ECX is not zero. This results in the body of the loop being executed the number of times originally placed in the ECX register. The conditional jump jecxz instruction can be used to guard against executing such a loop when the initial counter value is zero.

Storage for an array can be reserved using the DUP directive in the data segment of a program. The elements of an array can be sequentially accessed by putting the address of the first element of the array in a register and adding the size of an array element repeatedly to get to the next element. The current element is referenced using register indirect addressing. The lea (load effective address) instruction is commonly used to load the initial address of the array.

Pipelining is done by a CPU with multiple stages that work on more than one instruction at a time, doing such tasks as fetching one, while decoding another, while executing a third. This can greatly speed up CPU operation.

Chapter 6 Procedures

Preface

Branching and Looping