Computers are frequently used to manipulate characters strings as well as numeric data. In data processing applications names, addresses, and so forth must be stored and sometimes rearranged. Text editor and word processor programs must be capable of searching for and moving strings of characters. An assembler must be able to separate assembly language statement elements, identifying those that are reserved mnemonics. Even when computation is primarily numerical, it is often necessary to convert either a character string to an internal numerical format when a number is entered at the keyboard or an internal format to a character string for display purposes.
An 80x86 microprocessor has instructions to manipulate character strings. The same instructions can manipulate strings of doublewords or words. This chapter covers 80x86 instructions that are used to handle strings, with emphasis on character strings. A variety of applications are given, including procedures that are similar to those in some high-level languages and the procedure called by the dtoa macro.
Five 80x86 instructions are designed for string manipulation: movs (move string), cmps (compare string), scas (scan string), stos (store string), and lods (load string). The movs instruction is used to copy a string from one memory location to another. The cmps instruction is designed to compare the contents of two strings. The scas instruction can be used to search a string for one particular value. The stos instruction can store a new value in some position of a string. Finally, the lods instruction copies a value out of some position of a string.
A string in the 80x86 architecture refers to a contiguous collection of bytes, words, or doublewords in memory. Strings are commonly defined in a program's data segment using such directives as
response BYTE 20 DUP (?) label1 BYTE 'The results are ', 0 wordString WORD 50 DUP (?) arrayD DWORD 60 DUP (0)
Note that strings and arrays are actually the same except for the way we look at them.
Each string instruction applies to a source string, a destination string, or both. The bytes, words, or doublewords of these strings are processed one at a time by the string instruction. Register indirect addressing is used to locate the individual byte, word, or doubleword elements. The 80x86 instructions access elements of the source string using the address in the source index register ESI. Elements in the destination string are accessed using the address in the destination index register EDI. If you program using a segmented memory model, then you also must know that the source element is in the data segment (at address DS:ESI) while the destination element is in the extra segment (at address ES:EDI). With flat memory model programming, both segment registers contain the same segment number and no distinction between segments.
Since the source and destination addresses of string elements are always given by ESI and EDI, respectively, no operands are needed to identify these locations. Without any operand, however, the assembler cannot tell the size of the string element to be used. For example, just movs by itself could say to move a byte, a word, or a doubleword. The Microsoft Macro Assembler offers two ways around this dilemma. The first method is to use destination and source operands; these are ignored except that MASM notes their type (both operands must be the same type) and uses that element size. The second method is to use special versions of the mnemonics that define the element size-instructions that operate on bytes use a b suffix, word string instructions use a w suffix, and doubleword string instructions use a d suffix. For example, movsb is used to move byte strings, movsw is used to move word strings, and movsd is used to move doubleword strings. Any of these instructions assemble as a movs and none uses an operand since the assembler knows the element size from the mnemonic. This book will use mnemonics with b, w, and d suffixes rather than operands for string instructions.
Although a string instruction operates on only one string element at a time, it always gets ready to operate on the next element. It does this by changing the source index register ESI and/or the destination index register EDI to contain the address of the next element of the string(s). When byte-size elements are being used, the index registers are changed by one; for words, ESI and EDI are changed by two; and for doublewords, the registers are changed by four. The 80x86 can move either forward through a string, from lower to higher addresses, or backward, from higher to lower addresses. The movement direction is determined by the value of the direction flag DF, bit 10 of the EFLAGS register. If DF is set to 1, then the addresses in ESI and EDI are decremented by string instructions, causing right to left string operations. If DF is clear (0), then the values in ESI and EDI are incremented by string instructions, so that strings are processed left to right.
The 80x86 has two instructions whose sole purpose is to reset or set the direction flag DF. The cld instruction clears DF to 0 so that ESI and EDI are incremented by string instructions and strings are processed left to right. The std instruction sets DF to 1 so that strings are processed backward. Neither instruction affects any flag other than DF. Technical data about these instructions appear in Fig. 7.1.
Clock Cycles |
|||||
---|---|---|---|---|---|
Instruction |
386 |
486 |
Pentium |
Number of Bytes |
Opcode |
cld |
2 |
2 |
2 |
1 |
FC |
std |
2 |
2 |
2 |
1 |
FD |
Figure 7.1: cld and std instructions
Finally it is time to present all the details about a string instruction. The move string instruction movs transfers one string element (byte, word, or doubleword) from a source string to a destination string. The source element at address DS:ESI is copied to address ES:EDI. After the string element is copied, both index registers are changed by the element size (1, 2, or 4), incremented if the direction flag DF is 0 or decremented if DF is 1. The movs instruction does not affect any flag. It comes in movsb, movsw, and movsd versions; Fig. 7.2 gives information about each form.
Clock Cycles |
||||||
---|---|---|---|---|---|---|
Mnemonic |
Element size |
386 |
486 |
Pentium |
Number of Bytes |
Opcode |
movsb |
byte |
7 |
7 |
4 |
1 |
A4 |
movsw |
word |
7 |
7 |
4 |
1 |
A5 |
movsd |
doubleword |
7 |
7 |
4 |
1 |
A5 |
Figure 7.2: movs instructions (use ESI and EDI)
Figure 7.3 gives an example of a program that uses the movs instruction. The important part of the example is the procedure strcopy. This procedure has two parameters passed on the stack, which give the destination and source addresses of byte or character strings. The source string is assumed to be null-terminated. Procedure strcopy produces an exact copy of the source string at the destination location, terminating the destination string by a null byte.
; test of "strcopy" procedure ; author: R. Detmer revised: 10/97 .386 .MODEL FLAT ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD INCLUDE io.h ; header file for input/output cr equ 0dh ; carriage return character Lf equ 0ah ; line feed .STACK 4096 ; reserve 4096-byte stack .DATA ; reserve storage for data prompt BYTE cr, Lf, "Original string? ",0 stringIn BYTE 80 DUP (?) display BYTE cr, Lf, "Your string was...", cr, Lf stringOut BYTE 80 DUP (?) .CODE _start: output prompt ; ask for string input stringIn, 80 ; get source string lea eax, stringOut ; destination address push eax ; first parameter lea eax, stringIn ; source push eax ; second parameter call strcopy ; call string copy procedure output display ; print result INVOKE ExitProcess, 0 ; exit with return code 0 PUBLIC _start ; make entry point public strcopy PROC NEAR32 ; Procedure to copy string until null byte in source is copied. ; It is assumed that destination location is long enough for copy. ; Parameters are passed on the stack: ; (1) address of destination ; (2) address of source push ebp ;save base pointer mov ebp,esp ;copy stack pointer push edi ;save registers and flags push esi pushf mov esi,[ebp+8] ;initial source address mov edi,[ebp+12] ;destination cld ;clear direction flag whileNoNull: cmp BYTE PTR [esi],0 ;null source byte? je endWhileNoNull ;stop copying if null movsb ;copy one byte jmp whileNoNull ;go check next byte endWhileNoNull: mov BYTE PTR [edi],0 ;terminate destination string popf ;restore flags and registers pop esi pop edi pop ebp ret 8 ;exit procedure, discarding parameters strcopy ENDP END
Figure 7.3: String copy program
The procedure only uses registers ESI and EDI. It saves each of these and the flag register on the stack so that the procedure will return with them unchanged. The index registers ESI and EDI must be initialized to the addresses of the first string bytes to be processed. The values for ESI and EDI are the arguments that were pushed on the stack. The direction flag is cleared for left-to-right copying.
After initialization, the procedure executes the following pseudocode design:
while next source byte is not null copy source byte to destination; increment source index; increment destination index; end while; put null byte at end of destination string;
To check whether the next source byte is null, the statement
cmp BYTE PTR [esi],0 ; null source byte?
is used. Recall that the notation [esi] indicates register indirect addressing, so that the element at the address in ESI is used, that is, the current byte of the source string. The operator BYTE PTR is necessary since MASM cannot tell from the operands [esi] and 0 whether byte, word, or doubleword comparison is needed. Copying the source byte and incrementing both index registers is accomplished by the movsb instruction. Finally,
mov BYTE PTR [edi],0 ; terminate destination string
serves to move a null byte to the end of the destination string since EDI was incremented after the last byte of the source was copied to the destination. Again, the operator BYTE PTR tells MASM that the destination is a byte rather than a word or doubleword.
The program to test strcopy simply inputs a string from the keyboard, calls strcopy to copy it somewhere else, and finally displays the string copy. The most interesting part of the code is the collection of instructions needed to call the procedure. The arguments are not removed from the stack since the procedure does that job.
Normally the source string for a movs instruction does not overlap the destination string. However, occasionally this is useful. Suppose that you want to initialize a 80 character-long string at starSlash with the pattern */, repeated 40 times. The following code can do this task.
starSlash BYTE 80 DUP (?) ... mov starSlash, '*' ; first * mov starSlash+1, '/' ; first / lea esi, starSlash ; source address lea edi, starSlash+2 ; destination cld ; process left to right mov ecx, 78 ; characters to copy forCount: movsb ; copy next character loop forCount ; repeat
In this example, the first time movsb is executed, a * from the first string position is copied to the third position. The next iteration, a / is copied from the second to the fourth position. The third time, a * is copied from the third to the fifth position, and so on. The next section introduces an easier way to repeat a movs instruction.
Exercises 7.1
.386 .MODEL FLAT ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD INCLUDE io.h ; header file for input/output cr equ 0dh ; carriage return character Lf equ 0ah ; line feed .STACK 4096 ; reserve 4096-byte stack .DATA ; global data string BYTE 'ABCDEFGHIJ' BYTE cr, Lf, 0 .CODE setup1 PROC NEAR32 lea esi, string ; beginning of string lea edi, string+5 ; address of 'F' cld ; forward movement ret setup1 ENDP _start: call setup1 ; set source, destination, direction movsb ; move 4 characters movsb movsb movsb output string ; display modified string INVOKE ExitProcess, 0 ; exit with return code 0 PUBLIC _start ; make entry point public END
setup2 PROC NEAR32 lea esi, string ; beginning of string lea edi, string+2 ; address of 'C' cld ; forward movement ret setup2 ENDP
setup3 PROC NEAR32 lea esi, string+9 ; end of string lea edi, string+4 ; address of 'E' std ; backward movement ret setup3 ENDP
setup4 PROC NEAR32 lea esi, string+9 ; end of string lea edi, string+7 ; address of 'H' std ; backward movement ret setup4 ENDP
Programming Exercises 7.1
Each 80x86 string instruction operates on one string element at a time. However, the 80x86 architecture includes three repeat prefixes that change the string instructions into versions that repeat automatically either for a fixed number of iterations or until some condition is satisfied. The three repeat prefixes actually correspond to two different single-byte codes; these are not themselves instructions, but supplement machine codes for the primitive string instructions, making new instructions.
Figure 7.4 shows two program fragments, each of which copies a fixed number of characters from sourceStr to destStr. The number of characters is loaded into the ECX register from count. The code in part (a) uses a loop. Since the count of characters might be zero, the loop is guarded by a jecxz instruction. The body of the loop uses movsb to copy one character at a time. The loop instruction takes care of counting loop iterations. The program fragment in part (b) is functionally equivalent to the one in part (a). After the count is copied into ECX, it uses the repeat prefix rep with a movsb instruction; the rep movsb instruction does the same thing as the last four lines in part (a).
lea esi, sourceStr ; source string lea edi, destStr ; destination cld ; forward movement mov ecx, count ; count of characters to copy jecxz endCopy ; skip loop if count is zero copy: movsb ; move 1 character loop copy ; decrement count and continue endCopy: (a) movsbiterated in a loop lea esi, sourceStr ; source string lea edi, destStr ; destination cld ; forward movement mov ecx, count ; count of characters to copy rep movsb ; move characters (b) repeat prefix with movsb
Figure 7.4: Copying a fixed number of characters of a string
The rep prefix is normally used with the movs instructions and with the stos instruction (discussed below). It causes the following design to be executed:
while count in ECX > 0 loop perform primitive instruction; decrement ECX by 1; end while;
Note that this is a while loop. The primitive instruction is not executed at all if ECX contains zero. It is not necessary to guard a repeated string instruction as it often is with an ordinary for loop implemented with the loop instruction.
The other two repeat prefixes are repe, with equivalent mnemonic repz, and repne, which is the same as repnz. The mnemonic repe stands for "repeat while equal" and repz stands for "repeat while zero." Similarly repne and repnz mean "repeat while not equal" and "repeat while not zero," respectively. Each of these repeat prefixes is appropriate for use with the two string instructions cmps and scas, which affect the zero flag ZF.
The names of these mnemonics partially describe their actions. Each instruction works the same as rep, iterating a primitive instruction while ECX is not zero. However, each also examines ZF after the string instruction is executed. The repe and repz continue iterating while ZF=1, as it would be following a comparison where two operands were equal. The repne and repnz continue iterating while ZF=0, as it would be following a comparison where two operands were different. Repeat prefixes themselves do not affect any flag. The three repeat prefixes are summarized in Fig. 7.5. Note that rep and repz (repe) generate exactly the same code.
Mnemonic |
Loop while |
Number of bytes |
Opcode |
---|---|---|---|
rep |
ECX>0 |
1 |
F3 |
repz/repe |
ECX>0 and ZF=1 |
1 |
F3 |
repnz/repne |
ECX>0 and ZF=0 |
1 |
F2 |
Figure 7.5: Repeat prefixes
The repz and repnz prefixes do not quite produce true while loops with the conditions shown in Fig. 7.5. The value in ECX is checked prior to the first iteration of the primitive instruction, as it should be with a while loop. However, ZF is not checked until after the primitive instruction is executed. In practice, this is very convenient since the instruction is skipped for a zero count, but the programmer does not have to do anything special to initialize ZF prior to repeated instructions.
Figure 7.6 shows how the repeat prefix rep combines with the movs instructions. In the clock cycles columns, there is a "set up" time plus a time for each iteration. The n in the table represents the number of iterations, so that, for example, a rep movsb takes 33 (13+4×5) clock cycles on a Pentium to move five bytes. (The table entries are not strictly accurate since there are special timings for the 486 and Pentium when n=0 or n=1.)
Element |
Clock Cycles |
Number |
||||
---|---|---|---|---|---|---|
Mnemonic |
size |
386 |
486 |
Pentium |
of bytes |
Opcode |
rep movsb |
byte |
7+4n |
12+3n |
13+4n |
2 |
F3 A4 |
rep movsw |
word |
F3 A5 |
||||
rep movsd |
doubleword |
F3 A5 |
Figure 7.6: rep movs instructions
The cmps instructions, summarized in Fig. 7.7, compare elements of source and destination strings. Chapter 5 explained how a cmp instruction subtracts two operands and sets flags based on the difference. Similarly, cmps subtracts two string elements and sets flags based on the difference; neither operand is changed. If a cmps instruction is used in a loop, it is appropriate to follow cmps by almost any of the conditional jump instructions, depending on the design being implemented.
Element |
Clock Cycles |
Number |
||||
---|---|---|---|---|---|---|
Mnemonic |
size |
386 |
486 |
Pentium |
of bytes |
Opcode |
cmpsb |
byte |
10 |
8 |
5 |
1 |
A6 |
cmpsw |
word |
A7 |
||||
cmpsd |
doubleword |
A7 |
||||
repe cmpsb |
byte |
5+9n |
7+7n |
9+4n |
2 |
F3 A6 |
repe cmpsw |
word |
F3 A7 |
||||
repe cmpsd |
doubleword |
F3 A7 |
||||
repne cmpsb |
byte |
5+9n |
7+7n |
9+4n |
2 |
F2 A6 |
repne cmpsw |
word |
F2 A7 |
||||
repne cmpsd |
doubleword |
F2 A7 |
Figure 7.7: cmps instructions
Repeat prefixes are often used with cmps instructions. In fact, for the task of finding if two strings are identical, the repe prefix is a perfect companion for cmps. Figure 7.7 summarizes all the cmps instructions, including repeated ones. Again, the timings are not strictly accurate for 486 and Pentium CPUs; for rep cmps there are special timings when n=0.
It is often necessary to search for one string embedded in another. Suppose that the task at hand is to find the position, if any, at which the string at key appears in the string at target. One simple algorithm to do this is
position := 1; while position < (targetLength - keyLength + 1) loop if key matches the substring of target starting at position then report success; exit process; end if; add 1 to position; end while; report failure;
This algorithm checks to see if the key string matches the portion of the target string starting at each possible position. Using 80x86 registers, checking for one match can be done as follows:
ESI := address of key; EDI := address of target + position - 1; ECX := length of key; forever loop if ECX = 0 then exit loop; end if; compare [ESI] and [EDI] setting ZF; increment ESI; increment EDI; decrement ECX; if ZF = 0 then exit loop; end if; end loop; if ZF = 1 then match was found; end if;
The forever loop is exactly what is done by the repeated string instruction repe cmpsb. Since the loop is terminated when either ECX = 0 or when ZF = 0, it is necessary to be sure that the last pair of characters compared were the same; this is the reason for the extra if structure at the end of the design. Figure 7.8 shows a complete program that implements this design.
; program to search for one string embedded in another ; author: R. Detmer revised: 10/97 .386 .MODEL FLAT ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD INCLUDE io.h cr EQU 0dh ; carriage return character Lf EQU 0ah ; linefeed character .STACK 4096 ; reserve 4096-byte stack .DATA prompt1 BYTE "String to search? ", 0 prompt2 BYTE cr, Lf, "Key to search for? ", 0 target BYTE 80 DUP (?) key BYTE 80 DUP (?) trgtLength DWORD ? keyLength DWORD ? lastPosn DWORD ? failure BYTE cr,Lf,Lf,"The key does not appear in the string.",cr,Lf,0 success BYTE cr,Lf,Lf,'The key appears at position' position BYTE 11 DUP (?) BYTE " in the string.", cr, Lf, 0 PUBLIC _start ; make entry point public .CODE _start: output prompt1 ; ask for input target,80 ; and input target string lea eax, target ; find length of string push eax ; length parameter call strlen mov trgtLength,eax ; save length of target output prompt2 ; ask for input key,80 ; and input key string lea eax, key ; find length of string push eax ; length parameter call strlen mov keyLength,eax ; save length of key ; calculate last position of target to check mov eax,trgtLength sub eax,keyLength inc eax ; trgtLength − keyLength + 1 mov lastPosn, eax cld ; left to right comparison mov eax,1 ; starting position whilePosn: cmp eax,lastPosn ; position <= last_posn? jnle endWhilePosn ; exit if past last position lea esi,target ; address of target string add esi,eax ; add position dec esi ; address of position to check lea edi,key ; address of key mov ecx,keyLength ; number of positions to check repe cmpsb ; check jz found ; exit on success inc eax ; increment position jmp whilePosn ; repeat endWhilePosn: output failure ; the search failed jmp quit ; exit found: dtoa position,eax ; convert position to ASCII output success ; search succeeded quit: INVOKE ExitProcess, 0 ; exit with return code 0 strlen PROC NEAR32 ; find length of string whose address is passed on stack ; length returned in EAX push ebp ; establish stack frame mov ebp, esp pushf ; save flags push ebx ; and EBX sub eax, eax ; length := 0 mov ebx, [ebp+8] ; address of string whileChar: cmp BYTE PTR [ebx], 0 ; null byte? je endWhileChar ; exit if so inc eax ; increment length inc ebx ; point at next character jmp whileChar ; repeat endWhileChar: pop ebx ; restore registers and flags popf pop ebp ret 4 ; return, discarding parameter strlen ENDP END
Figure 7.8: String search program
The scan string instruction scas is used to scan a string for the presence or absence of a particular string element. The string that is examined is a destination string; that is, the address of the element being examined is in the destination index register EDI. With a scasb instruction, the element searched for is the byte in the AL register; with a scasw, it is the word in the AX register; and with a scasd, it is the doubleword in the EAX register. The scasb, scasw, and scasd instructions use no operand since the mnemonics tell the element size. Figure 7.9 summarizes the scas instructions; as with the previous repeated instructions, there are special timings for n=0 on the 486 and Pentium.
Element |
Clock Cycles |
Number |
||||
---|---|---|---|---|---|---|
Mnemonic |
size |
386 |
486 |
Pentium |
of bytes |
Opcode |
scasb |
byte |
7 |
6 |
4 |
1 |
AE |
scasw |
word |
AF |
||||
scasd |
doubleword |
AF |
||||
repe scasb |
byte |
5+8n |
7+5n |
9+4n |
2 |
F3 AE |
repe scasw |
word |
F3 AF |
||||
repe scasd |
doubleword |
F3 AF |
||||
repne scasb |
byte |
5+8n |
7+5n |
9+4n |
2 |
F2 AE |
repne scasw |
word |
F2 AF |
||||
repne scasd |
doubleword |
F2 AF |
Figure 7.9: scas instructions (use EDI)
The program shown in Fig. 7.10 inputs a string and a character and uses repne scasb to locate the position of the first occurrence of the character in the string. It then displays the part of the string from the character to the end. The length of the string is calculated using the strlen procedure that previously appeared in Fig. 7.8; this time we assume that strlen is separately assembled. The lea instruction is used to load the offset of the string to be searched and cld ensures a forward search.
; Program to locate a character within a string. ; The string is displayed from the character to the end. ; author: R. Detmer revised: 10/97 .386 .MODEL FLAT ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD INCLUDE io.h EXTRN strlen:NEAR32 PUBLIC _start cr EQU 0dh ; carriage return character Lf EQU 0ah ; linefeed character .STACK 4096 ; reserve 4096-byte stack .DATA prompt1 BYTE "String? ", 0 prompt2 BYTE cr, Lf, Lf, "Character? ", 0 string BYTE 80 DUP (?) char BYTE 5 DUP (?) label1 BYTE cr, Lf, Lf, "The rest of the string is ---", 0 crlf BYTE cr, Lf, 0 .CODE _start: output prompt1 ; prompt for string input string,80 ; get string lea eax, string ; find length of string push eax ; length parameter call strlen inc ecx ; include null in string length mov ecx, eax ; save length of string output prompt2 ; prompt for character input char,5 ; get character mov al, char ; character to AL lea edi, string ; offset of string cld ; forward movement repne scasb ; scan while character not found dec edi ; back up to null or matching character output label1 ; print label output [edi] ; output string output crlf ; skip to new line INVOKE ExitProcess, 0 ; exit with return code 0 END
Figure 7.10: Program to find character in string
After the search, the destination index EDI will be one greater than desired since a string instruction always increments index registers whether or not flags were set. If the search succeeded, EDI will contain the address of the character following the one that matched with AL, or the address of the character after the end of the string if ECX was decremented to zero. The dec edi instruction takes care of both cases, backing up to the position of the matching character if there was one, or to the null byte at the end of the string otherwise. The string length was incremented so that the null character would be included in the search. The output macro displays the last portion of the string, whose address is in EDI.
The store string instruction stos copies a byte, a word, or a doubleword from AL, AX, or EAX to an element of a destination string. A stos instruction affects no flag, so that when it is repeated with rep, it copies the same value into consecutive positions of a string. For example, the following code will store spaces in the first 30 bytes of string.
mov ecx,30 ; 30 bytes mov al, ' ' ; character to store lea edi, string ; address of string cld ; forward direction rep stosb ; store spaces
Information about the stos instructions is in Fig. 7.11. As with previous repeated string instructions, the 80486 and Pentium have special timings when n=0.
Element |
Clock Cycles |
Number |
||||
---|---|---|---|---|---|---|
Mnemonic |
size |
386 |
486 |
Pentium |
of bytes |
Opcode |
stosb |
byte |
4 |
5 |
3 |
1 |
AA |
stosw |
word |
AB |
||||
stosd |
doubleword |
AB |
||||
rep stosb |
byte |
5+5n |
7+4n |
9n |
2 |
F3 A6 |
rep stosw |
word |
F3 A7 |
||||
rep stosd |
doubleword |
F3 A7 |
Figure 7.11: stos instructions (use EDI)
The load string instruction lods is the final string instruction. This instruction copies a source string element to the AL, AX, or EAX register, depending on the string element size. A lods instruction sets no flag. It is possible to use a rep prefix with lods but it is not helpful-all values except for the last string element would be replaced as successive values were copied to the destination register. A lods instruction is useful in a loop set up with other instructions, making it possible to easily process string elements one at a time. The lods instructions are summarized in Fig. 7.12. Repeated versions are not included since they are not used.
Element |
Clock Cycles |
Number |
||||
---|---|---|---|---|---|---|
Mnemonic |
size |
386 |
486 |
Pentium |
of bytes |
Opcode |
lodsb |
byte |
5 |
5 |
2 |
1 |
AC |
lodsw |
word |
AD |
||||
lodsd |
doubleword |
AD |
Figure 7.12: lods instructions (use ESI)
Exercises 7.2
For each exercise below, assume that the data segment contains
source BYTE "brown" dest BYTE "brine"
lea esi, source lea edi, dest cld mov ecx, 5 repne cmpsb
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the repne cmpsb instruction? What will be stored in ECX?
lea esi, source lea edi, dest cld mov ecx, 5 repe cmpsb
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the repe cmpsb instruction? What will be stored in ECX?
mov al, 'w' lea edi, dest cld mov ecx, 5 repe scasb
Assuming that EDI starts at 00010005, what will be the value stored in EDI following the repe scasb instruction? What will be stored in ECX?
mov al, 'n' lea edi, dest cld mov ecx, 5 repne scasb
Assuming that EDI starts at 00010005, what will be the value stored in EDI following the repne scasb instruction? What will be stored in ECX?
mov al, '*' lea edi, dest cld mov ecx, 5 rep stosb
Assuming that EDI starts at 00010005, what will be the value stored in EDI following the rep stosb instruction? What will be stored in ECX? What will be stored in the destination string?
lea esi, source lea edi, dest cld mov ecx, 5 for6: lodsb inc al stosb loop for6 endFor6:
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the for loop? What will be stored in ECX? What will be stored in the destination string?
lea esi, source lea edi, dest cld mov ecx, 3 rep movsb
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the rep movsb instruction? What will be stored in ECX? What will be stored in the destination string?
lea esi, source+4 lea edi, dest+4 std mov ecx, 3 rep movsb
Assuming that ESI starts at 00010010 and EDI starts at 00010015, what will be the values stored in ESI and EDI following the rep movsb instruction? What will be stored in ECX? What will be stored in the destination string?
Programming Exercises 7.2
Sometimes character data are available in one format but need to be in another format for processing. One instance of this occurs when characters are transmitted between two computer systems, one normally using ASCII character codes and the other normally using EBCDIC character codes. Another time character codes need to be altered is to transmit them to a device that cannot process all possible codes; it is sometimes easier to replace the unsuitable codes by acceptable codes than to delete them entirely.
The 80x86 instruction set includes the xlat instruction to translate one character to another character. In combination with other string-processing instructions, it can easily translate all the characters in a string.
The xlat instruction requires only one byte of object code, the opcode D7. It takes five clock cycles to execute on an 80386, and four clock cycles on an 80486 or a Pentium. Prior to execution, the character to be translated is in the AL register. The instruction works by using a translation table in the data segment to look up the translation of the byte in AL. This translation table normally contains 256 bytes of data, one for each possible 8-bit value in AL. The byte at offset zero in the table-the first byte-is the character to which 00 is translated. The byte at offset one is the character to which 01 is translated. In general xlat uses the character being translated as an offset into the table, and the byte at that offset then replaces the character in AL.
The xlat instruction has no operand. The EBX register must contain the address of the translation table.
Figure 7.13 illustrates a short program which translates each character of string in place; that is, it replaces each character by its translation using the original location in memory. The heart of the program is the translation table and the sequence of instructions
; Translate uppercase letters to lowercase; don't change lower ; case letters and digits. Translate other characters to spaces. ; author: R. Detmer revised: 10/97 .386 .MODEL FLAT ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD INCLUDE io.h PUBLIC _start cr EQU 0dh ; carriage return character Lf EQU 0ah ; linefeed character .STACK 4096 ; reserve 4096-byte stack .DATA string BYTE 'This is a #!$& STRING',0 strLength EQU $ - string - 1 label1 BYTE 'Original string ->',0 label2 BYTE cr, Lf, 'Translated string ->',0 crlf BYTE cr, Lf, 0 table BYTE 48 DUP (' '), '0123456789', 7 DUP (' ') BYTE 'abcdefghijklmnopqrstuvwxyz', 6 DUP (' ') BYTE 'abcdefghijklmnopqrstuvwxyz', 133 DUP (' ') .CODE _start: output label1 ; display original string output string output crlf mov ecx, strLength ; string length lea ebx, table ; address of translation table lea esi, string ; address of string lea edi, string ; destination also string forIndex: lodsb ; copy next character to AL xlat ; translate character stosb ; copy character back into string loop forIndex ; repeat for all characters output label2 ; display altered string output string output crlf INVOKE ExitProcess, 0 END
Figure 7.13: Translation program
mov ecx, strLength ; string length lea ebx, table ; address of translation table lea esi, string ; address of string lea edi, string ; destination also string forIndex: lodsb ; copy next character to AL xlat ; translate character stosb ; copy character back into string loop forIndex ; repeat for all characters
These instructions implement a for loop with the design
for index := 1 to stringLength loop load source character into AL; translate character in AL; copy character in AL to destination; end for;
One new feature in this program is the use of the location counter symbol $. Recall that the assembler calculates addresses as if they start at 00000000, and increments a counter every time it generates bytes of object code. The dollar sign symbol refers to the value of this counter at the time it is encountered in assembly. In this particular program, it will be at the address just beyond the null byte of the string. Since the symbol string actually references its address, the expression string - $ is the length of string, including the null byte. The value equated to strLength is string - $ - 1, which excludes the null byte. The assembly process will be discussed more in Chapter 9.
Each ASCII code is translated to another ASCII code by this program. Uppercase letters are translated to lowercase, lowercase letters and digits are unchanged, and all other characters are translated to spaces. Construction of such a table involves looking at a table of ASCII codes (see Appendix A). For this program the translation table is defined by
table BYTE 48 DUP (' '), '0123456789', 7 DUP (' ') BYTE 'abcdefghijklmnopqrstuvwxyz', 6 DUP (' ') BYTE 'abcdefghijklmnopqrstuvwxyz', 133 DUP (' ')
Careful counting will show that exactly 256 bytes are defined. Recall that a BYTE directive stores the ASCII code of each character operand. Each of the first 48 bytes of the table will contain the ASCII code for a space (i.e., blank), 2016. Therefore if the code in the AL register represents any of the first 48 ASCII characters-a control character, or one of the printable characters from 2016 (space) to 2F16 (/)-it will be translated to a space.
Note that it is legal to translate a character to itself. Indeed, this is what will happen for digits; the ASCII codes 3016 to 3916 for digits 0 through 9 appear at offsets 3016 to 3916. The codes for the seven characters : through @ are next in an ASCII chart; each of these will be translated to a space. The next ASCII characters are the uppercase letters and the next entries in the table are codes for the lowercase letters. For example, the table contains 6116 at offset 4116, so an uppercase A (ASCII code 4116) will be translated to a lower case a (ASCII code 6116). The next six blanks are at the offsets 9116 ([) through 9616 (‘), so that each of these characters is translated to a blank. The ASCII code for each lowercase letter is assembled at an offset equal to its value, so each lowercase letter is translated to itself. Finally, the translation table contains 133 ASCII codes for blanks; these are the destinations for {, |, }, ~, DEL, and each of the 128 bit patterns starting with a one, none of them codes for ASCII characters.
Figure 7.14 shows the output of the program in Fig. 7.13. Notice that "strange" characters are not deleted, they are replaced by blanks.
Original string ->This is a #!$& STRING Translated string ->this is a string
Figure 7.14: Output from translation program
Exercises 7.3
81 a |
C1 A |
40 space |
82 b |
C2 B |
4B . |
83 c |
C3 C |
6B , |
84 d |
C4 D |
|
85 e |
C5 E |
F0 0 |
86 f |
C6 F |
F1 1 |
87 g |
C7 G |
F2 2 |
88 h |
C8 H |
F3 3 |
89 i |
C9 I |
F4 4 |
91 j |
D1 J |
F5 5 |
92 k |
D2 K |
F6 6 |
93 l |
D3 L |
F7 7 |
94 m |
D4 M |
F8 8 |
95 n |
D5 N |
F9 9 |
96 o |
D6 O |
|
97 p |
D7 P |
|
98 q |
D8 Q |
|
99 r |
D9 R |
|
A2 s |
E2 S |
|
A3 t |
E3 T |
|
A4 u |
E4 U |
|
A5 v |
E5 V |
|
A6 w |
E6 W |
|
A7 X |
E7 X |
|
A8 y |
E8 Y |
|
A9 z |
E9 Z |
Give a translation table that would be suitable for xlat translation of EBCDIC codes for letters, digits, space, period, and comma to the corresponding ASCII codes, translating every other EBCDIC code to a null character.
movzx eax, al ; clear high order bits in EAX mov al, [ebx+eax] ; copy new character from table to AL
Given that [ebx+eax] references the memory byte at the address that is the sum of the contents of EBX and EAX, explain why this pair of instructions is equivalent to a single xlat instruction.
Programming Exercises 7.3
The dtoa and itoa macros have been used to convert 2’s complement integers to strings of ASCII characters for output. The code for these operations is similar. In this section we examine the slightly shorter code for itoa.
The itoa macro expands into the following sequence of instructions.
push ebx ; save EBX mov bx, source push bx ; source parameter lea ebx, dest ; destination address push ebx ; destination parameter call itoaproc ; call itoaproc(source,dest) pop ebx ; restore EBX
These instructions call procedure itoaproc after pushing the source value and the destination address on the stack. The actual source and destination are used in the expanded macro, not the names source and dest. So that the user does not need to worry about any register contents being altered, EBX is initially saved on the stack and is restored at the end of the sequence. The parameters are removed from the stack by procedure itoaproc since the alternative add esp,6 following the call instruction potentially changes the flags.
The real work of 2’s complement integer to ASCII conversion is done by the procedure itoaproc. The assembled version of this procedure is contained in the file IO.OBJ. The source code from file IO.ASM is shown in Fig. 7.15. The procedure begins by saving all of the registers that it alters on the stack; the flag register is also saved so that the procedure call to itoaproc will not change flag settings. The flag register and other registers are restored immediately before returning from the procedure.
; itoaproc(source, dest) ; convert integer (source) to string of 6 characters at destination address itoaproc PROC NEAR32 push ebp ; save base pointer mov ebp, esp ; establish stack frame push eax ; Save registers push ebx ; used by push ecx ; procedure push edx push edi pushf ; save flags mov ax, [ebp+12] ; first parameter (source integer) mov edi, [ebp+8] ; second parameter (dest address) ifSpecial: cmp ax,8000h ; special case −32,768? jne EndIfSpecial ; if not, then normal case mov BYTE PTR [edi],'-' ; manually put in ASCII codes mov BYTE PTR [edi+1],'3' ; for −32,768 mov BYTE PTR [edi+2],'2' mov BYTE PTR [edi+3],'7' mov BYTE PTR [edi+4],'6' mov BYTE PTR [edi+5],'8' jmp ExitIToA ; done with special case EndIfSpecial: mov dx, ax ; save source number mov al,' ' ; put blanks in mov ecx,5 ; first five cld ; bytes of rep stosb ; destination field mov ax, dx ; copy source number mov cl,' ' ; default sign (blank for +) IfNeg: cmp ax,0 ; check sign of number jge EndIfNeg ; skip if not negative mov cl,'-' ; sign for negative number neg ax ; number in AX now >= 0 EndIfNeg: mov bx,10 ; divisor WhileMore: mov dx,0 ; extend number to doubleword div bx ; divide by 10 add dl,30h ; convert remainder to character mov [edi],dl ; put character in string dec edi ; move forward to next position cmp ax,0 ; check quotient jnz WhileMore ; continue if quotient not zero mov [edi],cl ; insert blank or "-" for sign ExitIToA: popf ; restore flags and registers pop edi pop edx pop ecx pop ebx pop eax pop ebp ret 6 ;exit, discarding parameters itoaproc ENDP
Figure 7.15: Integer to ASCII conversion procedure
The basic idea of the procedure is to build a string of characters right to left by repeatedly dividing the number by 10, using the remainder to determine the rightmostcharacter. For instance, dividing the number 2895 (0B4F16) by 10 gives a remainder of 5 and a quotient of 289 (012116), the last digit of the number and a new number with which to repeat the process. This scheme works nicely for positive numbers, but a negative number must be changed to its absolute value before starting the division loop. To complicate things further, the bit pattern 800016 represents the negative number −32,76810, but +32,768 cannot be represented in 2’s complement form in a 16-bit word.
After standard entry code, the value parameter is copied to AX and the destination address to EDI. The procedure then checks for the special case 800016. If this is the value, then the ASCII codes for 32768 are moved one at a time to the destination, using the fact that the destination address is in EDI. The location for the minus sign is in the EDI register, so register indirect addressing can be used to put this character in the correct memory byte. The location for the character 3 is one byte beyond the address contained in EDI; this address is referenced by [edi+1]. The remaining four characters are similarly put in place, and the procedure is exited.
The next step of the procedure is to put five leading blanks in the six-byte-long destination field. The procedure does this with a rep stosb, which uses EDI to point to successive bytes in destination field. Note that EDI is left pointing at the last byte of the destination field.
The procedure next stores the correct "sign" in the CL register. A blank is used for a number greater than or equal to zero, and minus character (−) is used for a negative number. A negative number is also negated, giving its absolute value for subsequent processing.
Finally the main idea is executed. The divisor 10 is placed in the BX register. The non-negative number is extended to a doubleword by moving zeros to DX. Division by 10 in BX gives a remainder from 0 to 9 in DX, the last decimal digit of the number. This is converted to the corresponding ASCII code by adding 3016; recall that the ASCII codes for digits 0 through 9 are 3016 through 3916. A mov using register indirect addressing puts the character in place in the destination string, and EDI is decremented to point at the next position to the left.
This process is repeated until the quotient is zero. Finally the "sign" stored in CL (blank or –) is copied to the immediate left of the last code for a digit. Other positions to the left, if any, were previously filled with blanks.
Exercises 7.4
Programming Exercises 7.4
The number will be converted to a string of ASCII characters starting at the offset in the data segment. Do not use a blank in front of a positive number. If the length is less than or equal to the actual number of characters needed to display the number, fill the entire field with pound signs (#). If the length is larger than needed, pad with extra spaces to the left of the number. The procedure will remove parameters from the stack and must modify no register.
The procedure will remove parameters from the stack and must modify no register. (The remainder upon division by 16 produces a decimal value corresponding to the rightmost hex digit.)
The procedure will remove parameters from the stack and must modify no register. (The remainder upon division by 2 gives the rightmost bit.)
Early digital computers had very simple instruction sets. When designers began to use microcode to implement instructions in the 1960s, it became possible to have much more complex instructions. At the same time high-level programming languages were becoming popular, but language compilers were fairly primitive. This made it desirable to have machine language statements that almost directly implemented high-level language statements, increasing the pressure to produce computer architectures with many complex instructions.
The Intel 80x86 machines use complex instruction set computer (CISC) designs. Instructions such as the string instructions discussed in this chapter would never have appeared in early computers. CISC machines also have a variety of memory addressing modes, and the 80x86 family is typical in this respect, although you have only seen a few of its modes so far. Often CISC instructions take several clock cycles to execute.
Reduced instruction set computer (RISC) designs began to appear in the 1980s. These machines have relatively few instructions and few memory addressing modes. Their instructions are so simple that any one can be executed in a single clock cycle. As compiler technology improved, it became possible to produce efficient code for RISC machines. Of course, it often takes many more instructions to implement a given high-level language statement on a RISC than on a CISC machine, but the overall operation is often faster because of the speed with which individual instructions execute.
In RISC architectures, instructions are all the same format; that is, the same number of bytes are encoded in a common pattern. This is not the case with CISC architectures. If the 80x86 chips were RISC designs, then this book would have no questions asking "How many clock cycles?" or "How many bytes?" When we look at the many 80x86 instruction formats in Chapter 9, you may wish for the simplicity of a RISC machine.
One unusual feature of many RISC designs is a relatively large collection of registers (sometimes over 500), of which only a small number (often 32) are visible at one time. Registers are used to pass parameters to procedures, and the registers that are used to store arguments in the calling program overlap the registers that are used to receive the parameter values in the procedure. This provides a simple but very efficient method of communication between a calling program and a procedure.
There are proponents of both CICS and RISC designs. At this point in time it is not obvious that one is clearly better than the other. However, the popular Intel 80x86 and Motorola 680x0 families are both CISC designs, so we will be dealing with CISC systems at least in the near future.
The word string refers to a collection of consecutive bytes, words, or doublewords in memory. The 80x86 instruction set includes five instructions for operating on strings: movs (to move or copy a string from a source to a destination location), cmps (to compare two strings), scas (to scan a string for a particular element), stos (to store a given value in a string), and lods (to copy a string element into EAX, AX, or AL). Each of these has mnemonic forms ending with b, w, or d to give the size of the string element.
A string instruction operates on one string element at a time. When a source string is involved, the source index register ESI contains the address of the string element. When a destination string is involved, the destination index register EDI contains the address of the string element. An index register is incremented or decremented after the string element is accessed, depending on whether the direction flag DF is reset to zero or set to one; the cld and std instructions are used to give the direction flag a desired value.
Repeat prefixes rep, repe (repz), and repne (repnz) are used with some string instructions to cause them to repeat automatically. The number of times to execute a primitive instruction is placed in the ECX register. The conditional repeat forms use the count in ECX but will also terminate instruction execution if the zero flag gets a certain value; these are appropriate for use with the cmps and scas instructions that set or reset ZF.
The xlat instruction is used to translate the characters of a string. It requires a 256-byte-long translation table that starts with the destination byte to which the source byte 00 is translated and ends with the destination byte to which the source byte FF is translated. The xlat instruction can be used for such applications as changing ASCII codes to EBCDIC codes or for changing the case of letters within a given character coding system.
The itoa macro expands to code that calls a procedure itoaproc. Basically this procedure works by repeatedly dividing a non-negative number by 10 and using the remainder to get the rightmost character of the destination string.
The 80x86 chips are examples of complex instruction set computer (CISC) architecture. They include many complex instructions and offer many different addressing modes. Reduced instruction set computer (RISC) architectures implement fewer and simpler instructions and have more limited addressing options. Even though RISC computers take more instructions to accomplish a task, they are usually quite fast since they execute their simple instructions very rapidly.
Preface