String Operations

Table of contents:

Computers are frequently used to manipulate characters strings as well as numeric data. In data processing applications names, addresses, and so forth must be stored and sometimes rearranged. Text editor and word processor programs must be capable of searching for and moving strings of characters. An assembler must be able to separate assembly language statement elements, identifying those that are reserved mnemonics. Even when computation is primarily numerical, it is often necessary to convert either a character string to an internal numerical format when a number is entered at the keyboard or an internal format to a character string for display purposes.

An 80x86 microprocessor has instructions to manipulate character strings. The same instructions can manipulate strings of doublewords or words. This chapter covers 80x86 instructions that are used to handle strings, with emphasis on character strings. A variety of applications are given, including procedures that are similar to those in some high-level languages and the procedure called by the dtoa macro.

Using String Instructions

Five 80x86 instructions are designed for string manipulation: movs (move string), cmps (compare string), scas (scan string), stos (store string), and lods (load string). The movs instruction is used to copy a string from one memory location to another. The cmps instruction is designed to compare the contents of two strings. The scas instruction can be used to search a string for one particular value. The stos instruction can store a new value in some position of a string. Finally, the lods instruction copies a value out of some position of a string.

A string in the 80x86 architecture refers to a contiguous collection of bytes, words, or doublewords in memory. Strings are commonly defined in a program's data segment using such directives as

 response BYTE 20 DUP (?)
 label1 BYTE 'The results are ', 0
 wordString WORD 50 DUP (?)
 arrayD DWORD 60 DUP (0)

Note that strings and arrays are actually the same except for the way we look at them.

Each string instruction applies to a source string, a destination string, or both. The bytes, words, or doublewords of these strings are processed one at a time by the string instruction. Register indirect addressing is used to locate the individual byte, word, or doubleword elements. The 80x86 instructions access elements of the source string using the address in the source index register ESI. Elements in the destination string are accessed using the address in the destination index register EDI. If you program using a segmented memory model, then you also must know that the source element is in the data segment (at address DS:ESI) while the destination element is in the extra segment (at address ES:EDI). With flat memory model programming, both segment registers contain the same segment number and no distinction between segments.

Since the source and destination addresses of string elements are always given by ESI and EDI, respectively, no operands are needed to identify these locations. Without any operand, however, the assembler cannot tell the size of the string element to be used. For example, just movs by itself could say to move a byte, a word, or a doubleword. The Microsoft Macro Assembler offers two ways around this dilemma. The first method is to use destination and source operands; these are ignored except that MASM notes their type (both operands must be the same type) and uses that element size. The second method is to use special versions of the mnemonics that define the element size-instructions that operate on bytes use a b suffix, word string instructions use a w suffix, and doubleword string instructions use a d suffix. For example, movsb is used to move byte strings, movsw is used to move word strings, and movsd is used to move doubleword strings. Any of these instructions assemble as a movs and none uses an operand since the assembler knows the element size from the mnemonic. This book will use mnemonics with b, w, and d suffixes rather than operands for string instructions.

Although a string instruction operates on only one string element at a time, it always gets ready to operate on the next element. It does this by changing the source index register ESI and/or the destination index register EDI to contain the address of the next element of the string(s). When byte-size elements are being used, the index registers are changed by one; for words, ESI and EDI are changed by two; and for doublewords, the registers are changed by four. The 80x86 can move either forward through a string, from lower to higher addresses, or backward, from higher to lower addresses. The movement direction is determined by the value of the direction flag DF, bit 10 of the EFLAGS register. If DF is set to 1, then the addresses in ESI and EDI are decremented by string instructions, causing right to left string operations. If DF is clear (0), then the values in ESI and EDI are incremented by string instructions, so that strings are processed left to right.

The 80x86 has two instructions whose sole purpose is to reset or set the direction flag DF. The cld instruction clears DF to 0 so that ESI and EDI are incremented by string instructions and strings are processed left to right. The std instruction sets DF to 1 so that strings are processed backward. Neither instruction affects any flag other than DF. Technical data about these instructions appear in Fig. 7.1.

	Clock Cycles
Instruction	386	486	Pentium	Number of Bytes	Opcode
cld	2	2	2	1	FC
std	2	2	2	1	FD

Figure 7.1: cld and std instructions

Finally it is time to present all the details about a string instruction. The move string instruction movs transfers one string element (byte, word, or doubleword) from a source string to a destination string. The source element at address DS:ESI is copied to address ES:EDI. After the string element is copied, both index registers are changed by the element size (1, 2, or 4), incremented if the direction flag DF is 0 or decremented if DF is 1. The movs instruction does not affect any flag. It comes in movsb, movsw, and movsd versions; Fig. 7.2 gives information about each form.

		Clock Cycles
Mnemonic	Element size	386	486	Pentium	Number of Bytes	Opcode
movsb	byte	7	7	4	1	A4
movsw	word	7	7	4	1	A5
movsd	doubleword	7	7	4	1	A5

Figure 7.2: movs instructions (use ESI and EDI)

Figure 7.3 gives an example of a program that uses the movs instruction. The important part of the example is the procedure strcopy. This procedure has two parameters passed on the stack, which give the destination and source addresses of byte or character strings. The source string is assumed to be null-terminated. Procedure strcopy produces an exact copy of the source string at the destination location, terminating the destination string by a null byte.

; test of "strcopy" procedure
; author: R. Detmer revised: 10/97

.386
.MODEL FLAT
ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h ; header file for input/output
cr equ 0dh ; carriage return character
Lf equ 0ah ; line feed

.STACK 4096 ; reserve 4096-byte stack

.DATA ; reserve storage for data
prompt BYTE cr, Lf, "Original string? ",0
stringIn BYTE 80 DUP (?)
display BYTE cr, Lf, "Your string was...", cr, Lf
stringOut BYTE 80 DUP (?)

.CODE
_start: output prompt ; ask for string
 input stringIn, 80 ; get source string
 lea eax, stringOut ; destination address
 push eax ; first parameter
 lea eax, stringIn ; source
 push eax ; second parameter
 call strcopy ; call string copy procedure
 output display ; print result
 INVOKE ExitProcess, 0 ; exit with return code 0

PUBLIC _start ; make entry point public

strcopy PROC NEAR32

; Procedure to copy string until null byte in source is copied.
; It is assumed that destination location is long enough for copy.

; Parameters are passed on the stack:
; (1) address of destination
; (2) address of source
 push ebp ;save base pointer
 mov ebp,esp ;copy stack pointer

 push edi ;save registers and flags
 push esi
 pushf

 mov esi,[ebp+8] ;initial source address
 mov edi,[ebp+12] ;destination
 cld ;clear direction flag
whileNoNull:
 cmp BYTE PTR [esi],0 ;null source byte?
 je endWhileNoNull ;stop copying if null
 movsb ;copy one byte
 jmp whileNoNull ;go check next byte
endWhileNoNull:
 mov BYTE PTR [edi],0 ;terminate destination string

 popf ;restore flags and registers

 pop esi
 pop edi
 pop ebp
 ret 8 ;exit procedure, discarding parameters
strcopy ENDP

 END

Figure 7.3: String copy program

The procedure only uses registers ESI and EDI. It saves each of these and the flag register on the stack so that the procedure will return with them unchanged. The index registers ESI and EDI must be initialized to the addresses of the first string bytes to be processed. The values for ESI and EDI are the arguments that were pushed on the stack. The direction flag is cleared for left-to-right copying.

After initialization, the procedure executes the following pseudocode design:

while next source byte is not null
 copy source byte to destination;
 increment source index;
 increment destination index;
end while;
put null byte at end of destination string;

To check whether the next source byte is null, the statement

 cmp BYTE PTR [esi],0 ; null source byte?

is used. Recall that the notation [esi] indicates register indirect addressing, so that the element at the address in ESI is used, that is, the current byte of the source string. The operator BYTE PTR is necessary since MASM cannot tell from the operands [esi] and 0 whether byte, word, or doubleword comparison is needed. Copying the source byte and incrementing both index registers is accomplished by the movsb instruction. Finally,

 mov BYTE PTR [edi],0 ; terminate destination string

serves to move a null byte to the end of the destination string since EDI was incremented after the last byte of the source was copied to the destination. Again, the operator BYTE PTR tells MASM that the destination is a byte rather than a word or doubleword.

The program to test strcopy simply inputs a string from the keyboard, calls strcopy to copy it somewhere else, and finally displays the string copy. The most interesting part of the code is the collection of instructions needed to call the procedure. The arguments are not removed from the stack since the procedure does that job.

Normally the source string for a movs instruction does not overlap the destination string. However, occasionally this is useful. Suppose that you want to initialize a 80 character-long string at starSlash with the pattern */, repeated 40 times. The following code can do this task.

 starSlash BYTE 80 DUP (?)
 ...
 mov starSlash, '*' ; first *
 mov starSlash+1, '/' ; first /
 lea esi, starSlash ; source address
 lea edi, starSlash+2 ; destination
 cld ; process left to right
 mov ecx, 78 ; characters to copy
 forCount: movsb ; copy next character
 loop forCount ; repeat

In this example, the first time movsb is executed, a * from the first string position is copied to the third position. The next iteration, a / is copied from the second to the fourth position. The third time, a * is copied from the third to the fifth position, and so on. The next section introduces an easier way to repeat a movs instruction.

Exercises 7.1

What will be the output of the following program?

.386
.MODEL FLAT
ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h ; header file for input/output
cr equ 0dh ; carriage return character
Lf equ 0ah ; line feed

.STACK 4096 ; reserve 4096-byte stack
.DATA ; global data
string BYTE 'ABCDEFGHIJ'
 BYTE cr, Lf, 0
.CODE
setup1 PROC NEAR32
 lea esi, string ; beginning of string
 lea edi, string+5 ; address of 'F'
 cld ; forward movement
 ret
setup1 ENDP

_start: call setup1 ; set source, destination, direction
 movsb ; move 4 characters
 movsb
 movsb
 movsb
 output string ; display modified string
 INVOKE ExitProcess, 0 ; exit with return code 0
PUBLIC _start ; make entry point public
 END

Repeat Problem 1, replacing the procedure setup1 by

setup2 PROC NEAR32
 lea esi, string ; beginning of string
 lea edi, string+2 ; address of 'C'
 cld ; forward movement
 ret
setup2 ENDP

Repeat Problem 1, replacing the procedure setup1 by

setup3 PROC NEAR32
 lea esi, string+9 ; end of string
 lea edi, string+4 ; address of 'E'
 std ; backward movement
 ret
setup3 ENDP

Repeat Problem 1, replacing the procedure setup1 by

setup4 PROC NEAR32
 lea esi, string+9 ; end of string
 lea edi, string+7 ; address of 'H'
 std ; backward movement
 ret
setup4 ENDP

Programming Exercises 7.1

Write a program that copies strings read in one at a time from the keyboard into a large storage area for later processing. Specifically, use the input macro to input a string, then copy the string to the first of a 1024 byte block of storage that has been reserved in the data segment. (Recall that the input macro produces a null-terminated string.) Follow the string by a carriage return and a linefeed character in this storage area. Repeat the process with additional strings, copying each subsequent string to the storage area so that it immediately follows the linefeed after the last string. Exit the loop when the first character of the source string is $-do not copy this last string to the storage area. Do, however, place a null byte after the linefeed of the last string in the storage area. Finally, use the output macro to display all the characters in the data area. The result should be the strings that were entered, one per line.

Repeat Prefixes and More String Instructions

Each 80x86 string instruction operates on one string element at a time. However, the 80x86 architecture includes three repeat prefixes that change the string instructions into versions that repeat automatically either for a fixed number of iterations or until some condition is satisfied. The three repeat prefixes actually correspond to two different single-byte codes; these are not themselves instructions, but supplement machine codes for the primitive string instructions, making new instructions.

Figure 7.4 shows two program fragments, each of which copies a fixed number of characters from sourceStr to destStr. The number of characters is loaded into the ECX register from count. The code in part (a) uses a loop. Since the count of characters might be zero, the loop is guarded by a jecxz instruction. The body of the loop uses movsb to copy one character at a time. The loop instruction takes care of counting loop iterations. The program fragment in part (b) is functionally equivalent to the one in part (a). After the count is copied into ECX, it uses the repeat prefix rep with a movsb instruction; the rep movsb instruction does the same thing as the last four lines in part (a).

 lea esi, sourceStr ; source string
 lea edi, destStr ; destination
 cld ; forward movement
 mov ecx, count ; count of characters to copy
 jecxz endCopy ; skip loop if count is zero
 copy: movsb ; move 1 character
 loop copy ; decrement count and continue
 endCopy:

 (a) movsbiterated in a loop


 lea esi, sourceStr ; source string
 lea edi, destStr ; destination
 cld ; forward movement
 mov ecx, count ; count of characters to copy
 rep movsb ; move characters

 (b) repeat prefix with movsb

Figure 7.4: Copying a fixed number of characters of a string

The rep prefix is normally used with the movs instructions and with the stos instruction (discussed below). It causes the following design to be executed:

while count in ECX > 0 loop
 perform primitive instruction;
 decrement ECX by 1;
end while;

Note that this is a while loop. The primitive instruction is not executed at all if ECX contains zero. It is not necessary to guard a repeated string instruction as it often is with an ordinary for loop implemented with the loop instruction.

The other two repeat prefixes are repe, with equivalent mnemonic repz, and repne, which is the same as repnz. The mnemonic repe stands for "repeat while equal" and repz stands for "repeat while zero." Similarly repne and repnz mean "repeat while not equal" and "repeat while not zero," respectively. Each of these repeat prefixes is appropriate for use with the two string instructions cmps and scas, which affect the zero flag ZF.

The names of these mnemonics partially describe their actions. Each instruction works the same as rep, iterating a primitive instruction while ECX is not zero. However, each also examines ZF after the string instruction is executed. The repe and repz continue iterating while ZF=1, as it would be following a comparison where two operands were equal. The repne and repnz continue iterating while ZF=0, as it would be following a comparison where two operands were different. Repeat prefixes themselves do not affect any flag. The three repeat prefixes are summarized in Fig. 7.5. Note that rep and repz (repe) generate exactly the same code.

Mnemonic	Loop while	Number of bytes	Opcode
rep	ECX>0	1	F3
repz/repe	ECX>0 and ZF=1	1	F3
repnz/repne	ECX>0 and ZF=0	1	F2

Figure 7.5: Repeat prefixes

The repz and repnz prefixes do not quite produce true while loops with the conditions shown in Fig. 7.5. The value in ECX is checked prior to the first iteration of the primitive instruction, as it should be with a while loop. However, ZF is not checked until after the primitive instruction is executed. In practice, this is very convenient since the instruction is skipped for a zero count, but the programmer does not have to do anything special to initialize ZF prior to repeated instructions.

Figure 7.6 shows how the repeat prefix rep combines with the movs instructions. In the clock cycles columns, there is a "set up" time plus a time for each iteration. The n in the table represents the number of iterations, so that, for example, a rep movsb takes 33 (13+4×5) clock cycles on a Pentium to move five bytes. (The table entries are not strictly accurate since there are special timings for the 486 and Pentium when n=0 or n=1.)

	Element	Clock Cycles			Number
Mnemonic	size	386	486	Pentium	of bytes	Opcode

rep movsb	byte	7+4n	12+3n	13+4n	2	F3 A4
rep movsw	word					F3 A5
rep movsd	doubleword					F3 A5

Figure 7.6: rep movs instructions

The cmps instructions, summarized in Fig. 7.7, compare elements of source and destination strings. Chapter 5 explained how a cmp instruction subtracts two operands and sets flags based on the difference. Similarly, cmps subtracts two string elements and sets flags based on the difference; neither operand is changed. If a cmps instruction is used in a loop, it is appropriate to follow cmps by almost any of the conditional jump instructions, depending on the design being implemented.

	Element	Clock Cycles			Number
Mnemonic	size	386	486	Pentium	of bytes	Opcode

cmpsb	byte	10	8	5	1	A6
cmpsw	word					A7
cmpsd	doubleword					A7
repe cmpsb	byte	5+9n	7+7n	9+4n	2	F3 A6
repe cmpsw	word					F3 A7
repe cmpsd	doubleword					F3 A7
repne cmpsb	byte	5+9n	7+7n	9+4n	2	F2 A6
repne cmpsw	word					F2 A7
repne cmpsd	doubleword					F2 A7

Figure 7.7: cmps instructions

Repeat prefixes are often used with cmps instructions. In fact, for the task of finding if two strings are identical, the repe prefix is a perfect companion for cmps. Figure 7.7 summarizes all the cmps instructions, including repeated ones. Again, the timings are not strictly accurate for 486 and Pentium CPUs; for rep cmps there are special timings when n=0.

It is often necessary to search for one string embedded in another. Suppose that the task at hand is to find the position, if any, at which the string at key appears in the string at target. One simple algorithm to do this is

position := 1;
while position < (targetLength - keyLength + 1) loop
 if key matches the substring of target starting at position
 then
 report success;
 exit process;
 end if;
 add 1 to position;
end while;
report failure;

This algorithm checks to see if the key string matches the portion of the target string starting at each possible position. Using 80x86 registers, checking for one match can be done as follows:

ESI := address of key;
EDI := address of target + position - 1;
ECX := length of key;

forever loop
 if ECX = 0 then exit loop; end if;
 compare [ESI] and [EDI] setting ZF;
 increment ESI;
 increment EDI;
 decrement ECX;
 if ZF = 0 then exit loop; end if;
end loop;

if ZF = 1
then
 match was found;
end if;

The forever loop is exactly what is done by the repeated string instruction repe cmpsb. Since the loop is terminated when either ECX = 0 or when ZF = 0, it is necessary to be sure that the last pair of characters compared were the same; this is the reason for the extra if structure at the end of the design. Figure 7.8 shows a complete program that implements this design.

; program to search for one string embedded in another
; author: R. Detmer revised: 10/97

.386
.MODEL FLAT

ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h

cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character

.STACK 4096 ; reserve 4096-byte stack

.DATA
prompt1 BYTE "String to search? ", 0
prompt2 BYTE cr, Lf, "Key to search for? ", 0
target BYTE 80 DUP (?)
key BYTE 80 DUP (?)
trgtLength DWORD ?
keyLength DWORD ?
lastPosn DWORD ?
failure BYTE cr,Lf,Lf,"The key does not appear in the string.",cr,Lf,0
success BYTE cr,Lf,Lf,'The key appears at position'
position BYTE 11 DUP (?)
 BYTE " in the string.", cr, Lf, 0

PUBLIC _start ; make entry point public
.CODE

_start: output prompt1 ; ask for
 input target,80 ; and input target string
 lea eax, target ; find length of string
 push eax ; length parameter
 call strlen
 mov trgtLength,eax ; save length of target
 output prompt2 ; ask for
 input key,80 ; and input key string
 lea eax, key ; find length of string
 push eax ; length parameter
 call strlen
 mov keyLength,eax ; save length of key

; calculate last position of target to check
 mov eax,trgtLength
 sub eax,keyLength
 inc eax ; trgtLength − keyLength + 1
 mov lastPosn, eax
 cld ; left to right comparison

 mov eax,1 ; starting position
whilePosn: cmp eax,lastPosn ; position <= last_posn?
 jnle endWhilePosn ; exit if past last position
 lea esi,target ; address of target string
 add esi,eax ; add position
 dec esi ; address of position to check
 lea edi,key ; address of key
 mov ecx,keyLength ; number of positions to check
 repe cmpsb ; check
 jz found ; exit on success
 inc eax ; increment position
 jmp whilePosn ; repeat
endWhilePosn:
 output failure ; the search failed
 jmp quit ; exit
found: dtoa position,eax ; convert position to ASCII
 output success ; search succeeded
quit:
 INVOKE ExitProcess, 0 ; exit with return code 0
strlen PROC NEAR32
; find length of string whose address is passed on stack
; length returned in EAX
 push ebp ; establish stack frame
 mov ebp, esp
 pushf ; save flags
 push ebx ; and EBX
 sub eax, eax ; length := 0
 mov ebx, [ebp+8] ; address of string
whileChar: cmp BYTE PTR [ebx], 0 ; null byte?
 je endWhileChar ; exit if so
 inc eax ; increment length
 inc ebx ; point at next character
 jmp whileChar ; repeat
endWhileChar:
 pop ebx ; restore registers and flags
 popf
 pop ebp
 ret 4 ; return, discarding parameter
strlen ENDP

 END

Figure 7.8: String search program

The scan string instruction scas is used to scan a string for the presence or absence of a particular string element. The string that is examined is a destination string; that is, the address of the element being examined is in the destination index register EDI. With a scasb instruction, the element searched for is the byte in the AL register; with a scasw, it is the word in the AX register; and with a scasd, it is the doubleword in the EAX register. The scasb, scasw, and scasd instructions use no operand since the mnemonics tell the element size. Figure 7.9 summarizes the scas instructions; as with the previous repeated instructions, there are special timings for n=0 on the 486 and Pentium.

	Element	Clock Cycles			Number
Mnemonic	size	386	486	Pentium	of bytes	Opcode

scasb	byte	7	6	4	1	AE
scasw	word					AF
scasd	doubleword					AF
repe scasb	byte	5+8n	7+5n	9+4n	2	F3 AE
repe scasw	word					F3 AF
repe scasd	doubleword					F3 AF
repne scasb	byte	5+8n	7+5n	9+4n	2	F2 AE
repne scasw	word					F2 AF
repne scasd	doubleword					F2 AF

Figure 7.9: scas instructions (use EDI)

The program shown in Fig. 7.10 inputs a string and a character and uses repne scasb to locate the position of the first occurrence of the character in the string. It then displays the part of the string from the character to the end. The length of the string is calculated using the strlen procedure that previously appeared in Fig. 7.8; this time we assume that strlen is separately assembled. The lea instruction is used to load the offset of the string to be searched and cld ensures a forward search.

; Program to locate a character within a string.
; The string is displayed from the character to the end.
; author: R. Detmer revised: 10/97

.386
.MODEL FLAT

ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h
EXTRN strlen:NEAR32
PUBLIC _start

cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character

.STACK 4096 ; reserve 4096-byte stack

.DATA
prompt1 BYTE "String? ", 0
prompt2 BYTE cr, Lf, Lf, "Character? ", 0
string BYTE 80 DUP (?)
char BYTE 5 DUP (?)
label1 BYTE cr, Lf, Lf, "The rest of the string is ---", 0
crlf BYTE cr, Lf, 0

.CODE
_start: output prompt1 ; prompt for string
 input string,80 ; get string
 lea eax, string ; find length of string
 push eax ; length parameter
 call strlen
 inc ecx ; include null in string length
 mov ecx, eax ; save length of string

 output prompt2 ; prompt for character
 input char,5 ; get character
 mov al, char ; character to AL

 lea edi, string ; offset of string
 cld ; forward movement
 repne scasb ; scan while character not found
 dec edi ; back up to null or matching character

 output label1 ; print label
 output [edi] ; output string
 output crlf ; skip to new line

 INVOKE ExitProcess, 0 ; exit with return code 0
 END

Figure 7.10: Program to find character in string

After the search, the destination index EDI will be one greater than desired since a string instruction always increments index registers whether or not flags were set. If the search succeeded, EDI will contain the address of the character following the one that matched with AL, or the address of the character after the end of the string if ECX was decremented to zero. The dec edi instruction takes care of both cases, backing up to the position of the matching character if there was one, or to the null byte at the end of the string otherwise. The string length was incremented so that the null character would be included in the search. The output macro displays the last portion of the string, whose address is in EDI.

The store string instruction stos copies a byte, a word, or a doubleword from AL, AX, or EAX to an element of a destination string. A stos instruction affects no flag, so that when it is repeated with rep, it copies the same value into consecutive positions of a string. For example, the following code will store spaces in the first 30 bytes of string.

 mov ecx,30 ; 30 bytes
 mov al, ' ' ; character to store
 lea edi, string ; address of string
 cld ; forward direction
 rep stosb ; store spaces

Information about the stos instructions is in Fig. 7.11. As with previous repeated string instructions, the 80486 and Pentium have special timings when n=0.

	Element	Clock Cycles			Number
Mnemonic	size	386	486	Pentium	of bytes	Opcode

stosb	byte	4	5	3	1	AA
stosw	word					AB
stosd	doubleword					AB
rep stosb	byte	5+5n	7+4n	9n	2	F3 A6
rep stosw	word					F3 A7
rep stosd	doubleword					F3 A7

Figure 7.11: stos instructions (use EDI)

The load string instruction lods is the final string instruction. This instruction copies a source string element to the AL, AX, or EAX register, depending on the string element size. A lods instruction sets no flag. It is possible to use a rep prefix with lods but it is not helpful-all values except for the last string element would be replaced as successive values were copied to the destination register. A lods instruction is useful in a loop set up with other instructions, making it possible to easily process string elements one at a time. The lods instructions are summarized in Fig. 7.12. Repeated versions are not included since they are not used.

	Element	Clock Cycles			Number
Mnemonic	size	386	486	Pentium	of bytes	Opcode

lodsb	byte	5	5	2	1	AC
lodsw	word					AD
lodsd	doubleword					AD

Figure 7.12: lods instructions (use ESI)

Exercises 7.2

For each exercise below, assume that the data segment contains

source BYTE "brown"
dest BYTE "brine"

Suppose that the following instructions are executed:
```
lea esi, source
lea edi, dest
cld
mov ecx, 5
repne cmpsb
```
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the repne cmpsb instruction? What will be stored in ECX?
Suppose that the following instructions are executed:
```
lea esi, source
lea edi, dest
cld
mov ecx, 5
repe cmpsb
```
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the repe cmpsb instruction? What will be stored in ECX?
Suppose that the following instructions are executed:
```
mov al, 'w'
lea edi, dest
cld
mov ecx, 5
repe scasb
```
Assuming that EDI starts at 00010005, what will be the value stored in EDI following the repe scasb instruction? What will be stored in ECX?
Suppose that the following instructions are executed:
```
mov al, 'n'
lea edi, dest
cld
mov ecx, 5
repne scasb
```
Assuming that EDI starts at 00010005, what will be the value stored in EDI following the repne scasb instruction? What will be stored in ECX?
Suppose that the following instructions are executed:
```
mov al, '*'
lea edi, dest
cld
mov ecx, 5
rep stosb
```
Assuming that EDI starts at 00010005, what will be the value stored in EDI following the rep stosb instruction? What will be stored in ECX? What will be stored in the destination string?
Suppose that the following instructions are executed:
```
 lea esi, source
 lea edi, dest
 cld
 mov ecx, 5
for6: lodsb
 inc al
 stosb
 loop for6
endFor6:
```
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the for loop? What will be stored in ECX? What will be stored in the destination string?
Suppose that the following instructions are executed:
```
lea esi, source
lea edi, dest
cld
mov ecx, 3
rep movsb
```
Assuming that ESI starts at 00010000 and EDI starts at 00010005, what will be the values stored in ESI and EDI following the rep movsb instruction? What will be stored in ECX? What will be stored in the destination string?
Suppose that the following instructions are executed:
```
lea esi, source+4
lea edi, dest+4
std
mov ecx, 3
rep movsb
```
Assuming that ESI starts at 00010010 and EDI starts at 00010015, what will be the values stored in ESI and EDI following the rep movsb instruction? What will be stored in ECX? What will be stored in the destination string?

Programming Exercises 7.2

Write a NEAR32 procedure index to find the position of the first occurrence of a character in a null-terminated string. Specifically, the procedure must have two parameters: (1) a character and (2) the address of a string in the data segment. Use the stack to pass the parameters: For the character, use an entire word with the character in the low-order byte. Use the EAX register to return the position of the character within the string; return zero if the character is not found. No other register should be altered. Procedure index will not remove parameters from the stack.
Write a NEAR32 procedure append that will append one null-terminated string to the end of another. Specifically, the procedure must have two parameters: (1) the address of string1 in the data segment and (2) the address of string2 in the data segment. Use the stack to pass the parameters. The procedure should copy the characters of string2 to the end of string1 with the first character of string2 replacing the null byte at the end of string1, and so on. (Warning: In the data section, enough space must be reserved after the null byte of the first string to hold the characters from the second string.) All registers used by the procedure should be saved and restored. Procedure append will not remove parameters from the stack.
Write a complete program that prompts for and inputs a person's name in the "LastName, FirstName" format and builds a new string with the name in the format "FirstName LastName." A comma and a space separate the names originally and there is no character except the null following FirstName; only a space separates the names in the new string. After you generate the new string in memory, display it.
Write a complete program which prompts for and inputs a person's name in the "LastName, FirstName" format and builds a new string with the name in the format "FirstName LastName." One or more spaces separate the names originally and there may be spaces following FirstName. Only a single space separates the names in the new string. After you generate the new string in memory, display it.
Write a complete program that prompts for and inputs a string and a single character. Construct a new string that is identical to the old one except that it is shortened by removing each occurrence of the character. After you generate the new string in memory, display it.
Write a complete program that prompts for and inputs a sentence and a single word. Construct a new sentence that is identical to the old one except that it is shortened by removing each occurrence of the word. After you generate the new sentence in memory, display it.
Write a complete program that prompts for and inputs a sentence and two words. Construct a new sentence that is identical to the old one except that each occurrence of the first word is replaced by the second word. After you generate the new sentence in memory, display it.

Character Translation

Sometimes character data are available in one format but need to be in another format for processing. One instance of this occurs when characters are transmitted between two computer systems, one normally using ASCII character codes and the other normally using EBCDIC character codes. Another time character codes need to be altered is to transmit them to a device that cannot process all possible codes; it is sometimes easier to replace the unsuitable codes by acceptable codes than to delete them entirely.

The 80x86 instruction set includes the xlat instruction to translate one character to another character. In combination with other string-processing instructions, it can easily translate all the characters in a string.

The xlat instruction requires only one byte of object code, the opcode D7. It takes five clock cycles to execute on an 80386, and four clock cycles on an 80486 or a Pentium. Prior to execution, the character to be translated is in the AL register. The instruction works by using a translation table in the data segment to look up the translation of the byte in AL. This translation table normally contains 256 bytes of data, one for each possible 8-bit value in AL. The byte at offset zero in the table-the first byte-is the character to which 00 is translated. The byte at offset one is the character to which 01 is translated. In general xlat uses the character being translated as an offset into the table, and the byte at that offset then replaces the character in AL.

The xlat instruction has no operand. The EBX register must contain the address of the translation table.

Figure 7.13 illustrates a short program which translates each character of string in place; that is, it replaces each character by its translation using the original location in memory. The heart of the program is the translation table and the sequence of instructions

; Translate uppercase letters to lowercase; don't change lower
; case letters and digits. Translate other characters to spaces.
; author: R. Detmer revised: 10/97

.386
.MODEL FLAT

ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h
PUBLIC _start
cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character

.STACK 4096 ; reserve 4096-byte stack

.DATA
string BYTE 'This is a #!$& STRING',0
strLength EQU $ - string - 1
label1 BYTE 'Original string ->',0
label2 BYTE cr, Lf, 'Translated string ->',0
crlf BYTE cr, Lf, 0
table BYTE 48 DUP (' '), '0123456789', 7 DUP (' ')
 BYTE 'abcdefghijklmnopqrstuvwxyz', 6 DUP (' ')
 BYTE 'abcdefghijklmnopqrstuvwxyz', 133 DUP (' ')

.CODE
_start: output label1 ; display original string
 output string
 output crlf
 mov ecx, strLength ; string length
 lea ebx, table ; address of translation table
 lea esi, string ; address of string
 lea edi, string ; destination also string
forIndex: lodsb ; copy next character to AL
 xlat ; translate character
 stosb ; copy character back into string
 loop forIndex ; repeat for all characters

 output label2 ; display altered string
 output string
 output crlf

 INVOKE ExitProcess, 0
 END

Figure 7.13: Translation program

 mov ecx, strLength ; string length
 lea ebx, table ; address of translation table
 lea esi, string ; address of string
 lea edi, string ; destination also string
forIndex: lodsb ; copy next character to AL
 xlat ; translate character
 stosb ; copy character back into string
 loop forIndex ; repeat for all characters

These instructions implement a for loop with the design

for index := 1 to stringLength loop
 load source character into AL;
 translate character in AL;
 copy character in AL to destination;
end for;

One new feature in this program is the use of the location counter symbol $. Recall that the assembler calculates addresses as if they start at 00000000, and increments a counter every time it generates bytes of object code. The dollar sign symbol refers to the value of this counter at the time it is encountered in assembly. In this particular program, it will be at the address just beyond the null byte of the string. Since the symbol string actually references its address, the expression string - $ is the length of string, including the null byte. The value equated to strLength is string - $ - 1, which excludes the null byte. The assembly process will be discussed more in Chapter 9.

Each ASCII code is translated to another ASCII code by this program. Uppercase letters are translated to lowercase, lowercase letters and digits are unchanged, and all other characters are translated to spaces. Construction of such a table involves looking at a table of ASCII codes (see Appendix A). For this program the translation table is defined by

table BYTE 48 DUP (' '), '0123456789', 7 DUP (' ')
 BYTE 'abcdefghijklmnopqrstuvwxyz', 6 DUP (' ')
 BYTE 'abcdefghijklmnopqrstuvwxyz', 133 DUP (' ')

Careful counting will show that exactly 256 bytes are defined. Recall that a BYTE directive stores the ASCII code of each character operand. Each of the first 48 bytes of the table will contain the ASCII code for a space (i.e., blank), 2016. Therefore if the code in the AL register represents any of the first 48 ASCII characters-a control character, or one of the printable characters from 2016 (space) to 2F16 (/)-it will be translated to a space.

Note that it is legal to translate a character to itself. Indeed, this is what will happen for digits; the ASCII codes 3016 to 3916 for digits 0 through 9 appear at offsets 3016 to 3916. The codes for the seven characters : through @ are next in an ASCII chart; each of these will be translated to a space. The next ASCII characters are the uppercase letters and the next entries in the table are codes for the lowercase letters. For example, the table contains 6116 at offset 4116, so an uppercase A (ASCII code 4116) will be translated to a lower case a (ASCII code 6116). The next six blanks are at the offsets 9116 ([) through 9616 (‘), so that each of these characters is translated to a blank. The ASCII code for each lowercase letter is assembled at an offset equal to its value, so each lowercase letter is translated to itself. Finally, the translation table contains 133 ASCII codes for blanks; these are the destinations for {, |, }, ~, DEL, and each of the 128 bit patterns starting with a one, none of them codes for ASCII characters.

Figure 7.14 shows the output of the program in Fig. 7.13. Notice that "strange" characters are not deleted, they are replaced by blanks.

 Original string ->This is a #!$& STRING

 Translated string ->this is a string

Figure 7.14: Output from translation program

Exercises 7.3

Here is a partial hexadecimal/EBCDIC conversion table:

81 a	C1 A	40 space
82 b	C2 B	4B .
83 c	C3 C	6B ,
84 d	C4 D
85 e	C5 E	F0 0
86 f	C6 F	F1 1
87 g	C7 G	F2 2
88 h	C8 H	F3 3
89 i	C9 I	F4 4
91 j	D1 J	F5 5
92 k	D2 K	F6 6
93 l	D3 L	F7 7
94 m	D4 M	F8 8
95 n	D5 N	F9 9
96 o	D6 O
97 p	D7 P
98 q	D8 Q
99 r	D9 R
A2 s	E2 S
A3 t	E3 T
A4 u	E4 U
A5 v	E5 V
A6 w	E6 W
A7 X	E7 X
A8 y	E8 Y
A9 z	E9 Z

Give a translation table that would be suitable for xlat translation of EBCDIC codes for letters, digits, space, period, and comma to the corresponding ASCII codes, translating every other EBCDIC code to a null character.

Give a translation table that would be suitable for xlat translation of ASCII codes for lowercase letters to the corresponding uppercase letters, leaving all other characters unchanged.
Here is an alternative to the xlat instruction.
```
movzx eax, al ; clear high order bits in EAX
mov al, [ebx+eax] ; copy new character from table to AL
```
Given that [ebx+eax] references the memory byte at the address that is the sum of the contents of EBX and EAX, explain why this pair of instructions is equivalent to a single xlat instruction.

Programming Exercises 7.3

In the United States, decimal numbers are written with a decimal point separating the integral part from the fractional part and with commas every three positions to the left of the decimal point. In many European countries, decimal numbers are written with the roles of commas and decimal points reversed. For example, the number 1,234,567.89 would be written 1.234.567,89. Write a program that will interchange commas and periods, translating a string of characters representing either format of number to the other format. Use the xlat instruction with a translation table that translates a period to a comma, a comma to a period, each digit to itself, and any other character to a space. Prompt for and input the number to be translated. Translate the string. Display the new number format with an appropriate label.

Converting a 2 s Complement Integer to an ASCII String

The dtoa and itoa macros have been used to convert 2’s complement integers to strings of ASCII characters for output. The code for these operations is similar. In this section we examine the slightly shorter code for itoa.

The itoa macro expands into the following sequence of instructions.

 push ebx ; save EBX
 mov bx, source
 push bx ; source parameter
 lea ebx, dest ; destination address
 push ebx ; destination parameter
 call itoaproc ; call itoaproc(source,dest)
 pop ebx ; restore EBX

These instructions call procedure itoaproc after pushing the source value and the destination address on the stack. The actual source and destination are used in the expanded macro, not the names source and dest. So that the user does not need to worry about any register contents being altered, EBX is initially saved on the stack and is restored at the end of the sequence. The parameters are removed from the stack by procedure itoaproc since the alternative add esp,6 following the call instruction potentially changes the flags.

The real work of 2’s complement integer to ASCII conversion is done by the procedure itoaproc. The assembled version of this procedure is contained in the file IO.OBJ. The source code from file IO.ASM is shown in Fig. 7.15. The procedure begins by saving all of the registers that it alters on the stack; the flag register is also saved so that the procedure call to itoaproc will not change flag settings. The flag register and other registers are restored immediately before returning from the procedure.

; itoaproc(source, dest)
; convert integer (source) to string of 6 characters at destination address

itoaproc PROC NEAR32
 push ebp ; save base pointer
 mov ebp, esp ; establish stack frame
 push eax ; Save registers
 push ebx ; used by
 push ecx ; procedure
 push edx
 push edi
 pushf ; save flags

 mov ax, [ebp+12] ; first parameter (source integer)
 mov edi, [ebp+8] ; second parameter (dest address)
ifSpecial: cmp ax,8000h ; special case −32,768?
 jne EndIfSpecial ; if not, then normal case
 mov BYTE PTR [edi],'-' ; manually put in ASCII codes
 mov BYTE PTR [edi+1],'3' ; for −32,768
 mov BYTE PTR [edi+2],'2'
 mov BYTE PTR [edi+3],'7'
 mov BYTE PTR [edi+4],'6'
 mov BYTE PTR [edi+5],'8'
 jmp ExitIToA ; done with special case
EndIfSpecial:

 mov dx, ax ; save source number

 mov al,' ' ; put blanks in
 mov ecx,5 ; first five
 cld ; bytes of
 rep stosb ; destination field

 mov ax, dx ; copy source number
 mov cl,' ' ; default sign (blank for +)
IfNeg: cmp ax,0 ; check sign of number
 jge EndIfNeg ; skip if not negative
 mov cl,'-' ; sign for negative number
 neg ax ; number in AX now >= 0
EndIfNeg:

 mov bx,10 ; divisor
WhileMore: mov dx,0 ; extend number to doubleword
 div bx ; divide by 10
 add dl,30h ; convert remainder to character
 mov [edi],dl ; put character in string
 dec edi ; move forward to next position
 cmp ax,0 ; check quotient
 jnz WhileMore ; continue if quotient not zero

 mov [edi],cl ; insert blank or "-" for sign
ExitIToA: popf ; restore flags and registers
 pop edi
 pop edx
 pop ecx
 pop ebx
 pop eax
 pop ebp
 ret 6 ;exit, discarding parameters
itoaproc ENDP

Figure 7.15: Integer to ASCII conversion procedure

The basic idea of the procedure is to build a string of characters right to left by repeatedly dividing the number by 10, using the remainder to determine the rightmostcharacter. For instance, dividing the number 2895 (0B4F16) by 10 gives a remainder of 5 and a quotient of 289 (012116), the last digit of the number and a new number with which to repeat the process. This scheme works nicely for positive numbers, but a negative number must be changed to its absolute value before starting the division loop. To complicate things further, the bit pattern 800016 represents the negative number −32,76810, but +32,768 cannot be represented in 2’s complement form in a 16-bit word.

After standard entry code, the value parameter is copied to AX and the destination address to EDI. The procedure then checks for the special case 800016. If this is the value, then the ASCII codes for 32768 are moved one at a time to the destination, using the fact that the destination address is in EDI. The location for the minus sign is in the EDI register, so register indirect addressing can be used to put this character in the correct memory byte. The location for the character 3 is one byte beyond the address contained in EDI; this address is referenced by [edi+1]. The remaining four characters are similarly put in place, and the procedure is exited.

The next step of the procedure is to put five leading blanks in the six-byte-long destination field. The procedure does this with a rep stosb, which uses EDI to point to successive bytes in destination field. Note that EDI is left pointing at the last byte of the destination field.

The procedure next stores the correct "sign" in the CL register. A blank is used for a number greater than or equal to zero, and minus character (−) is used for a negative number. A negative number is also negated, giving its absolute value for subsequent processing.

Finally the main idea is executed. The divisor 10 is placed in the BX register. The non-negative number is extended to a doubleword by moving zeros to DX. Division by 10 in BX gives a remainder from 0 to 9 in DX, the last decimal digit of the number. This is converted to the corresponding ASCII code by adding 3016; recall that the ASCII codes for digits 0 through 9 are 3016 through 3916. A mov using register indirect addressing puts the character in place in the destination string, and EDI is decremented to point at the next position to the left.

This process is repeated until the quotient is zero. Finally the "sign" stored in CL (blank or –) is copied to the immediate left of the last code for a digit. Other positions to the left, if any, were previously filled with blanks.

Exercises 7.4

Why does itoaproc use a destination string six bytes long?
Suppose that negative numbers are not changed before the division loop of itoaproc begins and that an idiv instruction is used rather than a div instruction in this loop. Recall that when a negative number is divided by a positive number, both quotient and remainder will be negative. For instance, −1273 = 10*(−127) + (−3). How could the rest of the division loop be modified to produce the correct ASCII codes for both positive and negative numbers?

Programming Exercises 7.4

Rewrite itoaproc, adding a length parameter. Specifically, the new itoaNew will be a NEAR32 procedure with three parameters, passed on the stack:
1. the 2’s complement number to convert to ASCII characters (a word)
2. the address of the ASCII string (a doubleword)
3. the desired length of the ASCII string (a word)
The number will be converted to a string of ASCII characters starting at the offset in the data segment. Do not use a blank in front of a positive number. If the length is less than or equal to the actual number of characters needed to display the number, fill the entire field with pound signs (#). If the length is larger than needed, pad with extra spaces to the left of the number. The procedure will remove parameters from the stack and must modify no register.
Write a NEAR32 procedure hexString that converts a 32-bit integer to a string of exactly eight characters representing its value as a hexadecimal number. (That is, the output characters will be 0–9 and A–F, with no blanks.) The procedure will have two parameters, passed on the stack:
1. the number
2. the address of the destination string
The procedure will remove parameters from the stack and must modify no register. (The remainder upon division by 16 produces a decimal value corresponding to the rightmost hex digit.)
Write a NEAR32 procedure binaryString that converts a 32-bit integer to a string of exactly 32 characters representing its value as a binary number. The procedure will have two parameters, passed on the stack:
1. the number
2. the address of the destination string
The procedure will remove parameters from the stack and must modify no register. (The remainder upon division by 2 gives the rightmost bit.)

Other Architectures CISC versus RISC Designs

Early digital computers had very simple instruction sets. When designers began to use microcode to implement instructions in the 1960s, it became possible to have much more complex instructions. At the same time high-level programming languages were becoming popular, but language compilers were fairly primitive. This made it desirable to have machine language statements that almost directly implemented high-level language statements, increasing the pressure to produce computer architectures with many complex instructions.

The Intel 80x86 machines use complex instruction set computer (CISC) designs. Instructions such as the string instructions discussed in this chapter would never have appeared in early computers. CISC machines also have a variety of memory addressing modes, and the 80x86 family is typical in this respect, although you have only seen a few of its modes so far. Often CISC instructions take several clock cycles to execute.

Reduced instruction set computer (RISC) designs began to appear in the 1980s. These machines have relatively few instructions and few memory addressing modes. Their instructions are so simple that any one can be executed in a single clock cycle. As compiler technology improved, it became possible to produce efficient code for RISC machines. Of course, it often takes many more instructions to implement a given high-level language statement on a RISC than on a CISC machine, but the overall operation is often faster because of the speed with which individual instructions execute.

In RISC architectures, instructions are all the same format; that is, the same number of bytes are encoded in a common pattern. This is not the case with CISC architectures. If the 80x86 chips were RISC designs, then this book would have no questions asking "How many clock cycles?" or "How many bytes?" When we look at the many 80x86 instruction formats in Chapter 9, you may wish for the simplicity of a RISC machine.

One unusual feature of many RISC designs is a relatively large collection of registers (sometimes over 500), of which only a small number (often 32) are visible at one time. Registers are used to pass parameters to procedures, and the registers that are used to store arguments in the calling program overlap the registers that are used to receive the parameter values in the procedure. This provides a simple but very efficient method of communication between a calling program and a procedure.

There are proponents of both CICS and RISC designs. At this point in time it is not obvious that one is clearly better than the other. However, the popular Intel 80x86 and Motorola 680x0 families are both CISC designs, so we will be dealing with CISC systems at least in the near future.

Summary

The word string refers to a collection of consecutive bytes, words, or doublewords in memory. The 80x86 instruction set includes five instructions for operating on strings: movs (to move or copy a string from a source to a destination location), cmps (to compare two strings), scas (to scan a string for a particular element), stos (to store a given value in a string), and lods (to copy a string element into EAX, AX, or AL). Each of these has mnemonic forms ending with b, w, or d to give the size of the string element.

A string instruction operates on one string element at a time. When a source string is involved, the source index register ESI contains the address of the string element. When a destination string is involved, the destination index register EDI contains the address of the string element. An index register is incremented or decremented after the string element is accessed, depending on whether the direction flag DF is reset to zero or set to one; the cld and std instructions are used to give the direction flag a desired value.

Repeat prefixes rep, repe (repz), and repne (repnz) are used with some string instructions to cause them to repeat automatically. The number of times to execute a primitive instruction is placed in the ECX register. The conditional repeat forms use the count in ECX but will also terminate instruction execution if the zero flag gets a certain value; these are appropriate for use with the cmps and scas instructions that set or reset ZF.

The xlat instruction is used to translate the characters of a string. It requires a 256-byte-long translation table that starts with the destination byte to which the source byte 00 is translated and ends with the destination byte to which the source byte FF is translated. The xlat instruction can be used for such applications as changing ASCII codes to EBCDIC codes or for changing the case of letters within a given character coding system.

The itoa macro expands to code that calls a procedure itoaproc. Basically this procedure works by repeatedly dividing a non-negative number by 10 and using the remainder to get the rightmost character of the destination string.

The 80x86 chips are examples of complex instruction set computer (CISC) architecture. They include many complex instructions and offer many different addressing modes. Reduced instruction set computer (RISC) architectures implement fewer and simpler instructions and have more limited addressing options. Even though RISC computers take more instructions to accomplish a task, they are usually quite fast since they execute their simple instructions very rapidly.

Preface