The Assembly Process

Table of contents:

The job of an assembler is to turn assembly language source code into object code. With simpler computer systems this object code is machine language, ready to be loaded into memory and executed. With more complex systems, object code produced by the assembler must be "fixed up" by a linker and/or loader before it can be executed. The first section of this chapter describes the assembly process for a typical assembler and gives some details particular to the Microsoft Macro Assembler. The second section is very specific to the 80x86 microprocessor family; it details the structure of its machine language. The third and fourth sections discuss macros and conditional assembly, respectively. Most assemblers have these capabilities, and these sections describe how MASM implements them. The final section describes the macros in the header file IO.H.

Two Pass and One Pass Assembly

One of the many reasons for writing assembly language rather than machine language is that assemblers allow the use of identifiers or symbols to reference data in the data segment and instructions in the code segment. To code in machine language, a programmer must know run-time addresses for data and instructions. An assembler maintains a symbol table that associates each identifier with various attributes. One attribute is a location, typically relative to the beginning of a segment, but sometimes an absolute address to be used at run time. Another attribute is the type of the symbol, where possible types include labels for data or instructions, symbols equated to constants, procedure names, macro names, and segment names. Some assemblers start assembling a source program with a symbol table that includes all the mnemonics for the language, all register names, and other symbols with reserved usage.

The other main job of an assembler is to output object code that is close to the machine language executed when a program is run. A two-pass assembler scans the source code once to produce a symbol table and a second time to produce the object code. A one-pass assembler only scans the source code one time, but often must patch the object code produced during this scan. A simple example shows why: If the segment

 jmp endLoop
 add eax, ecx
 endLoop:

is scanned, the assembler finds a forward reference to endLoop in the jmp instruction. At this point the assembler cannot tell the address of endLoop, much less whether this destination is short (within 27 bytes of the address of the add instruction) or near (within 232 bytes). The first option would use an EB opcode and a single-byte displacement. The second option would use an E9 opcode and a doubleword displacement. Clearly the final code must wait at least until the assembler reaches the source code line with the end-Loop label.

Typical assemblers use two passes, and some actually use three or more passes. The Microsoft Macro Assembler is a one-pass assembler. This book will not attempt to cover details of how it fixes up object code. You can see part of MASM’s symbol table by looking at the end of an assembly listing. The remainder of this section concentrates on a typical symbol table, drawing examples from the program and listing file that appear in Chapter 3.

If a symbol is a label for data, then the symbol table may include the size of the data. For instance, the program in Fig. 3.1 contains the directive

 number2 DWORD ?

and the corresponding line in the listing file (Fig. 3.7) is

 number2 . . . . . . . . . . . . Dword 00000004 _DATA

This shows that the size of number2 has been recorded as a doubleword. Having the size recorded enables MASM to detect incorrect usage of a symbol—with this definition of number2, MASM would indicate an error for the instruction

 mov bh, number2

since the BH register is byte size while the symbol table identifies number2 as doubleword size. In addition to the size, if a symbol is associated with multiple objects, a symbol table may contain the number of objects or the total number of bytes associated with the symbol. The MASM symbol listing does not show this.

If a symbol is equated to a value, then the value is usually stored in the symbol table. When the assembler encounters the symbol in subsequent code, it substitutes the value recorded in the symbol table. In the example program, the source code line

 cr EQU 0dh ; carriage return character

is reflected in the listing file line

 cr . . . . . . . . . . . . . . . Number 0000000Dh

If a symbol is a label for data or an instruction, then its location is entered in the symbol table. An assembler keeps a location counter to compute this value. With a typical assembler, the location counter is set to zero at the beginning of a program or at the beginning of each major subdivision of the program. The Microsoft Macro Assembler sets the location counter to zero at the beginning of each segment. As an assembler scans source code, the location of each datum or instruction is the value of the location counter before the statement is assembled. The number of bytes required by the statement is added to the location counter to give the location of the next statement. Again looking at the line

 number2 DWORD ?

the listing file shows

 number2 . . . . . . . . . . . . Dword 00000004 _DATA

with 00000004 in the Value column. This is the value of the location counter at the time number2 is encountered in the data segment. The value is 00000004 since the only item preceding number2 was number1, and it took four bytes.

The location counter is used the same way when instructions are assembled. Suppose that the location counter has value 0000012E when MASM reaches the code fragment shown in Fig. 9.1. The location for the symbol while1 will be 0000012E. The cmp instruction requires three bytes of object code. (Section 9.2 details how to determine the object code of an 80x86 instruction.) Therefore the location counter will have value 00000131 when MASM reaches the jnle instruction. The jnle instruction requires two bytes of object code, so the location counter will increase to 00000133 for the first add instruction. The first add instruction takes two bytes of object code, so the location counter is 00000135 when MASM reaches the second add instruction. Three bytes are required for add ebx,2 so the location counter is 00000138 for the inc instruction. The inc instruction takes a single byte, so the location counter is 00000139 for the jmp instruction. The jmp instruction requires two bytes, making the location counter 0000013B when the assembler reaches the label endWhile1. Therefore 0000013B is recorded in the symbol table as the location of endWhile1.

 while1: cmp ecx, 100 ; count <= 100 ?
 jnle endWhile1 ; exit if not
 add eax, [ebx] ; add value to sum
 add ebx, 4 ; address of next value
 inc ecx ; add 1 to count
 jmp while1
 endWhile1:

Figure 9.1: Code with forward reference

The location of a symbol is needed for a variety of purposes. Suppose that MASM encounters the statement

 mov eax, number

where number is the label on a DWORD directive in the data section. Since the addressing mode for number is direct, the assembler needs the offset of number for the object code; this offset is precisely the location of number recorded in the symbol table.

The primary job of an assembler is to generate object code. However, a typical assembler does many other tasks. One duty is to reserve storage. A statement like

 WORD 20 DUP(?)

sets aside 20 words of storage. This storage reservation is typically done one of two ways:

the assembler may write 40 bytes with some known value (like 00) to the object file, or
the assembler may insert a command that ultimately causes the loader to skip 40 bytes when the program is loaded into memory

In the latter case, storage at run time will contain whatever values are left over from execution of other programs.

In addition to reserving storage, assemblers can initialize the reserved memory with specified values. The MASM statement

 WORD 10, 20, 30

not only reserves three words of storage, it initializes the first to 000A, the second to 0014 and the third to 001E. Initial values may be expressed in a variety of ways using MASM and most other assemblers. Numbers may be given in different number systems, often binary, octal, decimal, and hexadecimal. The assembler converts character values to corresponding ASCII or EBCDIC character codes. Assemblers usually allow expressions as initial values. The Microsoft Macro Assembler is typical in accepting expressions that are put together with addition, subtraction, negation, multiplication, division, not, and, or, exclusive or, shift, and relational operators. Such an expression is evaluated at assembly time, producing the value that is actually used in the object code.

Most assemblers can produce a listing file that shows the original source code and some sort of representation of the corresponding object code. Another responsibility of an assembler is to produce error messages when there are errors in the source code. Rudimentary assemblers just display a line number and an error code for each error. Slightly less primitive assemblers produce a separate page with line numbers and error messages. Most assemblers can include an error message in the listing file at the point where the error occurs. The Microsoft Macro Assembler includes messages in the optional listing file and also displays them on the console.

In addition to the listing that shows source and object code, an assembler often can generate a listing of symbols used in the program. Such a listing may include information about each symbol’s attributes—taken from the assembler’s symbol table—as well as cross references that indicate the line where the symbol is defined and each line where it is referenced.

Some assemblers begin assembling instructions with the location counter set to a particular actual memory address and thus generate object code that is ready to be loaded at that address. This is the only way to generate object code with some simpler systems. Generally such code is not linked; it is ready to load and run.

One file can reference objects in another. Recall that the EXTRN directive facilitates this for MASM. A linker combines separate object code files into a single file. If one file references objects in the other, the linker changes the references from "to be determined" to locations in the combined file.

Most assemblers produce object code that is relocatable; that is, it can be loaded at any address. One way to do this is to put a map in the object code file that records each place in the program where an address must be modified. Address modifications are usually carried out by the loader. The loader finally produces true machine language, ready for execution.

Another way to get relocatable code is to write it with only relative references; that is, so that each instruction only references an object at some distance from itself, not at a fixed address. In an 80x86 system, most jump instructions are relative, so if a programmer stores data in registers or on the stack, it is fairly easy to produce such a program.

With MASM, a programmer can actually directly reference the location counter using the $ symbol. The code fragment from Fig. 9.1 could be rewritten as

 cmp ecx, 100 ; count <= 100 ?
 jnle $+10 ; exit if not
 add eax, [ebx] ; add value to sum
 add ebx, 4 ; address of next value
 inc ecx ; add 1 to count
 jmp $-11

This works since the value of the location counter $ is the location of the beginning of the jnle statement as it is assembled. Its two bytes and the eight bytes of the next four statements need to be skipped to exit the loop. Similarly the backward reference must skip the inc statement and the four other statements back through the beginning of the cmp statement, a total of eleven bytes. Although MASM allows use of $ to reference the location counter, obviously this can produce confusing code and should normally be avoided.

Exercises 9.1

Describe the differences between object code and machine language.
Suppose that every symbol reference in an assembly language program is a backward reference. Would a one-pass assembler ever have to "fix up" the code it produced? Explain your answer.
Assemble the following code fragment
```
Array DWORD 10 DUP(?)
ArrSize EQU SIZE Array
```
To what value is ArrSize equated? What conclusion can you draw about whether or not MASM records an attribute that tracks the number of bytes associated with a variable?
This section states that storage reservation with a directive like WORD can work by putting the correct number of some known byte value in the object file or by inserting a command that ultimately causes the loader to skip the correct number of bytes. State one advantage and one disadvantage of each design.

x86 Instruction Coding

This section describes the structure of 80x86 machine language. From this information one could almost assemble an 80x86 assembly language program by hand. However, the primary purpose here is to acquire a better understanding of the capabilities and limitations of the 80x86 microprocessor family.

An 80x86 instruction consists of several fields, which are summarized in Fig. 9.2. Some instructions have only an opcode, while others require that other fields be included. Any included fields always appear in this order. Each of these components is discussed below.

Field	Number of bytes	Purpose

instruction prefix	0 or 1	F316 for REP, REPE, or REPZ F216 for REPNE or REPNZ F016 for LOCK
address size	0 or 1	value 6716 if present; indicates that a displacement is a 16-bit address rather than the default 32-bit size
operand size	0 or 1	value 6616 if present; indicates that a memory operand is 16-bit if in 32-bit mode or 32 bit if in 16-bit mode
segment override	0 or 1	indicates that an operand is in a segment other than the default segment
opcode	1 or 2	operation code
mod-reg-r/m	0 or 1	indicates register or memory operand, encodes register(s)
scaled index base byte	0 or 1	additional scaling and register information
displacement	0 to 4	an address
immediate	0 to 4	an immediate value

Figure 9.2: 80x86 instruction fields

The repeat prefixes for string instructions were discussed in Chapter 7. There you learned that adding a repeat prefix to one of the basic string instructions effectively changes it into a new instruction that automatically iterates a basic operation. The repeat prefix is coded in the instruction prefix byte, with the opcode of the basic string instruction in the opcode byte. Repeat prefix bytes can be coded only with the basic string instructions.

The LOCK prefix is not illustrated in this book's code. It can be used with a few selected instructions and causes the system bus to be locked during execution of the instruction. Locking the bus guarantees that the 80x86 processor has exclusive use of shared memory.

All the code in this book uses 32-bit memory addresses. In a 32-bit address environment it is possible to have an instruction that only contains a 16-bit address. When an address size byte of 6716 is coded, a two-byte rather than a four-byte displacement is used in the displacement field. This prefix byte will not appear in machine code generated from the assembly language code shown in this book.

On the other hand, the operand size byte has frequently been generated from this book's assembly language code. The 80x86 CPU has a status bit that determines whether operands are 16-bit or 32-bit. With the assembly and linking options we have used, that bit is always set to indicate 32-bit operands. Each time you code a word-size operand, the generated instruction includes the 6616 prefix byte to indicate the 16-bit operand. Other assembly and linking options-not used in this book-cause the default operand size to be 16-bit; in this case a 6616 prefix byte indicates a 32-bit operand.

What indicates a byte-size operand? A different opcode. Why don't 16-bit and 32-bit operands use distinct opcodes? This design decision was made by Intel. The original 8086 processor design had 16-bit registers and used separate opcodes for 8-bit and 16-bit operand sizes; no instruction used 32-bit operands. When the 80386 was designed with 32-bit registers, the choice was made to "share" opcodes for 16-bit and 32-bit operand sizes rather than to introduce many new opcodes.

The mod-reg-r/m byte has different uses for different instructions. When present it always has three fields, a two-bit mod field (for "mode"), a three-bit reg field (for "register," but sometimes used for other purposes), and a 3-bit r/m field (for "register/memory"). The mod-reg-r/m byte is examined below.

The opcode field completely identifies many instructions, but some require additional information-for example, to determine the type of operand or even to determine the operation itself. You have previously seen the latter situation. For example, each of the instructions add, or, adc, sbb, and, sub, xor, and cmp having a byte-size operand in a register or memory and an immediate operand uses the opcode 80. Which of these eight instructions is determined by the reg field of the mod-reg-r/m byte. For the particular case of the 80 opcode, the reg field is 000 for add, 001 for or, 010 for adc, 011 for sbb, 100 for and, 101 for sub, 110 for xor, and 111 for cmp.

The opcode 80 is one of twelve in which the reg field of the mod-reg-r/m byte actually determines the instruction. The others are 81, 82, 83, D0, D1, D2, D3, F6, F7, FE, and FF. The table in Fig. 9.3 gives reg field information for the most common instructions.

		`reg field`

`Opcode`		000	001	010	011	100	101	110	111
	80, 81, 82, 83	ADD	OR	ADC	SBB	AND	SUB	XOR	CMP
	D0, D1, D2, D3	ROL	ROR	RCL	RCR	SHL	SHR		SAR
	F6, F7	TEST		NOT	NEG	MUL	IMUL	DIV	IDIV
	FE, FF	INC	DEC					PUSH

Figure 9.3: reg field for specified opcodes

Each two-operand, nonimmediate 80x86 instruction has at least one register operand. The reg field contains a code for this register. Figure 9.4 shows how the eight possible register codes are assigned. The meaning of a reg code varies with the operand size and with the instruction, so that, for example, the same code is used for ECX and CL. These codes are used any time information about a register is encoded in an instruction, whether in the reg field or other places.

reg code	register 32	register 16	register 8	segment register

000	EAX	AX	AL	ES
001	ECX	CX	CL	CS
010	EDX	DX	DL	SS
011	EBX	BX	BL	DS
100	ESP	SP	AH	FS
101	EBP	BP	CH	GS
110	ESI	SI	DH
111	EDI	DI	BH

Figure 9.4: 80x86 register codes

The mod field is also used to determine the type of operands an instruction has. Often the same opcode is used for an instruction that has two register operands or one register operand and one memory operand. The choice mod=11 means that the instruction is a register-to-register operation or an immediate-to-register operation. For a register-to-register operation, the destination register is coded in the reg field and the source register is coded in the r/m field. Both use the register codes shown in Fig. 9.4. For an immediate-to-register operation, the operation is coded as shown in Fig. 9.3 and the destination register is coded in the r/m field. The situation is complicated for the other possible mod values and depends on the r/m field as well as the mod field. For r/m=100, it also depends on the scaled index base (SIB) byte.

The SIB byte consists of three fields, a two-bit scaling field, a three-bit index register field, and a three-bit base register field.

The scale values are 00 for 1, 01 for 2, 10 for 4, and 11 for 8.

The index and base register encodings are as shown in Fig. 9.4, except that 100 cannot appear in the index register field since ESP cannot be an index register. Figure. 9.5 shows the different encodings. The mod field in these formats tells how many bytes there are in the displacement. A value of 00 means that there is no displacement in the machine code, except when r/m=101 when there is only a displacement. This special case is for direct memory addressing, so is frequently used. A mod value of 01 means that there is a displacement byte in the machine code; this byte is treated as a signed number and is extended to a doubleword before it is added to the value from the base register and/or index register. A value of 10 means that there is a displacement doubleword in the machine code; this doubleword is added to the value that comes from the base register and/or scaled index register. The scaling factor is multiplied times the value in the index register.

mod	r/m	base from SIB	operand (scale and index from SIB)

00	000		DS:[EAX]
	001		DS:[ECX]
	010		DS:[EDX]
	011		DS:[EBX]
	100	000	DS:[EAX + (scaleindex*)]
	(use SIB)	001	DS:[ECX + (scaleindex*)]
		010	DS:[EDX + (scaleindex*)]
		011	DS:[EBX + (scaleindex*)]
		100	SS:[ESP + (scaleindex* )]
		101	DS:[displacement32 + (scaleindex*)]
		110	DS:[ESI + (scaleindex* )]
		111	DS:[EDI + (scaleindex* )]
	101		DS:displacement32
	110		DS:[ESI]
	111		DS:[EDI]
01	000		DS:[EAX + displacement8]
	001		DS:[ECX + displacement8]
	010		DS:[EDX + displacement8]
	011		DS:[EBX + displacement8]
	100	000	DS:[EAX + (scaleindex) + displacement8*]
	(use SIB)	001	DS:[ECX + (scaleindex) + displacement8*]
		010	DS:[EDX + (scaleindex) + displacement8*]
		011	DS:[EBX + (scaleindex) + displacement8*]
		100	SS:[ESP + (scaleindex) + displacement8*]
		101	SS:[EBP+ (scaleindex) + displacement8*]
		110	DS:[ESI + (scaleindex) + displacement8*]
		111	DS:[EDI + (scaleindex) + displacement8*]
		101	SS:[EBP + displacement8]
		110	DS:[ESI + displacement8]
		111	DS:[EDI + displacement8]
10	000		DS:[EAX + displacement32]
	001		DS:[ECX + displacement32]
	010		DS:[EDX + displacement32]
	011		DS:[EBX + displacement32]
	100	000	DS:[EAX + (scaleindex) + displacement32*]
	(use SIB)	001	DS:[ECX + (scaleindex) + displacement32*]
		010	DS:[EDX + (scaleindex) + displacement32*]
		011	DS:[EBX + (scaleindex) + displacement32*]
		100	SS:[ESP + (scaleindex) + displacement32*]
		101	SS:[EBP+ (scaleindex) + displacement32*]
		110	DS:[ESI + (scaleindex) + displacement32*]
		111	DS:[EDI + (scaleindex) + displacement32*]
		101	SS:[EBP + displacement32]
		110	DS:[ESI + displacement32]
		111	DS:[EDI + displacement32]
mod	reg	r/m	operands
11	dest	source	source register, destination register
	operation	dest	destination register, immediate operand

Figure 9.5: 80x86 instruction encodings

It is time for some examples. The first example shows the kind of instruction seen frequently in this book.

 add ecx, value

Suppose that at execution time value references the memory doubleword at address 1B27D48C. From Fig. 4.5 or Appendix D, this add instruction has opcode 03. The direct address consists only of the 32-bit displacement-there is no index register or base register used. Therefore the components of the mod-reg-r/m byte are mod=00, reg=001 (for ECX), and r/m=101 (for direct addressing), giving 00 001 101 or 0D after regrouping and converting to hexadecimal. The final part of the instruction is the displacement, so the entire instruction is encoded as 03 0D 1B27D48C (where the bytes of the address will actually be stored backwards).

Now consider the instruction

 add ecx, eax

This instruction also has opcode 03. The mod field is 11 since there are two register operands. The reg field specifies the destination register, 001 for ECX. The r/m field gives the source register, 000 for EAX. The mod-reg-r/m byte of the instruction is therefore 11 001 000, or C8 in hex. The machine code for the instruction is 03 C8.

Next consider the instruction

 mov edx, [ebx]

Figure 4.3 or Appendix D gives the opcode as 8B. Since the operand [ebx] is indirect addressing using no displacement, the mod field is 00. The reg field contains 010, the code for EDX. The fourth line of the mod=00 group shows address DS:[EBX], that is, register indirect addressing in the data segment using the address in EBX. Therefore the r/m field is 011. Putting these fields together gives a mod-reg-r/m byte of 00 010 011 or 13, and the entire instruction assembles to 8B 13.

Now look at

 xor ecx, [edx+2]

Figure 8.2 or Appendix D gives the opcode of this instruction as 33. The memory operand uses indirect addressing and a displacement of 2, small enough to encode in a single byte 02. Therefore the mod field is 01. The reg field contains 001 for ECX. Figure 9.5 gives the r/m field as 010. Putting this together gives a mod-reg-r/m byte of 01 001 010 or 4A, so this instruction has machine code 33 4A 02.

Next consider an instruction that uses scaling.

 add eax, [ebx + 4*ecx]

This type of instruction is useful to process an array almost as in a high level language. You can store the starting address of the array in EBX, and the array index in ECX (assuming that indexing starts at 0). The index is multiplied by the scaling factor 4 (the size of a doubleword), and added to the base address to get the address of the array element. Figure 4.5 gives the opcode as 03. The mod-reg-r/m byte is 00 000 100 or 04 for no displacement, destination register EAX, and SIB byte used. The SIB byte is required since the instruction includes both base and index registers. Its fields are scale=10 for 4, index=001 for ECX, and base=011 for EBX, giving a SIB byte of 10 001 011 or 8B. The object code is therefore 03 04 8B.

Next we look at

 sub ecx, value[ebx + 2*edi]

where value references an address in the data segment. The opcode for this sub instruction is 2B. This address is treated as a 32-bit displacement, and there is both a base and an index register. Therefore mod=10, reg=001 (for ECX), and r/m=100 (for SIB needed). The fields of the SIB byte are 01 (for scaling factor 2), 111 (for index register EDI), and 011 (for base register EBX). The displacement doubleword will contain the run-time address of value. The machine code is therefore 2B 8C 7B xxxxxxxx, where the x's represent the address of value.

If the second operand in the last example is changed to value[EBX+2*EDI+10], then the displacement/address (represented above by xxxxxxxx) is simply 10 larger. That is, the assembler combines the displacement 10 and the displacement corresponding to value.

You may have noticed that the first group in Fig. 9.5 does not show how to encode the operand [ebp]. It is encoded as [ebp+0], using a byte-size displacement. For example

 mov eax, [ebp]

is encoded as 8B 45 00, opcode 8B, mod-reg-r/m byte 01 000 101 (1-byte displacement, destination EAX, base register EBP), and displacement 00.

Figure 9.5 points out again that indirect addresses using ESP and EBP are in the stack segment, not the data segment. One would rarely want to override this. However, you might want to reference data in, say, the extra segment. To do this, you might code an instruction like

 cmp ax, WORD PTR es:[edx + 2*esi + 512]

This example has been chosen to involve almost all of the possible components of an 80x86 instruction. It uses operand size prefix since word-size operands are being used. It uses a segment override prefix for ES. It uses base and index registers and a 32-bit displacement. The code generated is 66 26 3B 84 72 00000200, operand size prefix 66, segment override 26 (for ES), opcode 3B, mod-reg-r/m byte 84, SIB 72, and displacement 00000200. The possible segment override bytes are in Fig. 9.6.

Prefix	Segment

2E	CS
3E	DS
26	ES
36	SS
64	FS
65	GS

Figure 9.6: Segment override prefixes

While it may seem that opcode assignments are completely random, there are actually several patterns. For example, given a doubleword operand referenced by value, the opcode for the memory-to-register instruction mov eax,value is A1 and the opcode for the register-to-memory instruction mov value,eax is A3. In binary, these differ only in bit position 1, the next-to-last bit. Bit 1 often serves as a direction bit, having value 1 when the first operand is in memory and 0 when the first operand is in a register.

Similarly, corresponding instructions with doubleword operands and byte-size operands often have opcodes that differ only in bit position 0, the last bit. For example, given a byte referenced by bVal and a doubleword referenced by dVal, then the opcode for cmp bVal,dl is 38 and for cmp dVal,edx is 39. Bit 0 often serves as a size bit, having value 1 for doubleword (or word) operands and value 0 for byte operands.

Another set of patterns occurs in some single byte instructions where the same instruction is available for each of the registers-the opcode ends in the appropriate register code. For instance, the inc instructions for register32 operands (Fig. 4.6) have opcodes 40 through 47, and the last three bits are 000 through 111, the register codes for the registers to be incremented. Another way of looking at this is that the opcodes for this class of inc instructions are obtained by adding 40 and the register code.

Exercises 9.2

Why can no 80x86 assembly language instruction specify two memory operands?
Find the machine code for each of the following instructions. Make the following assumptions:
```
dbl DWORD ? ; run-time location 1122AABB
wrd WORD ? ; run-time location 3344CCDD
byt BYTE ? ; run-time location 5566EEFF
```
1. add dbl, ecx
2. add wrd, cx
3. add byt, cl
4. add edx, ebx
5. add dx, bx
6. add dl, bh
7. push ebp
8. cmp ecx, dbl
9. cmp al, byt
10. inc ecx
11. inc cx
12. pop eax
13. push dbl
14. or al, 35
15. sub dbl, 2 (byte-size immediate operand)
16. and ebx, 0ff000000h (doubleword-size immediate operand)
17. xchg ebx, ecx
18. xchg eax, ecx (note accumulator operand)
19. cwd
20. shl edx, 1
21. neg WORD PTR [EBX]
22. imul ch
23. div dbl
24. dec DWORD PTR [ebx+esi]
25. and ecx, [ebx+4*edi]
26. sub ebx, dbl[4*eax]

Programming Exercises 9.2

Assuming that arr[0..nbr] contains a collection of doublewords in increasing order. The following design describes a binary search for keyValue, returning the index of keyValue if it is present in the array and -1 if it is absent.
```
procedure binarySearch(arr : array, nbr: integer, keyValue : integer) : integer
topIndex := nbr;
bottomIndex := 0;
while (bottomIndex ≤ topIndex) loop
 midIndex := (bottomIndex + topIndex) div 2;
 if (keyValue = arr[midIndex])
 then
 return midIndex;
 elseif (keyValue < arr[midIndex])
 then
 topIndex := midIndex--1;
 else
 bottomIndex := midIndex + 1;
 end if;
end loop;
return −1;
```
Implement this design as an 80x86 NEAR32 procedure binarySearch with three parameters, (1) the address of an array of doublewords, (2) a doubleword nbr, and (3) a doubleword keyValue. Return the appropriate result in EAX. The procedure will change no register other than EAX, and it will be responsible for removing parameters from the stack. Use scaled and indexed addressing appropriately to address array elements. Write a short test driver program to test your procedure binarySearch.
The first nbrElts values in an array a[1..maxIndex] can be sorted into increasing order using the selection sort algorithm.
```
procedure selectionSort(arr : array, nbr: integer)
for position := 1 to nbrElts-1 loop
 smallSpot := position;
 smallValue := a[position];
 for i := position+1 to nbrElts loop
 if a[i] < smallValue
 then
 smallSpot := i;
 smallValue := a[i];
 end if;
 end for;
 a[smallSpot] := a[position];
 a[position] := smallValue;
end for;
```
Implement this algorithm in a NEAR32 procedure selectionSort with two parameters: (1) the address of an array a of doubleword integers, and (2) a doubleword nbrElts. The procedure will change no register and it will be responsible for removing parameters from the stack. Use scaled and indexed addressing appropriately to address array elements, noting that the algorithm as written starts with index 1, not index 0. Write a short test driver program to test your procedure.
The quick sort algorithm sorts an array slice a[leftEnd..rightEnd] into increasing order by identifying a middle value in the array and moving elements of the array so that all elements on the left are smaller than the middle value and all on the right are larger than the middle value. Then the procedure is recursively called to sort the left and right sides. The recursion terminates when the portion to be sorted has one or fewer elements. Here is a design.
```
procedure quickSort(a:array, leftEnd:integer, rightEnd:integer)
if leftEnd < rightEnd
then
 left := leftEnd;
 right := rightEnd;

 while left < right loop
 while (left < right) and (a[left] < a[right]) loop
 add 1 to left;
 end while;
 swap a[left] and a[right];

 while (left < right) and (a[left] < a[right]) loop
 subtract 1 from right;
 end while;
 swap a[left] and a[right];
 end while;

 quickSort(a, leftEnd, left-1);
 quickSort(a, right+1, rightEnd);
end if;
```
Implement this algorithm in a NEAR32 procedure quickSort with three parameters: (1) the address of an array a of doubleword integers, (2) a doubleword leftEnd, and (3) a doubleword nbrElts. The procedure will change no register and it will be responsible for removing parameters from the stack. Use scaled and indexed addressing appropriately to address array elements. Write a short test driver program to test your procedure.

Macro Definition and Expansion

A macro was defined in Chapter 3 as a statement that is shorthand for a sequence of other statements. The assembler expands a macro to the statements it represents, and then assembles these new statements. Many previous chapters have made extensive use of macros defined in the file IO.H. This section explains how to write macro definitions and tells how MASM uses these definitions to expand macros into other statements.

A macro definition resembles a procedure definition in a high-level language. The first line gives the name of the macro being defined and a list of parameters; the main part of the definition consists of a collection of statements that describe the action of the macro in terms of the parameters. A macro is called much like a high-level language procedure, too; the name of the macro is followed by a list of arguments.

These similarities are superficial. A procedure call in a high-level language is generally compiled into a sequence of instructions to push parameters on the stack followed by a call instruction, whereas a macro call actually expands into statements given in the macro, with the arguments substituted for the parameters used in the macro definition. Code in a macro is repeated every time a macro is called, but there is just one copy of the code for a procedure. Macros often execute more rapidly than procedure calls since there is no overhead for passing parameters or for call and ret instructions, but this is usually at the cost of more bytes of object code.

Every macro definition is bracketed by MACRO and ENDM directives. The format of a macro definition is

 name MACRO list of parameters
 assembly language statements
 ENDM

The parameters in the MACRO directive are ordinary symbols, separated by commas. The assembly language statements may use the parameters as well as registers, immediate operands, or symbols defined outside the macro. These statements may even include macro calls.

A macro definition can appear anywhere in an assembly language source code file as long as the definition comes before the first statement that calls the macro. It is good programming practice to place macro definitions near the beginning of a source file.

The remainder of this section gives several examples of macro definitions and macro calls. Suppose that a program design requires several pauses where the user is prompted to press the [Enter] key. Rather than write this code every time or use a procedure, a macro pause can be defined. Figure 9.7 gives such a definition.

 pause MACRO
 ; prompt user and wait for [Enter] to be pressed
 output pressMsg ; "Press [Enter]"
 input stringIn,5 ; input
 ENDM

Figure 9.7: pause macro

The pause macro has no parameter, so a call expands to almost exactly the same statements as are in the definition. If the statement

 pause

is included in subsequent source code, then the assembler expands this macro call into the statements

 output pressMsg ; "Press [Enter]"
 input stringIn,5 ; input

Of course, each of these statements is itself a macro call and will expand to additional statements. Notice that the pause macro is not self-contained; it references two fields in the data segment:

 pressMsg BYTE "Press [Enter] to continue", 0
 stringIn BYTE 5 DUP (?)

Note again that the definition and expansion for the pause macro contain no ret statement. Although macros look much like procedures, they generate in-line code when the macro call is expanded at assembly time.

Figure 9.8 gives a definition of a macro add2 that finds the sum of two parameters, putting the result in the EAX register. The parameters used to define the macro are nbr1 and nbr2. These labels are local to the definition. The same names could be used for other purposes in the program, although some human confusion might result.

 add2 MACRO nbr1, nbr2
 ; put sum of two doubleword parameters in EAX
 mov eax, nbr1
 add eax, nbr2
 ENDM

Figure 9.8: Macro to add two integers

The statements to which add2 expands depends on the arguments used in a call. For example, the macro call

 add2 value, 30 ; value + 30

expands to

 ; put sum of two doubleword parameters in EAX
 mov eax, value
 add eax, 30

The statement

 add2 value1, value2 ; value1 + value2

expands to

 ; put sum of two doubleword parameters in EAX
 mov eax, value1
 add eax, value2

The macro call

 add2 eax, ebx ; sum of two values

expands to

 ; put sum of two doubleword parameters in EAX
 mov eax, eax
 add eax, ebx

The instruction mov eax,eax is legal, even if it accomplishes nothing.

In each of these examples, the first argument is substituted for the first parameter nbr1 and the second argument is substituted for the second parameter nbr2. Each macro results in two mov instructions, but since the types of arguments differ, the object code will vary.

If one of the parameters is missing the macro will still be expanded. For instance, the statement

 add2 value

expands to

 ; put sum of two doubleword parameters in EAX
 mov eax, value
 add eax,

The argument value replaces nbr1 and an empty string replaces nbr2. The assembler will report an error, but it will be for the illegal add instruction that results from the macro expansion, not directly because of the missing argument.

Similarly, the macro call

 add , value

expands to

 ; put sum of two doubleword parameters in EAX
 mov eax,
 add eax, value

The comma in the macro call separates the first missing argument from the second argument value. An empty argument replaces the parameter nbr1. The assembler will again report an error, this time for the illegal mov instruction.

Figure 9.9 shows the definition of a macro swap that will exchange the contents of two doublewords in memory. It is very similar to the 80x86 xchg instruction that will not work with two memory operands.

 swap MACRO dword1, dword2
 ; exchange two doublewords in memory
 push eax
 mov eax, dword1
 xchg eax, dword2
 mov dword1, eax
 pop eax
 ENDM

Figure 9.9: Macro to swap two memory words

As with the add2 macro, the code generated by calling the swap macro depends on the arguments used. For example, the call

 swap [ebx], [ebx+4] ; swap adjacent words in array

expands to

 ; exchange two doublewords in memory
 push eax
 mov eax, [ebx]
 xchg eax, [ebx+4]
 mov [ebx], eax
 pop eax

It might not be obvious to the user that the swap macro uses the EAX register, so the push and pop instructions in the macro protect the user from accidentally losing the contents of this register.

Figure 9.10 gives a definition of a macro min2, which finds the minimum of two doubleword signed integers, putting the smaller in the EAX register. The code for this macro must implement a design with an if statement, and this requires at least one assembly language statement with a label. If an ordinary label were used, then it would appear every time a min2 macro call was expanded and the assembler would produce error messages because of duplicate labels. The solution is to use a LOCAL directive to define a symbol endIfMin that is local to the min2 macro.

 min2 MACRO first, second
 LOCAL endIfMin
 ; put smaller of two doublewords in the EAX register
 mov eax, first
 cmp eax, second
 jle endIfMin
 mov eax, second
 endIfMin:
 ENDM

Figure 9.10: Macro to find smaller of two memory words

The LOCAL directive is used only within a macro definition and must be the first statement after the MACRO directive. (Not even a comment can separate the MACRO and LOCAL directives.) It lists one or more symbols, separated by commas, which are used within the macro definition. Each time the macro is expanded and one of these symbols is needed, it is replaced by a symbol starting with two question marks and ending with four hexadecimal digits (??0000, ??0001, etc.) The same ??dddd symbol replaces the local symbol each place the local symbol is used in one particular expansion of a macro call. The same symbols may be listed in LOCAL directives in different macro definitions or may be used as regular symbols in code outside of macro definitions.

The macro call

 min2 [ebx], ecx ; find smaller of two values

might expand to the code

 LOCAL endIfMin
 ; put smaller of two doublewords in the EAX register
 mov eax, [ebx]
 cmp eax, ecx
 jle ??000C
 mov eax, ecx
 ??000C:

Here endIfMin has been replaced the two places it appears within the macro definition by ??000C in the expansion. Another expansion of the same macro would use a different number after the question marks.

The MASM assembler has several directives that control how macros and other statements are shown in .LST files. The most useful are

.LIST that causes statements to be included in the listing file
.NOLIST that completely suppresses the listing of all statements, and
.NOLISTMACRO that selectively suppresses macro expansions while allowing the programmer’s original statements to be listed

The file IO.H ends starts with a .NOLIST directive so that macro definitions do not clutter the listing. Similarly IO.H ends with .NOLISTMACRO and .LIST directives so that macro expansion listings do not obscure the programmer’s code, but original statements are listed.

Exercises 9.3

Using the macro definition for add2 given in Fig. 9.8, show the sequence of statements to which each of the following macro calls expands.
1. add2 25, ebx
2. add2 ecx, edx
3. add2 ; no argument
4. add2 value1, value2, value3
(Hint: the third argument is ignored since it has no matching parameter.)
Using the macro definition for swap given in Fig. 9.9, show the sequence of statements to which each of the following macro calls expands.
1. swap value1, value2
2. swap temp, [ebx]
3. swap value
Using the macro definition for min2 given in Fig. 9.10, show the sequence of statements to which each of the following macro calls expands.
1. min2 value1, value2
  
  (Assume the local symbol counter is at 000A)
2. min2 cx, value
  
  (Assume the local symbol counter is at 0019)

Programming Exercises 9.3

Write a definition of a macro add3 that has three doubleword integer parameters and puts the sum of the three numbers in the EAX register.
Write a definition of a macro max2 that has two doubleword integer parameters and puts the maximum of the two numbers in the EAX register.
Write a definition of a macro min3 that has three doubleword integer parameters and puts the minimum of the three numbers in the EAX register.
Write a definition of a macro toUpper with one parameter, the address of a byte in memory. The code generated by the macro will examine the byte, and if it is the ASCII code for a lowercase letter, will replace it by the ASCII code for the corresponding uppercase letter.

Conditional Assembly

The Microsoft Macro Assembler can observe various conditions that can be tested at assembly time and alter how the source code is assembled on the basis of these conditions. For instance, a block of code may be assembled or skipped based on the definition of a constant. This ability to do conditional assembly is especially useful in macro definitions. For example, two macros using the same mnemonic may be expanded into different sequences of statements based on the number of operands present. This section describes some of the ways that conditional assembly can be used.

Figure 9.11 shows a definition for a macro addAll that will add one to five doubleword integers, putting the sum in the EAX register. It employs the conditional assembly directive IFNB ("if not blank"). This directive is most often used in macro definitions, although it is legal in open code, that is, regular code outside a macro. When an addAll macro call is expanded and one of its IFNB directives is encountered, MASM examines the value of the macro parameter whose name is enclosed between < and >. If that parameter has a corresponding argument passed to it, then it is "not blank" and the add instruction for that argument is included in the expansion of the macro. If a parameter does not have a corresponding argument, the add instruction is not assembled.

 addAll MACRO nbr1, nbr2, nbr3, nbr4, nbr5
 ; add up to 5 doubleword integers, putting sum in EAX
 mov eax, nbr1 ; first operand
 IFNB 
 add eax, nbr2 ; second operand
 ENDIF
 IFNB 
 add eax, nbr3 ; third operand
 ENDIF
 IFNB 
 add eax, nbr4 ; fourth operand
 ENDIF
 IFNB 
 add eax, nbr5 ; fifth operand
 ENDIF
 ENDM

Figure 9.11: addAll macro using conditional assembly

Given the macro call

 addAll ebx, ecx, edx, number, 1

each of the five macro parameters has a corresponding argument, so the macro expands to

 mov eax, ebx ; first operand
 add eax, ecx ; second operand
 add eax, edx ; third operand
 add eax, number ; fourth operand
 add eax, 1 ; fifth operand

The macro call

 addAll ebx, ecx, 45 ; value1 + value2 + 45

has only three arguments. The argument ebx becomes the value for parameter nbr1, ecx is substituted for nbr2, and 45 will be used for nbr3, but the parameters nbr4 and nbr5 will be blank. Therefore the macro expands to the statements

 mov eax, ebx ; first operand
 add eax, ecx ; second operand
 add eax, 45 ; third operand

Although it would be unusual to do so, arguments other than trailing ones can be omitted. For example, the macro call

 addAll ebx, ,ecx

has ebx corresponding to nbr1 and ecx matched to nbr3, but all other parameters will be blank. Therefore the macro expands to

 mov eax, ebx ; first operand
 add eax, ecx ; third operand

If the first argument is omitted in an addAll macro call, the macro will still be expanded. However, the resulting statement sequence will contain a mov instruction with a missing operand, and this statement will cause MASM to issue an error message. For example, the macro call

 addAll , value1, value2

expands to

 mov eax, ; first operand
 add eax, value1 ; second operand
 add eax, value2 ; third operand

An unusual use of the addAll macro is illustrated by the call

 addAll value, eax, eax, value, eax ; 10 * value

that expands to

 mov eax, value ; first operand
 add eax, eax ; second operand
 add eax, eax ; third operand
 add eax, value ; fourth operand
 add eax, eax ; fifth operand

The comment "10 * value" explains the purpose of this call.

The Microsoft assembler provides several conditional assembly directives. The IFNB directive has a companion IFB ("if blank") that checks if a macro parameter is blank.

The IF and IFE directives examine an expression whose value can be determined at assembly time. For IF, MASM assembles conditional code if the value of the expression is not zero. For IFE, MASM includes conditional code if the value is zero.

The IFDEF and IFNDEF are similar to IF and IFE. They examine a symbol and MASM assembles conditional code depending on whether or not the symbol has previously been defined in the program.

Each conditional assembly block is terminated by the ENDIF directive. ELSEIF and ELSE directives are available to provide alternative code. In general, blocks of conditional assembly code look like

 IF... [operands]
 statements
 ELSEIF ...
 statements
 ELSE
 statements
 ENDIF

Operands vary with the type of IF and are not used with all types. The ELSEIF directive and statements following it are optional, as are the ELSE directive and statements following it. There can be more than one ELSEIF directive, but at most one ELSE directive.

The above syntax strongly resembles what appears in many high-level languages. It is important to realize, however, that these directives are used at assembly time, not at execution time. That is, they control assembly of statements that are later executed, not the order of statement execution.

The EXITM directive can be used to make some macro definitions simpler to write and understand. When MASM is processing a macro call and finds an EXITM directive, it immediately stops expanding the macro, ignoring any statements following EXITM in the macro definition. The design

if condition
then
 process assembly language statements for condition;
else
 process statements for negation of condition;
end if;

and the alternative design

if condition
then
 process assembly language statements for condition;
 terminate expansion of macro;
end if;

process statements for negation of condition;

are equivalent, assuming that no macro definition statements follow those sketched in the designs. These alternative designs can be implemented using

 IF... [operands]
 assembly language statements for condition
 ELSE
 assembly language statements for negation of condition
 ENDIF

and

 IF... [operands]
 assembly language statements for condition
 EXITM
 ENDIF
 assembly language statements for negation of condition

Notice that the EXITM directive is not needed when the ELSE directive is used. A macro definition using EXITM appears in Fig. 9.12 on the next page.

 min2 MACRO value1,value2,extra
 LOCAL endIfLess
 ; put smaller of value1 and value2 in EAX


 IFB 
 .ERR 
 EXITM
 ENDIF


 IFB 
 .ERR 
 EXITM
 ENDIF


 IFNB 
 .ERR 
 EXITM
 ENDIF


 mov eax, value1 ;; first value to EAX
 cmp eax, value2 ;; value1 <= value2?
 jle endIfLess ;; done if so
 mov eax, value2 ;; otherwise value2 smaller
 endIfLess:
 ENDM

Figure 9.12: Improved min2 macro

Examples in the previous section showed macro calls that expanded to illegal statements as a result of missing arguments. Such illegal statements are detected by MASM during subsequent assembly rather than as the macro is expanded. The designer of a macro definition may wish to include safeguards to ensure that the correct number of arguments is included in a macro call, or that the call is valid in other ways. Conditional assembly directives make this possible. If, however, assembly errors are eliminated by avoiding generation of illegal statements, a user may not know when a macro call is faulty. It requires additional effort to inform the user of an error. One way to do this is with the .ERR directive. This directive generates a forced error at assembly time, resulting in a message to the console and a message to the listing file, if any. It also ensures that no .obj file is produced for the assembly. The .ERR directive is often followed by a string enclosed by < and >. This string is included in the error message.

The min2 macro definition in Fig. 9.12 incorporates safeguards to ensure that the macro is called with the correct number of parameters. The conditional block

 IFB 
 .ERR 
 EXITM
 ENDIF

examines the first argument. If it is missing, then the .ERR directive displays the message "first argument missing in min2 macro." Note that the conditional block ends with an EXITM directive, so that if the first argument is missing, no further expansion of the macro is done. An alternative way to suppress additional macro expansion would be to nest the rest of the macro definition between an ELSE directive and the ENDIF directive for this first conditional block.

The conditional block

 IFB 
 .ERR 
 EXITM
 ENDIF

examines the second argument, generating an error if it is missing. The conditional block

 IFNB 
 .ERR 
 EXITM
 ENDIF

tells MASM to check to see if a third argument was listed in the macro call that is being expanded. Since there should be no third argument, an error is generated if the argument is not blank.

Exercises 9.4

Using the macro definition for min2 given in Fig. 9.12, show the sequence of statements to which each of the following macro calls expands.
1. min2 nbr1, nbr2
  
  (Assume the local symbol counter is at 0004.)
2. min2 , value
  
  (Assume the local symbol counter is at 0011.)
3. min2 ecx
  
  (Assume the local symbol counter is at 000B.)
4. min2 nbr1, nbr2, nbr3
  
  (Assume the local symbol counter is at 01D0.)

Programming Exercises 9.4

Rewrite the macro definition for swap from Fig. 9.9, so that a swap macro call must have exactly two arguments; use .ERR with appropriate messages if there are missing or extra arguments.
Write a definition of a macro min3 that has exactly three doubleword integer parameters and that puts the minimum of the three numbers in the EAX register. Use .ERR with appropriate messages if there are missing or extra arguments in a min3 call.

Macros in IO H

Macros in the file IO.H are designed to provide simple, safe access to standard input and output devices. Figure 9.13 shows the contents of IO.H and the remainder of the section discusses the directives and macros in the file.

; IO.H - header file for I/O macros
; 32-bit version for flat memory model
; R. Detmer last revised 8/2000
.NOLIST ; turn off listing
.386

 EXTRN itoaproc:near32, atoiproc:near32
 EXTRN dtoaproc:near32, atodproc:near32
 EXTRN inproc:near32, outproc:near32

itoa MACRO dest,source,xtra ;; convert integer to ASCII string

 IFB

.ERR EXITM ENDIF IFNB .ERR EXITM ENDIF push ebx ;; save EBX mov bx, source push bx ;; source parameter lea ebx,dest ;; destination address push ebx ;; destination parameter call itoaproc ;; call itoaproc(source,dest) pop ebx ;; restore EBX ENDM atoi MACRO source,xtra ;; convert ASCII string to integer in AX ;; offset of terminating character in ESI IFB .ERR EXITM ENDIF IFNB .ERR EXITM ENDIF push ebx ;; save EBX lea ebx,source ;; source address to EBX push ebx ;; source parameter on stack call atoiproc ;; call atoiproc(source) pop ebx ;; parameter removed by ret ENDM dtoa MACRO dest,source,xtra ;; convert double to ASCII string IFB .ERR EXITM ENDIF IFNB .ERR EXITM ENDIF push ebx ;; save EBX mov ebx, source push ebx ;; source parameter lea ebx,dest ;; destination address push ebx ;; destination parameter call dtoaproc ;; call dtoaproc(source,dest) pop ebx ;; restore EBX ENDM atod MACRO source,xtra ;; convert ASCII string to integer in EAX ;; offset of terminating character in ESI IFB .ERR EXITM ENDIF IFNB .ERR EXITM ENDIF lea eax,source ;; source address to EAX push eax ;; source parameter on stack call atodproc ;; call atodproc(source) ;; parameter removed by ret ENDM output MACRO string,xtra ;; display string IFB .ERR EXITM ENDIF IFNB .ERR EXITM ENDIF push eax ;; save EAX lea eax,string ;; string address push eax ;; string parameter on stack call outproc ;; call outproc(string) pop eax ;; restore EAX ENDM input MACRO dest,length,xtra ;; read string from keyboard IFB .ERR EXITM ENDIF IFNB .ERR EXITM ENDIF push ebx ;; save EBX lea ebx,dest ;; destination address push ebx ;; dest parameter on stack mov ebx,length ;; length of buffer push ebx ;; length parameter on stack call inproc ;; call inproc(dest,length) pop ebx ;; restore EBX ENDM .NOLISTMACRO ; suppress macro expansion listings .LIST ; begin listing

Figure 9.13: IO.H

Most of the file IO.H consists of macro definitions that, when used, generate code to call external procedures. However, the file does contain other directives. It begins with a .NOLIST directive; this suppresses the listing of all source code, in particular the contents of IO.H. It then has EXTRN directives that identify the external procedures called by the macros. The file ends with a .NOLISTMACRO directive to suppress listing of any macro expansions and an .LIST directive so that the user's statements following the directive INCLUDE io.h will again be shown in the listing file.

The bulk of the file IO.H consists of definitions for itoa, atoi, dtoa, atod, output, and input macros. These definitions have similar structures. Each uses IFB and IFNB directives to check that a macro call has the correct number of arguments. If not, .ERR directives are used to generate forced errors and appropriate messages. Actually, the checks are not quite complete.

Assuming that its arguments are correct, an input/output macro call expands to a sequence of instructions that call the appropriate external procedure, for instance itoaproc for the macro itoa. Parameters are passed on the stack, but some code sequences use a register to temporarily contain a value, with push and pop instructions to ensure that these registers are not changed following a macro call.

Exercises 9.5

Notice that itoa has only one error message that is used if either or both argument is missing. Rewrite the definition of itoa to provide complete argument checking. That is, check separately for missing source and dest arguments, generating specific messages for each missing argument. Allow for the possibility that both are missing.

Summary

This chapter has discussed the assembly process. A typical two-pass assembler scans an assembly language program twice, using a location counter to construct a symbol table during the first pass, and completing assembly during the second pass. The symbol table contains information about each identifier used in the program, including its type, size, and location. Assembly can be done in a single pass if the object code is "fixed up" when forward references are resolved.

A machine instruction may have one or more prefix bytes. However, the main byte of machine code for each 80x86 instruction is its opcode. Some instructions are a single byte long, but most consist of multiple bytes. The next byte often has the format mod reg r/m where reg indicates a source or destination register, and the other two fields combine to describe the addressing mode. Other instruction bytes contain additional addressing information, immediate data, or the address of a memory operand.

Macros are defined using MACRO and ENDM directives. Macros may use parameters that are associated with corresponding arguments in macro calls. A call is expanded at assembly time. The statements in the expansion of a macro call appear in the macro definition, with arguments substituted for parameters. A macro definition may declare local labels that MASM expands to different symbols for different macro calls.

Conditional assembly may be used in regular code or in macro definitions to generate different statements, based on conditions that can be checked at assembly time. The IFB and IFNB directives are used in macros to check for the absence or presence of arguments. Several other conditional assembly directives are also available, including IF, IFE, IFDEF, and IFNDEF. An ELSE directive may be used to provide two alternative blocks of code, and the ENDIF directive ends a conditional assembly block.

If the assembler encounters an EXITM directive when expanding a macro definition, it immediately terminates expansion of the macro. The .ERR directive triggers a forced error so that MASM displays an error message and produces no .OBJ file for the assembly.

The file IO.H contains definitions for a collection of input/output macros, and a few directives. These macro definitions use conditional assembly to check for missing or extra arguments and generate code that calls external procedures.

Preface

reg code	register 32	register 16	register 8	segment register

000	EAX	AX	AL	ES
001	ECX	CX	CL	CS
010	EDX	DX	DL	SS
011	EBX	BX	BL	DS
100	ESP	SP	AH	FS
101	EBP	BP	CH	GS
110	ESI	SI	DH
111	EDI	DI	BH

reg code	register 32	register 16	register 8	segment register

000	EAX	AX	AL	ES
001	ECX	CX	CL	CS
010	EDX	DX	DL	SS
011	EBX	BX	BL	DS
100	ESP	SP	AH	FS
101	EBP	BP	CH	GS
110	ESI	SI	DH
111	EDI	DI	BH

reg code	register 32	register 16	register 8	segment register

000	EAX	AX	AL	ES
001	ECX	CX	CL	CS
010	EDX	DX	DL	SS
011	EBX	BX	BL	DS
100	ESP	SP	AH	FS
101	EBP	BP	CH	GS
110	ESI	SI	DH
111	EDI	DI	BH