1.2. Intel Pentium Processor Commands and Registers


1.2. Intel Pentium Processor Commands and Registers

This section is dedicated to the overview of the Intel Pentium commands and registers. This material is useful if you will be investigating executable code. The information provided here will be helpful not only for beginners but also for experienced users. It can be used as a reference that is always handy.

1.2.1. Pentium Microprocessor Registers

The Pentium microprocessor comprises general-purpose registers, the flags register, segment registers, control registers, system address registers, and debug registers. The EIP register, which also is known as the instruction pointer, deserves special mention. It always contains the address of the executable command relative to the start of the segment. This register cannot be accessed directly; however, lots of commands change its contents indirectly— for example, the commands that pass control.

General-Purpose Registers

The list of general-purpose registers includes the following:

  • EAX = (16 + AX = (AH + AL))

  • EBX = (16 + BX = (BH + BL))

  • ECX = (16 + CX = (CH + CL))

  • EDX = (16 + DX = (DH + DL))

  • ESI = (16 + SI)

  • EDI = (16 + DI)

  • EBP = (16 + BP)

  • ESP = (16 + SP)

The EAX, EBX, EDX, and ECX registers are called working registers. Note that of all these registers have subregisters. For example, the first 16 bits of the EAX register are designated as AX. The least significant byte, AX, is in turn designated as AL, and the most significant bit is AH. The EDI and ESI registers are called index registers. They play a special role in index operations. The EBP register is usually employed for addressing parameters and local variables in the stack. The ESP register is the stack pointer that is automatically modified by PUSH, POP, RET, and CALL; however, it is rarely used explicitly. The ESI, EDI, ESP, and EBP registers also have subregisters. For example, the first 16 bits of the EDI register are designated as DI.

Flags Register

The flags register contains 32 bits. The bit values used by this register are as follows:

  • Bit 0, carry flag (CF) — This bit is set to one if in the course of addition or multiplication there was a carry from the most significant bit or if a bit was borrowed in the course of subtraction.

  • Bit 1- One.

  • Bit 2, parity flag (PF) — This bit is set to one if the least significant byte of the result contains an even number of ones; otherwise, this bit is set to zero.

  • Bit 3 - Zero.

  • Bit 4, auxiliary carry flag (AF) — This bit is set to one if there was a number was carried (or borrowed) from the third bit into bit 4.

  • Bit 5 - Zero.

  • Bit 6, zero flag (ZF) — This bit is set to one if the result of the operation is zero; otherwise, this bit is set to zero.

  • Bit 7, sign flag (SF) — This bit equals the most significant bit of the result of the previous operation.

  • Bit 8, trap flag (TF) — Setting this flag to one results in INT 3 being called after each command. This flag is used by debuggers in real mode.

  • Bit 9, interrupt flag (IF) — Resetting this flag to zero results in the microprocessor ceasing to accept interrupts from external devices.

  • Bit 10, direction flag (DF) — This flag is taken into account in string operations. If the flag is set to one, the address is automatically decremented in string operations.

  • Bit 11, overflow flag (OF) — This bit is set to one if the result of the operation over a signed number has exceeded the allowed limits.

  • Bits 12 and 13, input/output privilege level (IOPL) - These bits define the privilege level required to allow the code to execute input/output commands and other privileged commands.

  • Bit 14, nested task flag (NT).

  • Bit 15 — Zero.

  • Bit 16, resume flag (RF) — This flag is used with the debug breakpoint registers.

  • Bit 17, virtual mode flag (VM)— In protected mode, this flag enables the virtual 8086 mode.

  • Bit 18, alignment control flag (AC) — If this flag is set to one, exception 17 is thrown if an unaligned operand is accessed.

  • Bit 19, virtual function of the IF flag (VIF) — This flag works in the protected mode.

  • Bit 20, virtual interrupt pending flag (VIP).

  • Bit 21, identification command availability flag.

  • Bits 22–31 — Must be zero.

Segment Registers

Segment registers include CS, the code segment; DS, the data segment; SS, the stack segment; and ES, GS, and FS, auxiliary registers. All segment registers are 16-bit registers. Segment registers are intended to participate in forming the memory address either directly or using selectors that point to a certain structure (in descriptors table) that determines the segment, in which the address being formed is located.

Control Registers

The list of control registers includes the following:

  • The CR0 register:

    • Bit 0, protection enabled flag (PE) — Switches the processor to protected mode.

    • Bit 1, monitor coprocessor flag (MP) — Causes exception 7 with each WAIT command.

    • Bit 2, coprocessor emulation (EM) — Causes exception 7 with each coprocessor command.

    • Bit 3, task switching flag (TS) — Determines whether or not the given coprocessor context relates to the current task. It causes exception 7 when executing the next coprocessor command.

    • Bit 4, extension type — Indicates support for coprocessor instructions (ET).

    • Bit 5, numeric error (NE) — Enables native mechanisms for reporting coprocessor errors.

    • Bits 6–15, reserved.

    • Bit 16, write protect (WP) — Enables write protection at the supervisor privilege level.

    • Bit 17, reserved.

    • Bit 18, alignment mask (AM) — Enables automatic alignment checking.

    • Bits 19–28, reserved.

    • Bit 29, not write-through (NW) — Disables write-through for writes that hit the cache or invalidation cycles.

    • Bit 30, cache disable (CD) — Prevents the cache from filling.

    • Bit 31, paging (PG) — Enables paging when set to one.

  • The CR1 register is reserved for future use.

  • The CR2 register stores the 32-bit linear address, at which the last page fault occurred.

  • In the CR3 register, the 20 most significant bits store the physical base address of the page directory table. Other bits are as follows:

    • Bit 3, page level write transparent (PWT) — Controls the write-through or write-back page caching policy.

    • Bit 4, page-level cache disable (PCD) — Controls caching of the current page directory.

  • The CR4 register:

    • Bit 0, virtual 8086 mode extensions (VME) — Enables interrupt- and exception-handling extensions in virtual 8086 mode when set to one.

    • Bit 1, protected-mode virtual interrupts (PVI) — Enables hardware support for a virtual interrupt flag (VIF) in protected mode when set to one.

    • Bit 2, time stamp disable (TSD) — Restricts the execution of the RDTSC instruction to procedures running at privilege level 0.

    • Bit 3, debugging extensions (DE) — Enables breakpoints on accessing input/output ports.

    • Bit 4, page size extensions (PSE) — Enables 4-MB pages when set to one.

    • Bit 5, physical address extension (PAE) — Enables the paging mechanism to reference at least 36-bit physical addresses when set to one.

    • Bit 6, machine-check enable (MCE) — Enables the machine-check exception when set to one.

    • Bit 7, page global enable (PGE) — Enables the global page feature when set to one.

    • Bit 8, performance-monitoring counter enable (PCE) -— Enables execution of the RDPMC instruction for programs or procedures running at any protection level when set to one.

    • Bit 9, operating system support for FXSAVE and FXRSTOR instructions (OSFXSR) — Enables the FXSAVE and FXRSTOR instructions to save and restore the contents of the XMM and MXCSR registers, along with the contents of the x87 floating-point unit (FPU) and MMX registers, when set to one.

System Address Registers

These registers are used in the protected mode of Intel processors. The Windows operating system also operates in this mode.

  • GDTR — This is a 6-byte register containing the linear address of the global descriptor table (GDT).

  • TDTR — This is a 6-byte register containing the 32-bit linear address of the interrupt descriptor table.

  • LDTR — This is a 10-byte register containing the 16-bit selector (index) for GDT and an 8-byte descriptor.

  • TR — This is a 10-byte register containing the 16-bit selector for GDT and the entire 8-byte descriptor from GDT, describing the task state segment (TSS) of the current task. TSS is a segment of special format that contains all required information about the given task, and a special field that ensures task interactions and intercommunications.

Debug Registers

  • DR0-DR3 — These registers store the 32-bit linear addresses of the breakpoints. The operating mechanism of these registers is as follows: Any address formed by a program is compared with the addresses stored in the debug registers. If a match is encountered, the processor generates the debug exception (INT 1).

  • DR6 (equivalent to DR4) — This register reflects the checkpoint status. Bits of this register are set according to the debug conditions that have caused the debug exception. Significant bits of this register are as follows:

    • Bit 0, breakpoint condition detected (B0) — If this bit is set to zero, this indicates that the last exception has occurred when the breakpoint determined in DR0 was reached.

    • Bit 1 — This bit is similar to B0 but in relation to the DR1 register.

    • Bit 2 — This bit is similar to B0 but in relation to the DR2 register.

    • Bit 3 — This bit is similar to B0 but in relation to the DR3 register.

    • Bit 13, debug register access detected (BD) — Protects debug registers.

    • Bit 14, single step (BS) — If this bit is set to one, the exception was generated because the trap flag (bit 8 in flags register) is set to one.

    • Bit 15, task switch (BT) — If the value of this bit is one, the exception was caused by switching to the task with the trap bit set.

  • DR7 (equivalent to DR5) — This bit controls the breakpoints setting. In this register, for each of the debug registers (DR0-DR3) there are fields that determine the conditions, for which it is necessary to generate interrupts. The first four pairs of bits (8 bits) of this register, a pair per register, indicate whether the corresponding register would define a breakpoint for the local task (in which case the first bit must be set to one) or for all tasks running in the system (in which case the second bit of the pair must be set to one). Bits 16-31 define the type of access, for which the interrupt will be activated (when fetching a command or reading or writing to or from the memory) and specify the data size:

    • Bits 16-17, 20-21, 24-25, and 28-29 define the type of access as follows: 00 by a command, 01 for writing, 11 for reading and writing, and 10 for not used.

    • Bits 18-19, 22-23, 26–27, and 30-31 define the size of the operand as follows: 00 for byte, 01 for 2 bytes, 11 for 4 bytes, and 10 for "not used."

1.2.2. Main Instruction Set

The main instruction set includes all commands of the microprocessor, except for the coprocessor instructions and MMX instructions.

The designations adopted for presenting materials in subsequent tables are as follows:

  • dest and src — Destination operand and source operand

  • m — Operand located in memory

  • r — Operand that is a processor register

  • r8, rl6, and r32 — 8-, 16-, and 32-bit processor registers, respectively

  • mm — 64-bit MMX register

  • m32 and m64 — 32-bit and 64-bit operands, respectively, located in memory

  • ir32 — Normal processor registers

  • imm — Immediate operand (constant), 1 byte in size

Table 1.2: Data exchange commands

Command

Description

MOV dest, src

Load data to or from the register, memory, or immediate operand. For example: MOV AX, 10; MOVEBX, ESI; MOVAL, BYTE PTR MEM; and MOV DWORD PTR MEM, 10000h.

XCHG r/m, r

Exchange data between registers or between a register and the memory. The command for exchanging data between memory cells is not provided in the Intel processor instruction set.

BSWAP reg32

Swap bytes from the least significant— most significant order into the most significant-least significant order. Bits 7-0 exchange positions with bits 31 —24, and bits 15-8 exchange positions with bits 23-16. This command was in troduced in the Intel 486 processor.

MOVSXB r, r/m

Extend a byte to a word or double word with duplication of the sign bit and load it into the destination. For example: MOVSXB AX, BL and MOVSXB EAX, BYTE PTR MEM. The command was introduced in the Intel 386 processor.

MOVSXW r, r/m

Load the source word, extended to a double word with duplication of the sign bit, into the destination. For example: MOVSXW EAX, WORD PTR MEM. This command was introduced in the Intel 386 processor.

MOVZXB r, r/m

Load the source byte extended to a word or double word with duplication of the zero bit into the destination. For example: MOVSXB AX, BL and MOVSXB EAX, BYTE PTR MEM. This command was introduced in the Intel 386 processor.

MOVZXW r, r/m

Load the source word extended to a double word with duplication of the zero bit into the destination. For example: MOVZXW EAX, WORD PTR MEM. This command was introduced in the Intel 386 processor.

XLAT

Load a byte from the table in the data segment, the starting point of which is pointed by EBX (BX) into AL. The initial value of AL plays the role of the offset.

LEA r, m

Load the effective address, for example:

LEA EAX, MEM;

LEA EAX, [EBX]

This command is featured by certain "magic" properties that allow efficient arithmetic. For example, the LEA EAX, [EAX*8] command multiplies the contents of EAX by 8, and the LEA EAX, [EAX] [EAX*4] command multiplies the contents of EAX by 5. The LEA ECX, [EAX] [ESI+5] command is equivalent to the following three commands:

 MOV ECX, EAX ADD ECX, ESI ADD ECX, 5 

Note that the LEA command allows multiplying only by 2, 4, and 8; therefore, if it is necessary to use a different multiplier, multiplication must be combined with addition.

LDS r, m

Load the DS:reg pair from memory. In this case, the word (or double word) is first, and DS contains the next word.

LES r, m

Similar to the previous command but in relation to the ES:reg pair.

LFS r, m

Similar to the previous command but in relation to FS:reg.

LGS r, m

Similar to the previous command but in relation to GS:reg.

LSS r, m

Similar to the previous command but in relation to SS:reg.

Conditional settings of the first bit of the byte:

SETcc r/m

Check the cc condition. If this condition has been met, then the first bit of the byte is set to one; otherwise, this bit is set to zero. Conditions are similar to the ones used in conditional jumps (JE, JC). For example: SETE AL. This command was introduced in the Intel 386 processor. All variants of this command are as follows:

SETA/SETNBE — Set if greater.

SETAE/SETNB — Set if greater or equal.

SETB/SETNAE — Set if smaller.

SETBE/SETNA — Set if smaller.

SETC — Set if there is a carry.

SETE/SETZ — Set if zero.

SETG/SETNLE — Set if greater.

SETGE/SETNL — Set if greater or equal.

SETL/SETNGE — Set if smaller.

SETLE/SETNG -— Set if smaller or equal.

SETNC — Set if there is no carry.

SETNE/SETNZ — Set if smaller or equal.

SETNO — Set if there is no overflow.

SETNP/SETPO — Set if there is no parity.

SETNS — Set if there is no sign.

SETO — Set if there is overflow.

SETP/SETPE — Set if there is parity.

SETS — Set if there is a sign.

LAHF

Load flags into AH (obsolete).

SAHF

Save AH into the flags register (obsolete).

Conditional movings: CMOVX dest, src

CMOVA/CMOVNBE Move if greater.

CMOVAE/CMOVNB — Move if greater or equal

CMOVB/CMOVNAE — Move if smaller.

CMOVBE/CM0VNA — Move if smaller.

CMOVC — Move if there is carry.

CMOVE/CMOVZ — Move if zero.

CMPVG/CMOVNLE Move if greater.

CMOVGE/CMOVNL — Move if greater or equal.

CMOVL/CMOVNGE — Move if smaller.

CMOVLE/CMOVNG — Move if greater or equal.

CMOVNC — Move if there is no carry.

CMOVNE/CMOVNZ — Move if smaller or equal.

CMOVNO — Move if there is no overflow.

CMOVNP/CMOVPO — Move if there is no parity.

CMOVNS — Move if there is no sign.

CMOVO — Move if there is no overflow.

CMOVP/CMOVPE — Move if there is parity.

CMOVS — Move if there is a sign.

Table 1.3: Input/output commands

Command

Description

IN AL (AX, EAX) , Port

IN AL (AX, EAX) , DX

Load from the input/output port into the accumulator. The port is addressed directly through the Dx register.

OUT port, AL (AX, EAX)

OUT DX, AL (AX, EAX)

Output into the input/output port. The port is addressed directly through the Dx register.

[REP] INSB

[REP] INSW

[REP] INSD

Output the data from the port addressed by the Dx register into the following memory cell: ES:[EDI/DI]. After input of a byte, word, or double word, EDI/DI is corrected by 1, 2, or 4. If the REP prefix is present, the process continues until the contents of CX equal zero.

[REP] OUTSB

[REP] OUTSW

[REP] OUTSD

Output the data from the DS:[EST/SI] memory cell into the output port, the address of which is stored in the DX register. After output of a byte, word, or double word, the EST/SI pointer is corrected by 1, 2, or 4.

Table 1.4: Instructions for operations over the stack

Command

Description

PUSH r/m

Load a word or double word into the stack. Because the stack becomes unaligned by the double word boundary if a word is loaded into it, it is recommended to push double words into the stack anyway.

PUSH const

Load an immediate 32-bit operand into the stack.

PUSHA

Load the EAX, EBX, ECX, EDX, EST, EDI, EBP, and ESP registers into the stack. This command was introduced in the Intel 386 processor.

POP r/m

Retrieve a word or double word from the stack.

POPA

Retrieve the data from the stack into the EAX, EBX, ECX, EDX, EST, EDT, EBP, and ESP registers. The command was introduced in the Intel 386 processor.

PUSHF

Load the flags register into the stack.

POPF

Retrieve the flags register from the stack.

Table 1.5: Instructions for integer arithmetic

Command

Description

ADD dest, src

Add two operands. The first operand can be a register or memory cell, and the second operand can be a register, memory cell, or constant. If both operands are memory cells, this operation is impossible.

XADD dest, src

Exchange operands and then carry out the ADD operation. This command was introduced in the Intel 486 processor.

ADC dest, src

Add with the account of the carry flag; the carry flag is added to the least significant bit.

INC r/m

Increment the operand.

SUB dest, src

Subtract one operand from another operand. All other features are similar to the addition (the ADD command).

SBB dest, src

Subtract with the account of the carry bit. The carry bit (flag) is subtracted from the least significant bit.

DEC r/m

Decrement the operand.

CMP r/m, r/m

Compare (subtracts the operands without changing their values).

CMPXCHG r, m, a

Compare and exchange. This command accepts three operands (register-operand-source, memory cell-operand-destination, or accumulator; in other words, AL, AX or EAX). If the values in the destination operand and accumulator are equal, then the destination operand is replaced with the source operand, and initial value of the destination operand is loaded into the accumulator. This command was introduced in the Intel 486 processor.

CMPXCHG8B r, m, a

Compare and exchange 8 bytes. The command was introduced in the Intel Pentium processor. It compares the number stored in the EDX: EAX pair of registers with the 8-byte number in memory.

NEG r/m

Invert the operand sign.

AAA

ASCII adjust after addition. This command adjusts the result of addition as set by the American Standard Code for Information Interchange (ASCII) (binary addition of two unpacked BCDs). The AAA instruction must follow an ADD instruction that adds (binary addition) two unpacked BCDs and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register so that they contain the correct one-digit, unpacked BCD result.

If the addition produces a decimal carry, the AH register is incremented by one and the AL register is incremented by six (binary addition).

For example, assume that AX contains the 9H number. In this case, executing the ADD AL, 8 /AAA pair of commands results in AX containing 0107H, in other words, the 17 ASCII number.

AAS

ASCII adjust after subtraction. This operation adjusts the result of the subtraction of two unpacked BCDs to create an unpacked BCD result.[a]

If the subtraction produces a decimal carry, the AH register is decremented by one and the AL register is decremented by six (binary addition).

Consider the following example:

      MOV AX, 205H ; Load the 25 ASCII number.      SUB AL, 8    ; Binary subtraction      AAS 

As a result, AX will contain the 0107H code, in other words, unpacked BCD 17.

AAM

ASCII adjust after multiplication. This instruction adjusts the result of the multiplication of two unpacked BCDs to create a pair of unpacked (base 10) BCDs. For this command, it is assumed that the AX register contains the result of binary multiplication of two decimal system digits (ranging from 0 to 81). After completion of this operation, the AX register will contain a 2-byte product in ASCII format. It is assumed that the least significant digit is contained in AL and the most significant digit is contained in AH. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCDs and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register so that they contain the correct two-digit, unpacked (base 10) BCD result.

AAD

ASCII adjust before division. This command adjusts two un packed BCDs (the least significant digit in the AL register and the most significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AX register by an unpacked BCD. The AAD instruction sets the value in the AL register to (AL + (10*AH) ) and then clears the AH register to 00H. The value in the AX register then equals the binary equivalent of the original unpacked, two-digit (base 10 number in registers AH and AL.

DAA

Decimal adjust AL after addition. This operation adjusts the sum of two packed BCDs to create a packed BCD result and is only useful when it follows an ADD instruction that adds (binary addition) a pair of two-digit, packed BCDs and stores a byte result in the AL register. The DAA instruction then adjusts the contents of the AL register so that they contain the correct two-digit, packed BCD result.

DAS

Decimal adjust after subtraction. This instruction adjusts the result of the subtraction of two packed BCDs to create a packed BCD result and is only useful when it follows a SUB instruction that subtracts (binary subtraction) a single two-digit, packed BCD from another and stores a byte result in the AL register. The DAS instruction then adjusts the contents of the AL register so that they contain the correct two-digit, packed BCD result.

MUL r/m

Multiply AL(AX, EAX) by an unsigned integer number. The result will be contained in AX, DX:AX, EDX:EAX.

IMUL r/m

Perform signed multiplication (similar to MUL). All operands are considered signed. This instruction has three forms, depending on the number of operands. The one-operand form is identical to that used by the MUL instruction. The two-operand form is follows: IMUL r, src and r <- r*src. The three-operand form of this instruction is as follows:

IMUL cist, src, imm and dst <- src*imm.

DIV r/m (src)

Perform unsigned division. This operation is similar to unsigned multiplication. It divides the accumulator and its extension (AH:AL, DX:AX, EDX:EAX) by the divisor src. The quotient is then placed into the accumulator, and the remainder is saved in the accumulator extension.

IDIV r/m

Performs signed division. This is similar to unsigned division.

CBW

Convert a byte to a word (CBW). This instruction doubles the size of the operand through sign extension. It extends the byte (AL) into a word and copies the sign bit in the source operand into every bit in the AH register.

CWD

Convert a word to a double word. This instruction doubles the size of the source operand (AX) into the double word (DX:AX) and copies the sign bit (bit 15) of the word in the AX register into every bit of the Dx register.

CWDE

Convert a word to a double word. This instruction doubles the source operand (AX) through sign extension. This is similar to CWD but uses EAX as the destination.

CDQ

Convert a double word (EAX) to a quadword (EDX:EAX).

[a]Recall that ASCII numbers assume one digit is used per byte and BCD numbers assume that one digit is used per nibble (4 bits). In other words, the AX register can contain either a two-digit ASCII number or a four-digit BCD number.

Table 1.6: Logical operations

Command

Description

AND dest, src

Logical AND operation. This resets to zero every bit of dest, provided that the corresponding bit of src is zero.

TES Best, src

Similar to AND but does not change dest. This operation is used for checking whether there are nonzero bits.

OR dest, src

Logical OR. This sets to one all bits in dest, for which the corresponding bits in src are not zero.

XOR dest, src

Exclusive OR. Each bit of the result is one if the corresponding bits of the two operands are different; each bit is zero if the corresponding bits of the operands are the same.

NOT dest

Inverts the values of all bits.

Table 1.7: Shift operations

Command

Description

RCL/RCR dest, src

Rotate through carry left and rotate through carry right. These commands cyclically shift all bits of the source oper and to the left or right, including the carry flag, into rotation. Src may be either CL or the immediate operand.

ROL/ROR dest, src

Rotate left and rotate right. These commands are similar to RCL/RCR but use CF differently. CF doesn't participate in the cyclic shift, and its original value is not a part of the result. But CF receives a copy of the bit shifted from one end to the other.

SAL/SAR dest, src

Shift arithmetically left or right. In the right shift, the most significant bit is duplicated. In the left shift, the least significant bit is filled with zero. The "popped out" bit is loaded into CF.

SHL/SHR dest, src

Shift logically left or right. A logical shift right is different from SAR in that the most significant bit is also filled with zero.

SHLD/SHRD dest, src, count

Three-operand commands for left and right shifts. The first operand, as usual, can be either a register or a memory cell. The second operand must be a general-purpose register, and the third operand is either CL or the immediate operand. The essence of this operation is that dest and src are first joined and then shifted by the number of bits specified by count. The result is then placed into dest.

Table 1.8: String operations

Command

Description

REP

Repeat the string operation until ECX is reset to zero. There are several variations with this prefix, such as REPZ (REPS) for repeat until zero (zf = 1) and REPNZ (REPNE) for repeat as long as zero.

MOVS dest, src

Move a byte, word, or double word from the chain ad dressed by DS:[ESI] into the dest chain addressed by ES [EDI ] . The EDI and ESI registers are automatically corrected according to the value of the direction flag (DF). This command has the following variants: MOVSB (byte) for moving by single bytes, MOVSW (word) for moving by words, and MOVSD (double word) for moving by 4-byte blocks. Dest and src do not need to be specified explicitly.

LODS src

Load a string. This is the command for loading a string into an accumulator. The following variants of the command are available: lodsb, lodsw, and lodsd. When executing this command, a byte, word, or double word is loaded into AL, AX, or EAX, respectively. The ESI register is automatically changed by one, depending on the state of DF. The REP prefix is not used.

STOS dest

An inverse of LODS. In other words, this command passes a byte, word, or double word from an accumulator into the string and automatically corrects EDT.

SCAS dest

Scan a string. It subtracts a string element, dst, from the contents of an accumulator (AL, AX, EAX) and modifies flags. The REPNE prefix allows the required element within the string to be found.

CMPS dest, src

Compare strings. This command subtracts a byte, word, or double word of the dst string from the corresponding element of the src string. Flags are modified depending on the subtraction result. The EDI and ESI registers are automatically shifted to the next element. If the REPE prefix is used, the command continues comparison until the end of the string is reached or as long as elements are equal. If the REPNE prefix is used, the command continues comparison until the end of the string is reached or until elements are equal.

Table 1.9: Commands for operations over flags

Command

Description

CLC

Clear the carry flag in the EFLAGS register.

CMC

Complement the carry flag. This inverts CF.

STC

Set CF in the EFLAGS register.

CLD

Clear the direction flag. This resets DF to zero.

STD

Set DF in the EFLAGS register.

CLI

Clear the interrupt flag. This disables maskable hardware interrupts.

STI

Set the interrupt flag. This enables maskable hardware interrupts.

CTS

Reset the task switching flag.

Table 1.10: Control flow commands

Command

Description

JMP target

There are five forms of this command, differing by the distance of the destination and the current address and by the method of specifying the target address. When working in Windows, jumps within the limits of a 32-bit segment are mainly used (NEAR). The target address can be specified directly (by a label) or indirectly; in other words, this value can be stored in the memory cell or register (JMP [EAX] ).

JMP target

Another type of jump — a short jump — takes only 2 bytes. The range of the offset within which the jump takes place, is 128-127. The use of such jumps is limited.

An intersegment jump can appear as follows: JMP FWORD PTR L, where L is the pointer to the structure containing a 48-bit address, started with the 32-bit offset address and followed by a 16-bit selector (segment, call gateway, task state segment). Also, the following variant of intersegment jump is possible: JMP FWORD ES:[EDT].

Conditional jumps

JA/JNBE — Jump if above, jump if not below or equal.

JAE/JNB — Jump if above or equal, jump if not below.

JB/JNAE — Jump if below, jump if not above.

JBE/JNA — Jump if below or equal, jump if not above.

JC — Jump if there is a carry.

JE/JZ — Jump if equal, jump if zero.

JG/JNLE — Jump if greater, jump if not less or equal.

JGE/JNL — Jump if greater or equal, jump if not less.

JL/JNGE — Jump if less, jump if not greater or equal.

JLE/JNG — Jump if less or equal, jump if not greater.

JNC — Jump if there is no carry.

JNE/JNZ — Jump if not equal, jump if not zero.

JNO — Jump if there is no overflow.

JNP/JPO — Jump if there is no parity, jump if the parity is odd.

JNS — Jump if there is no sign.

JO — Jump if there is overflow.

JP/JPE — Jump if there is parity, jump if the parity is even.

JS — Jump if there is a sign.

JCXZ — Jump if CX equals zero.

JECXZ — Jump if ECX equals zero.

In the flat memory model, conditional jump commands carry out jumps within a 32-bit register.

Loop control; all commands of this group decrement the contents of the ECX register

LOOP — Perform a loop operation if ECX content does not equal zero.

LOOPS (LOOPZ) - Perform a loop operation if the contents of ECX do not equal zero and ZF equals one.

LOOPNE (LOOPNZ) - Perform a loop operation if the contents of ECX do not equal zero and ZF equals zero.

CALL target

Pass control to procedure (label) and saves the address that follows the CALL command into the stack. In the flat memory model, the return address is a 32-bit offset. An intersegment call requires both the selector and the offset to be pushed into the stack (in other words, a 48-bit value, where 16 bits are for the selector and 32 bits are for the offset).

RET [N]

Return from the procedure. An optional parameter, N, assumes that the command also automatically clears the stack (frees N bytes). There are several variants of the command that assembler chooses automatically, depending on the procedure type (NEAR or FAR). However, it is also possible to explicitly specify the return type (RETN or RETE). In the flat memory model, RETN with a 4-byte return address is used by default.

Table 1.11: Commands for supporting high-level programming languages

Command

Description

ENTER par1, par2

Prepare the stack when entering a procedure. The parl parameter shows the number of bytes for local variables within a procedure, and par2 specifies the nesting level of the procedure. When par2 equals zero, nesting is not allowed (this situation arises when programming in C language).

LEAVE

Exit a high-level procedure. This restores the original stack contents after executing the ENTER command.

BOUND r16, m1 6 or BOUND REG32, MEM32

Check the array index against the bounds. It is assumed that the register contains the current address of the array and that the second operand defines 2 words or 2 double words in the memory. The first argument is considered the minimum index value, and the second argument is the maximum index value. If the current index goes beyond these limits, then the INI 5 command is generated. These commands are used for control if the index falls within the specified range, which is important for debugging purposes.

Table 1.12: Interrupt commands

Command

Description

INT n

Call to the interrupt procedure. This is a 2-byte command. The contents of the flags register are pushed into the stack, followed by the fully qualified return address. In addition, the trap flag (TF) is reset. After this, an indirect jump through the nth element of the interrupt descriptor table is carried out to the interrupt handler. The 1-byte INT 3 command is named the debug exception handler and is actively used in debuggers.

INTO

Similar to the INT 4 command, provided that overflow flag equals one. If OF equals zero, the command doesn't carry out any actions.

IRET

Interrupt return. This command retrieves the return address and flags register from the stack and returns from the interrupt. The privilege level bit will be modified only if the current privilege level equals zero.

Table 1.13: Processor synchronization commands

Command

Description

HLT

Halt. This program stops instruction execution and switches the processor to the halt state. Processor can be switched to resume operation by an external interrupt.

LOCK

Assert LOCK# signal prefix. This is a bus locking prefix. It forces the processor to form the LOCK# signal for the time of execution of the command that follows the prefix. In a multiprocessor system, this signal blocks requests to the bus from other processors.

NOP

No operation.

WAIT (FWAIT)

Synchronize with the coprocessor. Most coprocessor commands handle this command automatically.

Table 1.14: Commands for processing chains of bits (introduced in the Intel 386 processor)

Command

Description

BSF (BSR) dest, src

Bit scan forward and bit scan reverse. Here, Dest is a 16-bit or a 32-bit register. Src is a register or a memory cell. When the BSF command is executed, the src operand is scanned starting from least significant bits. The BSR command scans starting from the most significant bits. The number of the first encountered bit set to one is placed into the dest register, and the zero flag is reset to zero. If src contains zero, then ZF equals one, and the contents of dest are undefined.

BT dest, src

Bit test. This selects the bit in the bit string specified by src at the bit position specified by dest and stores its value in the carry flag.

BTC dest, src

Bit test and complement. This selects the bit in the bit string specified by src at the bit position specified by dest, stores the bit value in CF, and complements the bit value in the bit string.

BTR dest, src

Bit test and reset. This selects the bit in the bit string specified by src at the bit position specified by dest, stores the bit value in CF, and resets the bit value in the bit string to zero.

BTS dest, src

Bit test and set. This selects the bit in the bit string specified by src at the bit position specified by dest, stores the bit value in CF, and sets the bit value in the bit string to one.

Table 1.15: Protection control commands

Command

Description

LGDT src

Load the value in the source operand into GDTR. Src is a 6-byte value (memory location).

SGDT dest

Store GDTR in the memory.

LIDT src

Load the value from the source operand into IDTR.

SIDT dest

Store IDTR in the memory.

LLDT src

Load the local descriptor table register (LDTR). This loads the source operand (16 bits) into the segment selector field of LDTR.

SLDT dest

Store LDTR. This stores the segment selector from LDTR in the destination operand. The destination operand can be a general-purpose register or a memory location (16 bits).

LMSW src

Load the machine status word (MSW). This loads the source operand into MSW, bits 0—15 of register CR0. The source operand can be a 16-bit general-purpose register or a memory location.

SMSW dest

Store MSW. This saves MSW into a register or memory location (16 bits).

LTR src

Load the task register (TR). This loads the source operand into the segment selector field of TR. The source operand (a general-purpose register or a memory location) contains a segment selector that points to a task state segment.

STR dest

Store TR. This stores the segment selector from TR in the destination operand. The destination operand can be a general-purpose register or a memory location (16 bits).

LAR dest, src

Load access rights byte. This loads the access rights from the segment descriptor specified by the second operand (src) into the first operand (dest) and sets ZF in the flags register.

LSL dest, src

Load segment limit. This loads the unscrambled segment limit from the segment descriptor specified with the second operand (source operand) into the first operand (destination operand) and sets ZF in the EFLAGS register. The source operand (which can be a register or a memory location) contains the segment selector for the segment descriptor being accessed. The destination operand is a general-purpose register.

ARPL r/m, r

Adjust RPL field of the segment selector. This compares RPL fields of two segment selectors, and if the RPL field of the destination operand is less than the RPL field of the source operand, ZF is set to one and the RPL field of the destination operand is increased to match that of the source operand.

VERR seg

Verify a segment for reading. This sets ZF to one if the task is allowed to read in the SEG segment.

VERW seg

Verify a segment for writing. This sets ZF to one if the task is allowed to write into the SEG segment.

Table 1.16: Commands for exchanging data with control registers

Command

Description

MOV CRn, src

Load src into the CRn control register.

MOV dest, CRn

Read from the CRn control register.

MOV DRn, src

Load src into the DRn debug register.

MOV dest, DRn

Read from the DRn debug register.

MOV TRn, src

Load src into the DRn test register.

MOV dest, TRn

Read from the TRn test register.

RDTSC

Read the timestamp counter. The TSC value is stored into the EDX:EAX pair of registers.

Table 1.17: Commands for identifying and controlling architecture

Command

Description

CPUID

CPU identification. This returns the processor identification information. It depends on the contents of the EAX register.

If EAX=0, the processor returns the string of characters con taining information about the manufacturer into the EBX, EDX, and ECX registers. For example, AMD processors return the AuthenticAMD string and Intel processors return the GenuineIntel string.

If EAX=1, the identification code is returned in the least significant word of the EAX register.

If EAX=2, processor configuration parameters are returned in the EAX, EBX, EcX, and EDX registers.

RDMSR r/m

Read from model-specific register (MSR) into ECX.

RDPMC

Read performance-monitoring counters. This places the value of one of the two programmable performance monitor ing counters into the EDX: EAX pair of registers. The choice of the counter depends on the contents of the ECX register.

WRMSR r/m

Write to the MSR. This writes the ECX contents into the MSR.

SYSENTER

Fast system call.

SYSEXIT

Exit from the system call.

Table 1.18: Caching control commands

Command

Description

INVD

Invalidate internal caches. This invalidates (flushes) the processor's internal caches and issues a special-function bus cycle that directs external caches to flush themselves. Data held in internal caches is not written back to main memory.

WBINVD

Write back and invalidate caches. This writes all modified cache lines and invalidates the caches.

INVLPG r/m

Invalidate TLB entry. This invalidates (flushes) the translation lookaside buffer (TLB) entry specified with the source.

1.2.3. Arithmetic Coprocessor Commands

In this section, I'll cover the main issues related to the operation of the arithmetic coprocessor.

Before the release of the Intel 80486 processor, coprocessors were supplied separately. Nowadays, the coprocessor is a built-in and integral part of the processor.

Structure and Operation

The arithmetic coprocessor operates over its own set of commands and over its own set of registers. However, command prefetching is carried out by the processor.

The arithmetic coprocessor carries out operations over the following data types: word (16 bits), short integer (32 bits), long word (64 bits), packed BCD (80 bits), short real number (32 bits), long real number (64 bits), and extended real number (80 bits). Formats, in which real numbers are stored, were considered in Section 1.1. In addition to normal numbers, some coprocessor operations might result in special cases.

Special Cases

The special cases that might occur as a result of coprocessor operations are as follows:

  • Positive zero — All bits are set to zero.

  • Negative zero — The sign bit equals one.

  • Positive infinity — The sign bit is set to zero, all bits of the mantissa are set to zero, and all bits of the exponent are set to one.

  • Negative infinity — The sign bit is set to one, all bits of the mantissa are set to zero, and all bits of the exponent are set to one.

  • Denormalized numbers — All bits of the exponent are set to zero.

  • Indefinite numbers — The sign bit is set to one, all bits of the exponent are set to one, the first bit of the mantissa is set to one (for an 80-bit number, the first 2 bits of the mantissa are set to one), and the other bits are zeros.

  • Signaling NaNs[4] (SNaNs) — All bits of the exponent are set to one, the first bit of the mantissa is zero (for an 80-bit number, the first 2 bits are one and zero), and there are ones among the other bits.

  • Quiet NaNs (QNaNs) — All bits of the exponent are set to one, the first bit of the mantissa is zero (for an 80-bit number, the first 2 bits of the mantissa are ones), and there are ones among the other bits of the mantissa.

  • Unsupported numbers do not correspond to standard numbers and are not described as special cases.

When the coprocessor executes an operation, the processor waits for this operation to complete. In other words, before each coprocessor command, the assembler automatically generates the command that checks whether the coprocessor is busy. If the coprocessor is busy, the processor is switched to the waiting state. Sometimes programmers need to manually insert the WAIT command after the coprocessor command.

Data Registers

The coprocessor has eight 80-bit data registers that represent a stack structure. These registers are also called the coprocessor stack. The registers are named R0-R7; however, they cannot be accessed directly. Each register can take any position in the stack. The names of the relative stack registers are ST(0)-ST(7).

There is also the status register (or the status word, SW), the flags of which allow you to assess the result of the completed operation. The control register (or the control word, CW) contains the bits that influence the result of execution of the coprocessor commands.

The tags register (or the tag word, TW) is made up of 16 bits describing the contents of the coprocessor registers — 2 bits per data register. The tag reflects the contents of the data register. Here are the tag values: 00 for a real nonzero number, 01 for true zero, 10 for special numbers, and 11 for no data.

In addition to the previously-listed register, the coprocessor has the FIP and FDP registers. The FIP register contains the address of the last executed command, except for FTNIT, FCLEX, FLDCW, FSTCW, FSTSW, FSTSWAX, FSTENV, FLDENV, FSAVE, FRSTOR, and FWATT. The FDP register contains the address of the command operand, except for the preceding commands.

When carrying out computations using a coprocessor command, the most important role is delegated to exceptions, also called special cases. A typical exception is division by zero. Exception bits are stored in the status register. Exceptions must be taken into account to obtain correct results.

Exceptions

The list of exceptions is as follows:

  • Incorrect result (rounding)

  • Invalid operation

  • Division by zero

  • Underflow (tiny result)

  • Overflow (too large result)

  • Denormalized operand

The Status Word

The coprocessor status word reflects its overall state. It includes the following bits:

  • Bit 0, invalid operation exception (IE) flag

  • Bit 1, denormalized operation exception (DE) flag

  • Bit 2, division by zero exception (ZE) flag

  • Bit 3, overflow exception (OE) flag

  • Bit 4, underflow exception (UE) flag

  • Bit 5, inexact result (precision) exception (PE) flag

  • Bit 6, stack fault exception (SF) flag

  • Bit 7, exception summary (ES) flag

  • Bits 8, 9, 10, and 14, condition flags (C0, C1, C2, and C3)

  • Bits 11-13, number (0-7) specifying which register is the top of the stack

  • Bit 15, FPU busy flag — Matches ES

The Control Word

The control word of the arithmetic coprocessor determines one of several available methods of processing numeric data. Bits of the control word (CW) are as follows:

  • Bit 0, invalid operation mask (IM)

  • Bit 1, denorm7alized operand mask (DM)

  • Bit 2, division by zero mask (ZM)

  • Bit 3, overflow mask (OM)

  • Bit 4, underflow mask (UM)

  • Bit 5, inexact result (precision) mask (PM)

  • Bits 6 and 7, reserved

  • Bits 8 and 9, precision control (PC)

  • Bits 10 and 11, rounding control (RC)

  • Bit 12, infinity control (IC)

  • Bits 13-15, reserved

The following are possible causes of exceptions:

  • Stack fault. The result is an indefinite number.

  • Operation over an unsupported number. The result is an indefinite number.

  • Operation over an SNaN. The result is a QNaN.

  • Comparison of a number with QNaN or SNaN. The result is C0=C2=C3=l.

  • Addition of infinities (the same sign) or subtraction of infinities (different signs). The result is an indefinite number.

  • Multiplication of infinity by zero. The result is an indefinite number.

  • Division of infinity by infinity or division of zero by zero. The result is an indefinite number.

  • FPREM and FPREM1 commands if the divisor is zero or if the dividend equals infinity. The result is an indefinite number, and C2=0.

  • Trigonometric operations over infinity. The result is an indefinite number, and C2=0.

  • Root or logarithm operations if the argument is negative. The result is an indefinite number.

  • FBSTP command if the source register is empty, contains a QNaN or SNaN, contains infinity, or is more than 18 decimal characters in length. The result is an indefinite number.

  • FXCH if one of the operands is empty. The result is an indefinite number.

Coprocessor Commands

Tables 1.19-1.23 provide a complete list of the FPU commands and a brief description of the operations they carry out.

Table 1.19: Data exchange commands

Command

Description

FLD src

Load a real number into ST(0) (stack top) from the memory location. In this case, ST(0)->ST(1). The memory location might be 32 bits, 64 bits, or 80 bits. The FLD ST(0) command duplicates the stack top.

FILD src

Load an integer number from the memory into ST(0). In this case, ST(0)->ST(1). The memory area can be 16 bits, 32 bits, or 64 bits.

FBLD src

Load a BCD into ST(0) from an 80-bit memory area.

FLDZ

Load 0 into ST(0).

FLD1

Load 1 into ST(0).

FLDPI

Load PI into ST(0).

FLDL2T

Load LOG2(10) into ST(0).

FLDTL2E

Load LOG2(e) into ST(0).

FLDLG2

Load LG(2) into ST(0).

FLDLN2

Load LN(2) into ST(0).

FST dest

Write a real number from ST(0) into the memory. The memory area might be 32 bits, 64 bits, or 80 bits.

FSTP dest

Write a real number from ST(0) into the memory. The memory area might be 32 bits, 64 bits, or 80 bits. In this case, the stack top is popped from the stack.

FBST dest

Write a BCD into the memory. The memory area is 80 bits.

FBSTP dest

Write a BCD into the memory. The memory area is 80 bits. In this case, the stack top is popped from the stack.

FXCH st(i)

Exchange the values of the stack top and the i register. If the operand is not specified, then ST(0) and ST(1) are ex changed.

FCMOVc dest, src

Move conventional data. This command copies ST(i) (src) into ST(0) (dest). There are the following forms of this command:

  • FCMOVE — Copy if equal (ZF = 1)

  • FCMOVE —Copy if not equal (ZF = 0)

  • FCMOVB — Copy if below (CF = 1)

  • FCMOVBE — Copy if below or equal (CF = 1 or ZF = 1)

  • FCMOVBE —COPY if not below or equal (CF = 1 or ZF = 1)

  • FCMOVNB —COPY if not below (CF = 0)

  • FCMOVNBE —Copy if unordered (incomparable) (PF = 1)

  • FCMOVU —Copy if not unordered (comparable) (PF = 0)

Table 1.20: Data comparison commands

Command

Description

FCOM

Compare two real numbers, ST(0) and ST(1) . Flags are set the same way as for the subtraction operation: ST(0) — ST(1).

In this command and further on (up to the FCOMI command), the C0, C2, and C3 flags are set as follows:

ST(0)>src C0 = 0, C2 = 0, C3 = 0

ST(0)<src C0 = l, C2 = 0, C3 = 0

ST(0)=src C0 = 0, C2 = 0, C3 = 1

If operands are unordered (cannot be compared), then

CO = C2 = C3 = 1.

FCOM src

Compare ST(0) with the operand contained in the memory. The operand might be 32 bits or 64 bits.

FCOMP src

Compare the real number in ST(0) with the operand in memory. The ST(0) is popped from the stack. The operand might be a register or memory area.

FCOMPP

Compare ST(0) and ST(1). Two registers are popped from the stack.

FICOM src

Compare an integer number in ST(0) with the operand. The operand might be either 16 bits or 32 bits.

FICOMP src

Compare an integer number in ST(0) with the operand. The operand might be a 16-bit or 32-bit memory area or a register. In the course of this operation, ST(0) is popped from the stack.

FTST

Test whether ST(0) equals zero.

FUCOM ST(i)

Make an unordered comparison of ST(0) with ST(i).

FUCOMP ST(i)

Make an unordered comparison of ST(0) with ST(i). In the course of this operation, the stack is popped.

FUCOMPP ST(i)

Make an unordered comparison of ST(0) with ST(i). In the course of this operation, the stack is popped twice.

FCOMT src

Compare and set flags. The four commands (FXAM) have the following influence on the bits of the flags register:

ST(0) >src ZF=0, PF=0, CF = 0

ST(0) <src ZF=0, PF=0, CF = 1

ST(0) =src ZF=1, PF=0, CF = 0

If the operands are unordered, then all three flags are set to one.

FCOMIP src

Compare, set bits, and pop.

FUCOMI src

Make an unordered comparison and set flags.

FUCOMIP src

Make an unordered comparison, set flags, and pop.

FXAM

Analyze the contents of the stack top. The result is stored into bits C3, C2, and C0 as follows:

000 — Unsupported format

001 — NaN

010 — Normalized number

0ll — Infinity

100 — Zero

101 — Blank operand

110 — Denormalized number

Table 1.21: Arithmetic commands

Command

Description

FADD src

Add the floating point number:

FADD ST(i), ST

ST(0) <- ST(0) + src, where src is a 32-bit or 64-bit number ST(i) <- ST(i) + ST{0)

FADDP ST(i), ST

Add the floating point number: ST(i) <- ST(i) + ST(0). In the course of this operation, the stack is popped.

FIADD src

Add the integer number: ST(0) <- ST(0) + src, where src is a 16-bit or 32-bit number.

FSUB src

Subtract the floating point number:

FSUB ST(i), ST

ST(0) <-ST(0) - src, where src is a 32-bit or 64-bit number ST(i) <- ST(i) - ST(0)

FSUBP ST(i), ST

Subtract the floating point number: ST(i) <- ST(i) - ST(0). When carrying out this operation, the stack is popped.

FSUBR ST(i), ST

Subtract the floating point number reverse: ST{0) <- ST(i) - ST(0).

FSUBRP ST(i), ST

Subtract the floating point reverse and pop ST(0) <-ST(i) -ST(0). When carrying out this operation, the stack is popped.

FISUB src

Subtract integer numbers: ST(0) <- ST(0) - src, where src is a 16-bit or 32-bit number.

FISUBR src

Subtract integer numbers and pop ST( 0 ) <- ST( 0 ) - src, where src is a 16-bit or 32-bit number. When carrying out this operation, the stack is popped.

FMUL

Multiply the floating point number:

FMUL ST(i)

The first case: ST(0) <- ST(0)*ST(1)

FMUL ST(i), ST

The second case: ST(0) <- ST(i)*ST(0)

The third case: ST(i) <- ST(i)*ST(0)

FMULP ST(i), ST(0)

Multiply the floating point and pop ST(i) <- ST(i) *ST(0) . When carrying out this operation, the stack is popped.

FIMUL src

Multiply ST(0) by an integer number: ST(0) <- ST(0) *src. The operand might be a 16-bit or 32-bit number.

FDIV

FDIV ST(i)

ST(0) <- ST(0)/ST(1)

ST(0) <- ST(0)/ST(i)

ST(i) <- ST(0)/ST(i)

FDIV ST(i), SY

FDIVP ST(i), ST

Divide the floating point numbers and pop: ST(i) <- ST(0) / ST(i). When carrying out this operation, the stack is popped.

FIDIV src

Divide integer numbers: ST(0) <- ST(i) /src. The divisor might be a 16-bit or a 32-bit number.

FDIVR ST(i), ST

Divide the floating point numbers: ST(0) <- ST(i) /ST(0).

FDIVRP ST(i), ST

Divide the floating point numbers reverse and pop:

ST(0) <- ST(i)/ST(0). When carrying out this operation, the stack is popped.

FIDIVR src

Divide integer numbers reverse: ST(0) <- src/ST(0).

FSQRT

Extract the square root from ST(0) and store back.

FSCALE

Scale by a power of two: ST(0) <- ST(0) *2 ^ST(1).

EXTRACT

Extract the exponent and mantissa from the number ST(0). The exponent will be stored in ST(0), and the mantissa will be in ST(1).

FPREM

Find the remainder from the division:

ST(0) <- ST(0)MOD(ST(1) ).

FPREM1

Find the remainder from the division according to the IEEE standard.

FUNDINT

Round to the nearest integer number stored in ST(0):

ST(0) <-int(ST(0)).

FABS

Find the absolute value: ST(0) <- ABS (ST(0)).

FCSH

Invert the sign: ST(0) <- (-ST(0)).

Table 1.22: Transcendental functions

Command

Description

FCOS

Compute the cosine: ST(0) <-COS (ST(0)). The contents of ST( 0) are interpreted as an angle measured in radians.

FPTAN

Compute the partial tangent. The contents of ST(0) are interpreted as an angle in radians. The tangent value is returned to the place of the argument, then the value of one is pushed into the stack.

FPATAN

Compute the arctangent. The following function is computed:

Arctg(ST(1)/ST(0))

After the computation, the stack is popped and the result goes to the top of the stack.

FSIN

Compute the cosine: ST(0) <- sin (si (0) ). The contents of ST(0) are interpreted as an angle in radians.

FSINCOS

Compute sine and cosine: ST(0) <- sin (ST(0)) and ST(1) <-COS(ST(0)).

F2XM1

Compute 2^X- 1: ST(0) <-2^ST(0) - 1.

FYL2X

Compute Y*LOG2(X):ST(0) = Y, ST(1) = X. When this function is executed, the stack is popped and the result is pushed into the stack top.

FYL2XP1

Compute Y*LOG2(X):ST(0) = Y, ST(1) = x. When this function is executed, the stack is popped and the result is pushed into the stack top.

Table 1.23: Coprocessor control commands

Command

Description

FINIT

Initialize the coprocessor.

FNINIT

Initialize the coprocessor without waiting.

FSTSW AX

Write the status word into AX (SW -> AX)

FSTSW dest

Write the status word into deST(16 bits).

FNSTSW dest

Save the status word into deST(16 bits).

FLDCW src

Load the control word (16 bits) from dest.

FSTCW dest

Save the control word into dest.

FCLEX

Clear FPU exception flags after checking for error conditions.

FNCLEX

Clear FPU exception flags without checking for error conditions.

FSTENV dest

Store the FPU environment (SW, CW, TAGW, FIP, FDP) in the memory after checking for error conditions.

FNSTENV dest

Store the FPU environment (SW, CW, TAGW, FIP, FDP) in the memory without checking for error conditions.

FLDENV src

Load the FPU environment from the memory.

FSAVE dest

Save the FPU state (SW, CW, TAGW, FIP, FDP) in the memory after checking for error conditions.

FNSAVE dest

Save the FPU state (SW, CW, TAGW, FIP, FDP) in the memory without checking for error conditions.

FHSTOR src

Restore the FPU state.

FINCSTP

Increment the FPU register's stack pointer.

FDECSTP

Decrement the FPU register's stack pointer.

FFREE ST(i)

Free the FPU register. Label ST(i) as free.

FNOP

FPU has no operation.

WAIT (FWAIT)

Instruct the processor to wait for FPU to complete the current operation.

1.2.4. MMX Extension

MMX Architecture

The MMX extension is mainly oriented toward use in multimedia applications. The main idea of MMX consists of simultaneous processing of several data elements per instruction. The MMX extension was introduced in the Pentium P54C modification of the Intel Pentium processor and is present in all later modifications of this processor.

The MMX extension uses new types of packed data: packet bytes (8 bytes), packed words (4 words), packed double words (2 double words), and quadwords. As you can see, these are 64-bit numbers. The MMX extension includes eight general-purpose registers (designated as MM0-MM7). The size of these registers is 64 bits. Physically, these registers are used by the least significant bits of the FPU data registers (R0-R7). MMX commands "spoil" the status register and the tags register. Therefore, combined use of MMX commands and coprocessor commands might cause certain difficulties. In other words, before you use MMX commands, you'll have to save the coprocessor context, which can considerably slow the operation of your program. Also, it is important to note that MMX commands operate directly of coprocessor registers, not over the pointers to the stack elements.

MMX Instructions

MMX instructions are briefly outlined in Tables 1.24 and 1.25.

Table 1.24: MMX extension commands

Command

Description

EMMS

Clear the registers stack. This sets all bits of the tags word to one.

MOVD mm, m32/ir32

Move the data into the 32 least significant bits of an MMX register and fill the most significant bits with zeros.

MOVD m32/ir32, mm

Move the data from the 32 least significant bits of an MMX register.

MOVQ mm, mm/m64

Move the data into an MMX register.

MOVQ mm/m64, mm

Move the data from an MMX register.

PACKSSDW mm, mm/m64

Pack double words into words with signed saturation. This command packs, with signed saturation, 2 double words in mm and 2 double words in mm/m64 into 4 double words in mm. In other words, this command copies 2 double words from mm into the 2 least significant words of mm and 2 double words from mm/m64 into the 2 most significant words. If the value of some double word happens to be greater than 32,767 or less than -32,768, then 32,767 and -32,768, respectively, will be written into the double words.

PACKSSWB mm, mm/m64

Pack words into bytes with signed saturation. This command packs, with signed saturation, 4 words in mm and 4 words in mm/m64 into 8 bytes in mm. In other words, 4 words from mm are converted into the 4 least significant bytes of mm, and 4 words from irm/m64 are converted into the 4 most significant bytes. If the value of some word happens to be greater than 127 or less than -128, then 127 and -128, respectively, will be placed into the bytes.

PACKUSWB mm, mm/m64

Pack and saturate 4 signed words from the destination operand (first operand) and 4 signed words from the source operand (second operand) into 8 unsigned bytes in the destination operand, lif the signed value of a word is beyond the range of an unsigned byte (that is, greater than 255 or less than 0), the saturated byte value of 255 or 0, respectively, is stored in the destination.

PADDB mm, mm/m64

PADDW mm, mm/m64

PADDD mm, mm/m64

Add the individual data elements (bytes, words, or double words) of the source operand (second operand) to the individual data elements of the destination operand (first operand). If the result of an individual addition exceeds the range for the specified data type (overflows), the result is wrapped around it, meaning that the result is truncated so that only the lower (least significant) bits of the result are returned (that is, the carry is ignored).

PADDSB mm, mm/m64

PADDSW mm, mm/m64

Add packed bytes (words) with sign saturation.

PADDUSB mm, mm/m64

PADDUSW mm, mm/m64

Add packed bytes (words) with unsigned saturation.

PAND mm, mm/m64

Perform the logical AND operation.

PANDN mm, mm/m64

Perform the logical AND NOT operation. This performs a bitwise logical NOT on the quadword destination operand (first operand). Then, the instruction performs a bitwise logical AND operation on the inverted destination operand and the quadword source operand (second operand). Each bit of the result of the AND operation is set to one if the corresponding bits of the source and inverted destination bits are one; otherwise, it is set to zero. The result is stored in the destination operand location.

PCMPEQB mm, mm/m64

PCMPEQD mm, mm/m64

PCMPEQW mm, mm/m64

Packed compare for equal. This compares the individual data elements (bytes, words, or double words) in the destination operand (first operand) to the corresponding data elements in the source operand (second operand). If two data elements are equal, the corresponding data element in the destination operand is set to all ones (true); otherwise, it is set to all zeros (false). The destination operand must be an MMX register; the source operand may be either an MMX register or a 64-bit memory location.

PCMPGTB mm, mm/m64

PCMPGTD mm, mm/m64

PCMPGTW mm, mm/m64

Packed compare for greater than. This compares the individual signed data elements (bytes, words, or double words) in the destination operand (first operand) to the corresponding signed data elements in the source operand (second operand). If a data element in the destination operand is greater than its corresponding data element in the source operand, the data element in the destination operand is set to all ones (true); otherwise, it is set to all zeros (false). The destination operand must be an MMX register; the source operand may be either an MMX register or a 64-bit memory location.

PMADDWD mm, mm/m64

Packed multiply and add. This multiplies the individual signed words of the destination operand by the corresponding signed words of the source operand, producing 4 signed, double word results. The 2 double word results from the multiplication of the high-order words are added together and stored in the upper double word of the destination operand; the 2 double word results from the multiplication of the low-order words are added together and stored in the lower double word of the destination operand. The destination operand must be an MMX register; the source operand may be either an MMX register or a 64-bit memory location.

PMULHW mm, mm/m64

Packed multiply higher. This multiplies the 4 signed words of the source operand (second operand) by the 4 signed words of the destination operand (first operand), producing 4 signed, double word, intermediate results. The high-order word of each intermediate result is then written to its corresponding word location in the destination operand. The destination operand must be an MMX register; the source operand may be either an MMX register or a 64-bit memory location.

PMULLW mm, mm/m64

Packed multiply low. This multiplies the 4 signed or unsigned words of the source operand (second operand) with the 4 signed or unsigned words of the destination operand (first operand), producing four double word, intermediate results. The low-order word of each intermediate result is then written to its corresponding word location in the destination operand. The destination op-erand must be an MMX register; the source operand may be either an MMX register or a 64-bit memory location.

POR mm, mm/m64

Bitwise logical OR.

PSHIMD mm, imm

PSHIMQ mm, imm

PSHIMW mm, imm

PSHIMW mm, irrm

PSHIMD represents the PSLLD, PSRAD, and PSRLD instructions with the immediate operand (a counter).

PSHIMQ represents the PSLLQ and PSRLQ instructions with the immediate operand (a counter).

PSHIMW represents the PSLLW, PSRAW, and PSRLW instructions.

PSLLD mm, mm/m64

PSLLQ mm, mm/m 64

PSLLW mm, mm/m 64

Packed shift left logical. This shifts the bits in the data elements (words, double words, or a quadword) in the destination operand (first operand) to the left by the number of bits specified in the unsigned count operand (second operand). The result of the shift operation is written to the destination operand. As the bits in the data elements are shifted left, the empty low-order bits are cleared (set to zero). If the value specified by the count operand is greater than 15 (for words), 31 (for double words), or 63 (for a quadword), then the destination operand is set to all zeros.

PSRAD mm, mm/m64

PSRAW mm, mm/m64

Packed shift right arithmetic. This shifts the bits in the data elements (words or double words) in the destination operand (first operand) to the right by the amount of bits specified in the unsigned count operand (second operand). The result of the shift operation is written to the destination operand. The empty high-order bits of each element are filled with the initial value of the sign bit of the data element. If the value specified by the count operand is greater than 15 (for words) or 31 (for double words), each destination data element is filled with the initial value of the sign bit of the element.

PSRLD mm, mm/m64

PSRLQ mm, mm/m64

PSRLW mm, mm/m64

Packed shift right logical. This shifts the bits in the data elements (words, double words, or quadwords) in the destination operand (first operand) to the right by the number of bits specified in the unsigned count operand (second operand). The result of the shift operation is written to the destination operand. As the bits in the data elements are shifted right, the empty high-order bits are cleared (set to zero). If the value specified by the count operand is greater than 15 (for words), 31 (for double words), or 63 (for a quadword), then the destination operand is set to all zeros.

PSUBB mm, mm/m64

PSUBW mm, mm/m64

PSUBD mm, mm/m64

Packed subtract. This subtracts the individual data elements (bytes, words, or double words) of the source operand (second operand) from the individual data elements of the destination operand (first operand). If the result of the subtraction exceeds the range for the specified data type (overflows), the result is wrapped around. This means that the result is truncated so that only the lower (least significant) bits of the result are returned (that is, the carry is ignored).

PSUBSB mm, mm/m 64

PSUBSW mm, mm/m64

Packed subtract with saturation. This subtracts the individual signed data elements (bytes or words) of the source operand (second operand) from the individual signed data elements of the destination operand (first operand). If the result of the subtraction exceeds the range for the specified data type, the result is saturated. The destination operand must be an MMX register; the source operand can be either an MMX register or a quadword memory location.

PSUBUSB mm, mm/m64

PSUBUSW mm, mm/m64

Packed subtract unsigned with saturation. This subtracts the individual unsigned data elements (bytes or words) of the source operand (second operand) from the individual unsigned data elements of the destination operand (first operand). If the result of the individual subtraction exceeds the range for the specified unsigned data type, the result is saturated (the minimal number — zero — is used as the result).

PUNPCKHBW mm, mm/m64

Interleave the 4 high-order bytes of the source operand and the 4 high-order bytes of the destination operand and write them to the destination operand.

PUNPCKHWD mm, mm/m64

Interleave the 2 high-order words of the source operand and the 2 high-order words of the destination operand and write them to the destination operand.

PUNPCKHDQ mm, mm/m64

Interleave the high-order double word of the source operand and the high-order double word of the destination operand and write them to the destination operand.

PUNPCKLBW mm, mm/m64

Unpack the low-order bytes of the source operands and interleave them with the low-order bytes of the destination operand.

PUNPCKLWD mm, mm/m64

Unpack the low-order words of the source operand and interleave them with the low-order words of the destination operand.

PUNPCKLDQ mm, mm/m64

Unpack the low-order double words of the source operand and interleave them with the low-order double words of the destination operand.

PXOR mm, mm/m64

Exclusive OR.

Table 1.25: New MMX commands

Command

Description

PADDQ xmm, xmm/m128

Add 128-bit operands.

PSUBQ xmm, xmm/m128

Subtract 128-bit operands.

PMULUDQ xmm, xmm/m128

Multiply 64-bit operands. The result must not exceed 128 bits.

PSLLDQ xmm, imm

Shift left logical the double quadword. This shifts the contents of the source operand to the left by the amount of bytes specified by an immediate operand (imm x 8 bits).

PSRLDQ xmm imm

Shift right logical the double quadword. This shifts the contents of the source operand to the right by the amount of bytes specified by an immediate operand (imm x 8 bits).

PSHUFHW xmm, xmm/m128, imm

Shuffle the packed high words. This instruction shuffles the word integers packed into the high quadword of the source operand and stores the shuffled result in the high quadword of the destination operand. An 8-bit immediate operand specifies the shuffle order.

PSHUFLW xmm/ml28, imm

Shuffle the packed low words. The PSHUFLW instruction copies words from the low quadword of the source operand (second operand) and inserts them in the low quadword erf the destination operand (first operand) at word locations selected with the order operand (third operand).

PSHUFD xmm, xmm/m128, imm

Shuffle the packed double words. This copies double words from source operand (second operand) and inserts them into the destination operand (first operand) at the locations selected with the order operand (third operand).

PUNPCKHQDQ xmm, xmm/m128

Unpack the high quadwords. This instruction interleaves the high quadword of the source operand and the high quadword of the destination operand and writes them to the destination register.

PUNPCKLQDQ xmm, xmm/m128

Unpack the low quadwords. This instruction interleaves the low quadwords of the source operand and the low quad-words of the destination operand and writes them to the destination register.

MOVDQ2Q mm, xmm

Move the quadword integer from an XMM to an MMX register. This instruction moves the low quadword integer from an XMM source register to an MMX destination register.

MOVQ2DQ xmm, mm

Copy the content of the mm register into the least significant half of xmm. The MOVQ2DQ (move quadword integer from an XMM to an MMX register) instruction moves the quadword integer from an MMX source register to an XMM destination register.

MOVNTDQ m128, xmm

Store the double quadword using a nontemporal hint. This instruction stores packed. The address must be aligned to a 16-byte boundary.

MOVDQA xmm/m128

MOVDQA xmm/m128, xmm

Move the aligned double quadword. The MOVDQA instruction transfers a double quadword operand from memory to an XMM register, or vice versa. Alternatively, it transfers it between XMM registers. The memory address must be aligned to a 16-byte boundary.

MOVDQU xmm, xmm/m128

MOVDQU xmm/m128, xmm

Move the unaligned double quadword. This instruction performs the same operations as the MOVDQA instruction, except that 16-byte alignment of a memory address is not required.

MOVMSKPD r32, xmm

Extract the sign mask from two packed, double-precision, floating-point values. This copies the values of sign bits (63 and 127) into bits 0 and 1 of the r32 register. Other bits are cleared.

MASKMOVDQU xrrm, xmm

Store selected bytes from the source operand (first operand) into a 1 28-bit memory location. The mask operand (second operand) selects, which bytes from the source operand are written to memory. The source and mask operands are XMM registers. The location of the first byte of the memory location is specified by DI/EDI and DS registers. The memory location does not need to be aligned on a natural boundary. (The size of the store address depends on the address-size attribute.)

New MMX Instructions

With the release of the Pentium 4 processor, previously-listed instructions of the MMX group have gained access to 128-bit registers (xmm). Table 1.25 lists new MMX instructions.

[4]NaN stands for "not a number." NaNs are nonnumbers; they are not part of the real number set. The encoding space for NaNs in floating-point format is beyond the ends of the real number line. This space includes any value with the maximum allowable biased exponent and a nonzero fraction (the sign bit is ignored for NaNs).




Disassembling Code. IDA Pro and SoftICE
Disassembling Code: IDA Pro and SoftICE
ISBN: 1931769516
EAN: 2147483647
Year: 2006
Pages: 63
Authors: Vlad Pirogov

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net