Chapter 1: Introduction | 32/64-Bit 80x86 Assembly Language Architecture

Overview

When the processor manufacturer Intel is mentioned, two 64-bit processors come to mind: EM64T and the Itanium. For AMD: the AMD64. Non-80x86 manufacturers discovered years ago that competing against an established desktop market is difficult to impossible . The key to successful market injection is being able to run a large quantity of pre-existing applications. Intel and AMD have built their business upon this by periodically creating superset instruction sets for their 80x86 processors so that pre-existing software still runs on the new equipment and new software can be written for the new equipment.

The technology has been forked into two 64-bit paths. One uses the Itanium-based platform with a new 64-bit primary instruction set that belongs to the IA-64 family. The other is a superset to the IA-32, referred to as the Extended Memory 64 Technology (EM64T). Newer P4 and Xeon processors are of this alternate type.

This book targets the AMD32/64, IA-32, and EM64T processor technology. It is not written for the Itanium series. (Look for a future book to cover the Itanium processor.) The EM64T supports a new superset instruction set, SSE-3, and 64-bit extensions to the IA-32 general-purpose instruction set. It also allows 64-bit operating systems such as Windows XP Professional x64 and Windows Server 2003 x64 editions to run both 64-bit and 32-bit software on the same machine.

This book can be used for both 32-bit and 64-bit instruction sets, but there is an operating system application dependency that needs to be followed.

Operating System	App(32-bit) IA-32	App (64-bit) AMD64/EM64T
Win9X (32-bit)	X
WinXP (32-bit)	X
Win2K (32-bit)	X
Win2003 (32-bit)	X
XP X64 (64-bit)	X	X
Win Server 2003 X64	X	X

The 80x86 processor has joined the domain of the super computer since the introduction of the SIMD (single instruction multiple data) such as Intel's Pentium III used in the Xbox, and all other x86s including the Pentium IV and AMD's 3DNow! extension instructions used in PCs. And now they are available in 64 bit. Both fixed-point (inclusive of integer) and floating-point math are being used by the computer, video gaming, and embedded worlds in assembly and vector-based operations.

3D graphic rendering hardware has been going through major increases in the numbers of polygons that can be handled by using geometry engines as part of their rendering hardware to accelerate the speed of mathematical calculations. There is also the recent introduction of the programmable vertex and pixel shaders built into newer video cards that use this same vector functionality. (This is another type of assembly language programming. For more information on shaders read my book Learn Vertex and Pixel Shader Programming with DirectX 9 .) These work well for rendering polygons with textures, depth ordering Z-buffers or W-buffers, and translucency controlled alpha channels with lighting, perspective correction, etc., at relatively high rates of speed. The problem is that the burden of all the other 3D processing, culling, transformations, rotations , etc., are put on the computer's central processing unit (CPU), which is needed for artificial intelligence (AI), terrain following, landscape management, property management, sound, etc. Fortunately for most programmers, a continuous growth market of middle-ware providers is developing key building blocks such as the Unreal 3D rendering libraries and physics packages such as Havok. Whether you are looking to become employed by these companies and generate this technology or merely one who wishes to use these libraries, you should keep in mind that the introduction of new hardware technology has created a surplus of CPU processor power that can now be used to fulfill aspects of your programming projects as well as develop new technologies. All of this creates openings for programmers needing to write assembly language, whether using a scalar or parallel architecture.

There are perhaps only two reasons for writing code in assembly language: writing low-level kernels in operating systems and writing high-speed optimized critical code. A vector processor can be given sequences and arrays of calculations to perform to enhance the performance above that of scalar operations that high-level compilers typically generate during a compile.

Hint	Check out the following web site for additional information, code, links, etc., related to this book: http://www.leiterman.com/books.html.

There are exceptions to this as some vector compilers do exist but as of yet have not been adopted into the mainstream marketplace . These are well worth investigating if you are in need of high-level C code that takes advantages of SIMD instruction sets.

One other item to keep in mind is that if you understand this information, it may be easier for you to get a job in the game or embedded software development industry. This is because you will have enhanced your programming foundation and possibly have a leg up on your competition. Even if you rarely program in 80x86 assembly language, peeking at the disassembly output of your high-level compiler while debugging your application can give you insight into code bloat due to your coding methodology and you will better be able to resolve some of the weird bugs you encounter in your applications.

Goal	A better understanding of 80x86 assembly.

I know how a number of you like technical books to be like a resource bible, but I hate for assembly books (no matter how detailed) to be arranged in that fashion, because:

It takes me too long to find what I am looking for!
They almost always put me to sleep!

This book is not arranged like a bible, but it contains the same information. By using the instruction mnemonic lookup in Appendix B, it becomes an abstracted bible. It is instead arranged in chapters of functionality. If you want that bible-like alpha-sorted organization, just look at the index or Appendix B of this book, scan for the instruction you are looking for, and turn to the page.

Info	Appendix B is the master instruction index listing and what processors support it.

I program multiple processors in assembly and occasionally have to reach for a book to look up the correct mnemonic. Quite often my own books! Manufacturers almost always seem to camouflage those needed instructions. As an example, mnemonics shifting versus rotating can be located all over the place in a book. For example, in the 80x86, {psllw, pslld, psllq, , shld, shr, shrd} are mild cases due to the closeness of their spellings, but for Boolean bit logic, {andor, pandxor} are all over the place in an alphabetical arrangement. When grouped in chapters of functionality, however, one merely turns to the chapter related to what functionality is required and then leafs through the pages. For these examples, merely turn to Chapter 4, "Bit Mangling" or Chapter 5, "Bit Wrangling." Okay, okay, so I had a little fun with the chapter titles, but there is no having to wade through pages of extra information trying to find what you are looking for. In addition (not meant to be a pun), there are practical examples near the descriptions as well as in Chapter 19, which are even more helpful in jogging your memory as to an instruction's usage. Even the companion code for this book uses this same orientation.

The examples are for the 80x86. I tried to minimize printed computer code as much as possible so that the pages of the book do not turn into a mere source code listing! Hopefully I did not overtrim and make it seem confusing. If that occurs, merely open your source code editor or integrated development environment (IDE) to the chapter and project in the accompanying code related to that point in the book you are trying to understand. By the way, if you find a discrepancy between the code and the book, you should favor the code as the code in the book was cut and pasted from elements of code that could be lost during the editing process.

The book is also written in a friendly style so as to occasionally be amusing and thus help you in remembering the information over a longer period of time. What good is a technical book that is purely mathematical in nature, difficult to extract any information from, and just puts you (I mean me) to sleep? You would most likely have to reread the information again once you woke up! The idea is that you should be able to sit down in a comfortable setting and read the book cover to cover to get a global overview. (My favorite place to read is in a lawn chair on the back patio with a wireless laptop.) Then go back to your computer and, using the book as a tool, implement what you need or cut and paste into your code. But use at your own risk! You should use this book as an appendix to more in-depth technical information to gain an understanding of that information.

An attempt was made to layer the information so you would be able to master the information at your skill level. In regard to cutting and pasting: You will find portions of this book also occur inside one of my other published books: Vector Game Math Processors . There is a degree of overlap, but this book is to be considered the prequel and a foundation for that book. Any duplication of information between the two has been enhanced in this book as it is now almost three years later and the technology has been extended.

The code is broken down by platform, chapter, and project, but most of the code has not been optimized. This is explained later but briefly , optimized code is difficult to read and understand. For that reason, I tried to keep this book as clear and as readable as possible. Code optimizers such as Intel's VTune program are available for purposes of optimization.

This book, as mentioned, is divided into chapters of functionality. It is related to the use of learning to write 80x86 assembly language for games , or embedded and scientific applications. (Except for writing a binary-coded decimal (BCD) package, there is not a big need for assembly language in pure business applications.) Now graphically or statistically oriented, that is another matter. With that in mind, you will learn from this book:

Adapted coding standards that this book recommends
Bit manipulations and movement
Converting data from one form into another
Addition/subtraction (integer/floating-point/BCD)
Multiplication/division (integer/floating-point)
Special functions
(Some) trigonometric functionality
Branching and branchless coding
Some vector foundations
Debugging

It is very important to write functions in a high-level language such as C code before rewriting in assembly. Do not write code destined for assembly code using the C++ programming language because you will have to untangle it later. Assembly language is designed for low-level development and C++ for high-level object-oriented development using inheritance, name mangling, and other levels of abstraction, which makes the code harder to simplify. There is of course no reason why you would not wrap your assembly code with C++ functions or libraries. But I strongly recommend you debug your assembly language function before locking it away in a static or dynamic library, as debugging it will become harder. This allows the algorithm to be debugged and mathematical patterns to be identified before writing the algorithm. In addition, the results of both algorithms can be compared to verify that they are identical and thus the assembly code is functioning as expected.

Tip	Sometimes examining compiler output can give insight into writing optimized code. (That means peeking at the disassembly view while debugging your application.)