Chapter 6: Memory Organization and Access | The Art of Assembly Language

This chapter describes the basic components that make up a computer system: the CPU, memory, I/O, and the bus that connects them. Although you can write software without this knowledge, writing great, high-performance code requires an understanding of this material.

This chapter begins by discussing bus organization and memory organization. These two hardware components may have as large a performance impact on your software as the CPU's speed. Knowing about memory performance characteristics, data locality, and cache operation can help you design software that runs as fast as possible. Writing great code requires a strong knowledge of the computer's architecture.

6.1 The Basic System Components

The basic operational design of a computer system is called its architecture . John von Neumann, a pioneer in computer design, is given credit for the principal architecture in use today. For example, the 80x86 family uses the von Neumann architecture (VNA). A typical von Neumann system has three major components: the central processing unit (CPU), memory, and input/output (I/O), as shown in Figure 6-1.

Figure 6-1: Typical von Neumann machine

In VNA machines, like the 80x86, the CPU is where all the action takes place. All computations occur within the CPU. Data and machine instructions reside in memory until the CPU requires them, at which point the system transfers the data into the CPU. To the CPU, most I/O devices look like memory; the major difference between memory and I/O devices is the fact that the latter are generally located in the outside world, whereas the former is located within the same machine.

6.1.1 The System Bus

The system bus connects the various components of a VNA machine. Most CPUs have three major buses: the address bus, the data bus, and the control bus. A bus is a collection of wires on which electrical signals pass between components of the system. These buses vary from processor to processor, but each bus carries comparable information on most processors. For example, the data buses on the Pentium and 80386 may have different implementations , but both variants carry data between the processor, I/O, and memory.

6.1.1.1 The Data Bus

CPUs use the data bus to shuffle data between the various components in a computer system. The size of this bus varies widely among CPUs. Indeed, bus size is one of the main attributes that defines the ' size ' of the processor.

Most modern, general-purpose CPUs employ a 32-bit-wide or 64-bit-wide data bus. Some processors use 8-bit or 16-bit data buses, there may well be some CPUs with 128-bit buses by the time you read this. For the most part, however, the CPUs in personal computers tend to use 32-bit or 64-bit data buses (and 64-bit data buses are the most prevalent ).

You'll often hear a processor called an 8-, 16-, 32-, or 64-bit processor. The smaller of the number of data lines on the processor and the size of the largest general-purpose integer register determines the processor size. For example, modern Intel 80x86 CPUs all have 64-bit buses, but only provide 32-bit general-purpose integer registers, so we'll classify these devices as 32-bit processors. The AMD x86-64 processors support 64-bit integer registers and a 64-bit bus, so they're 64-bit processors.

Although the 80x86 family members with 8-, 16-, 32-, and 64-bit data buses can process data blocks up to the bit width of the bus, they can also access smaller memory units of 8, 16, or 32 bits. Therefore, anything you can do with a small data bus can be done with a larger data bus as well; the larger data bus, however, may access memory faster and can access larger chunks of data in one memory operation. You'll read about the exact nature of these memory accesses a little later in this chapter.

6.1.2 The Address Bus

The data bus on an 80x86 family processor transfers information between a particular memory location or I/O device and the CPU. The only question is, 'Which memory location or I/O device?' The address bus answers that question. To uniquely identify each memory location and I/O device, the system designer assigns a unique memory address to each. When the software wants to access a particular memory location or I/O device, it places the corresponding address on the address bus. Circuitry within the device checks this address and transfers data if there is an address match. All other memory locations ignore the request on the address bus.

With a single address bus line, a processor could access exactly two unique addresses: zero and one. With n address lines, the processor can access 2 ⁿ unique addresses (because there are 2 ⁿ unique values in an n -bit binary number). Therefore, the number of bits on the address bus will determine the maximum number of addressable memory and I/O locations. Early 80x86 processors, for example, provided only 20 lines on the address bus. Therefore, they could only access up to 1,048,576 (or 2 ²⁰ ) memory locations. Larger address buses can access more memory (see Table 6-1 on the next page).

Table 6-1: 80x86 Addressing Capabilities
Processor	Address Bus Size	Maximum Addressable Memory
8088, 8086, 80186, 80188	20	1,048,576 (1 megabyte)
80286, 80386sx	24	16,777,216 (16 megabytes)
80386dx	32	4,294,976,296 (4 gigabytes)
80486, Pentium	32	4,294,976,296 (4 gigabytes)
Pentium Pro, II, III, IV	36	68,719,476,736 (64 gigabytes)

Newer processors will support 40-, 48-, and 64-bit address buses. The time is coming when most programmers will consider 4 GB (gigabytes) of storage to be too small, just as we consider 1 MB (megabyte) insufficient today. (There was a time when 1 MB was considered far more than anyone would ever need!) Many other processors (such as SPARC and IA-64) already provide much larger addresses buses and, in fact, support addresses up to 64 bits in the software.

A 64-bit address range is truly infinite as far as memory is concerned . Noone will ever put 2 ⁶⁴ bytes of memory into a computer system and feel that they need more. Of course, people have made claims like this in the past. A few years ago, no one ever thought a computer would need 1 GB of memory, but computers with a gigabyte of memory or more are very common today. However, 2 ⁶⁴ really is infinity for one simple reason - it's physically impossible to build that much memory based on estimates of the current size of the universe (which estimate about 2 ⁵⁶ different elementary particles in the universe). Unless you can attach one byte of memory to every elementary particle in the known universe, you're not even going to come close to approaching 2 ⁶⁴ bytes of memory on a given computer system. Then again, maybe we really will use whole planets as computer systems one day, as Douglas Adams predicts in The Hitchhiker's Guide to the Galaxy . Who knows ?

6.1.3 The Control Bus

The control bus is an eclectic collection of signals that control how the processor communicates with the rest of the system. To illustrate its importance, consider the data bus for a moment. The CPU uses the data bus to move data between itself and memory. This prompts the question, 'How does the system know whether it is sending or receiving data?' Well, the system uses two lines on the control bus, read and write , to determine the data flow direction (CPU to memory, or memory to CPU). So when the CPU wants to write data to memory, it asserts (places a signal on) the write control line. When the CPU wants to read data from memory, it asserts the read control line.

Although the exact composition of the control bus varies among processors, some control lines are common to all processors and are worth a brief mention. Among these are the system clock lines, interrupt lines, byte enable lines, and status lines.

The byte enable lines appear on the control bus of some CPUs that support byte-addressable memory . These control lines allow 16-, 32-, and 64-bit processors to deal with smaller chunks of data by communicating the size of the accompanying data. Additional details appear later in the sections on 16-bit and 32-bit buses.

The control bus also contains a signal that helps distinguish between address spaces on the 80x86 family of processors. The 80x86 family, unlike many other processors, provides two distinct address spaces: one for memory and one for I/O. However, it does not have two separate physical address buses (for I/O and memory). Instead, the system shares the address bus for both I/O and memory addresses. Additional control lines decide whether the address is intended for memory or I/O. When such signals are active, the I/O devices use the address on the LO 16 bits of the address bus. When inactive, the I/O devices ignore the signals on the address bus, and the memory subsystem takes over at that point.