Lesson 1: Microprocessors | Microsoft Corporation

[Previous] [Next]

A microprocessor is an integrated circuit that contains a complete CPU on a single chip. In this lesson, we examine the microprocessor from its inception to the current state-of-the-art chip. It is important for a computer technician to understand the development of the processor and what makes each version different from its predecessors. This knowledge gives us an understanding of the enhancements each new design offers over earlier ones and how the system components can take advantage of the new features.

After this lesson, you will be able to

Describe how a microprocessor works.

Define different types of processors and describe their advantages and limitations.

Estimated lesson time: 40 minutes

Computer technicians, fortunately, aren't required to design microprocessors, only to understand how they work. Microprocessors can be viewed as little black boxes that provide answers or perform a variety of chores on command. We also need to understand the external data bus, because it is the means by which the CPU accesses system resources.

The External Data Bus

In previous lessons, you learned that information is transmitted throughout a computer by binary code traveling through a bus. The external data bus (also known as the external bus or simply data bus) is the primary route for data in a PC. All data-handling components or optional data devices are connected to it; therefore, any information (code) placed on that bus is available to all devices connected to the computer.

As mentioned in Chapter 1, "Introduction to Computers," early computers used eight conductors (an 8-bit data bus), which allowed for the transfer of 1 byte of information at a time. As computers evolved, the width of the external data bus increased to 16, 32, and finally to the current width of 64 conductors. The wider bus lets more data flow at the same time, just as adding more lanes to a highway allows more cars to move through a point in a given amount of time.

Figure 4.1 shows a CPU attached to its motherboard. The motherboard is the main circuit board, which contains the external data bus and connection for expansion devices that are not part of the board's basic design. The expansion slots act as "on ramps" to the external bus. Expansion cards, once commonly known as "daughter cards," are placed in slots on the motherboard. Other forms of onramp are the slots that hold memory or the sets of pins used to attach drive cables. Connectors on the motherboard grant access to the data bus for keyboards, mouse devices, and peripheral devices like modems and printers through the use of COM and LPT ports.

Figure 4.1 Motherboard

To understand how a computer moves data between components, visualize each device on the data bus (including the CPU) connected to the bus by means of a collection of on/off switches. By "looking at" which conductors have power and which ones do not, the device can read the data as it is sent by another device. The on-off state of a line gives the value of 0 (on) or 1 (off). The wires "spell out" a code of binary numbers that the computer interprets and then routes to another system component or to the user by means of an output device such as a monitor or printer. Communication occurs when voltage is properly applied to, or read from, any of the conductors by the system. Figure 4.2 illustrates a data bus connected to a CPU and a device.

Figure 4.2 External data bus

Coded messages can be sent into or out of any device connected to the external data bus. Think of the data bus as a large highway with parallel lanes. Extending that analogy, bits are like cars traveling side by side—each carries part of a coded message. Microprocessors are used to turn the coded messages into data that performs a meaningful task for the computer's user.

NOTE
All hardware that uses data is connected in some way to the data bus, or to another device that is connected to the data bus.

The CPU

The CPU is the part of a computer in which arithmetic and logical operations are performed and instructions are decoded and executed. The CPU controls the operation of the computer. Early PCs used several chips to handle the task. Some functions are still handled by support chips, which are often referred to collectively as a "chip set." Figure 4.3 shows a close-up of a CPU and other chips on a motherboard.

click to view at full size.

Figure 4.3 CPU

Although it is not necessary to know exactly what goes on inside the processor, learning a few terms that you will encounter as a computer professional will be helpful to you in the discussion that follows.

Transistors

Transistors, the main components of microprocessors, are small, electronic switches. The on/off positions of the transistors form the binary codes discussed earlier in this lesson. Although transistors might seem simple, their development required many years of painstaking research. Before transistors were available, computers relied on slow, inefficient vacuum tubes and mechanical switches to process information. The first large-scale computers took up a huge amount of space, and technicians actually went inside them to "program" by turning on and off specific tubes!

Many materials, including most metals, allow electrical current to flow through them—these are known as electrical conductors. Materials that don't pass electrical current are called insulators. Pure silicon (which is used to make most transistors) is a semiconductor; its degree of conductivity can be adjusted, or modulated, by adding impurities during production.

Transistor switches have three terminals: the source, the gate, and the drain. When positive voltage is applied to the gate, electrons are attracted, forming an electron channel between the source and the drain. Positive voltage applied to the drain pulls electrons from the source to the drain, turning the transistor on. Removing the voltage turns it off by breaking the pathway.

In the late 1950s, a major development in transistor technology took place. A team of engineers put two transistors on a silicon wafer, creating the world's first integrated circuit and paving the way for the development of compact computers.

Integrated Circuits

An integrated circuit (IC) is an electronic device consisting of a number of miniature transistors and other circuit elements (resistors and capacitors, for instance). An IC functions just as a large collection of these parts would, but it is a fraction of the size and uses a fraction of the power. ICs make today's microelectronics possible. The original transistors were small plastic boxes about the size of a peanut (outside its shell), and could handle only one function. The word "integrated" denotes that IC devices combine many circuits—and some of their functions—into one package. A prime example of this technology is the microprocessor.

Microprocessors

On November 15, 1971, Intel shipped the commercial microprocessor, Model 4004. It ran a product called the Busicom calculator. The 108-KHz 4004 had 2300 transistors and a 4-bit data bus and could address 640 bytes of RAM. Computer engineers quickly took advantage of the potential this new type of chip offered, leading the way to the first personal computers.

A year later, the Intel 8008 appeared. Radio Electronics Magazine reported that hobbyist Don Lancaster used an 8008 to build what is considered the first personal computer. The article called it a "TV typewriter."

The Intel 8080 appeared in 1974. It sold then for $400, and now sells for about one dollar. It powered traffic lights, but of more interest to our discussion is the fact that it formed the core of the Altair computer of 1975. It was sold in kit form for $395 and was named for a world in the Star Trek TV series. Figure 4.4 shows a picture of the 8080 die. By today's standards, it was very weak: 6000 transistors, an 8-bit bus, and a 2-MHz clock speed. It could address 64 K of RAM, and users programmed the Altair by throwing manual switches located on the case.

Figure 4.4 The Intel 8080 Microprocessor

Microprocessor Design

Before going further into microprocessor-development history, it is important to discuss in general terms how they operate. Microprocessors are usually divided into three subsystems: the control unit (CU), the arithmetic logic unit (ALU), and the input/output unit. The term CPU is used to denote a combined CU and ALU, contained in a single package.

The advent of the control unit marked a radical improvement in processor design, allowing CPU operations to be based in part on code provided by an external program like a BIOS (basic input/output system). This extended the ability of a PC to use new hardware components that were not part of the original design.

The ALU is just what its name implies—the part of the IC that handles the basic, math functions of computation. The I/O unit fetches data from the outside and passes data back to the external bus.

Registers

Registers are temporary memory storage areas used during data manipulation. Physically, registers are rows of microscopic switches, which are set on or off. Each row forms a binary number: off = 0, on = 1. Hence (reading from right to left) off.off.on equals the number 1. Off.on.on equals the number three (0+2+1). The CPU uses registers like scratch pads, to hold data while it works on a task. Changes in data during an operation are also stored in a register, then sent out to other components as the job is finished. The number and width of a register varies from one type of machine to another. The wider the register, the more bits the machine can handle at one time—just as with the width of the external bus. As register width moved from 4 to 8 to 16 to 32 to 64 bits, PCs increased in performance.

Codes

Computers use various binary-based codes to represent information. In Chapter 2, "Understanding Electronic Communication," we saw how ASCII code is a binary representation of characters on a keyboard. These codes are sent on the external data bus by a system component to be read by other devices. Press a key on a PC keyboard and an ASCII code is generated and sent over the data bus. Transferring information to and from the CPU (and other hardware) is only the first step in manipulating data.

Other codes tell the PC how to display data on the monitor, talk to devices such as printers, and take in data streams from scanners. Each of those operations requires system resources and the manipulation of binary numbers.

In addition to the code that requires data, special machine code is required in order for the CPU to turn the string of numbers into something useful to an application. As with the data code, this machine code is sent in the form of binary numbers on the data bus. The CPUs in turn are different enough that a code system must be written specifically for each of them.

The Clock

Timing is essential in PC operations. Without some means of synchronization, chaos would ensue. Timing allows the electronic devices in the computer to coordinate and execute all internal commands in the proper order.

Timing is achieved by placing a special conductor in the CPU and pulsing it with voltage. Each pulse of voltage received by this conductor is called a "clock cycle." All the switching activity in the computer occurs while the clock is sending a pulse. This process somewhat resembles several musicians using a metronome to synchronize their playing, with all the violinists moving their bows at the same time. Thanks to this synchronization, you get musical phrasing instead of a jumble of notes.

Virtually every computer command needs at least two clock cycles. Some commands might require hundreds of clock cycles to process. Figure 4.5 shows an external data bus with a CPU and two devices. Notice that the crystal or clock is attached to the CPU to generate the timing.

Figure 4.5 CPU with clock

Clock Speed

It is common for computers to be marketed to consumers based on features that show off their best points. One principal selling point is the system clock rate—measured in megahertz (MHz) or millions of cycles per second. The clock rate suggests how many commands can be completed in two cycles (the minimum time required to execute a command). The process of adding two numbers together would take about four commands (eight clock cycles). A computer running at 450 MHz can do about 44 million simple calculations per second.

Clock speed is determined by the CPU manufacturer and represents the fastest speed at which the CPU can be reliably operated. The Intel 8088 processor, as used in the original IBM PC, had a clock speed of 4.77 MHz. Today's processors have clock speeds that run up to and, in some, exceed 750 MHz.

NOTE
Remember that this speed is the CPU's maximum speed. If you place too many clock cycles on a CPU, it can fail or overheat and stop working.

The system crystal determines the speed at which a CPU operates. The system crystal is usually a quartz oscillator, very similar to the one in a wristwatch. You can find the system crystal soldered to the motherboard. Look for a silver part, usually with a label that indicates the crystal speed.

IMPORTANT
A computer has two clocks: one to set the speed and timing and a second clock to keep time for date/time calculations. They are two entirely different devices.

Memory

The CPU's ability to hold large amounts of information at once is very limited. To compensate, additional chips are installed in the computer for the sole purpose of temporarily storing information that the CPU needs. These chips are called random access memory (RAM). The term random access is used because the CPU can place or retrieve bytes of information in or from any RAM location at any time. RAM is explored in greater detail in Chapter 7, "Memory."

Address Bus

The word "location" is italicized in the last paragraph to underscore the importance of location in PC memory operations. The content of RAM is changing all the time, as programs and the computer itself use portions of it to note, calculate, and hold results of actions. It is essential for the system to know what memory is assigned to which task and when that memory is free for a new use. To do so, the system has to have a way to address segments of memory and to quickly change the holdings in that position. The portion of the PC that does this is the address bus.

Think of the address bus as a large, virtual table in which the columns are individual bits (like letters) and each row contains a string of bits (making up a word). The actual lengths of these words will vary depending on the number of bits the address bus can handle in a single pass. Figure 4.6 shows a table containing 1s and 0s. Each segment is given an address, just like the one that identifies a home or post office box. The system uses this address to send data to or retrieve data from memory.

Figure 4.6 Memory spreadsheet

Like all the other buses in a PC, this one is a collection of conductors. It links the physical memory to the system and moves signals as memory is used. The number of conductors in the address bus determines the maximum amount of memory that can be used (memory that is addressable) by the CPU. Remember that computers count in binary notation. Each binary digit—in this case, a conductor—that is added to the left will double the number of possible combinations.

Early data buses used eight conductors and, therefore, 256 (2⁸) combinations of code where possible. The maximum number of patterns a system can generate determines how much RAM the data bus can address. The 8088 used 20 address conductors and could address up to 1,048,576 bytes of memory locations, or 2²⁰. Today's PCs can address a lot more than that, and, in many cases, the actual limiting factor is not the number of patterns, but the capacity of the motherboard to socket memory chips. In all cases, the total amount of memory is the factor of 2^X, where X = the number of connectors.

The CPU does not directly connect to the memory bus, but sends requests and obtains results using the system's memory controller. This circuitry acts as both postmaster and translator, providing the proper strings of data in the right order, at the right time, and in a form the CPU can use. As mentioned before, any write or read action will require at least two clock cycles to execute. (It can require more clock cycles on systems that do not have memory tuned to the maximum system clock speed. In that case, the PC will have to use additional clock cycles while it waits for the memory to be ready for the next part of the operation.)

Figure 4.7 shows a diagram of the process with the CPU and RAM stack on the external data bus. The address bus is connected to the memory controller. It fetches and places data in memory.

Figure 4.7 CPU and RAM

How Microprocessors Work

Current CPUs, such as the Intel Pentium III, are collections of millions of switches and bus pathways. They operate all kinds of machines, in addition to PCs, and are found in cameras, cars, microwave ovens, TVs, and all kinds of gadgets. Here, however, we are interested only in how they work inside a PC. Let's look at a simple task: adding two numbers such as 2 and 2 together and obtaining their sum (2 + 2 = 4). The CPU can do math problems very quickly, but it requires several very quick steps to do it. Knowing how a CPU performs a simple task will help you understand how developments in PC design have improved PC performance.

When the user pushes a number key (in a program like Calculator, which can add numbers), the keystroke causes the microprocessor's prefetch unit to ask for instructions on what to do with the new data. The data is sent through the address bus to the PC's RAM and is placed in the instruction cache, with a reference code (let's call it 2 = a).

The prefetch unit obtains a copy of the code and sends it to the decode unit. There it is translated into a string of binary code and routed to the control unit and the data cache to tell them what to do with the instruction. The control unit sends it to an address called "X" in the data cache to await the next part of the process.

When the plus (+) key is pressed, the prefetch unit again asks the instruction cache for instructions about what to do with the new data. The prefetch unit translates the code and passes it to the control unit and data cache, which alerts the ALU that an ADD function will be carried out. The process is repeated when the user presses the "2" key.

Next (yes, there's still more to do), the control unit takes the code and sends the actual ADD command to the ALU. The ALU sums "a" and "b" are added together after they have been sent up from the data cache. The ALU sends the code for "4" to be stored in an address register.

Pressing the equal sign (=) key is the last act the user must execute before getting the answer, but the computer still has a good bit of work ahead of it. The prefetch unit checks the instruction cache for help in dealing with the new keystroke. The resulting instruction is stored, and a copy of the code is sent to the decode unit for processing. There, the instruction is translated into binary code and routed to the control unit. Now that the sum has (finally) been computed, a print command retrieves the proper address, registers the contents, and displays them. (That involves a separate flurry of activity in the display system, which we won't worry about.)

As you can see, a microprocessor must go through many more steps than human beings are required to take, merely to arrive at the conclusion that 2 + 2 = 4. The computer must execute a complicated dance in order to manage the code, place it, and fetch it in memory; then it has to be told what to do with it. Yet the result usually appears as fast as you can type the request. You can see that clock cycles and, hence, processor speed, have a significant effect on performance. Other issues that affect performance include memory access and speed, as well as the response time of components such as the display system.

PC Microprocessor Developments and Features

PC microprocessor design grows more complex with each generation, and CPU packaging keeps changing to provide room for additional features and operating requirements. Microprocessors have evolved from the 4004 described earlier into today's high-speed Pentiums. Each new processor has brought higher performance and spawned new technology. Six basic elements are customarily used to gauge the performance and capability of a CPU design:

Speed: The maximum number of clock cycles measured in megahertz. The higher the speed, the quicker a command will be executed.

Number of transistors: More switches, more computing power.

Registers: The size (in bits) of the internal registers. The larger the registers, the more complicated the commands that can be processed in one step.

External data bus: As data bus size increases, so does the amount and complexity of code (information) that can be transferred between all devices in the computer.

Address bus: The size of the address bus determines the maximum amount of memory that can be addressed by the CPU.

Internal cache: The internal cache is high-speed memory built into the processor. This is a place to store frequently used data instead of sending it to slower devices (speed is relative in computers) such as RAM and hard disk drives. It is built into the processor and has a dramatic effect on speed. We cover cache in more detail later in this lesson.

Intel has held most of the PC CPU market share since the original IBM PC was introduced. Closely following each new Intel launch, rivals such as Advanced Microdevices (AMD) and Cyrix have offered alternative chips that are generally compatible with the Intel models. This development, in turn, drives prices down and spurs a new round of CPU design. Another player is Motorola, a firm that manufactures the microprocessors used in the Apple family of computers, among others.

Intel's 8086 and 8088: The Birth of the PC

We have already introduced the "pre-PC" CPUs. Now we take a look at the models that have powered one of the most dramatic developments of the modern world: the inexpensive, general-purpose computer.

On June 6, 1978, Intel introduced its first 16-bit microprocessor, known as the 8086. It had 29,000 transistors, 16-bit registers, a 16-bit external data bus, and a 20-bit address bus to allow it to access 1 MB of memory. When IBM entered the computer business, the 8086 was too powerful (and expensive) to meet its requirements.

Intel then released the 8088 processor, which was identical to the 8086 except for an 8-bit external data bus, and a slower top clock rate. This meant that 8-bit components (more common at the time) could be used for the construction of PCs, and 8-bit applications written for earlier machines could be converted for PC use. The following table compares the 8088 and 8086 chips.

Chip	Number of Transistors	CPU Speed (MHz)	Register Width	External Data Bus	Address Bus	Internal Cache
Intel 8088	29,000	4.77-8	16-bit	8-bit	20-bit	None
Intel 8086	29,000	4.77-10	16-bit	16-bit	20-Bit	None

The early 8088 processors ran at 4.77 MHz, while later versions ran at 8 MHz. The 8086 and 8088 processors came as a 40-pin DIP (dual inline package) containing approximately 29,000 transistors. The DIP is so named because of the two rows of pins on either side of the processor, as shown in Figure 4.8. These fit into a set of slots on a raised socket on the motherboard. The small u-shaped notch at one end of a DIP-style CPU denotes the end that has pin 1. During installation, you well need to be sure to line it up correctly, or you might have to repeat the process.

Figure 4.8 DIPP (Dual Inline Package Processor) used for 8086, 8088, and 80286 CPUs

NOTE
The 8088 and 8086 are software-compatible—they can run exactly the same programs (assuming the PCs that use them don't have other complicating factors). The benefit of using an 8086 is its 16-bit external data bus. This allows an 8086-based computer to execute the same software faster than an 8088 computer with the same clock speed.

The early IBM personal computers based on the 8086 and 8088 chips featured:

16 KB of memory.

Cassette tape recorder or floppy disk drive for program and data storage.

Nongraphics monochrome monitor and monochrome display adapter (MDA).

Soon, a new industry was born as third-party vendors started manufacturing add-ons and improved models of the basic design. Graphics cards with color and better resolution, clocks, additional memory, and peripherals, such as printers, extended the features of the new appliance. "Clones" offered some of these extras at very competitive prices, as a way to attract buyers who wanted a lower price and did not need the comfort of purchasing from a big name like IBM.

NOTE
A clone is a computer that contains the same microprocessor and runs the same programs as a better-known, more prestigious, and often more expensive machine.

Most of the 8088 and 8086-based PCs used some variation of MS-DOS. The variations limited the growth of the software market because of the compatibility issues they presented between versions of MS-DOS. Buyers had to be sure that a program would run on their specific version of MS-DOS.

As users found more ways to take advantage of the PC's power, developers and owners alike soon felt the limitations of the original IBM PC design. The engineers who created it never envisioned the need for more than 16 K of RAM. "Who would ever need more than that?" one is quoted as saying. The cassette drive was never a big seller; most buyers opted for one or two 5.25-inch floppy disk drives, and many soon craved color graphics and the space of the "massive" 5- and 10-MB hard disk drives.

To meet that growing demand, IBM introduced a more robust PC, the XT (eXtended Technology), that could take advantage of a hard disk drive and came with either a monochrome or four-color display and more RAM. Clone makers soon followed suit.

The 80286 and the IBM PC AT

In February, 1982, Intel introduced the 80286 6-MHz microprocessor (later pushing the clock speeds to 10 and 12.5 MHz), commonly called the 286, with a 24-bit address path. In 1983, IBM unveiled its PC AT (Advanced Technology) computer, based on the 286. It had a larger, boxier design, came with a standard hard drive, and a new expansion slot format, rendering older add-on cards obsolete.

The AT could run the same applications as the PC XT (8088), but run them faster. The use of a 24-bit address path allowed the 286 to access up to 16 MB of memory. The clone-makers soon followed suit, taking advantage of third-party versions of the 286. Chip makers Harris and AMD produced versions of the 286 that could run at up to 20 MHz.

Computers based on the 80286 chip featured:

Two memory modes (real and protected).

16 MB of addressable memory.

Clock speeds up to 20 MHz.

Reduced command set (fewer program commands to do more work).

Multitasking abilities.

Virtual memory support.

Virtual Memory

Virtual memory is the art of using hard disk space to hold data not immediately required by the processor; it is placed in and out of RAM as needed. Although using virtual memory slowed the system down (electronic RAM is much faster than a mechanical hard drive), it allowed the 286 to address up to 1 GB (gigabyte—one thousand megabytes) of memory (16 MB of actual memory and 984 MB of virtual memory). Virtual memory required the use of operating systems more advanced than MS-DOS, leading to the development of products such as Microsoft Windows, IBM OS/2, and SCO's PC version of UNIX.

Real Mode vs. Protected Mode

The 286 might have made older hardware outdated, but Intel had no desire to invoke industry ire and slow the adoption of the new chip by requiring all-new software applications. The result was a CPU with two operating modes: real and protected.

In real mode, sometimes called compatibility mode, a 286 emulates the 8086 processor and addresses only the first 1 MB of memory. This mode is used to run older software. Protected mode allows access to all memory on the system, physical and virtual. In protected mode, a program can write only to the memory allocated to it, with specific memory blocks allocated to different programs. This mode can go well beyond the 16 MB of "true" memory, opening up the possibility of multitasking—running more than one program at a time.

This development required new, more powerful operating systems and applications, but they were slow in coming. By the time they arrived on the market, the 286 was functionally obsolete, but it paved the way for today's powerful multitasking environments such as Windows 95 and 98, Windows NT and 2000. Another major drawback to the 286's memory management scheme was its need to reboot the system when changing between real and protected modes.

The original 286 processor came packaged in DIP (already shown), PGA (pin grid array), and PLCC (plastic leadless chip carrier) designs. The PLCC can be recognized by the arrangement of thin legs around its perimeter. The PLCC's major advantage is its stronger leads (pins), which make it more difficult to damage during removal or installation. PLCCs became popular because they made it easier to upgrade a PC with a faster CPU.

NOTE
PGA and PLCC models look very much alike, but CPUs designed for some types can't be socketed in the other type. Verify the type you need before ordering or attempting a replacement or upgrade (see Figure 4.9).

Figure 4.9 Plastic Leadless Chip Carrier (PLCC) CPU Package

The 80386 Arrives

On June 16, 1985, Intel introduced the original 80386 (commonly known as the 386). This true 32-bit processor was equipped with a 32-bit external data bus, 32-bit registers, and a 32-bit address bus. The first models shipped with a clock speed of 16 MHz, and the CPU sported 275,000 transistors. It could directly address 4 GB of RAM, and 64 TB (terabytes—a terabyte is approximately one trillion bytes) of virtual memory. According to Intel, the 386 could hold an eight-page history of every person on earth in that address space. The 386 was a true generational leap in PC computing, with true multitasking capability—it really could run more than one program at a time. That was due to a third memory mode, called virtual real mode, that allowed independent MS-DOS sessions (called "virtual machines") to coexist on the same system at once. It spawned a host of programs called "memory managers" designed to optimize (and troubleshoot) the more complex world of virtual memory.

The original 80386 chips shipped with speeds of 12 or 16 MHz. Intel produced faster versions—25 and 33 MHz, while AMD manufactured a 40-MHz variant. The 386 provided both the real and protected mode available in the 286.

By April of 1989, the 386 was running at clock speeds of 33 MHz, and Intel was calling it the 80386DX to distinguish it from a lower-cost model, the 386SX.

The 386SX: A Scaled-Down Version

The 386SX came on the scene in June, 1988. Intel wanted to increase the sales of 386-based machines without dramatically dropping the price of its flagship CPU. The result was the introduction of a scaled-down model for "entry-level" computers. It had a 16-bit external data bus and a 24-bit address bus (it could address only 16 MB of memory). The 16-bit configuration allowed it to be used as an upgrade chip for existing 16-bit motherboards, thereby providing an easy transition to the next generation of computers.

The following table compares members of the 80386 chip family from Intel and rival AMD. The AMD 80386DXLV is notable as the first PC CPU with an internal cache.

Chip	Number of Transistors	CPU Speed (MHz)	Register Width	External Data Bus	Address Bus	Internal Cache
Intel 80386SX	275,000	16-25	32-bit	16-bit	24-bit	None
Intel 80386DX	275,000	16-33	32-bit	32-bit	32-bit	None
AMD 80386DX	275,000	20-40	32-bit	32-bit	32-bit	None
AMD 80386DXL	275,000	20-33	32-bit	32-bit	32-bit	None
AMD 80386DXLV	275,000	20-33	32-bit	32-bit	32-bit	8 KB

NOTE
The terms "SX" and "DX" are not acronyms; they do not stand for longer terms.

386 Packaging

The 386 was usually placed in either a PLCC package or a PGA package. This type of mount can be found with the 80386, 486, and some older Pentiums up to 166-MHz models. The pins are evenly distributed in concentric rows along the bottom of the chip (see Figure 4.10).

Figure 4.10 PGA (Pin Grid Array)

PGA chips go into regular PGA or the popular ZIF (zero insertion force) sockets. Care must be used when inserting or removing CPUs from a PGA mount—it is very easy to bend the pins if you do not pull perfectly straight up from the socket or have a slight uneven push downward. ZIF mounts are a bit better, but much tech time has been wasted straightening pins, and it is possible to ruin a CPU! PGA mounts are often "hidden" under a CPU fan, which presents another hurdle during repair or upgrade.

A variation of the PGA is the SPGA (staggered pin grid array). It looks almost the same, but with (surprise!) staggered rows of pins. This allows engineers to place more connectors in a smaller area. It also adds emphasis to the caution given earlier about not bending pins through careless removal or insertion.

Both the PGA and SPGA have three pointed corners and a "snipped corner" on one side. Use that corner to line the chip with the socket. If it does not go in smoothly—double check!

Laptop Designs and the Plastic Quad Flat Pack

Some forms of portable PC have existed from the days of the 8088. The early models, such as the Osborne and the original Compaq, were known as "luggables"—tipping the scales at close to 30 pounds. Their cases looked more suited for holding sewing machines than computers. Modern laptop computers started to gain popularity with the advent of the 386 chip and the use of flat screen monitors incorporated in the design, rather than conventional video tubes.

To seat 80286, 80386, and 486 CPUs (the latter are covered in the section that follows) on the more compact laptop motherboards, many vendors use plastic quad flat pack (PQFP) mounts, which are also more secure than traditional socket types designed for systems that will not be moved as much. PQFPs require a submount called a "carrier ring" (see Figure 4.11). PQFPs require a special tool for placing or removing a CPU. Be sure to get the tool before attempting repairs on PQFP-mounted CPUs.

Figure 4.11 PQFP (Plastic Quad Flat Pack)

80486

April 10, 1989, brought us the 80486 line of processors. Once again, the rallying cry was "better and faster." By this time, applications like CorelDRAW, Adobe PhotoShop, and desktop-publishing tools like PageMaker and Ventura Publisher were generating more interest in faster systems. Microsoft Windows was gaining popularity and on its way to becoming the standard desktop environment.

The 486 processor started life at 25 MHz and could address 4 GB of RAM and 64 TB of virtual memory. It is the first PC CPU to break the 1-million transistor mark with 1,200,000. It provided a built-in math coprocessor (older PC CPUs offered separate math coprocessors as an option—usually with a similar number ending in a 7 rather than a 6). The combination speeded up graphics programs that used floating-point math.

The 486SX and Beyond

Once again, Intel sought a way to increase sales without weakening the price of the flagship version of its 486DX CPU, so it added an SX version in April, 1991. This time, the company achieved its goal by removing the math coprocessor, reducing the number of transistors to 1,185,000. Users could upgrade the SX to a 486DX by adding an optional OverDrive processor to restore the missing component.

The 486 label was attached to other chip designs during its active development phase, both by Intel and third-party chip makers. The 486SL, a variant with a 20- to 33-MHz clock and 1.4 million transistors, debuted in 1992. It was very popular in high-performance laptop computers, running at lower voltage (3.3 volts instead of 5 volts) than the usual 486. The small (and for that time) powerful machines also included System Management Mode (SMM), which can dim the LCD screen and power down the hard disk drive—extending the life of the battery.

System Memory Management

System Memory Management (SMM) is a hardware-based function that allows the microprocessor to selectively shut down the monitor, hard drives, and any other peripherals not in use. SMM works at the chip level; the microprocessor can be operating in real, protected, or virtual 8086 mode. SMM is transparent to all software running on the system, which decreases the likelihood of lockups.

Clock-Doubling Debuts

The need for speed spurred the introduction of new models of the 486 family through the spring of 1994, the last variations being the DX2 and DX4. These chips were models with faster clock speeds of up to 100 MHz. The processors were either 25- or 33-MHz versions that had been altered to run internally at double or triple their external speed. For example, the DX4 version of the 486 33-MHz processor ran at 33 MHz externally, but at 100 MHz internally (3 x 33.3 MHz). This meant that internal operations, such as numeric calculations or moving data from one register to another, occurred at 100 MHz, while external operations, like loading data from memory, took place at 33 MHz.

Slower external clock speeds allowed existing motherboard and memory designs to be used. Upgrades were less expensive, and new machines based on the DX technology could quote faster benchmarks at lower costs. The DX4 offered 16 KB of on-board cache, further boosting performance. The DX2 50-MHz-based machines should not be confused with machines designed around the 50-MHz 486DX processor—the latter performed much better.

Vendors such as AMD rode the wave with their own editions of the 486 for users feeling a need for greater speed. The following table lists the most popular 486 chips and third-party work-alikes.

Chip	CPU Speed (MHz)	Register Width	External Data Bus	Address Bus	Internal Cache
Intel 80486DX	25, 33, 50	32-bit	32-bit	32-bit	8 KB
Intel 80486DX/2	50, 66	32-bit	32-bit	32-bit	8 KB
Intel 80486DX/4	75, 100	32-bit	32-bit	32-bit	16 KB
Intel 80486SX	16, 20, 25	32-bit	32-bit	32-bit	8 KB
Intel 80486SL	16, 20, 25	32-bit	32-bit	32-bit	8 KB
AMD AM486DX	33, 40	32-bit	32-bit	32-bit	8 KB
AMD AM486DXLV	33	32-bit	32-bit	32-bit	8 KB
AMD AM486DX2	50, 80	32-bit	32-bit	32-bit	8 KB
AMD AM486DX4	100, 120	32-bit	32-bit	32-bit	8 KB
AMD AM486DX "Enhanced"	120, 133	32-bit	32-bit	32-bit	16 KB W/B
AMD AM486DXL2	50, 80	32-bit	32-bit	32-bit	8 KB
AMD AM486SX	33, 40	32-bit	32-bit	32-bit	8 KB
AMD AM486SXLV	33	32-bit	32-bit	32-bit	8 KB
AMD AM486SX2	33	32-bit	32-bit	32-bit	8 KB
CYRX CX486DX	33	32-bit	32-bit	32-bit	8 KB W/B
CYRX CX486DX2	50-80	32-bit	32-bit	32-bit	8 KB W/B
CYRX CX486DLC	33-40	32-bit	32-bit	32-bit	1 KB W/B
CYRX CX486SLC	20-33	32-bit	32-bit	32-bit	1 KB W/B
CYRX CX486SLC2	50	32-bit	32-bit	32-bit	1 KB W/B

NOTE
W/T (write-through) and W/B (write-back) cache are explained in Chapter 7, "Memory."

Heat Sinks and Fans

The 486 is notable for one other item, the addition of a standard heat sink and, usually, a fan mounted on the CPU and powered by the PC. To maintain stable operation, the PC must provide proper cooling for the 5486 and newer CPUs. Failure of the cooling apparatus can lead to erratic behavior and—uncorrected—can damage the chip. If a customer complains of strange noises inside the PC, the CPU fan is a good place to look. As their bearings age, they start to whine.

Pentium

By 1993, Windows was standard, and users expected a lot more from PCs in performance and features. Increasing software sophistication led to increasing memory usage and hard disk drive requirements. The market was ready for a major upgrade in CPUs, and Intel once again addressed that need. The new Pentium processor signaled a radical redesign of both the CPU and naming conventions.

With their CPUs identified by numbers, Intel faced a business problem: numbers cannot be trademarked. The company's strategy was to substitute a trademarkable name, "Pentium" for their upcoming chips that would otherwise have been named "586." The word is based on the Latin word for the number five, and this chip would have been the 80586. The original design has been revamped several times since 1993, and now there are Pentium IIs and IIIs. Like the older PC CPUs, the Pentium has spawned its share of clones, leading to entry-level PCs priced under $400.

The Pentium (Series I) offers the following features:

Speeds of 60 to over 200 MHz.

32-bit address bus and 32-bit registers.

64-bit data path to improve the speed of data transfers.

Dual pipeline, 32-bit data bus that allows the chip to process two separate lines of code simultaneously.

At least 8-KB write-back cache for data and an 8-KB write-through cache for programs. (Types of caches are explained in more detail in Chapter 7, "Memory.")

"Branch prediction"—in which the program cache attempts to anticipate branching within the code. The CPU stores a few lines of code from each branch so that when the program reaches the branch, the Pentium already has the code stored within the cache.

The following table lists the first generation of Pentium and Pentium-compatible chips.

Chip	Speed (MHz)	Register Width	External Data Bus	Address Bus	Internal Cache
Intel Pentium	60, 66	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
Intel Pentium	75	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
Intel Pentium	90, 100	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
Intel Pentium	120, 130	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
Intel Pentium	150, 166	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
Intel Pentium	180, 200	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
CYRIX 6x86(P-rating)	100, 120, 133, 200	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T
AMD K5 (P-rating)	75, 90	32-bit	64-bit	32-bit	8 KB W/B and 8 KB W/T

NOTE
W/T (write-through) and W/B (write back) caches are explained in Chapter 7, "Memory." P-rating is a standard method of rating chips by their equivalency to a Pentium chip. It avoids direct comparison of clock speeds. Each processor is tested on an identical system and measured accordingly. If a chip performs 1.5 percent slower than a Pentium chip, it gets the same rating as the next lower chip.

Mass-producing reliable P66 Pentiums proved difficult, and many were rejected during quality control. The faulty chips were stable at clock speeds of 60 MHz, so Intel sold them as the P60. Some users change their P60 processor clock speed to 66 MHz by changing a jumper on the motherboard. This might work, but computer performance and longevity can be unpredictable.

Intel continued to use the 0.8-micron manufacturing process (the ability to draw lines as fine as 1/1000 of a millimeter on the die, about 16,000 lines per inch), begun with the 486, to fit 3.1 million transistors on the Pentium chip. The P66 used considerable power and consequently generated a large amount of heat. Operating a reliable heat sink and fan became critical with the advent of the Pentium.

The Pentium 75 was released in 1994. These chips were made using a 0.6-micron manufacturing process (approximately 21,000 lines and spaces per inch) and, as a result, they required considerably less power, despite an additional 200,000 transistors. Intel was able to change the power supply from 5 volts to 3.3 volts (the DX4 also had a reduced power supply), which reduced by nearly one half the amount of heat produced. The P90 and P100 processors were released at this time. These processors ran internally at 1.5 times the external speed (60 or 66 MHz, which was the fastest system board). A P75 processor was also released for use in lower specification machines and laptop computers.

Superscalar Technology

The main components of a processor—registers, decoders, and ALUs (arithmetic/ logic units)—are collectively known as the instruction pipeline. To carry out a single instruction, a processor must:

Read the instruction.

Decode the instruction.

Fetch operands (for math functions).

Execute the instruction.

Write back the results.

Early processors carried out these steps one at a time. Combining these steps into a single clock cycle, a process known as pipelining, thereby increases the speed of processing. Superscalar technology allows the Pentium to have two instruction pipelines—called U and V. The U pipeline can execute the full range of Pentium instructions, while the V pipeline can execute a limited number. When possible, the Pentium processor breaks up a program into discrete tasks that are then shared between the pipelines, thus allowing the Pentium to execute two simple instructions simultaneously. Software must be specifically written to take advantage of this innovative feature, which is known as multithreading.

Pentium On-Board Cache

The original Pentium series came with two 8-KB caches—one for data and one for program code, compared with the single 8-KB cache on the 486 (16 KB on the DX4). As described with the 486 chip, the cache uses a technique called "branch prediction" to improve its ability to guess what data or program code will be required next by the processor.

Intel's Competitors

Competitors have moved away from simply making clones of the Intel processors. They are currently designing their own processors with unique features:

NextGen Nx586

AMD AmSx86

Cyrix 6x86

IBM 6x86

RISC (Reduced Instruction Set Computing)

Until recently, all the Intel processors had been based on a CISC (complex instruction set computing) architecture. Processors based on RISC (reduced instruction set computing) have been used in high-powered machines since the mid-1980s. Intel has produced its own version of a RISC-based processor that uses a much smaller and simpler set of instructions, greatly enhancing the speed of the processor.

Pentium Pro

Intel made CPU selection even more complex with the introduction of the Pentium Pro, offering varied features, in different models, of the Pentium design. This processor was aimed at a 32-bit server and workstation-level applications such as computer-aided design (CAD), mechanical engineering, and advanced scientific computation. The Pentium Pro was packaged with a second speed-enhancing cache memory chip, and boasted 5.5 million transistors. First available in November, 1995, it incorporated an internal RISC architecture with a CISC-RISC translator, three-way superscalar execution, and dynamic execution. While compatible with all the previous software for the Intel line, the Pentium Pro is optimized to run 32-bit software. Its pin structure and mount differ from the basic Pentium, requiring a special ZIF socket. Some motherboards have sockets for both Pentium and Pentium Pro, but most machines use motherboards designed for one or the other. The package was a 2.46-inch by 2.66-inch 387-pin PGA configuration to house a Pentium Pro processor core and an on-board L2 cache. Although mounted on one PGA device, they are two ICs. A single, gold-plated copper/tungsten heat spreader gives them the appearance of a single chip.

The main CPU and 16-KB first-level (L1) cache consist of 5.5 million transistors; the second chip is a 256- or 512-KB second-level (L2) cache with 15 million transistors. A 133-MHz Pentium Pro processes data about twice as fast as a 100-MHz Pentium.

One reason for the better performance is a technology called dynamic execution. Before processing, the data flow is analyzed and sequenced for optimal execution. Then the system looks ahead in the program process and predicts where the next branch or group of instructions can be found in memory, then processes up to five instructions before they are needed. By using a technique known as data-flow analysis, the Pentium Pro can determine dependencies between data items so they can be processed as soon as their inputs are available, regardless of the program's order.

Pentium MMX

Soon, more choices were on the way. About the time the 166-MHz Pentiums shipped, Intel introduced MMX (multimedia extension) technology, designed to enhance performance of data-hungry applications like graphics and games. With larger data and code caches, Pentiums with MMX technology can run non-MMX-enhanced software approximately 10 to 20 percent faster than a non-MMX CPU with the same clock speed.

To reap the full benefits of the new processor, MMX-enhanced software makes use of 57 special multimedia instructions. These new MMX operators use a technology called single instruction multiple data (SIMD) stream processing. SIMD allows different processing elements to perform the same operations on different data—a central controller broadcasts the instruction to all processing elements in the same way that a drill sergeant would tell a whole platoon to "about face," rather than instruct each soldier individually.

The MMX chips also take advantage of dynamic branch prediction using the branch target buffer (BTB) to predict the most likely set of instructions to be executed.

The MMX Pentium processor is also more compatible with older 16-bit software than is the Pentium Pro; consequently, it soon doomed the Pro to the backwaters of PC computing. All later versions of the Pentium have incorporated some variation of MMX and improved on it. The original Pentium desktop line ended with the 233-MHz MMX release in June of 1997.

Pentium II

By 1997, multimedia was becoming mainstream, and high performance in a graphical user environment was critical to CPU market success. Intel upped the ante with its competitors in 1997 with a radical redesign. The first 233-MHz, 7.5 million-transistor, Pentium II processor incorporated MMX technology and was packaged with a high-speed cache memory chip. Intel released Pentium II versions operating at speeds of up to 450 MHz.

The Pentium II incorporated the features of its older designs and added a number of enhancements. Among these are:

Multiple Branch Prediction: predicts program execution through several branches, accelerating the flow of work to the processor.

Data-flow Analysis: Creates an optimized, reordered schedule of instructions by analyzing data dependencies between instructions.

Speculative Execution: Carries out instructions speculatively and, based on this optimized schedule, ensures that the processor's superscalar execution units remain busy, boosting overall performance.

Single-edge connector (SEC) cartridge packaging: Developed by Intel, this enables high-volume availability and offers improved handling protection and a common form factor for future high-performance processors. This development resolved problems caused when pins were accidentally bent during installation or removal of CPUs.

High-performance Dual Independent Bus (DIB) architecture (system bus and cache bus).

System bus that supports multiple outstanding transactions to increase bandwidth availability. It also provides "glueless" support for up to two processors. This enables low-cost, two-way symmetric multiprocessing, providing a significant performance boost for multitasking operating systems and multithreaded applications. Many inexpensive motherboards offer two Slot 1 sockets, making it easy to build a dual processor system for use with operating systems like Windows NT or 2000.

512-KB unified, nonblocking, L2 cache: Improves performance by reducing average memory access time and providing fast access to recently used instructions and data. Performance is enhanced through a dedicated 64-bit cache bus. The speed of the L2 cache scales with the processor core frequency. This processor also incorporates separate 16-KB, L1 caches: one for instructions and one for data.

Models available in 450, 400, and 350 MHz: Support memory caches for up to 4 GB of addressable memory space.

Error correction coding (ECC) functionality on the L2 cache bus: for applications in which data intensity and reliability are essential.

Pipelined floating-point unit (FPU): supports the 32-bit and 64-bit formats specified in IEEE (Institute of Electrical and Electronics Engineers) standard 754, as well as an 80-bit format.

Parity-protected address/request and response system bus signals, with a retry mechanism for high data integrity and reliability.

Variations on a Theme: The Intel Celeron CPUs

As it had in the past, Intel faced competitors who sold CPUs with similar performance at lower prices. Most high-priced desktop computers and servers were sold with a Pentium of one sort or another, but home and entry-level were another matter. Enter a variation of the SX concept—the Celeron.

Models available in 500, 466, 433, 400, 366, and 333 MHz have expanded Intel processing into the market for computers selling under $1,200.

All the Intel Celeron processors are available in PGA packages. The versions operating at 433, 400, 366, 333, and 300A MHz are also available in single-edge processor packages.

Key features include:

MMX media enhancement technology.

Dynamic Execution Technology.

A 32-KB (16-KB/16-KB) nonblocking, L1 cache for fast access to heavily used data.

Celerons operating at 500, 466, 433, 400, 366 and 333 MHz include integrated 128-KB L2 cache.

All Celeron processors use the Intel P6 microarchitecture's multitransaction system bus at 66 MHz. Processors at 500, 466, 433, 400, 366 and 333 MHz use the Intel P6 microarchitecture's multitransaction system bus with the addition of the L2 cache interface.

Like the Pentium family, the Celerons offer multiple branch prediction, data-flow analysis, and speculative execution.

Figure 4.12 Intel Pentium II in SEC Package

Xenon, the Premium Pentium

Intel has labeled a new CPU brand to denote high-end server and high-performance desktop use. First introduced in June, 1998, the Xenon line commands a premium price and offers extra performance-enhancing technology. The Pentium II models incorporate 7.5 million transistors, clock speeds to 450 MHz, bus speeds of 100 MHz, full-speed L2 caches in varying sizes up to 2 MB, new multiprocessing capabilities, and compatibility with previous Intel microprocessor generations. All models use the SEC package.

Pentium III Processor

The Intel Pentium III processor is the newest member of the P6 family. With 28 million transistors, speeds from 500 to 733 MHz, and system bus speeds of 100 to 133 MHz, they mark a significant jump in PC CPU technology. They employ the same dynamic execution microarchitecture as the PII—a combination of multiple branch prediction, data-flow analysis, and speculative execution. This provides improved performance over older Pentium designs, while maintaining binary compatibility with all previous Intel processors. The Pentium III processor, shown in Figure 4.13, also incorporates MMX technology, plus streaming SIMD extensions for enhanced floating-point and 3-D application performance. It also utilizes multiple low-power states, such as AutoHALT, Stop-Grant, Sleep, and Deep Sleep to conserve power during idle times.

Figure 4.13 The Intel Pentium III Processor

Intel offers a Xenon version of the Pentium III processor at 550 MHz, aimed at high-performance workstations and servers.

Motorola

Motorola has been the mainstay CPU for Apple computers. The 68000 processor was introduced in 1979 as a 32-bit chip with a 16-bit data path. At that time, the 68000 outperformed the Intel 8086. In 1982, the 68010 arrived, adding virtual memory support and a cache capable of holding three instructions.

1984 saw the advent of the Macintosh II-series computer, which used the 68020 processor. It was the first full 32-bit chip, with a 32-bit data path, math coprocessor, and the ability to access up to 4 GB of RAM. Introduced in the same year as Intel's 80286 processor, the Motorola ran faster. However, it lacked the market share and third-party support to gain real marketplace momentum. PC clones offered more programs and at lower cost than the Apple offerings.

The 68030 chip, introduced in 1987, provided increased data and instruction speed. This was comparable to the 80386 chip. The 68040 processor was introduced (in the Macintosh Quadra) as a competitor to the 80486. It has internal caches for data and program code.

The Power PC processor was developed jointly by IBM, Motorola, and Apple. The name stands for performance optimization with enhanced RISC. The chips in this family of processors are suitable for machines ranging from laptop computers to high-powered network servers. It can run MS-DOS software without using emulation.

Lesson Summary

The following points summarize the main elements of this lesson:

The microprocessor is the centerpiece of today's computers.

Understanding the development and progression of the processor is essential in understanding how to mix older technology with new technology.

The three key elements that go into measuring a CPU's performance are its speed, address bus, and external data bus.

The development of the 80286 processor introduced the concepts of real and protected modes and allowed the use of up to 16 MB of memory.

The development of the 80386 processor brought about 32-bit processing and allowed up to 4 GB of memory.

The 80486 processor is a souped-up version of the 80386 and introduced the use of cache memory.

The Pentium chip began a new line of processors and technology, incorporating RISC and true multithreading capabilities in an Intel microprocessor for the first time.

Pentium MMX technology was developed to meet the needs of today's multimedia world.

The Intel Pentium III further extended PC performance with advanced cache technology and streamlined code handling.

Several players are currently competing with Intel for the processor market (NextGen, AMD, Cyrix, IBM), but Intel has the largest market share.

Today's standard processor is the Pentium III, with processor speeds of 500 MHz and greater.