Hardware Basics

< BACK NEXT >

[oR]

Regardless of the type of device being driven, there are several basic items that must be known.

How to use the device's control and status registers
What causes the device to generate an interrupt
How the device transfers data
Whether the device uses any dedicated memory
How the device announces its presence when attached
How the device is configured, preferably through software

The following sections discuss each of these topics in a general way.

Device Registers

Drivers communicate with a peripheral by reading and writing registers associated with the device. Each device register generally performs one of the following functions:

Command.
Bits in these registers control the device in some way perhaps starting or aborting a data transfer or configuring the device and are typically written by the driver.
Status.
These registers are typically read by the driver to discover the current state of the device.
Data.
These registers are used to transfer data between device and driver. Output data registers are written by the driver, while input data registers are read by the driver.

Simple devices (like the parallel port interface in Table 2.1) have only a few associated registers, while complex hardware (like graphics adapters) have many registers. The number and purpose of the registers is ultimately defined by the hardware designer and should be well documented in the Hardware Design Specification. Often, however, trial and error is required on the part of the driver author to determine the real behavior of bits in the various device registers. Further, experience shows that "reserved" bits in a register do not necessarily mean "don't care" bits. It is usually best to mask out these bits when reading, and to force them to zero when writing registers with reserved bits.

Accessing Device Registers

Once the hardware functions are known, there is still the small matter of knowing how to programmatically reference the device's registers. To do this, two additional pieces of information are required.

The address of the device's first register
The address space where the registers live

Usually, device registers occupy consecutive locations. Therefore, the address of the first register is a necessary clue in gaining access to all others. Unfortunately, the term address has varied meanings in a virtual address space on different platforms, so a complete discussion of this topic will have to wait until chapter 8. In general terms, however, device registers are accessed by the CPU in one of two ways: through CPU-specific I/O instructions or through standard memory reference instructions. Figure 2.1 depicts the two methods. Each of these methods is explained briefly in the following sections.

Figure 2.1. CPU access of I/O vs. memory registers.

Table 2.1. *Parallel Port Interface Registers*
Parallel port registers
Offset	Register	Access	Description
0	Data	R/W	Data byte transferred through parallel port
1	Status	R/O	Current parallel port status
	Bits 0 - 1		Reserved
	Bit 2		0 - interrupt has been requested by port
	Bit 3		0 - an error has occurred
	Bit 4		1 - printer is selected
	Bit 5		1 - printer out of paper
	Bit 6		0 - acknowledge
	Bit 7		0 - printer busy
2	Control	R/W	Commands sent to parallel port
	Bit 0		1 - strobe data to/from parallel port
	Bit 1		1 - automatic line feed
	Bit 2		0 - initialize printer
	Bit 3		1 - select printer
	Bit 4		1 - enable interrupt
	Bits 5 - 7		Reserved

I/O SPACE REGISTERS

Some CPU architectures (notably Intel x86) reference device registers using I/O machine instructions. These special instructions reference a specific set of pins on the CPU and therefore define a separate bus and address space for I/O devices. Addresses on this bus are sometimes known as ports and are completely separate from any memory address. On the Intel x86 architecture, the I/O address space is 64 KB in size (16 bits), and the assembly language defines two instructions for reading and writing ports in this space: IN and OUT.

Of course, as discussed in the first chapter, driver code should be platform-independent, so references to the actual IN and OUT instructions should be avoided. Instead, one of several HAL macros listed in Table 2. 2 should be used.

MEMORY-MAPPED REGISTERS

Not all CPU architects see the need for a separate I/O address space, in which case device registers are mapped directly into the memory space of the CPU. Motorola processors are one such example. Similarly, it is possible (and common) to design hardware devices that interface to the memory address and data buses of a CPU even when that CPU supports separate I/O space. In some cases, devices (e.g., a video adapter) will touch both I/O and memory space.

Devices that expose large data buffers usually map into memory space. This allows fast and convenient access from high-level languages such as C. The simple and familiar dereferencing of a pointer permits direct access to a device's buffer.

As before, the HAL provides a set of macros for accessing memory-mapped device registers and these are listed in Table 2.3. Since these macros differ from the I/O space HAL macros, a device that must be supported on two different platforms (with different register access techniques) must be cleverly written. It is common to write a driver-specific macro that points to one of two HAL macros, depending on the presence of a unique compiler symbol. Techniques listed later in this book describe this process more fully.

Table 2.2. *HAL Macros to Access Ports in I/O Space*
HAL I/O Space Macros
Function	Description
READ_PORT_XXX	Read a single value from an I/O port
WRITE_PORT_XXX	Write a single value to an I/O port
READ_PORT_BUFFER_XXX	Read an array of values from consecutive I/O ports
WRITE_PORT_BUFFER_XXX	Write an array of values to consecutive I/O ports

Table 2.3. *HAL Memory-Mapped Register Macros*
HAL Memory-Mapped Register Macros
Function	Description
READ_REGISTER_XXX	Read a single value from an I/O register
WRITE_REGISTER_XXX	Write a single value to an I/O register
READ_REGISTER_BUFFER_XXX	Read an array of values from consecutive I/O registers
WRITE_REGISTER_BUFFER_XXX	Write an array of values to consecutive I/O registers

Device Interrupts

Since devices typically perform their hardware actions in parallel with and asynchronous to normal CPU operation, it is common for devices to signal or generate an interrupt when CPU driver attention is required. Different CPUs have different mechanisms for being interrupted, but there is always one (or more) pin that can be driven or yanked by a device when service is needed. It is then the responsibility of the CPU to save the CPU state and context of the currently running code path before jumping into a driver-supplied Interrupt Service Routine.

Devices generate interrupts at strategic points in time, including

When the device has completed a previously requested operation and is now ready for an additional request.
When a buffer or FIFO of the device is almost full (during input) or almost empty (during output). This interrupt allows the driver an opportunity to empty (input) or refill (output) the buffer to keep the device operating without pause.
When the device encounters an error condition during an operation. This is really just a special form of a completed operation.

Devices which do not generate interrupts can cause serious system performance degradation. Since the CPU is shared among many running threads on Windows 2000, it is not acceptable to allow a driver to steal precious cycles just waiting in a tight loop for a device operation to complete. Later chapters present some techniques that can be used when working with noninterrupting devices.

With the complex world of PC hardware, buses connect to other buses through an interface, or bridge. As a result, the source of an interrupt (e.g., a device) is often several hardware layers away from the CPU. The techniques for prioritization and signaling are therefore distorted along the real path to the CPU. Nevertheless, interrupts can be characterized as having several properties.

INTERRUPT PRIORITY

When several devices need attention at the same time, there needs to be a mechanism to describe which device is serviced first. Presumably the most important device or the device that can least afford to wait is given the highest priority. If a device can wait, it is assigned a lower interrupt priority. The assignment of an interrupt priority to a device is a configuration option. Hopefully, this priority can be assigned by software during device initialization.

Interrupt priority means that while the CPU is servicing a lower priority device (i.e., executing its Interrupt Service Routine) a higher priority device can still interrupt. In such a case, the CPU has taken, or accepted, two interrupts the second on top of the first. Conversely, if a higher priority device is being serviced, lower priority interrupts are held off (and presumably not lost) until the higher priority interrupt service is completed and dismissed.

INTERRUPT VECTORS

Some devices and/or CPU architectures allow an interrupt to automatically dispatch (i.e., jump) to a software-defined function for servicing of the interrupt. Without interrupt vector capability, a common interrupt service routine must be supplied for all interrupt types. This common routine would then have to poll through a list of possible interrupting devices (in priority order) to determine the actual device requiring service. Since real systems handle tens to hundreds of interrupts per second, vectoring of interrupts can be considerably more efficient.

SIGNALING MECHANISMS

There are two basic strategies that devices use when generating an interrupt. An older, less desirable mechanism is known as edge-triggered or latched interrupts. Devices which generate edge-triggered interrupts signal their need for service by producing a transition on a hardware line, perhaps from 1 to 0. Once the transition has been generated, the device might release the line, restoring it to a logical 1 level. In other words, the interrupt line is pulsed by the device and it is the responsibility of the CPU to notice the pulse when it occurs.

Latched interrupts are subject to false signaling, since noise on the interrupt line may look like a pulse to the CPU. Much worse, however, is the problem that occurs when two devices attempt to share a single edge-triggered line. If the two devices signal simultaneously, the CPU recognizes only a single interrupt. Since the pulse occurs at an instant in time, the fact that two (instead of one) devices needed service is forever lost.

The classic example of lost edge-triggered interrupts occurred with old serial COM ports. Traditionally, COM1 and COM3 shared a single edge-triggered x86 interrupt, IRQ4. As a result, both ports could not be used simultaneously with interrupt-driven software. Attempts to use a mouse on COM1 with a modem on COM3 invariably led to a frozen situation for either the mouse or modem driver, which remained waiting for the lost interrupt to occur.

Such limitations do not occur when working with a level-sensitive, or level-triggered signaling mechanism. Devices using this technique signal their intent to interrupt by keeping a hardware line driven until their need is met. The CPU can detect an interrupt at any time since the line remains yanked until serviced. Thus, two or more devices can safely share a level-sensitive interrupt. When two interrupts occur simultaneously, the higher priority device can be safely serviced, knowing that the other device is continuing to signal its intentions by continually driving the line.

PROCESSOR AFFINITY

When a hardware system includes more than one processor, an issue of how interrupts are handled is raised. Is the device's interrupt line wired to only one CPU or to all? Usually, a special piece of hardware exists to allow for a driver's configuration and distribution of the interrupt signal. If a particular CPU can service a device's interrupt, those interrupts are said to have affinity to that CPU. Forcing interrupts to a specific CPU might be used as an attempt to control device load balancing among several CPUs and devices.

Data Transfer Mechanisms

There are three basic mechanisms that a device may use to move data to or from the CPU or memory.

Programmed I/O
Direct memory access (DMA)
Shared buffers

The transfer mechanism selected by a hardware designer is largely dictated by the device's speed and the average size of data transfer. Of course, a device may choose to use more than one mechanism to transfer its data.

The following sections describe the differences between the three techniques.

PROGRAMMED I/O

Programmed I/O (PIO) devices transfer data directly through data registers of the device. Driver code must issue an I/O instruction to read or write the data register for each byte of data. Software buffer addresses and byte counts must be kept as state of the driver for larger transfers.

Since the actual device transfer rate is probably much slower than the time required by the CPU to write or read a data register, a PIO device typically interrupts once for each byte (or word) of data transferred. Serial COM ports are an example of PIO devices. Better hardware includes a FIFO in front of the real hardware, thus allowing one interrupt for every 4 or 16 bytes transferred. Still, the ratio of interrupts to bytes transferred remains high for PIO devices, and the technique is suitable only for slow devices.

Clever software design techniques can minimize the performance impact of PIO devices. Such techniques are discussed in chapter 8.

DIRECT MEMORY ACCESS

Direct memory access (DMA) devices take advantage of a secondary processor called a DMA controller (DMAC). A DMAC is a very limited auxiliary processor with just enough intelligence (and state) to transfer a specified number of bytes between a device and memory. The DMAC operates in parallel with the main CPU(s), and its operations typically have little effect on overall system performance.

To initiate an I/O operation, the driver must set up or program the DMAC by supplying a starting buffer address for the transfer along with a byte transfer count. When the order to start is given by the driver, the DMAC operates without further software intervention, moving bytes between device and system RAM. When the DMAC completes the entire transfer, an interrupt is generated. Thus, driver code executes only at the beginning of a transfer and at the completion of a transfer, freeing the CPU to perform other tasks.

High-speed devices that routinely need to transfer large blocks of data are well suited to utilize DMA. Interrupt overhead and driver activity is significantly reduced as compared to PIO operation. Disks, multimedia devices, and network cards are all examples of DMA devices.

It should be pointed out that the actual DMA transfer is not really transparent to other system operation. The DMAC secondary processor competes for memory bandwidth with the CPU(s) of the system. If the CPU is referencing main memory frequently, either the CPU or the DMAC must be held off while the previous memory cycle completes. Of course, with today's large CPU cache sizes, a CPU seldom places massive demand on memory bandwidth. A system with a large number of bus master DMA devices, however, may find that memory bandwidth is saturated as the devices compete with each other during simultaneous transfers.

DMA Mechanisms

Chapter 12 covers the details of DMA transfer, but to complete the overview of DMA now, there are two general types of DMA operation.

SYSTEM DMA

The original PC specification by IBM (and subsequent standards) included a mainboard (a.k.a. motherboard) with a set of community DMACs. Each DMAC is known as a DMA channel, and a given device can be configured to utilize one (or more) of the available channels. There were originally four channels, which expanded to seven with the introduction of the AT. System DMA is also known as slave DMA.

The advantage of using system DMA is that the amount of hardware logic for DMA on a device is reduced. The disadvantage is that when devices share a channel, only one at a time may actually participate in a DMA transfer. At any given time, the DMA channel is "owned" by a single device others attempting to utilize the channel must wait their turn until the first device relinquishes ownership. This sharing situation would not work well for two high-speed, busy devices. The floppy controller in most PCs is an example of slave DMA operation.

BUS MASTER DMA

More complicated devices that do not wish to share DMAC hardware include their own customized DMA hardware. Because the hardware to perform DMA is on-board the controller itself, ownership is always guaranteed and transfers occur at will. SCSI controllers are often DMA bus masters.

Device-Dedicated Memory

The third type of data transfer mechanism that a device may use is shared memory. There are two general reasons why a device may wish to borrow (or own) system memory address space.

RAM or ROM might be a resource that is device-resident. For convenient, high-speed access by the driver code, it might make sense to map a view of the device's memory into CPU memory space. As an example, the device might contain a ROM with startup code and data. In order for the CPU to execute this code, it first must be mapped into the visible address space of the CPU.

The device may contain a high-speed specialized processor that relies on system memory for its buffer needs. A video capture card, for example, might make use of system memory to record the video image being streamed into it. Note that this second reason for borrowed address space is really a kind of DMA operation. In this case, the secondary processor is more intelligent and capable of more operation than a simple DMAC.

Devices generally take one of two approaches to deal with dedicated memory. Some specify a hard-coded range of physical addresses for their use. A VGA video adapter card, for example, specifies a 128 KB range of addresses beginning at 0xA0000 for its video buffer.

Other devices allow an initialization routine to specify the base address of the dedicated memory with software. This latter technique is more flexible, but Windows 2000 allows for either method to work with the operating system.

Auto-recognition and Auto-configuration

Every hardware device consumes PC resources. These resources consist of an I/O address range, an IRQ, a DMA channel, and perhaps a range of dedicated memory addresses. Since different devices are made at different times by different vendors, the possibility for conflict of resources is high inevitable, in fact.

The first PCs required that an intelligent owner configure each device by setting jumpers, or DIP switches, to assign unique resources to each card within a system. The installation of a new device required knowledge of what system resources were already assigned to existing devices. Errors in this manual configuration process were common, with the net result being an unbootable system, a system with intermittent crashes, or unusable devices.

New bus architectures have been introduced that deal directly with the problem of automatic recognition and configuration. Autorecognition is necessary so new devices added to a system report their presence. This could happen at boot/reset time, or better yet, as soon as the new hardware is inserted. Buses and hardware that support hot plugability allow software to safely add and remove hardware without a reboot.

Auto-configuration allows software to assign available resources to software-configurable hardware. This feature allows naive users to install new hardware without first setting jumpers. The operating system ultimately remains in charge of assignment of resources to various devices, whether installed prior to or post-boot.

Considerable effort is spent in subsequent chapters describing the protocol Windows 2000 uses to support auto recognition and configuration. It should be apparent, however, that regardless of the device and bus type, a well-behaved device must support several features.

DEVICE RESOURCE LISTS

At a minimum, a device must identify itself and provide the system with a list of resources that it consumes. The list should include

Manufacturer ID
Device type ID
I/O space requirements
Interrupt requirements
DMA channel requirements
Device memory requirements

NO JUMPERS OR SWITCHES

To support auto-configuration, the device must allow authorized software to dynamically set and change port, interrupt, and DMA channel assignments. This permits Windows 2000 to arbitrate resource conflicts among competing devices.

CHANGE NOTIFICATION

The device, in conjunction with the bus to which it attaches, must generate a notification signal whenever the device is inserted or removed. Without this feature, it is not possible to support hot-plugability or auto-recognition.