The Windows 2000 I/O system consists of several executive components as well as device drivers, which are shown in Figure 9-1.
Figure 9-1 I/O system components
Most I/O operations don't involve all the components just described. A typical I/O request starts with an application executing an I/O-related function (for example, reading data from a device) that is processed by the I/O manager, one or more device drivers, and the HAL.
In Windows 2000, threads perform I/O on virtual files. The operating system abstracts all I/O requests as operations on a virtual file, hiding the fact that the target of an I/O operation might not be a file-structured device. This abstraction generalizes an application's interface to devices. A virtual file refers to any source or destination for I/O that is treated as if it were a file (such as files, directories, pipes, and mailslots). All data that is read or written is regarded as a simple stream of bytes directed to these virtual files. User-mode applications (whether Win32, POSIX, or OS/2) call documented functions, which in turn call internal I/O system functions to read from a file, write to a file, and perform other operations. The I/O manager dynamically directs these virtual file requests to the appropriate device driver. Figure 9-2 illustrates the basic structure of a typical I/O request flow.
Figure 9-2 The flow of a typical I/O request
In the following sections, we'll be looking at these components more closely, examining the I/O manager in more detail, covering the various types of device drivers and the key I/O system data structures. Then we'll cover the operation and roles of the PnP manager and the power manager.
The I/O manager defines the orderly framework, or model, within which I/O requests are delivered to device drivers. The I/O system is packet driven. Most I/O requests are represented by an I/O request packet (IRP), which travels from one I/O system component to another. (As you'll discover in the section "Fast I/O," fast I/O is the exception; it doesn't use IRPs.) The design allows an individual application thread to manage multiple I/O requests concurrently. An IRP is a data structure that contains information completely describing an I/O request. (You'll find more information about IRPs in the section "I/O Request Packets" later in this chapter.)
The I/O manager creates an IRP that represents an I/O operation, passing a pointer to the IRP to the correct driver and disposing of the packet when the I/O operation is complete. In contrast, a driver receives an IRP, performs the operation the IRP specifies, and passes the IRP back to the I/O manager, either for completion or to be passed on to another driver for further processing.
In addition to creating and disposing of IRPs, the I/O manager supplies code that is common to different drivers and that the drivers call to carry out their I/O processing. By consolidating common tasks in the I/O manager, individual drivers become simpler and more compact. For example, the I/O manager provides a function that allows one driver to call other drivers. It also manages buffers for I/O requests, provides timeout support for drivers, and records which installable file systems are loaded into the operating system. There are close to a hundred different routines in the I/O manager that can be called by device drivers.
The I/O manager also provides flexible I/O services that allow environment subsystems, such as Win32 and POSIX, to implement their respective I/O functions. These services include sophisticated services for asynchronous I/O that allow developers to build scalable high-performance server applications.
The uniform, modular interface that drivers present allows the I/O manager to call any driver without requiring any special knowledge of its structure or internal details. As we stated earlier, the operating system treats all I/O requests as if they were directed at a file; the driver converts the requests from requests made to a virtual file to hardware-specific requests. Drivers can also call each other (using the I/O manager) to achieve layered, independent processing of an I/O request.
Besides the normal open, close, read, and write functions, the Windows 2000 I/O system provides several advanced features, such as asynchronous, direct, buffered, and scatter/gather I/O, which are described in the "Types of I/O" section later in this chapter.
To integrate with the I/O manager and other I/O system components, a device driver must conform to implementation guidelines specific to the type of device it manages and the role it plays in managing the device. In this section, we'll look at the types of device drivers Windows 2000 supports as well as the internal structure of a device driver.
Windows 2000 supports a wide range of different device driver types and programming environments. Even within a type of device driver, programming environments can differ, depending on the specific type of device for which a driver is intended. In this chapter, the focus is on kernel-mode device drivers. There are many different types of kernel-mode drivers, which can be divided into the following broad categories:
In WDM, no one driver is responsible for controlling all aspects of a particular device. The bus driver is responsible for detecting bus membership changes (device addition or removal), assisting the PnP manager in enumerating the devices on the bus, accessing bus-specific configuration registers, and in some cases, controlling power to devices on the bus. The function driver is generally the only driver that accesses the device's hardware.
The role of the HAL in Windows 2000 differs from the role it had in Windows NT. Prior to Windows 2000, third-party hardware vendors that wanted to add support for hardware buses not natively supported had to implement a custom HAL. Windows 2000 allows third parties to implement a bus driver to provide support for hardware buses not natively supported.
In addition to the above device driver types, Windows 2000 also supports several types of user-mode drivers:
Support for an individual piece of hardware is often divided among several drivers, each providing a part of the functionality required to make the device work properly. In addition to WDM bus drivers, function drivers, and filter drivers, hardware support might be split between the following components:
An example will help demonstrate how these device drivers work. A file system driver accepts a request to write data to a certain location within a particular file. It translates the request into a request to write a certain number of bytes to the disk at a particular "logical" location. It then passes this request (via the I/O manager) to a simple disk driver. The disk driver, in turn, translates the request into a physical location (cylinder/track/sector) on the disk and manipulates the disk heads to write the data. This layering is illustrated in Figure 9-3.
This figure illustrates the division of labor between two layered drivers. The I/O manager receives a write request that is relative to the beginning of a particular file. The I/O manager passes the request to the file system driver, which translates the write operation from a file-relative operation to a starting location (a sector boundary on the disk) and a number of bytes to read. The file system driver calls the I/O manager to pass the request to the disk driver, which translates the request to a physical disk location and transfers the data.
Figure 9-3 Layering of a file system driver and a disk driver
Because all drivers—both device drivers and file system drivers—present the same framework to the operating system, another driver can easily be inserted into the hierarchy without altering the existing drivers or the I/O system. For example, several disks can be made to seem like a very large single disk by adding a driver. Such a driver exists in Windows 2000 to provide fault tolerant disk support. (Whereas the driver is present on all versions of Windows 2000, fault tolerant disk support is available only on Windows 2000 Server versions.) This logical, volume manager driver is located between the file system and the disk drivers, as shown in Figure 9-4.
Figure 9-4 Adding a layered driver
Viewing the Loaded Driver List
You can see a list of registered drivers by going to the Drivers section of the Computer Management Microsoft Management Console (MMC) snap-in or by right-clicking the My Computer icon on the desktop and selecting Manage from the context menu. (The Computer Management snap-in is in the Programs/Administrative Tools folder of the Start menu.) You can view the Drivers section within Computer Management by expanding System Tools, System Information, Software Environment and selecting Drivers, as shown here:
You can also obtain a list of loaded kernel-mode drivers with the Drivers utility in the Windows 2000 resource kits or the Pstat utility (ships in the Platform SDK and is available for download from the Windows 2000 Resource Kits Web site at www.microsoft.com/windows2000/library/resources/reskit.) Pstat lists the drivers at the end of its display. (It first lists all the processes and threads in the system.) The only difference in the output of the two utilities is that Pstat shows the load address of the driver in system address space. The following output is a partial display of the driver information from Pstat:
C:\>pstat ModuleName Load Addr Code Data Paged LinkDate ----------------------------------------------------------------------- ntoskrnl.exe 80400000 429184 96896 775360 Tue Dec 07 18:41:11 1999 hal.dll 80062000 25856 6016 16160 Tue Nov 02 20:14:22 1999 BOOTVID.DLL EE010000 5664 2464 0 Wed Nov 03 20:24:33 1999 ACPI.sys BFFD8000 92096 8960 43488 Wed Nov 10 20:06:04 1999 WMILIB.SYS EE1C8000 512 0 1152 Sat Sep 25 14:36:47 1999 pci.sys EDC00000 12704 1536 31264 Wed Oct 27 19:11:08 1999 isapnp.sys EDC10000 14368 832 22944 Sat Oct 02 16:00:35 1999 compbatt.sys EE014000 2496 0 2880 Fri Oct 22 18:32:49 1999 BATTC.SYS EE100000 800 0 2976 Sun Oct 10 19:45:37 1999 intelide.sys EE1C9000 1760 32 0 Thu Oct 28 19:20:03 1999 PCIIDEX.SYS EDE80000 4544 480 10944 Wed Oct 27 19:02:19 1999 pcmcia.sys BFFBD000 32800 8864 23680 Fri Oct 29 19:20:08 1999 ftdisk.sys BFFA0000 4640 32 95072 Mon Nov 22 14:36:23 1999 Diskperf.sys EE102000 1728 32 2016 Thu Sep 30 20:30:40 1999 dmio.sys BFF7E000 104672 15168 0 Tue Nov 30 14:47:49 1999
If you're looking at a crash dump (or live system) with the kernel debugger, you can get a similar display with the kernel debugger !drivers command.
The I/O system drives the execution of device drivers. Device drivers consist of a set of routines that are called to process the various stages of an I/O request. Figure 9-5 illustrates the key driver-function routines, which are described below.
Figure 9-5 Primary device driver routines
Although the following routines aren't shown in Figure 9-5, they're found in many types of device drivers:
The PnP manager is the primary component involved in supporting the ability of Windows 2000 to recognize and adapt to changing hardware configurations. A user doesn't need to understand the intricacies of hardware or manual configuration in order to install and remove devices. For example, it's the PnP manager that enables a running Windows 2000 laptop that is placed on a docking station to automatically detect additional devices located in the docking station and make them available to the user.
Plug and Play support requires cooperation at the hardware, device driver, and operating system levels. Industry standards for the enumeration and identification of devices attached to buses are the foundation of Windows 2000 Plug and Play support. For example, the USB standard defines the way that devices on a USB bus identify themselves. With this foundation in place, Windows 2000 Plug and Play support provides the following capabilities:
Windows 2000 aims to provide full support for Plug and Play, but the level of support possible depends on the attached devices and installed drivers. If a single device or driver doesn't support Plug and Play, the extent of Plug and Play support for the system can be compromised. In addition, a driver that doesn't support Plug and Play might prevent other devices from being usable by the system. Table 9-1 shows the outcome of various combinations of devices and drivers that can and can't support Plug and Play.
Table 9-1 Device and Driver Plug and Play Capability
|Type of Driver|
|Type of Device||Plug and Play||Non-Plug and Play|
|Plug and Play||Full Plug and Play||No Plug and Play|
|Non-Plug and Play||Possible partial Plug and Play||No Plug and Play|
A device that isn't Plug and Play compatible is one that doesn't support automatic detection, such as a legacy ISA sound card. Because the operating system doesn't know where the hardware physically lies, certain operations, such as laptop undocking, sleep, and hibernation, are disallowed. However, if a Plug and Play driver is manually installed for the device, the driver can at least implement PnP manager-directed resource assignment for the device.
Drivers that aren't Plug and Play compatible include legacy drivers, such as those that ran on Windows NT 4. Although these drivers continue to function on Windows 2000, the PnP manager can't reconfigure the resources assigned to such devices in the event that resource reallocation is necessary to accommodate the needs of a dynamically added device. For example, a device might be able to use I/O memory ranges A and B, and during the boot the PnP manager assigns it range A. If a device that can use only A is attached to the system later, the PnP manager can't direct the first device's driver to reconfigure itself to use range B. This prevents the second device from obtaining required resources, which results in the device being unavailable for use by the system. Legacy drivers also impair a machine's ability to sleep or hibernate. (See the section "The Power Manager" for more details.)
To support Plug and Play, a driver must implement a Plug and Play dispatch routine as well as an add-device routine. Bus drivers must support different types of Plug and Play requests than function or filter drivers do, however. For example, when the PnP manager is guiding device enumeration during the system boot (described in detail later in this chapter), it asks bus drivers for a description of the devices that they find on their respective buses. The description includes data that uniquely identifies each device as well as the resource requirements of the devices. The PnP manager takes this information and loads any function or filter drivers that have been installed for the detected devices. It then calls the add-device routine of each driver for every installed device the drivers are responsible for.
Function and filter drivers prepare to begin managing their devices in their add-device routines, but they don't actually communicate with the device hardware. Instead, they wait for the PnP manager to send a start-device command for the device to their Plug and Play dispatch routine. The start-device command includes the resource assignment that the PnP manager determined during resource arbitration. When a driver receives a start-device command, it can configure its device to use the specified resources.
After a device has started, the PnP manager can send the driver additional Plug and Play commands, including ones related to a device's removal from the system or to resource reassignment. For example, when the user invokes the remove/eject device utility, shown in Figure 9-6 (accessible by right-clicking on the PC card icon in the taskbar and selecting Unplug Or Eject Hardware), to tell Windows 2000 to eject a PCMCIA card, the PnP manager sends a query-remove notification to any applications that have registered for Plug and Play notifications for the device. Applications typically register for notification on their handles, which they close during a query-remove notification. If no applications veto the query-remove request, the PnP manager sends a query-remove command to the driver that owns the device being ejected. At that point, the driver has a chance to deny the removal or to ensure that any pending I/O operations involving the device have completed and to begin rejecting further I/O requests aimed at the device. If the driver agrees to the remove request and no open handles to the device remain, the PnP manager next sends a remove command to the driver to request that the driver discontinue accessing the device and release any resources the driver has allocated on behalf of the device.
Figure 9-6 PC card remove/eject utility
When the PnP manager needs to reassign a device's resources, it first asks the driver whether it can temporarily suspend further activity on the device by sending the driver a query-stop command. The driver either agrees to the request, if doing so wouldn't cause data loss or corruption, or denies the request. As with a query-remove command, if the driver agrees to the request, the driver completes pending I/O operations and won't initiate further I/O requests for the device that can't be aborted and subsequently restarted. The driver typically queues new I/O requests so that the resource reshuffling is transparent to applications currently accessing the device. The PnP manager then sends the driver a stop command. At that point, the PnP manager can direct the driver to assign different resources to the device and once again send the driver a start-device command for the device.
The various Plug and Play commands essentially guide a device through an assortment of operational states, forming a well-defined state-transition table, which is shown in simplified form in Figure 9-7. (Several possible transitions and Plug and Play commands have been omitted for clarity. Also, the state diagram depicted is that implemented by function drivers. Bus drivers implement a more complex state diagram.) A state shown in the figure that we haven't discussed is the one that results from the PnP manager's surprise-remove command. This command results when either a user removes a device without warning, as when the user ejects a PCMCIA card without using the remove/eject utility, or the device fails. The surprise-remove command tells the driver to immediately cease all interaction with the device because the device is no longer attached to the system and to cancel any pending I/O requests.
Figure 9-7 Device Plug and Play state transitions
Just as Windows 2000 Plug and Play features require support from a system's hardware, its power-management capabilities require hardware that complies with the Advanced Configuration and Power Interface (ACPI) specification (available at www.teleport.com/~acpi/spec.htm). As a result of this requirement, the computer's BIOS (Basic Input Output System), the code that runs when the computer turns on, must also conform to the ACPI standard. Most x86 computers manufactured since the end of 1998 are ACPI compliant.
Some computers, especially ones more than few years old, don't comply with the ACPI standard. Instead, they often conform to the older Advanced Power Management (APM) standard, which mandates fewer power-management capabilities than ACPI. Windows 2000 provides limited power management for APM systems, but we won't go into the details of that topic here. In this book, we focus on the behavior of Windows 2000 on ACPI computers.
The ACPI standard defines various power levels for a system and for devices. The six system power states are described in Table 9-2. They are referred to as S0 (fully on or working) through S5 (fully off). Each state has the following characteristics:
Table 9-2 System Power-State Definitions
|State||Power Consumption||Software Resumption||Hardware Latency|
|S0 (fully on)||Maximum||Not applicable||None|
|S1 (sleeping)||Less than S0, more than S2||System resumes where it left off (returns to S0)||Less than 2 seconds|
|S2 (sleeping)||Less than S1, more than S3||System resumes where it left off (returns to S0)||2 or more seconds|
|S3 (sleeping)||Less than S2; processor is off||System resumes where it left off (returns to S0)||Same as S2|
|S4 (hibernating)||Trickle current to power button and wake circuitry||System restarts from saved hibernate file and resumes where it left off prior to hibernation (returns to S0)||Long and undefined|
|S5 (fully off)||Trickle current to power button||System boot||Long and undefined|
States S1 through S4 are sleeping states, in which the computer appears to be off because of reduced power consumption. However, the computer retains enough information, either in memory or on disk, to move to S0. For states S1 through S3, enough power is required to preserve the contents of the computer's memory so that when the transition is made to S0 (when the user or a device wakes up the computer), the power manager continues executing where it left off before the suspend. When the system moves to S4, the power manager saves the compressed contents of memory to a hibernation file named Hiberfile.sys, which is large enough to hold the uncompressed contents of memory, in the root directory of the boot volume. (Compression is used to minimize disk I/O and to improve hibernation and resume-from-hibernation performance.) After it finishes saving memory, the power manager shuts off the computer. When a user subsequently turns on the computer, a normal boot process occurs except that Ntldr checks for and detects a valid memory image stored in the hibernation file. If the hibernation file contains saved system state, Ntldr reads the contents of the file into memory, and then resumes execution at the point in memory that is recorded in the hibernation file.
The computer never directly transitions between states S1 and S4; instead, it must move to state S0 first. As illustrated in Figure 9-8, when the system is moving from any of states S1 through S5 to state S0, it's said to be waking, and when it's transitioning from state S0 to any of states S1 through S5, it's said to be sleeping.
Figure 9-8 System power-state transitions
Although the system can be in one of six power states, ACPI defines devices as being in one of four power states, D0 through D3. State D0 is fully on, and state D3 is fully off. The ACPI standard leaves it to individual drivers and devices to define the meanings of states D1 and D2, except that state D1 must consume an amount of power less than or equal to that consumed in state D0, and when the device is in state D2, it must consume power less than or equal to that consumed in D1. Microsoft, in conjunction with the major hardware OEMs, has defined a series of power management reference specifications (available on Microsoft's Web site at www.microsoft.com/hwdev/specs/pmref) that specify the device power states that are required for all devices in a particular class (for the major device classes: display, network, SCSI, and so on). For some devices, there's no intermediate power state between fully on and fully off, which results in these states being undefined.
Power management policy in Windows 2000 is split between the power manager and the individual device drivers. The power manager is the owner of the system power policy. This ownership means that the power manager decides which system power state is appropriate at any given point, and when a sleep, hibernation, or shutdown is required, the power manager instructs the power-capable devices in the system to perform appropriate system power-state transitions. The power manager decides when a system power-state transition is necessary by considering a number of factors:
When the PnP manager performs device enumeration, part of the information it receives about a device is its power-management capabilities. A driver reports whether or not its devices support device states D1 and D2 and, optionally, the latencies, or times required, to move from states D1 through D3 to D0. To help the power manager determine when to make system power-state transitions, bus drivers also return a table that implements a mapping between each of the system power states (S0 through S5) and the device power states that a device supports. The table lists the lowest possible device power state for each system state and directly reflects the state of various power planes when the machine sleeps or hibernates. For example, a bus that supports all four device power states might return the mapping table shown in Table 9-3. Most device drivers turn their devices completely off (D3) when leaving S0 to minimize power consumption when the machine isn't in use. Some devices, however, such as network adapter cards, support the ability to wake up the system from a sleeping state. This ability, along with the lowest device power state in which the capability is present, is also reported during device enumeration.
Table 9-3 Example System-to-Device Power Mappings
|System Power State||Device Power State|
|S0 (fully on)||D0 (fully on)|
|S4 (hibernating)||D3 (fully off)|
|S5 (fully off)||D3 (fully off)|
When the power manager decides to make a transition between system power states, it sends power commands to a driver's power dispatch routine. More than one driver can be responsible for managing a device, but only one of the drivers is designated as the device power-policy owner. This driver determines, based on the system state, a device's power state. For example, if the system transitions between state S0 and S1, a driver might decide to move a device's power state from D0 to D1. Instead of directly informing the other drivers that share the management of the device of its decision, the device power-policy owner asks the power manager, via the PoRequestPowerIrp function, to tell the other drivers by issuing a device power command to their power dispatch routines. This behavior allows the power manager to control the number of power commands that are active on a system at any given time. For example, some devices in the system might require a significant amount of current to power up. The power manager ensures that such devices aren't powered up simultaneously.
Many power commands have corresponding query commands. For example, when the system is moving to a sleep state, the power manager will first ask the devices on the system if the transition is acceptable. A device that is busy performing time-critical operations or interacting with device hardware might reject the command, which results in the system maintaining its current system power-state setting.
Viewing the System Power Capabilities and Policy
You can view a computer's system power capabilities by using the !pocaps kernel debugger command. Here's the output of the command when run on an ACPI-compliant laptop running Windows 2000 Professional:
kd> !pocaps PopCapabilities @ 0x8046adc0 Misc Supported Features: PwrButton SlpButton Lid S1 S3 S4 S5 HiberFile FullWake Processor Features: Thermal Throttle (MinThrottle = 03, Scale = 08) Disk Features: SpinDown Battery Features: BatteriesPresent Battery 0 - Capacity: 00000000 Granularity: 00000000 Battery 1 - Capacity: 00000000 Granularity: 00000000 Battery 2 - Capacity: 00000000 Granularity: 00000000 Wake Caps Ac OnLine Wake: Sx Soft Lid Wake: Sx RTC Wake: S3 Min Device Wake: Sx Default Wake: Sx
The Misc Supported Features line reports that, in addition to S0 (fully on), the system supports system power states S1, S3, S4, and S5 (it doesn't implement S2) and has a valid hibernation file to which it can save system memory when it hibernates (state S4).
The Power Options Properties dialog box, shown below (available by selecting Power Options in Control Panel), lets you configure various aspects of the system's power policy. The exact properties you can configure depends on the system's power capabilities, which we just examined.
Windows 2000 Professional on an ACPI-compliant laptop (such as the system on which we captured the following screen shot) generally provides the most power-management features. On such systems, you can set the idle detection timeouts that control when the system turns off the monitor, spins down hard disks, goes to standby mode (moves to system power state S1), and hibernates (moves the system to power state S4). In addition, the Advanced tab in Power Options lets you specify the power-related behavior of the system when you press the power or sleep buttons or close a laptop's lid.
The settings you configure in Power Options directly affect values in the system's power policy, which you can display with the !popolicy debugger command. Here's the output of the command on the same system:
kd> !popolicy SYSTEM_POWER_POLICY (R.1) @ 0x80469180 PowerButton: Off Flags: 00000003 Event: 00000000 Query UI SleepButton: Sleep Flags: 00000003 Event: 00000000 Query UI LidClose: Hibernate Flags: 00000001 Event: 00000000 Query Idle: None Flags: 00000001 Event: 00000000 Query OverThrottled: Sleep Flags: c0000004 Event: 00000000 Override NoWakes Critical IdleTimeout: 00000000 IdleSensitivity: 32 MinSleep: S1 MaxSleep: S3 LidOpenWake: S0 FastSleep: S1 WinLogonFlags: 00000000 S4Timeout: 00000000 VideoTimeout: 00000000 VideoDim: 6e SpinTimeout: 00000708 OptForPower: 01 FanTolerance: 64 ForcedThrottle: 64 MinThrottle: 19
The first lines of the display correspond to the button behaviors specified on the Advanced tab of Power Options, and on this system the power button is interpreted as an off switch, the sleep button moves the system to a sleep state, and the closing of the laptop lid causes the system to hibernate.
The timeout values shown at the end of the output are expressed in seconds and displayed in hexadecimal notation. The values reported here directly correspond to the settings you can see configured in the Power Options screen shot. (The laptop is plugged in.) For example, the video timeout is 0, meaning the monitor never turns off, and the hard disk spin-down timeout is 0x708, which corresponds to 1800 seconds, or 30 minutes.
Besides responding to power manager commands related to system power-state transitions, a driver can unilaterally control the device power state of its devices. In some cases, a driver might want to reduce the power consumption of a device it controls when the device is left inactive for a period of time. A driver can either detect an idle device itself or use facilities provided by the power manager. If the device uses the power manager, it registers the device with the power manager by calling the PoRegisterDeviceForIdleDetection function. This function informs the power manager of the timeout values to use to detect a device as idle and of the device power state that the power manager should apply when it detects the device as being idle. The driver specifies two timeouts: one to use when the user has configured the computer to conserve energy and the other to use when the user has configured the computer for optimum performance. After calling PoRegisterDeviceForIdleDetection, the driver must inform the power manager, by calling the PoSetDeviceBusy function, whenever the device is active.