The Windows DMA architecture presents an abstract view of the underlying system hardware. This view, called the Windows DMA abstraction, is created by the I/O manager, hardware abstraction layer, bus drivers, and any bus filter drivers that may be present. OEMs can customize some of these components, if required, to support unique features of their hardware; however, most Windows computers use standard hardware designs that are supported by the standard Windows DMA implementation.
The DMA support in KMDF is based upon the Windows DMA abstraction. By using the Windows DMA abstraction, the framework is not required to deal with the unique capabilities and organization of the underlying hardware platform. KMDF and the abstraction define a stable hardware environment that drivers can rely on. As a result, drivers do not require conditional compilation or runtime checks to support different underlying hardware platforms. The Windows DMA abstraction describes a system with the following major characteristics:
DMA operations occur directly from system memory and cannot be assumed to be coordinated with the contents of the processor cache.
Thus, the framework is responsible for flushing any changed data residing in processor cache back to system memory before each DMA operation.
At the end of each DMA operation, part of the transferred data may remain cached either by Windows or by one of the supporting hardware chipsets.
As a result, the framework is responsible for flushing this internal adapter cache after each DMA operation.
The address space on any given device bus is separate from the address space of system memory.
Map registers convert between device bus logical addresses and system memory physical addresses. This mapping allows the Windows DMA abstraction to support software scatter/gather.
In addition, the use of map registers ensures that DMA operations on any device bus can reach any location in system memory. For example, the framework is not required to perform any special processing to ensure that DMA operations on a PCI bus with 32-bit addressing can reach data buffers that are located in system memory above the 4-GB (0xFFFFFFFF) physical address mark.
The Windows DMA abstraction involves several key areas:
DMA operations and processor cache
Completion of DMA transfers by flushing caches
System scatter/gather support
DMA transfer to any location in physical memory
The remainder of this section describes the Windows DMA abstraction more fully.
For KMDF drivers, most of the details for DMA support are handled by the framework, so the information in this section is not critical for your driver development tasks. However, although you are not required to master all of the details of the abstraction to write a KMDF driver that supports DMA, a good understanding of the basic concepts can help you in designing and debugging your driver.
In the Windows DMA abstraction, DMA operations bypass the system hardware cache and take place directly in system memory. Therefore, before performing each DMA operation, the framework is responsible for updating system memory by flushing any data back from the system hardware cache to system memory. When the driver requests a DMA transfer, the framework flushes the data before executing the transfer.
To implement the abstraction, Windows updates system memory only when necessary. On many systems, DMA operations properly reflect the contents of the system hardware cache when it is different from the contents in system memory. As a result, performing a write-back operation from the cache to system memory before a DMA operation is unnecesary. On these systems, the standard Windows DMA implementation does not actually perform any operation in response to a flush request before a DMA operation.
On machines or bus configurations in which the hardware does not ensure cache coherence for DMA operations-such as certain Intel Itanium systems-the standard Windows DMA implementation does the process-specific work that is necessary to ensure such coherency when the driver calls WdfDmaTransactionExecute.
According to the Windows DMA abstraction, data can remain cached by Windows or by another system hardware component after a DMA transfer is complete. This caching is entirely transparent to both the framework and the driver. Therefore, after performing each DMA operation, the framework is responsible for flushing this data from the Windows internal cache to complete the DMA operation. The framework flushes this cache when the driver calls WdfDmaTransactionDmaCompleted.
To implement the abstraction, the framework notifies Windows when each DMA transfer is complete. This notification enables the operating system to properly complete any transfers that use map registers. The next section discusses map registers in detail.
One of the major features of the Windows DMA abstraction is the use of map registers to translate addresses between the system address space and the logical address space that a device bus uses. This section describes the conceptual abstraction of map registers that the Windows DMA model uses and briefly discusses how map registers are realized in the standard Windows DMA implementation.
Figure 17-1 shows how system memory and device buses are connected in the Windows DMA abstraction. The device buses in the diagram might be two PCI buses, for example. Remember that Figure 17-1 is a conceptual diagram and is not intended to depict the physical hardware layout of any specific machine.
Figure 17-1: Conceptually connecting device and memory bus by using map registers
The figure shows two device buses, each of which is separate from the memory bus. In the Windows DMA abstraction, each device bus has an address space that is separate from the address space of all other device buses and that is also separate from the address space of system memory. The address space for a device bus is referred to as the device-bus logical address space for that bus.
A box labeled "Map Registers" connects each device bus to the memory bus in Figure 17-1. Because device-bus logical address space and system memory address space are separate, some component is required to translate addresses between them. That component is the group of map registers. Map registers translate addresses between the memory bus and the device bus so that data can flow between those two buses.
Map registers translate addresses in much the same way that the processor's memory management registers-the page directory entries and page table entries-translate between processor virtual addresses and physical memory addresses. Each map register can translate up to a page of addresses in one direction. A page is 4K on x86 and x64 systems and 8K on Itanium systems and is defined as the PAGE_SIZE constant.
Map registers are a shared resource that the Windows operating system manages. The system reserves map registers based on the maximum transfer length that the driver specifies when it creates the DMA enabler object. The framework allocates the map registers immediately before executing the transaction and frees them after the transaction is complete. Whenever a driver for a DMA device transfers data to or from system memory, the framework allocates and programs the map registers for the transfer.
Because each map register can translate a page of addresses, the number of map registers that a transfer requires depends on the transfer size, as shown by the following equation:
Number of Required Map Registers = (Transfer Size / PAGE_SIZE) + 1
In the preceding equation, the additional map register added by the "+1" accounts for the case in which a transfer does not start at a page-aligned address.
Map registers are allocated in a contiguous block. Each block of map registers represents a contiguous set of device bus logical addresses that can translate an equally sized set of system memory physical addresses in either transfer direction. Windows represents a block of map registers to the framework by a map register base value and a length. These values are not available to the KMDF driver.
Because map registers are a conceptual abstraction that Windows uses, system hardware designers and the kernel-mode developers working with them can implement them in any way they choose. Over the history of Windows, a number of interesting implementations of hardware-based map registers have existed. Some of these designs used memory management logic to implement address mapping between the device bus and the memory bus in a manner similar to that shown in Figure 17-1.
Most modern computer systems running Windows use standard hardware designs and so use the standard Windows DMA implementation. In this implementation, map registers are realized entirely in software.
The standard Windows DMA components implement each map register as one PAGE_SIZE buffer located in physical memory below 4 GB. Depending on the bus being supported, the map registers might even be located in physical memory below the 16-MB physical address mark. When a driver prepares to perform a DMA transfer, Windows determines whether any map registers are required to support the transfer and, if so, how many. Map registers might be needed to support transfer of an entire data buffer, or they might be needed only to support the transfer of some fragments of a data buffer.
If no map registers are required to support the transfer, Windows provides the framework with the physical memory address of a buffer fragment as the fragment's device bus logical address.
If the transfer requires map registers, Windows allocates buffer space for the duration of the transfer. If map registers are required but not enough buffer space is available, the operating system delays the request until sufficient buffer space becomes free. When a DMA transfer that uses map registers is complete, Windows frees the map registers, thus making the buffer space available to support another transfer.
Conceptually, the Windows DMA abstraction always uses map registers to translate between device bus logical addresses and physical memory addresses.
In implementation, the standard Windows DMA components uses map registers if either of the following is true:
The driver for the DMA device indicates that the device does not support hardware scatter/gather.
Any part of the buffer used for a given transfer exceeds the device's addressing capability.
For example, if a DMA device can perform only 32-bit transfers, but part of the buffer being transferred is located above the 4-GB physical address mark, the system uses map registers.
Map registers make possible special support to drivers for devices that do not implement hardware scatter/gather. This section describes the concepts behind the Windows system scatter/gather support and how the standard Windows DMA components implement system scatter/gather support.
On Windows systems, data buffers are rarely contiguous. If the buffers are not contiguous, performing DMA for a device that has no hardware scatter/gather support could easily require a lot of extra processing because a separate DMA transfer would be required for each physical fragment of a user buffer. However, system scatter/gather support makes such extra processing unnecessary.
Map registers, which convert device bus logical addresses to system memory physical addresses and vice versa, map all data transfers between the device bus and the memory bus. To support scatter/gather in software, Windows allocates contiguous map registers for a given transfer. As a result, transfers that use map registers always appear to the device as a contiguous set of device bus logical addresses, even if the corresponding pages in system memory are not physically contiguous.
A picture might help clarify this concept. Figure 17-2 shows how a series of contiguous map registers describes a fragmented data buffer. The standard Windows DMA components have programmed the map registers to point to various locations in the host's physical memory. However, the device bus logical addresses represented by the map registers are contiguous, so the system can use a single device bus logical base address and length to describe them for the DMA device.
Figure 17-2: How a fragmented buffer is translated by using map registers
If this abstraction seems confusing, remember that map registers conceptually translate physical addresses in system memory to device bus logical addresses for DMA transfers in precisely the same way as the system's memory management hardware translates virtual addresses to physical addresses for program execution.
Map registers support system scatter/gather by intermediate buffering of DMA transfers. During initialization, if a driver indicates that its device does not support hardware scatter/ gather, the Windows DMA implementation uses map registers for all DMA transfers to or from the device. Thus, for such devices, a DMA transfer proceeds as follows:
The Windows DMA implementation allocates enough contiguous map registers-that is, low-memory buffers-to contain the data for the entire transfer.
If adequate buffer space is not available, the system delays the request until sufficient buffer space becomes free.
For a write operation-a transfer from system memory to the device-the implementation copies the data from the original data buffer to the map register buffer.
The Windows DMA implementation provides the framework with the physical memory address of the map register buffer as the buffer's device bus logical address.
The framework then passes this address to the driver in a scatter/gather list when it calls the driver's EvtProgramDma callback function to program the device for DMA. Because the buffer is physically contiguous, the scatter/gather list contains only one element: the base address and length of the buffer. The driver uses that length and address-not the actual address of the data buffer fragments-to program the device for the DMA operation. In addition, because the map register buffers are located below the 4-GB physical address mark, such buffers are always within the addressing capability of any bus-master DMA device.
When the driver notifies the framework that the DMA transfer is complete, the framework in turn notifies the standard Windows DMA implementation, which determines whether map registers were used for the transfer and whether the operation was a read from the device to system memory. If so, Windows copies the contents of the map register buffers that were used to the original data buffer. It then frees the map registers, making them available for another DMA transfer.
By using map registers, the Windows DMA abstraction can support any transfer-regardless of the addressing capability of the device bus and the amount of system memory. As a result, even devices with only limited addressing capabilities can access all of physical memory under Windows.
This section describes how map registers make it possible for a DMA device to transfer data to any location in physical memory.
Suppose DMA Device 1 in Figure 17-1 is a 32-bit bus-master DMA device on Device Bus A. Because Device 1 is only 32-bit capable, it can present only addresses from 0x00000000 to 0xFFFFFFFF on the device bus. This represents 4 GB of addressing capability. If Device Bus A were directly connected to system memory, it could only transfer data to or from data buffers in the low 4 GB of the system's physical address space. Because of the way in which the Windows virtual memory system works, it is impossible to prevent a program that uses Device 1 from having its data buffers located above the 4-GB mark on a machine that has more than 4 GB of physical memory-or, indeed, on any system that has memory located above the 4-GB physical address point. To solve this problem, every DMA transfer for Device 1 would be required to either use buffers that are specifically allocated below the 4 GB mark or implement special processing for the physical addresses of any fragments of user data buffers that were located out of its device's addressing range.
The Windows DMA abstraction makes such special processing unnecessary. As part of setting up the DMA transfer, the Windows DMA functions allocate and program the map registers between Device Bus A and the memory bus to perform the required relocation. Map registers translate 32-bit device bus logical addresses to 36-bit memory bus physical addresses in the same way in which memory management registers in certain x86 machines translate 32-bit virtual addresses to 36-bit physical addresses.
In the standard Windows DMA implementation, map registers are implemented entirely in software as contiguous PAGE_SIZE buffers in system memory with a physical address that is less than 4 GB.
Each time a KMDF driver sets up a DMA transfer, the Windows DMA components determine whether map registers are required by checking the addressing capability of the device and the location of the data buffer that is being transferred. If any part of the data buffer is located outside the physical addressing capability of the device, the system uses map registers to perform the transfer. Thus, if a 32-bit DMA device sets up a transfer by using a data buffer that includes one or more fragments located above the 4-GB address mark, the transfer requires map registers.
Windows uses map registers only for the fragments of the data buffer that lie outside the device's addressing range. If the entire fragment resides in memory that the device can address, Windows provides the physical memory address of the fragment as that fragment's device bus logical address.
If a fragment does not lie within the device's addressing capability, the standard Windows DMA implementation performs the following actions:
Allocates a map register to contain the data within the fragment.
If sufficient buffer space is not available, the system delays the request until sufficient buffer space becomes free.
Copies the fragment's data from the original data buffer to the map register buffer if the transfer is a write operation-that is, a transfer from system memory to the device.
Provides the physical memory address of the map register buffer as the fragment's device bus logical address.
The framework uses the physical memory addresses that the system provides to build a scatter/gather list. Each element of the list thus represents the physical memory address of either a data buffer fragment or a map register. When the framework calls the driver's EvtProgramDma callback function, it passes the list as a parameter. The driver uses the elements of the scatter/gather list to program its device for the DMA operation.
When the driver and framework indicate that the DMA transfer is complete, the standard Windows DMA implementation determines whether it used map registers to support the transfer. If it did, and if the operation was a read from the device to system memory, the standard Windows DMA implementation copies the contents of any map register buffers that were used to the original data buffer. Windows then frees the map registers that were used for the transfer, making them available for another DMA transfer.