Supporting DMA in a KMDF driver requires code in several of the driver's event callback functions, as Figure 17-3 shows.
Figure 17-3: DMA implementation in KMDF drivers
As the figure shows, DMA-related processing takes place in four phases:
During driver initialization, typically in the EvtDriverDeviceAdd callback, the driver initializes and creates the DMA enabler object and common-buffer object that are required to support the DMA device.
When an I/O request arrives that requires DMA, the driver creates a transaction object if it has not already done so and initiates the DMA transaction.
This code typically is in the EvtIoRead, EvtIoWrite, or other I/O event callback, but might be in a different driver function if the driver has set up its queue for manual dispatching.
When the framework has set up the buffers that are required for the transfer, it calls the driver's EvtProgramDma callback function.
This function programs the device hardware to perform a DMA transfer.
Each time the hardware completes a DMA transfer, the driver determines whether the entire transaction is complete, typically during the EvtInterruptDpc callback function.
If so, the driver completes the I/O request. If not, the framework prepares the next transfer and repeats phase 3.
The following sections describe each of these processing phases in detail, using sample code that is based on the PLX9x5x sample KMDF driver provided with the WDK. The sample driver supports a PCI device that has port, memory, interrupt and DMA resources. The device can be stopped and started at runtime and supports low-power states. The hardware has two DMA channels, so the driver uses one channel for reads and the other for writes. The driver configures two sequential queues: one for read requests and the other for write requests.
During initialization, typically in the EvtDriverDeviceAdd callback, the driver configures and creates the DMA enabler object and, if the device supports common-buffer DMA, the common buffer object.
If the driver has specified a packet-based DMA profile, it must serialize all of its DMA transactions because the framework allows only one packet-based DMA transaction to execute at any given time. To implement the serialization, the driver should either dispatch all of the I/O requests that require DMA from the same sequential queue or implement manual dispatching and call the framework to get the next I/O request only after the previous request is complete.
If the driver configures a sequential queue to dispatch requests for DMA I/O, it should also create the WDF DMA transaction object during initialization. In this case, only one such request is ever active at a given time, so the driver can create a single DMA transaction object during initialization and can reuse that object for each DMA request.
The driver uses the DMA enabler object (that is, WDFDMAENABLER) to communicate with the framework about DMA transfers for a specific device object. The DMA enabler object maintains information about the DMA capabilities of the device.
Before creating the object, the driver must first initialize a WDF_DMA_ENABLER_CONFIG structure by calling the WDF_DMA_ENABLER_CONFIG_INIT macro. The macro initializes the structure with a DMA profile and the device's maximum transfer length.
The DMA profile indicates the device's basic DMA characteristics, such as whether it supports 64-bit addressing and hardware scatter/gather. Most of the names of the available DMA profiles are self explanatory, but for completeness, Table 17-2 lists all of the profiles and their associated attributes.
Profile name | Capable of 64-bit addressing | Supports hardware scatter/gather | Supports simultaneous read and write operations |
---|---|---|---|
WdfDmaProfilePacket | No | No | No |
WdfDmaProfileScatterGather | No | Yes | No |
WdfDmaProfileScatterGatherDuplex | No | Yes | Yes |
WdfDmaProfilePacket64 | Yes | No | No |
WdfDmaProfileScatterGather64 | Yes | Yes | No |
WdfDmaProfileScatterGather64Duplex | Yes | Yes | Yes |
After initializing the structure, the driver creates the DMA enabler object by calling WdfDmaEnablerCreate.
If the device performs common-buffer DMA, the driver also must create a common-buffer object (that is, WDFCOMMONBUFFER) by calling WdfCommonBufferCreate or WdfCommonBufferCreateWithConfig. These methods are identical, except that WdfCommonBufferCreateWithConfig takes as an additional input parameter a WDF_COMMON_BUFFER_CONFIG structure, which includes the alignment requirement for the buffer.
Next, the driver must get the system virtual address and the device bus logical address of the common buffer by calling two additional WDF methods:
WdfCommonBufferGetAlignedVirtualAddress, which returns the system virtual address of the common buffer.
WdfCommonBufferGetAlignedLogicalAddress, which returns the device bus logical address of the common buffer.
These addresses are required to program the device in the EvtProgramDma callback function.
The I/O requests for which the sample driver performs DMA are dispatched from a sequential queue. Therefore, the driver creates a DMA transaction object during initialization. Instead of creating and deleting a DMA transaction object for each request, the driver can reuse this object for each additional DMA transaction.
To create the object, the driver calls WdfDmaTransactionCreate, passing in a handle to the previously created DMA enabler object and receiving a handle to the newly created DMA transaction object. Both the framework and the driver use the DMA transaction object to manage the DMA operations for a given request.
The example in Listing 17-1 supports a hybrid device, so it demonstrates the required initialization for both common-buffer and packet-based DMA support. This example is from the Sys\Init.c file in the PLX9x5x sample driver.
Listing 17-1: DMA initialization
WDF_DMA_ENABLER_CONFIG dmaConfig; WdfDeviceSetAlignmentRequirement( DevExt->Device, PCI9656_DTE_ALIGNMENT_16 ); WDF_DMA_ENABLER_CONFIG_INIT( &dmaConfig, WdfDmaProfileScatterGather64Duplex, DevExt->MaximumTransferLength ); status = WdfDmaEnablerCreate( DevExt->Device, &dmaConfig, WDF_NO_OBJECT_ATTRIBUTES, &DevExt->DmaEnabler ); if (!NT_SUCCESS (status)) { . . . //Error-handling code omitted } // Allocate common buffer for building writes DevExt->WriteCommonBufferSize = sizeof(DMA_TRANSFER_ELEMENT) * DevExt->WriteTransferElements; status = WdfCommonBufferCreate( DevExt->DmaEnabler, DevExt->WriteCommonBufferSize, WDF_NO_OBJECT_ATTRIBUTES, &DevExt->WriteCommonBuffer ); if (!NT_SUCCESS(status)) { . . . //Error-handling code omitted } DevExt->WriteCommonBufferBase = WdfCommonBufferGetAlignedVirtualAddress(DevExt->WriteCommonBuffer); DevExt->WriteCommonBufferBaseLA = WdfCommonBufferGetAlignedLogicalAddress(DevExt->WriteCommonBuffer); RtlZeroMemory( DevExt->WriteCommonBufferBase, DevExt->WriteCommonBufferSize); WDF_OBJECT_ATTRIBUTES_INIT_CONTEXT_TYPE(&attributes, TRANSACTION_CONTEXT); status = WdfDmaTransactionCreate( DevExt->DmaEnabler, &attributes, &DevExt->ReadDmaTransaction); if(!NT_SUCCESS(status)) { . . . //Error-handling code omitted }
The example in Listing 17-1 performs the following tasks:
Sets the required alignment for the device object.
Initializes and creates a DMA enabler object.
Creates a common buffer.
Gets the addresses of the common buffer.
Creates a DMA transaction object.
The sample driver starts by setting the required alignment for the device object. The framework uses this value as the alignment for DMA if the driver does not specify an alignment requirement when it creates the common buffer.
Next, the driver initializes a WDF_DMA_ENABLER_CONFIG structure by calling WDF_DMA_ENABLER_CONFIG_INIT. It specifies the DMA profile that best describes the device and the maximum transfer length that the device supports in a single DMA operation. The driver selects the WdfDmaProfileScatterGather64 profile to indicate that the device supports both 64-bit DMA transfers and hardware scatter/gather.
With the WDF_DMA_ENABLER_CONFIG structure initialized, the driver creates a new DMA enabler object by calling WdfDmaEnablerCreate. The driver stores the handle to the created object for later use.
This device uses a hybrid design-that is, it supports a combination of packet-based and common-buffer DMA, so the sample driver now creates a common buffer. It does this by calling WdfCommonBufferCreate, passing the length in bytes of the required common buffer. The allocated common-buffer area is not necessarily physically contiguous. By default, the common buffer has the same alignment that was specified earlier in the call to WdfDeviceSetAlignmentRequirement. Alternatively, the driver could call WdfCommonBufferCreateWithConfig to create the buffer and set the alignment requirement.
In addition to allocating the common-buffer space, WdfCommonBufferCreate and WdfCommonBufferCreateWithConfig allocate enough contiguous map registers to translate the physical addresses spanned by the common buffer to device bus logical addresses. These methods also program those map registers to perform the necessary translations between logical and physical device bus addresses.
Next, the driver calls WdfCommonBufferGetAlignedVirtualAddress to get the kernel virtual address of the common buffer that it just created. The driver uses this address to manipulate the data structures in the common-buffer area that it shares with the device. The driver completes its DMA-specific initialization by calling WdfCommonBufferGetAlignedLogicalAddress to get the device bus logical address of the common buffer.
Finally, the driver creates a DMA transaction object by calling WdfDmaTransactionCreate, passing a handle to the DMA enabler object. The driver uses this transaction object for all DMA read requests.
When the driver receives an I/O request that requires DMA, it initiates the DMA transaction. Typically, this code appears in the EvtIoRead, EvtIoWrite, or other I/O event callback, but if the driver manually retrieves I/O requests from a queue, the code might appear elsewhere.
Before the driver can initiate a DMA transaction, it must initialize the DMA transaction object with information about the requested transfer. The framework uses this information-along with the DMA profile that the driver previously supplied in the DMA enabler object-to calculate the number of required map registers for the transfer and to create the scatter/gather list that the driver uses to program the device.
If the driver has not already created a DMA transaction object to use for this transaction, it must first create a new DMA transaction object by calling WdfDmaTransactionCreate.
The driver can then initialize the transaction by calling either WdfDmaTransactionInitializeUsingRequest or WdfDmaTransactionInitialize.
If the driver has received an I/O request from the framework, it uses WdfDmaTransactionInitializeUsingRequest to initialize the transaction object with data from the request object. This method takes as input a pointer to the WDFREQUEST object to be processed, an enumeration constant that indicates whether the transfer moves data to or from the device, and a pointer to the driver's EvtProgramDma callback.
If the driver performs common-buffer DMA or performs DMA transactions that are not based on an I/O request, it calls WdfDmaTransactionInitialize to initialize the transaction object. In addition to the direction of the transfer and a pointer to the EvtProgramDma callback, this method takes as input a pointer to an MDL that describes the buffer to use for the transfer, the virtual address of the buffer, and the buffer length. The driver calls WdfRequestRetrieveInputWdmMdl for a write request or WdfRequestRetrieveOutputWdmMdl for a read request to get a pointer to the MDL, and then calls kernel memory manager functions to get its virtual address and length.
After it initializes the DMA transaction object, the driver can start processing the DMA transaction by calling WdfDmaTransactionExecute. Before beginning the DMA transaction, this method flushes any changed data in the processor cache back to system memory. It then calls the driver's EvtProgramDma callback to request that the driver program the device for this DMA transfer.
The following example draws again from the PLX9x5x sample driver. The code in Listing 17-2 shows the steps that a typical KMDF driver performs to initiate a DMA transfer. This code appears in the Read.c file.
Listing 17-2: DMA initiation
VOID PLxEvtIoRead( IN WDFQUEUE Queue, IN WDFREQUEST Request, IN size_t Length ) { NTSTATUS status = STATUS_UNSUCCESSFUL; PDEVICE_EXTENSION devExt; // Get the DevExt from the Queue handle devExt = PLxGetDeviceContext(WdfIoQueueGetDevice(Queue)); do { // Validate the Length parameter. if (Length > PCI9656_SRAM_SIZE) { status = STATUS_INVALID_BUFFER_SIZE; break; } // Initialize the DmaTransaction. status = WdfDmaTransactionInitializeUsingRequest(devExt->ReadDmaTransaction, Request, PLxEvtProgramReadDma, WdfDmaDirectionReadFromDevice ); if(!NT_SUCCESS(status)) { . . . //Error-handling code omitted break; } // Execute this DmaTransaction. status = WdfDmaTransactionExecute( devExt->ReadDmaTransaction, WDF_NO_CONTEXT); if(!NT_SUCCESS(status)) { . . . //Error-handling code omitted break; } // Indicate that the DMA transaction started successfully. // The DPC routine will complete the request when the DMA // transaction is complete. status = STATUS_SUCCESS; } while (0); // If there are errors, clean up and complete the request. if (!NT_SUCCESS(status )) { WdfDmaTransactionRelease(devExt->ReadDmaTransaction); WdfRequestComplete(Request, status); } return; }
The example in Listing 17-2 shows the driver's EvtIoRead callback, which performs the following tasks to initiate a DMA transaction:
Gets a pointer to the device context area, which contains the handle to the DMA transaction object.
Validates the transfer length.
Initializes the transaction.
Starts the transaction.
The driver starts by calling the PLxGetDeviceContext accessor function to get a pointer to its WDFDEVICE context area. It then validates the length of the transfer.
The driver next calls WdfDmaTransactionInitializeUsingRequest to associate the request that the framework passed to its EvtIoRead callback with the DMA transaction object that it previously created. As input parameters, this function takes handles to both the I/O request object and the DMA transaction object. It also takes as input a transfer direction indicator-WdfDmaDirectionReadFromDevice or WdfDmaDirectionWriteToDevice-and a pointer to the driver's EvtProgramDma callback, which is named PLxEvtProgramDma. WdfDmaTransactionInitializeUsingRequest validates the parameters for the request and sets up as much of the internal infrastructure as possible.
If WdfDmaTransactionInitializeUsingRequest completed successfully, the driver calls WdfDmaTransactionExecute. This method:
Determines the length of the DMA transfer.
The length of the DMA transfer depends on whether the current I/O request can be satisfied with one transfer or whether, because of size constraints imposed by the device or constraints on the availability of mapping registers, the transaction must be divided into multiple transfers. If the framework can process the entire request in a single DMA transfer, then it does so. If not, the framework divides the transaction into multiple DMA transfers and processes them serially.
Requests that Windows make the processor cache coherent with system memory for the purposes of a DMA request.
Allocates and initializes the necessary resources to perform the transfer.
This step includes allocating and programming any necessary map registers and building the scatter/gather list that will be passed to the driver.
Calls the driver's EvtProgramDma callback, passing a pointer to the created list of device bus logical base address and length pairs, so that the driver can program the device to initiate the DMA operation.
If an error occurs during initiation, the driver calls WdfDmaTransactionRelease to free the resources that the framework set up for the transaction without deleting the transaction object. The driver then completes the I/O request with an error status in the usual way.
If WdfDmaTransactionExecute determines that multiple DMA transfers are necessary to fulfill the DMA transaction, the framework performs these transfers serially. That is, the framework determines the length for the first transfer and calls the driver's EvtProgramDma callback to program the device for that transfer. Later, after the first transfer is complete and the driver calls WdfDmaTransactionDmaCompleted, typically from its EvtInterruptDpc function, the framework determines whether the entire transaction has been completed. If not, the framework calculates the length of the next transfer and calls the driver's EvtProgramDma callback again to perform the transfer. This cycle repeats until the entire DMA transaction is complete. The driver can then complete the associated I/O request.
A driver's EvtProgramDma callback function programs the DMA device to perform a transfer. The framework passes the driver a pointer to a scatter/gather list, which contains one or more pairs of a device bus logical address and a length that together describe the transfer. The driver uses these address/length pairs to program the device for the DMA transfer.
The following is the prototype for this callback function:
typedef BOOLEAN (*PFN_WDF_PROGRAM_DMA) ( IN WDFDMATRANSACTION Transaction, IN WDFDEVICE Device, IN WDFCONTEXT Context, IN WDF_DMA_DIRECTION Direction, IN PSCATTER_GATHER_LIST SgList );
where:
Transaction | A handle to the DMA transaction object that represents the current DMA transaction. |
Device | A handle to a framework device object. |
Context | The context pointer that the driver specified in a previous call to WdfDmaTransactionExecute. |
Direction | An enumeration constant of the WDF_DMA_DIRECTION type that indicates the direction of the DMA transfer operation. |
SgList | A pointer to a SCATTER_GATHER_LIST structure. |
The EvtProgramDma function should return TRUE if it successfully starts the DMA transfer, and FALSE otherwise.
After the framework sets up a transaction, it calls the driver's EvtProgramDma callback function to perform a DMA transfer. This function should perform the following steps:
Determine the offset into the buffer at which to start the transfer.
Set up the addresses and lengths to use in programming the device.
Program the device and start the transfer.
Release or delete the transaction if errors occur.
Remember that a single DMA transaction can involve more than one DMA transfer operation. This could happen if the I/O request involves a large amount of data, if the device has a limited capacity to transfer data, or if system resources are so constrained that the size of the buffer or the number of map registers is limited.
If this is the first transfer to be performed for this request, the driver programs the device to start transferring data from the beginning of the buffer. However, if one or more transfers have already been performed for this transaction, the transfer typically must start at some offset from the beginning of the buffer. To determine the offset and, by inference, find out whether this is the first transfer, the driver calls WdfDmaTransactionGetBytesTransferred, passing a handle to the current DMA transaction object. The method returns the number of bytes that have already been transferred for the transaction or, if no transfers have been performed yet, it returns zero. The driver can use the returned value as the offset into the buffer.
After it determines the offset, the driver should set up the required data structures to program the device. The specific details vary from one device to another, but a typical DMA device requires a base address and a length for each component of the transfer. The EvtProgramDma callback receives the base/address length pairs in the scatter/gather list parameter. The number of elements in the scatter/gather list depends on the type of DMA being performed and the type of device. For packet-based DMA, the list contains a single pair. For a device that supports hardware scatter/gather, the list contains multiple pairs. The driver translates these pairs into a form that the device can understand.
Next, the driver programs the device and starts the transfer. Before accessing the device registers, the driver acquires the interrupt spin lock for the device. This spin lock raises IRQL to DIRQL for the device and thus ensures that the device does not attempt to interrupt while the driver is changing the register values. Because code that is protected by this lock runs at a high IRQL, the driver should hold the lock for a minimal length of time. The lock should only protect code that physically accesses the device registers. All calculations and setup should occur outside the lock.
If the driver successfully starts the DMA transfer, the EvtProgramDma callback returns TRUE. If errors occur, the driver should cancel the current transaction and return FALSE.
The example in Listing 17-3 is from the PLx5x9x sample's EvtProgramDma callback function for read requests in the Read.c file. Much of this function is device-specific code that is not reproduced here.
Listing 17-3: Sample EvtProgramDma callback
BOOLEAN PLxEvtProgramReadDma( IN WDFDMATRANSACTION Transaction, IN WDFDEVICE Device, IN WDFCONTEXT Context, IN WDF_DMA_DIRECTION Direction, IN PSCATTER_GATHER_LIST SgList ) { PDEVICE_EXTENSION devExt; size_t offset; PDMA_TRANSFER_ELEMENT dteVA; ULONG_PTR dteLA; BOOLEAN errors; ULONG i; devExt = PLxGetDeviceContext(Device); errors = FALSE; // Get the number of bytes already transferred for this transaction. offset = WdfDmaTransactionGetBytesTransferred(Transaction); // Set up the addresses to use in programming the device ... //Device-specific code omitted // Acquire the interrupt spin lock for the device and start DMA. WdfInterruptAcquireLock( devExt->Interrupt ); ... //Device-specific code that programs device registers for DMA. WdfInterruptReleaseLock( devExt->Interrupt ); // If errors occur in the EvtProgramDma callback, // release the DMA transaction object and complete the request. if (errors) { NTSTATUS status; WDFREQUEST request; (VOID) WdfDmaTransactionDmaCompletedFinal(Transaction, offset, &status); // Get the associated request from the transaction. request = WdfDmaTransactionGetRequest(Transaction); WdfDmaTransactionRelease(Transaction); WdfRequestCompleteWithInformation(request, STATUS_INVALID_DEVICE_STATE, 0); return FALSE; } return TRUE; }
First, the driver initializes its local variables and then calls WdfDmaTransactionGetBytesTransferred to determine how many bytes of data have already been transferred for this transaction. It uses the returned value to determine the offset into the buffer at which to start the transfer.
Next, the driver translates the address/length pairs from the scatter/gather list into a form that the device can use and sets up the data structures that it requires to program the device. Then the driver calls WdfInterruptAcquireLock to acquire the interrupt spin lock for the device, accesses the device registers to program the device, and calls WdfInterruptReleaseLock to release the spin lock.
If errors occur during device programming, the driver calls WdfDmaTransactionDmaCompletedFinal, which indicates to the framework that the transaction is complete but all of the data was not transferred. This method takes a handle to the transaction object, the number of bytes that were successfully transferred, and a pointer to a location to receive a status value. Although the method is defined as a Boolean, it always returns TRUE, so the sample driver casts it to VOID. The driver then releases the DMA transaction object for later reuse, completes the I/O request with the STATUS_INVALID_DEVICE_STATE failure status, and returns FALSE from the callback function.
If the driver programs the device successfully, the callback function returns TRUE. The device interrupts to indicate that the transfer is complete.
Typically, a DMA device signals an interrupt when it completes a transfer. The driver performs minimal processing in the EvtInterruptIsr callback function and queues an EvtInterruptDpc callback function. The EvtInterruptDpc callback is generally responsible for processing the completed DMA transfer.
When the device signals the completion of a DMA transfer, the driver must determine whether the entire transaction is complete. If so, it completes the I/O request and deletes or releases the DMA transaction object.
The framework sets up the individual DMA transfers and keeps track of the number of bytes of data that each transfer involves. When a transfer is complete, the driver notifies the framework by calling WdfDmaTransactionDmaCompleted or WdfDmaTransactionDmaCompletedWithLength. The driver passes a handle to the DMA transaction object and receives an NTSTATUS value from both methods. The only difference between the two methods is that WdfDmaTransactionDmaCompletedWithLength also takes an input parameter that supplies the number of bytes that the device transferred in the just-completed operation, which is useful for devices that report this information.
Both of these methods do the following:
Flush any remaining data from the Windows cache.
Free the shared resources, such as map registers, that it allocated to support the transfer.
Determine whether the completion of this transfer also completes the entire DMA transaction.
If so, the method returns TRUE; if not, the method returns FALSE and the STATUS_MORE_PROCESSING_REQUIRED status.
If the entire transaction is complete, the driver completes the associated I/O request.
If the entire DMA transaction is not complete, one or more additional transfers are required to complete the transaction. The framework then allocates the necessary resources for the next transfer and calls the EvtProgramDma callback again to perform another transfer.
Once again using the PLX9x5x sample driver as a general guide, the code example in Listing 17-4 illustrates the steps that a typical KMDF driver performs to complete a DMA transfer. The sample code is based on the read completion processing in the driver's EvtInterruptDpc callback in the Isrdpc.c file.
Listing 17-4: DMA completion processing
if (readComplete) { BOOLEAN transactionComplete; WDFDMATRANSACTION dmaTransaction; size_t bytesTransferred; // Get the current Read DmaTransaction. dmaTransaction = devExt->CurrentReadDmaTransaction; // Indicate that this DMA operation has completed: // This may start the transfer on the next packet if // there is still data to be transferred. transactionComplete = WdfDmaTransactionDmaCompleted( dmaTransaction, &status ); if (transactionComplete) { // Complete the DmaTransaction and the request. devExt->CurrentReadDmaTransaction = NULL; bytesTransferred = ((NT_SUCCESS(status)) ? WdfDmaTransactionGetBytesTransferred(dmaTransaction): 0 ); WdfDmaTransactionRelease(dmaTransaction); WdfRequestCompleteWithInformation(request, status, bytesTransferred); } }
This sample code fragment shows how a typical KMDF driver handles DMA transfer completion. The driver executes this code after the device interrupts to indicate that the read operation is complete.
The example begins by getting a handle to the DMA transaction object for the current read operation. The driver uses this handle to call WdfDmaTransactionDmaCompleted. This method notifies the framework that the current transfer for the DMA transaction is complete. It returns a Boolean value that indicates whether entire transaction is now complete and an NTSTATUS value that indicates success or failure.
If WdfDmaTransactionDmaCompleted returns TRUE, the driver completes the current request by setting the location that holds the handle for the current DMA transaction object in its device object context area to NULL. If the transfer completed successfully, the driver retrieves the number of bytes that the DMA transaction transferred by calling WdfDmaTransactionGetBytesTransferred. Now that the entire transaction is complete, the driver releases the DMA transaction object by calling WdfDmaTransactionRelease. Finally, the driver completes the I/O request in the usual manner, by calling WdfRequestCompleteWithInformation and passing the status and number of bytes transferred.
If WdfDmaTransactionDmaCompleted returns FALSE, the EvtInterruptDpc callback performs no more processing for this DMA transaction because the framework immediately calls the driver's EvtProgramDma callback to process the next transfer associated with the transaction.