One obvious way to have RAID with Windows NT is for RAID to be implemented in hardware, such as in the HBA or the storage device controller.
Several drivers in the Windows Server family implement (host-based) RAID, including the fault-tolerant FtDisk driver, the Logical Disk Manager (LDM) driver that ships with Windows 2000, and the VERITAS Logical Volume Manager (LVM) product available for Windows 2000 from VERITAS. All of these drivers were discussed in Chapter 1 and also in Chapter 6. The RAID support provided by these pieces of software is summarized in Table 9.1.
|
RAID Level |
FtDisk |
Windows 2000 Logical Disk Manager |
VERITAS Logical Volume Manager |
|---|---|---|---|
|
RAID 0 |
Yes |
Yes |
Yes |
|
RAID 1 |
Yes |
Yes |
Yes |
|
RAID 5 |
Yes |
Yes |
Yes |
|
RAID 10 |
No |
No |
Yes |
|
|
|
|
| Top |
One way of ensuring high availability is literally to have two of everything. One can have a clustered server instead of a single server to ensure availability in case of a server failure.
Figure 9.8 shows a server with multiple HBAs. Each HBA is connected to a switch, and each switch has dual paths to the dual-ported storage device. The single point of failure in Figure 9.8 is the server. As pointed out earlier, clusters support the removal of this single point of failure. However, this section will consider a single server and explore the architecture in more detail. The focus is on the dual HBA within a single server, and once
Microsoft has announced native support for a multipath failover and load balancing solution for its Windows 2000 and Windows Server 2003 products. Microsoft provides a generic solution that the OEM or IHV needs to tune to take advantage of specific hardware features. The vendor needs to obtain a development kit that is available under a
Before getting into the details of the solution, it is worthwhile examining the situation to convince oneself that a problem indeed exists. Consider Figure 9.8 again, which shows a Windows NT server with two dual-ported HBAs. The HBAs all connect to dual-ported disks. To keep matters simple, assume that each storage disk is formatted as a single volume. The idea is that there are multiple I/O paths between the server and the disk storage units to provide fault tolerance. For the configuration shown in Figure 9.8, consider the device object hierarchy that the Windows storage driver stack will create.
Figure 9.9 shows the device object tree for the system configuration shown in Figure 9.8. Again, as explained in Chapter 1, note the pairs of physical device objects (PDOs) and functional device objects (FDOs) that cooperate to enable particular device functionality. Recall that PDOs, among other things, represent information required to use the device. For storage devices, that information would include the SCSI bus identifier, target identifier, and LUN. Recall that FDOs, among other things, represent information needed to access the device. For storage devices, a good example would be details of the disk organization.
Starting from the bottom of Figure 9.9, the PnP system finds the PCI bus, creates a PDO for the PCI bus, and loads the PCI bus driver. The PCI bus driver creates an FDO for the bus and attaches it to the PDO.
Next the SCSIPort or Storport driver behaves like a bus driver and enumerates devices on the SCSI bus. Because there are two (disk) devices on the bus, two (disk) devices are
Higher-level software entities will see four storage volumes, where in reality only two exist. Assuming that these two volumes are formatted for NTFS, the file system will assume there are four volumes, and NTFS will attempt to run on all four. Obviously the NTFS running on the volume residing on disk H1L1 and the NTFS running on volume H2L1 in Figure 9.9 would not be synchronized. So they would
To ensure proper operation, several
Microsoft designed the multipath architecture with several goals in mind, including but not limited to the following:
Coexisting with all other existing drivers and architecture, including Plug and Play (PnP) and power management. Indeed, the goal is not simply to coexist, but really to use the existing infrastructure ”for example, have device notifications flow using the existing PnP mechanism.
Providing dynamic discovery of devices and paths without requiring any special static configurations.
Providing a solution that allows for coexistence of multiple multipath solutions from different vendors, something that at this time is extremely difficult, if not altogether
Providing a generic solution, but one that
Providing a solution that allows up to 32 paths per LUN and works for both Fibre Channel and SCSI.
Figure 9.10 shows the Windows NT device tree in detail, with a multipath solution deployed, for the same configuration shown in Figure 9.9. The device driver tree shown also includes the various filter drivers and
The solution consists of four different pieces:
An upper filter driver called MPSPFLTR that is supplied by Microsoft.
A class driver called MPDEV supplied by Microsoft.
A pseudo bus driver called MPIO that is supplied by Microsoft.
A DSM that needs to be supplied by the vendor that is building and selling the solution. The vendor licenses an MPIO development kit from Microsoft that contains the first three drivers already mentioned and provides all needed information (including header files and sample code) to build a DSM.
The first thing to note about Figure 9.10 is that there are actually two distinct device stacks: a logical device stack on the left of the figure and a physical device stack on the right of the figure. The MPIO software bridges the two.
The second thing to note is the similarity with respect to device trees for volumes on basic or dynamic disks (discussed in Chapter 6). This similarity should not be surprising, given that volumes are logical entities that may
In comparing Figures 9.9 and 9.10, notice that the MPSPFLTR driver is an upper filter driver on the adapter FDO. Another difference is the PDO/FDO pair created for the MPIO pseudo bus driver by PnP and the MPIO driver itself. Notice the private communication channel between the MPSPFLTR driver and the MPIO pseudo bus driver. Further, on the top left-hand side of Figure 9.10, note the two PDOs for the pseudo disks created by the MPIO bus driver. This ensures that the MPIO bus driver gets a chance to handle the I/O, and it, in
Attached to each PDO created by the MPIO are two DSM objects. One is in active use; the other is shown in a different box simply to
Mpdev.sys is a disk class replacement driver with some
Thus, MPDEV can handle requests from the MPIO stack (shown with the dot-dash line in Figure 9.10) because those requests come from the disk class driver (layered over the PDO created by MPIO) that
The device-specific module (DSM) is designed to provide some important functions, including the following:
Handling device-specific initialization.
Providing functionality to decide whether two LUNs accessed using two different paths are really the same LUN simply accessed in different ways. Microsoft expects to use a built-in identifier from storage and not a software-written signature on the media to allow the DSM to identify these LUNs. The generic DSM module provided by Microsoft accomplishes this using the serial number page (80h) or device identification page (83h) defined by the SCSI command set. Vendors are not limited to the use of just these two mechanisms.
Handling certain special SCSI commands, mostly related to device control and querying device capability, such as Read_Capacity, Reserve, Release, and Start_Stop_Unit, and deciding if the command will go down all paths or just a specific one.
Making routing decisions on I/O requests.
Handling errors.
Handling PnP- and power-related requests with the assistance of the library routines provided by Microsoft in the pseudo bus multipath driver.
Handling management-related requests that are delivered to a driver in the form of Windows Management Instrumentation (WMI; see Chapter 7) IRPs. The pseudo bus multipath driver will invoke the appropriate routines within the DSM. The pseudo bus multipath driver is able to locate these routines and invoke them.
The DSM is implemented with the MPIO kit, which can be licensed from Microsoft. The DSM is implemented as a legacy driver that exports an interface for the benefit of the pseudo bus driver MPIO.
The pseudo bus multipath driver is loaded natively as part of the Windows NT operating system once the appropriate vendor package has been installed.
Upon initialization, the pseudo bus multipath driver
For all I/O requests, the pseudo bus multipath driver consults the DSM via a specified routine. The DSM has access to each IRP and can post a completion routine for the IRP if it so desires. For device control requests such as Reserve and Release, the DSM may direct the I/O to happen on all paths to the device. For regular I/O requests such as read or write, the DSM may direct I/O on any one path, depending on whether it is doing dynamic or static load balancing. On an I/O request completing with error, the pseudo bus multipath driver invokes the DSM at a specified entry point and the DSM may redirect the I/O to another path in an effort to perform failover.
Several vendors offer a multipath solution that implements at the very least a failover solution, and some of them also offer failback, as well as load balancing.
These solutions do work. Now that they have been deployed for a while, however, two drawbacks have emerged:
The configuration can be cumbersome and somewhat confusing because the solution is not fully integrated with PnP, so dynamic discovery is not ensured.
The solution precludes interoperability with a solution from another vendor. This means that if a particular Windows server has a solution deployed from one vendor, the same Windows server cannot also have a solution from any other vendor deployed on it.
EMC has implemented a failover and load balancing solution for Windows NT for quite a while. Figure 9.11 shows the architecture that EMC has implemented.
Unlike other architectures, EMC's architecture has a filter driver between the volume manager and SCSIPort or RAID port class drivers. For each logical volume that exists, the solution enumerates N logical volumes, where N is the number of independent ways in which the volume is accessed.
For each device with
N
different paths to access the device, Windows NT will see
N
logical devices. If I/O
The administrator may specify a policy for accomplishing load balancing. The possible policies are described as
The I/O requests are sprinkled to each path in turn, in
The next I/O is sent to the path that has the least number of pending requests.
The next I/O is sent to the path that has the least number of blocks pending.
An EMC Symmetrix optimization mode is used in which the next I/O is sent to the path that is estimated to have the least completion time.
HP (Compaq) offers a multipath failover and load balancing solution for Windows NT called SecurePath. The Compaq solution is slightly different between Windows NT 4.0 and Windows 2000.
Figure 9.12 shows the HP (Compaq) SecurePath architecture for Windows 2000. The solution consists of a block storage filter driver that is above the port (SCSIPort or Storport) class driver and below the disk class driver. A user mode service and user mode applications
On Windows NT 4.0, HP (Compaq) SecurePath requires the use of an HP-written disk class driver called HSZDisk (see Figure 9.13). The solution also involves a filter driver.
On Windows 2000, the failover and load balancing functionality is provided within a filter driver that HP calls
Raidisk
. On Windows 2000, the class driver provided by Microsoft is not
Failover
Load balancing (for nonclustered environments)
Failback when the fault is corrected
Path verification to the storage volumes
A user mode SecurePath Windows NT service provides administration capabilities and interacts with the SecurePath filter driver via private IOCTL control codes.
HP AutoPath offers dynamic load balancing and autofailover capabilities for Windows NT. As shown in Figure 9.14, HP implements AutoPath using a filter driver between the disk class driver and the port driver.
HP AutoPath load balancing performs according to a policy that the administrator sets. The choices of policy are:
Round-robin, in which I/O is sprinkled across the various different paths
No load balancing, in which all I/O to a particular storage device is statically sent down a path that the administrator can select
Shortest queue on the basis of outstanding requests, in which the I/O is sent to the path with the minimum number of outstanding requests
Shortest queue on the basis of outstanding bytes awaiting I/O
Shortest queue on the basis of service time, in which all outstanding requests queued to a path are summed up and the I/O is sent to the queue with the smallest total
|
|
| Top |