Section 9.3. High Availability Using Redundancy

   

9.3 High Availability Using Redundancy

One way of ensuring high availability is literally to have two of everything. One can have a clustered server instead of a single server to ensure availability in case of a server failure.

Figure 9.8 shows a server with multiple HBAs. Each HBA is connected to a switch, and each switch has dual paths to the dual-ported storage device. The single point of failure in Figure 9.8 is the server. As pointed out earlier, clusters support the removal of this single point of failure. However, this section will consider a single server and explore the architecture in more detail. The focus is on the dual HBA within a single server, and once understood , the same architecture can exist on other servers within the cluster as well. Simply having two HBAs in a Windows NT server is not enough. One also needs special software; Sections 9.3.1 and 9.3.2 explore details of that needed software.

Figure 9.8. High-Availability Configuration

graphics/09fig08.gif

9.3.1 Microsoft Multipath Support in Windows 2000 and Windows Server 2003

Microsoft has announced native support for a multipath failover and load balancing solution for its Windows 2000 and Windows Server 2003 products. Microsoft provides a generic solution that the OEM or IHV needs to tune to take advantage of specific hardware features. The vendor needs to obtain a development kit that is available under a nondisclosure agreement. The end user can obtain a complete solution only from the vendor and not directly from Microsoft.

Before getting into the details of the solution, it is worthwhile examining the situation to convince oneself that a problem indeed exists. Consider Figure 9.8 again, which shows a Windows NT server with two dual-ported HBAs. The HBAs all connect to dual-ported disks. To keep matters simple, assume that each storage disk is formatted as a single volume. The idea is that there are multiple I/O paths between the server and the disk storage units to provide fault tolerance. For the configuration shown in Figure 9.8, consider the device object hierarchy that the Windows storage driver stack will create.

Figure 9.9 shows the device object tree for the system configuration shown in Figure 9.8. Again, as explained in Chapter 1, note the pairs of physical device objects (PDOs) and functional device objects (FDOs) that cooperate to enable particular device functionality. Recall that PDOs, among other things, represent information required to use the device. For storage devices, that information would include the SCSI bus identifier, target identifier, and LUN. Recall that FDOs, among other things, represent information needed to access the device. For storage devices, a good example would be details of the disk organization.

Figure 9.9. Device Object Tree without Multipath

graphics/09fig09.gif

Starting from the bottom of Figure 9.9, the PnP system finds the PCI bus, creates a PDO for the PCI bus, and loads the PCI bus driver. The PCI bus driver creates an FDO for the bus and attaches it to the PDO. Next , devices on the PCI bus are enumerated. As a result, the two HBAs are located. The PCI bus driver creates the two PDOs, one for each adapter. The PnP system locates the driver for these adapters and loads either the SCSIPort or the Storport driver, along with the vendor-written miniport. Either Storport or SCSIPort creates an FDO for each adapter and attaches it to the respective PDO.

Next the SCSIPort or Storport driver behaves like a bus driver and enumerates devices on the SCSI bus. Because there are two (disk) devices on the bus, two (disk) devices are reported . In addition, because there are two SCSI adapters and enumeration is done on both of them, both will report two devices each. Thus, SCSIPort or Storport, as the case may be, sees four disk devices. The PDOs for these four disk devices are created, the disk class driver is loaded, and the disk class driver creates four FDOs and attaches each one to its respective PDO. Without a multipath I/O solution, the partitions on the disk FDOs would be enumerated, and either the FtDisk or the Logical Disk Manager (volume manager) would be loaded to handle the volumes that exist on those partitions. (For simplicity, assume that each disk has only one partition and that each partition constitutes an independent volume.)

Higher-level software entities will see four storage volumes, where in reality only two exist. Assuming that these two volumes are formatted for NTFS, the file system will assume there are four volumes, and NTFS will attempt to run on all four. Obviously the NTFS running on the volume residing on disk H1L1 and the NTFS running on volume H2L1 in Figure 9.9 would not be synchronized. So they would certainly overwrite each other's data ”for example, the data in the log file ”and the result would be volume corruption.

To ensure proper operation, several vendors have implemented a solution that not only prevents this problem, but also provides a rich set of additional functionality, such as failover, failback, and load balancing. Failover refers to functionality that will automatically move I/O from a failed I/O path to a different I/O path . Failback refers to functionality in which a failed I/O path is repaired, enabling the system to go back to using this repaired path. Load balancing refers to the distribution of I/O across all available I/O paths using a particular algorithm. The algorithm could be to distribute I/O in a round- robin fasion, or on the basis of the outstanding I/O on each path, or simply across all I/O paths, or in some other way. The Microsoft solution is described next, followed by a description of what some other vendors have now in terms of a shipping product.

Microsoft designed the multipath architecture with several goals in mind, including but not limited to the following:

  • Coexisting with all other existing drivers and architecture, including Plug and Play (PnP) and power management. Indeed, the goal is not simply to coexist, but really to use the existing infrastructure ”for example, have device notifications flow using the existing PnP mechanism.

  • Providing dynamic discovery of devices and paths without requiring any special static configurations.

  • Providing a solution that allows for coexistence of multiple multipath solutions from different vendors, something that at this time is extremely difficult, if not altogether impossible .

  • Providing a generic solution, but one that leaves room for OEMs, IHVs, and ISVs to add value such as load balancing or failback. The sample device-specific module (DSM) provided by Microsoft does offer some load balancing features, but this load balancing is most efficient when used in a static way; for example, all I/O to LUN 1 uses path 1, and all I/O to LUN 2 uses path 2.

  • Providing a solution that allows up to 32 paths per LUN and works for both Fibre Channel and SCSI.

Figure 9.10 shows the Windows NT device tree in detail, with a multipath solution deployed, for the same configuration shown in Figure 9.9. The device driver tree shown also includes the various filter drivers and related device objects for the Microsoft multipath architecture.

Figure 9.10. Device Object Tree with Multipath Solution

graphics/09fig10.jpg

The solution consists of four different pieces:

  1. An upper filter driver called MPSPFLTR that is supplied by Microsoft.

  2. A class driver called MPDEV supplied by Microsoft.

  3. A pseudo bus driver called MPIO that is supplied by Microsoft.

  4. A DSM that needs to be supplied by the vendor that is building and selling the solution. The vendor licenses an MPIO development kit from Microsoft that contains the first three drivers already mentioned and provides all needed information (including header files and sample code) to build a DSM.

The first thing to note about Figure 9.10 is that there are actually two distinct device stacks: a logical device stack on the left of the figure and a physical device stack on the right of the figure. The MPIO software bridges the two.

The second thing to note is the similarity with respect to device trees for volumes on basic or dynamic disks (discussed in Chapter 6). This similarity should not be surprising, given that volumes are logical entities that may encompass multiple LUNs or part of an individual LUN, and that the MPIO infrastructure is trying to map LUNs that are visible using multiple paths to one logical LUN. The functionality of the Partition Manager while handling partitions is fairly similar to the functionality of the MPSPFLTR driver. Both pay particular attention to IRP_MN_QUERY_DEVICE_RELATIONSHIPS IRPs and forward details of the objects reported to their respective partners ”the volume manager in one case and the pseudo bus multipath driver MPIO in the other case. Both the Partition Manager and the MPSPFLTR driver own responsibility for keeping their counterparts (volume manager and MPIO pseudo bus driver, respectively) informed about PnP and power events.

In comparing Figures 9.9 and 9.10, notice that the MPSPFLTR driver is an upper filter driver on the adapter FDO. Another difference is the PDO/FDO pair created for the MPIO pseudo bus driver by PnP and the MPIO driver itself. Notice the private communication channel between the MPSPFLTR driver and the MPIO pseudo bus driver. Further, on the top left-hand side of Figure 9.10, note the two PDOs for the pseudo disks created by the MPIO bus driver. This ensures that the MPIO bus driver gets a chance to handle the I/O, and it, in turn , can invoke the help of the DSM as needed.

Attached to each PDO created by the MPIO are two DSM objects. One is in active use; the other is shown in a different box simply to emphasize that MPIO allows DSMs from different vendors to coexist. Switching to the right-hand side of Figure 9.10, note the four disk PDOs that are created by the port driver (either SCSIPort or Storport) as usual. However, the disk PDOs attached to these PDOs are created by the MPDEV class driver and not by the disk class driver.

Mpdev.sys is a disk class replacement driver with some twists . The MPDEV class driver can handle only SCSI requests and not IRP functionality. In other words, MPDEV understands only a limited set of IRP functionality, the most important one of which is IRP_MJ_SCSI. MPDEV does not implement the classic IRP functionality, such as read and write IRPs (IRP_MJ_READ and IRP_MJ_WRITE). This means that user mode applications cannot access the physical device stack directly because user mode applications can only send down IOCTL requests . Of course, kernel mode drivers can send MPDEV SCSI command data blocks (CDBs), and indeed this is exactly what the disk class driver does.

Thus, MPDEV can handle requests from the MPIO stack (shown with the dot-dash line in Figure 9.10) because those requests come from the disk class driver (layered over the PDO created by MPIO) that translates the IRP requests (such as read and write) into SCSI CDBs. Further, Microsoft has created a tight security ACL for the device objects owned by the MPDEV class driver.

9.3.1.1 Device-Specific Module

The device-specific module (DSM) is designed to provide some important functions, including the following:

  • Handling device-specific initialization.

  • Providing functionality to decide whether two LUNs accessed using two different paths are really the same LUN simply accessed in different ways. Microsoft expects to use a built-in identifier from storage and not a software-written signature on the media to allow the DSM to identify these LUNs. The generic DSM module provided by Microsoft accomplishes this using the serial number page (80h) or device identification page (83h) defined by the SCSI command set. Vendors are not limited to the use of just these two mechanisms.

  • Handling certain special SCSI commands, mostly related to device control and querying device capability, such as Read_Capacity, Reserve, Release, and Start_Stop_Unit, and deciding if the command will go down all paths or just a specific one.

  • Making routing decisions on I/O requests.

  • Handling errors.

  • Handling PnP- and power-related requests with the assistance of the library routines provided by Microsoft in the pseudo bus multipath driver.

  • Handling management-related requests that are delivered to a driver in the form of Windows Management Instrumentation (WMI; see Chapter 7) IRPs. The pseudo bus multipath driver will invoke the appropriate routines within the DSM. The pseudo bus multipath driver is able to locate these routines and invoke them.

The DSM is implemented with the MPIO kit, which can be licensed from Microsoft. The DSM is implemented as a legacy driver that exports an interface for the benefit of the pseudo bus driver MPIO.

9.3.1.2 Pseudo Bus Multipath Driver

The pseudo bus multipath driver is loaded natively as part of the Windows NT operating system once the appropriate vendor package has been installed.

Upon initialization, the pseudo bus multipath driver interacts with the MPSPFLTR filter driver that is layered over the SCSIPort FDO (see Figure 9.10) to create a pseudodevice for each logical device that has multiple paths for accessing the device. For each such pseudodevice, on a per-DSM basis, the pseudo bus multipath driver offers the DSM a chance to claim or reject ownership of the device.

For all I/O requests, the pseudo bus multipath driver consults the DSM via a specified routine. The DSM has access to each IRP and can post a completion routine for the IRP if it so desires. For device control requests such as Reserve and Release, the DSM may direct the I/O to happen on all paths to the device. For regular I/O requests such as read or write, the DSM may direct I/O on any one path, depending on whether it is doing dynamic or static load balancing. On an I/O request completing with error, the pseudo bus multipath driver invokes the DSM at a specified entry point and the DSM may redirect the I/O to another path in an effort to perform failover.

9.3.2 Existing Multipath Solutions

Several vendors offer a multipath solution that implements at the very least a failover solution, and some of them also offer failback, as well as load balancing.

These solutions do work. Now that they have been deployed for a while, however, two drawbacks have emerged:

  1. The configuration can be cumbersome and somewhat confusing because the solution is not fully integrated with PnP, so dynamic discovery is not ensured.

  2. The solution precludes interoperability with a solution from another vendor. This means that if a particular Windows server has a solution deployed from one vendor, the same Windows server cannot also have a solution from any other vendor deployed on it.

9.3.2.1 EMC PowerPath

EMC has implemented a failover and load balancing solution for Windows NT for quite a while. Figure 9.11 shows the architecture that EMC has implemented.

Figure 9.11. EMC PowerPath Architecture

graphics/09fig11.gif

Unlike other architectures, EMC's architecture has a filter driver between the volume manager and SCSIPort or RAID port class drivers. For each logical volume that exists, the solution enumerates N logical volumes, where N is the number of independent ways in which the volume is accessed.

For each device with N different paths to access the device, Windows NT will see N logical devices. If I/O happened simultaneously down all these paths, data corruption might occur. Hence, PowerPath sets N “ 1 of these devices to be in a disabled state in which no I/O or IOCTL activity can happen. The GUI administration utility shows one active device and N “ 1 grayed-out devices corresponding to the disabled devices. The administrator needs to do a fair amount of configuration, especially if security is desired in terms of limiting which HBAs can access which devices. The EMC Symmetrix product allows an administrator to implement this security by specifying the World Wide Name of the HBA that can access a particular LUN within the EMC Symmetrix box.

The administrator may specify a policy for accomplishing load balancing. The possible policies are described as follows :

  • The I/O requests are sprinkled to each path in turn, in round-robin fashion.

  • The next I/O is sent to the path that has the least number of pending requests.

  • The next I/O is sent to the path that has the least number of blocks pending.

  • An EMC Symmetrix optimization mode is used in which the next I/O is sent to the path that is estimated to have the least completion time.

9.3.2.2 HP (Compaq) SecurePath

HP (Compaq) offers a multipath failover and load balancing solution for Windows NT called SecurePath. The Compaq solution is slightly different between Windows NT 4.0 and Windows 2000.

Figure 9.12 shows the HP (Compaq) SecurePath architecture for Windows 2000. The solution consists of a block storage filter driver that is above the port (SCSIPort or Storport) class driver and below the disk class driver. A user mode service and user mode applications constitute the other pieces of the puzzle, and these play a role in administration and notification.

Figure 9.12. HP (Compaq) SecurePath Architecture for Windows 2000

graphics/09fig12.gif

On Windows NT 4.0, HP (Compaq) SecurePath requires the use of an HP-written disk class driver called HSZDisk (see Figure 9.13). The solution also involves a filter driver.

Figure 9.13. HP (Compaq) SecurePath Architecture for Windows NT 4.0

graphics/09fig13.gif

On Windows 2000, the failover and load balancing functionality is provided within a filter driver that HP calls Raidisk . On Windows 2000, the class driver provided by Microsoft is not replaced by any other driver. The Raidisk driver performs

  • Failover

  • Load balancing (for nonclustered environments)

  • Failback when the fault is corrected

  • Path verification to the storage volumes

A user mode SecurePath Windows NT service provides administration capabilities and interacts with the SecurePath filter driver via private IOCTL control codes.

9.3.2.2 HP AutoPath

HP AutoPath offers dynamic load balancing and autofailover capabilities for Windows NT. As shown in Figure 9.14, HP implements AutoPath using a filter driver between the disk class driver and the port driver.

Figure 9.14. HP AutoPath Architecture

graphics/09fig14.gif

HP AutoPath load balancing performs according to a policy that the administrator sets. The choices of policy are:

  • Round-robin, in which I/O is sprinkled across the various different paths

  • No load balancing, in which all I/O to a particular storage device is statically sent down a path that the administrator can select

  • Shortest queue on the basis of outstanding requests, in which the I/O is sent to the path with the minimum number of outstanding requests

  • Shortest queue on the basis of outstanding bytes awaiting I/O

  • Shortest queue on the basis of service time, in which all outstanding requests queued to a path are summed up and the I/O is sent to the queue with the smallest total


   
Top


Inside Windows Storage
Inside Windows Storage: Server Storage Technologies for Windows 2000, Windows Server 2003 and Beyond
ISBN: 032112698X
EAN: 2147483647
Year: 2003
Pages: 111
Authors: Dilip C. Naik

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net