Server Clusters, Failover Clustering, and Disks | Microsoft SQL Server 2000 High Availability

There are two types of storage I/O technologies supported in server clusters: parallel SCSI and Fibre Channel. For both Windows 2000 and Windows Server 2003, support is provided for SCSI interconnects and Fibre Channel arbitrated loops for two nodes only.

Important

For larger cluster configurations (more than two nodes), you need to use a switched Fibre Channel (fabric/fiber- optic , not copper ) environment.

If you are implementing SCSI, the following considerations must be taken into account:

It is only supported in Windows 2000 Advanced Server or Windows Server 2003 up to two-nodes.
SCSI adaptors and storage solutions need to be certified.
SCSI cards that are hosting the interconnect should have different SCSI IDs, normally 6 and 7. Ensure device access requirements are in line with SCSI IDs and priorities.
SCSI adaptor BIOS should be disabled.
If devices are daisy-chained, ensure that both ends of the shared bus are terminated .
Use physical terminating devices and do not use controller-based or device-based termination.
SCSI hubs are not supported.
Avoid the use of connector converters (for example, 68-pin to 50-pin).
Avoid combining multiple device types (single ended and differential, and so on).

If you are implementing Fibre Channel, the following considerations must be taken into account:

Fibre Channel Arbitrated Loops (FC-AL) support up to two nodes.
Fibre Channel Fabric (FC-SW) support all higher combinations.
Components and configuration need to be in the Microsoft Hardware Compatibility List (HCL).
You can use a multicluster environment.
Fault-tolerant drivers and components also need to be certified.
Virtualization engines need to be certified.

When you really think about it, clusters are networked storage configurations because of how clusters are set up. They are dependent on a shared storage infrastructure. SCSI-based commands are embedded in fiber at a low level. For example, clustering uses device reservations and bus resets, which can potentially be disruptive on a SAN. Systems coming and going also lead to potential disruptions. This behavior might change with Windows Server 2003 and SANs, as the Cluster service issues a command to break a reservation and the port driver can do a targeted or device reset for disks on Fibre Channel (not SCSI). The targeted resets require that the host bus adapter (HBA) drivers provided by the vendor for the SAN support this feature. If a targeted reset fails, the traditional entire buswide SCSI reset is performed. Clusters identify the logical volumes through disk signatures (as well as partition offset and partition length), which is why using and maintaining disk signatures is crucial.

Clusters have a disk arbitration process (sometimes known as the challenge/defense protocol), or the process to reserve or own a disk. With Microsoft Windows NT 4.0 Enterprise Edition, the process was as follows : for a node to reserve a disk, it used the SCSI protocol RESERVE (issued to gain control of a device; lost if a buswide reset is issued), RELEASE (freed a SCSI device for another host bus adapter to use), and RESET (bus reset) commands. The server cluster uses the semaphore on the disk drive to represent the SCSI-level reservation status in software; SCSI-III persistent reservations are not used. The current owner reissues disk reservations and renews the lease every 3 seconds on the semaphore. All other nodes, or challengers, try to reserve the drive as well. Before Windows Server 2003, the underlying SCSI port did a bus reset, which affected all targets and LUNs. With the new StorPort driver stack of Windows Server 2003, instead of the behavior just described, a targeted LUN reset occurs. After that, a wait happens for approximately 7 to 15 seconds (3 seconds for renewal plus 2 seconds bus settle time, repeated three times to give the current owner a chance to renew). If the reservation is still clear, the former owner loses the lease and the challenger issues a RESERVE to acquire disk ownership and lease on the semaphore.

With Windows 2000 and Windows Server 2003, the arbitration process is a bit different. Arbitration is done by reading and writing hidden sectors on the shared cluster disk using a mutual exclusion algorithm by Leslie Lamport. Despite this change, the Windows NT 4.0 reserve and reset process formerly used for arbitration still occurs with Windows 2000 and Windows Server 2003. However, the process is now used only for protecting the disk against stray I/Os, not for arbitration.

More Info

For more information on Leslie Lamport, including some of his writings, go to http://research.microsoft.com/users/lamport/ . The paper containing the fast mutual exclusion algorithm can be found at http://research.microsoft.com/users/lamport/pubs/pubs.html#fast-mutex.

As of Windows 2000 Service Pack 2 or later (including Windows Server 2003), Microsoft has a new multipath I/O (MPIO) driver stack against which vendors can code new drivers. The new driver stack enables targeted resets using device and LUN reset (that is, you do not have to reset the whole bus) so that things like failover are improved. Consult with your hardware vendor to see if their driver supports the new MPIO stack.

Warning

MPIO is not shipped as part of the operating system. It is a feature provided to vendors by Microsoft to customize their specific hardware and then use. That means that out of the box, Windows does not provide multipath support.

When using a SAN with a server cluster, make sure you take the following into consideration:

Ensure that the SAN configurations are in the Microsoft HCL (multicluster section).

When configuring your storage, the following must be implemented:

Zoning Zoning allows users to sandbox the logical volumes to be used by a cluster. Any interactions between nodes and storage volumes are isolated to the zone, and other members of the SAN are not affected by the same. This feature can be implemented at the controller or switch level and it is important that users have this implemented before installing clustering. Zoning can be implemented in hardware or firmware on controllers or using software on hosts . For clusters, hardware-based zoning is recommended, as there can be a uniform implementation of access policy that cannot be disrupted or compromised by a node failure or a failure of the software component.
LUN masking This feature allows users to express a specific relationship between a LUN and a host at the controller level. In theory, no other host should be able to see that LUN or manipulate it in any way. However, various implementations differ in functionality; as such, one cannot assume that LUN masking will always work. Therefore, it cannot be used instead of zoning. You can combine zoning and masking, however, to meet some specific configuration requirements. LUN masking can be done using hardware or software, and as with zoning, a hardware-based solution is recommended. If you use software- based masking, the software should be closely attached to storage. Software involved with the presentation of the storage to Windows needs to be certified. If you cannot guarantee the stability of the software, do not implement it.

Firmware and driver versions Some vendors implement specific functionality in drivers and firmware and users should pay close attention to what firmware and driver combinations are compatible with the installation they are running. This is valid not only when building a SAN and attaching a host to it, but also over the entire life span of the system (hosts and SAN components). Pay careful attention to issues arising out of applying service packs or vendor-specific patches and upgrades.

Warning

If you are going to be attaching multiple clusters to a single SAN, the SAN must appear on the both the cluster and the multicluster lists of the HCL. The storage subsystem must be configured (down to the driver level, fabric, and HBA) as described on the HCL. Switches are the only component not currently certified by Microsoft. You should secure guarantees in writing from your storage vendor before implementing the switch fabric technologies.

Tip

Adding a disk to a cluster can cause some downtime, and that amount of downtime is directly related to how much you can prepare for it. If you have the disk space, create some spare, formatted LUNs available to the cluster if you need to use them. This is covered in more depth in Chapter 5, Designing Highly Available Microsoft Windows Servers.