9.1 Simple Problem-Isolation Techniques

Generally, most error conditions are encountered during changes to a network's topology. This is particularly true during initial installation of the network because new equipment is being connected for the first time. In complex SAN installations, you must install host adapters and load their appropriate device drivers, lay the cable plant, install and configure switches, position GBICs or transceivers, and properly deploy and cable disk arrays. To simplify installation, solutions providers typically preconfigure as many components as possible, but there is always an opportunity to incorrectly cable devices together or omit steps in installation. Things happen.

After the various SAN components have been configured, cabled, and powered on, you usually verify operation by testing a server's ability to access storage. If a server cannot see part or all of the assigned disks, elementary troubleshooting begins. As outlined in Figure 9-1, this process typically begins with examination of the physical cable plant, port status, and transceivers and continues through verification of the host system, interconnection, and storage target.

Figure 9-1. Troubleshooting server-to-target connectivity

graphics/09fig01.gif

You can diagnose physical-layer problems by verifying insertion or bypass status at the switch port. Port status LEDs should indicate whether an inbound signal is recognized or whether it is intermittently making and breaking contact. It is difficult to troubleshoot signaling status when copper cabling is used, because there is no immediate way to verify signal presence. For optical, non-OFC connections, disconnecting the cable at the device allows visual verification of signal presence on the SC connector and thus speeds problem identification, as shown in Figure 9-2. Short exposure to non-OFC laser is not harmful, but it is always recommended that you hold a small mirror or translucent paper over a fiber-optic connection before looking at the laser source. If no light is present on the fiber-optic cable, the likely cause is a break along the length of the cable lead. If light is present, lack of communication may have a number of other causes, including incorrect cable length or type, poor quality of cabling, dirty connectors, excessive jitter along the link, or protocol-level problems at the transport or upper layers. Using a known good cable in place of the existing run is usually the fastest means to verify the cable plant and move on to the next step in troubleshooting.

Figure 9-2. Verifying signal presence on a non-OFC fiber-optic SC connector

graphics/09fig02.jpg

Because bypass mode can be triggered by loss of signal, disconnecting the cable at the switch port allows verification of laser presence on the port's receive lead. Lack of laser light may indicate a break on the cable lead from the attached node's transmit running to the port's receiver. It may also indicate a problem with the node's transceiver or internal controller logic, or it may simply mean that the attached device is powered down.

Depending on the server's operating system, failure to discover targets may be a matter of the SAN boot sequence. An operating system may need to have disk arrays up and operational before its own device drivers load. Disks that are powered on after the system is booted may not be recognized by the operating system, and in that case you may need to reboot the host to see the drives. The next logical step is to verify that the proper device drivers are installed for the host adapter cards. For preconfigured servers, this is assumed, and most vendors provide utilities for driver installation and card diagnostics. When you are provisioning multiple adapter cards in a single server, you must ensure that no resource conflicts occur and that each adapter is cabled to its appropriate SAN switch.

Although adapter card vendors generally abide by standards in implementing Fibre Channel or iSCSI device drivers, some have peculiar interpretations of the parameters allowed. Some Fibre Channel adapter cards, for example, allow only specified address ranges to be used. In a single-vendor environment, this may not cause problems, but in a multivendor environment default addressing assumptions may cause interoperability conflicts. The configuration utility supplied with the adapter card should indicate current and allowable addresses, as well as the microcode and device driver versions for the card.

For Fibre Channel fabrics, you can examine the switch SNS table to verify the successful connection of end devices to the switch. Vendors may provide a graphical interface or command line interface to interrogate the SNS. An SNS entry for a disk array, however, does not guarantee that a server will discover it. The server HBA may not have completed a process login to the array, or the fabric switch may be inadvertently zoned to exclude the array from the server's view. Similarly, in an IP SAN environment, a successful entry in the iSNS server does not guarantee server-to-target connectivity. The end devices themselves may have failed to negotiate a login, or the TCP connection between them may need to be reset.

Zoning may be configured by port or World-Wide Name (WWN). Depending on vendor equipment, the default zoning configuration may exclude all devices; in other words, an administrator must first define zones before any communications can occur. It may also be possible to inadvertently create zone conflicts, with the result that resources that are zoned together by WWN are excluded by port zoning.

Similar in impact to zoning, a switch may provide LUN masking so that only specified LUNs are visible within a zoned target device. Consequently, an initiator and a target may reside in the same zone based, for example, on port zoning and yet the LUNs on that target may not be visible. A careful review of switch zones and LUN masking configurations may quickly resolve the problem without the need to resort to further diagnostics.

In multiswitch configurations, status of the interswitch links may also be an issue. Fibre Channel switches typically auto-sense E_Port connections in a single-vendor fabric, but they may need to be manually configured in multivendor fabrics. Some vendors support multiple E_Port compatibility modes, both for different microcode versions of their own products and for peculiar implementations of E_Port by other vendors.

E_Port connectivity may also require that you manually configure switch addresses or designate which switch will serve as principal switch for the fabric. Address conflicts between switches may send the fabric into a perpetual fabric reconfiguration or may automatically disable E_Ports in an attempt to recover. In more complex Fibre Channel fabrics, failure of some initiators to see designated targets may be caused by exceeding the recommended hop count between multiple switches. Initiators and targets separated by multiple hops may have the appropriate entries in their respective switch's SNS table but be unable to communicate because of excessive switch-to-switch latency or failure of SNS updates to propagate through the fabric.

At the receiving end, the storage target may have its own configuration utility for establishing and assigning LUNs. You should verify array-based LUN configurations and compare them with any additional LUN masking parameters supported by the SAN switch or host adapter card.

If the end-to-end SAN plumbing appears to be properly configured and operational, there is always the magic wand of system reboot. It may not be the weapon of choice for administrators in a production environment, but since the dawn of networking, all sorts of mysterious problems have been dispatched by the on/off switch.

Although SAN vendors have attempted to provide more advanced diagnostic capabilities in their products, the ability to crack frames and provide protocol decode at multigigabit speeds would add significant cost to any product. A full diagnostic of a storage network therefore requires protocol analyzers and either in-house expertise or a contracted support organization with trained personnel. In either case, low-level diagnostics are an expensive and often unadvertised component of an enterprise SAN.



Designing Storage Area Networks(c) A Practical Reference for Implementing Fibre Channel and IP SANs
Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel and IP SANs (2nd Edition)
ISBN: 0321136500
EAN: 2147483647
Year: 2003
Pages: 171
Authors: Tom Clark

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net