4.2 Arbitrated Loop

Compared with point-to-point configurations, arbitrated loops provide more flexibility and support for more devices. Although most SAN offerings use fabric switches for connectivity, arbitrated loop is still commonly used for JBODs (just a bunch of disks) and for mass storage to support NAS processors.

Arbitrated loop is a shared, gigabit transport. As with shared Ethernet or Token Ring segments, the functional bandwidth available to any individual loop device is determined by the total population on the segment and the level of activity of the other participants: more active talkers, less available bandwidth. A 1Gbps arbitrated loop with 50 equally active nodes, for example, would provide 100MBps/50, or only 2MBps functional bandwidth per node. Arbitrated loop would therefore not be a popular choice for SANs were it not for the fact that a typical storage network has relatively few active contenders for bandwidth. Although a single loop may have more than a hundred disk drives, there are usually no more than four to six initiators making requests to those drives. Thus, you can design large configurations on a single loop without dividing the bandwidth down to the level of ordinary Ethernet.

Because the transport is shared, some means must be provided for orderly access to the media. In arbitrated loop, you gain media access through an arbitration protocol. After an NL_Port has arbitrated and won control of the transport, it has the full bandwidth available for its transaction. When the transaction is complete, the NL_Port closes the temporary connection, making the transport available to others.

4.2.1 Loop Physical Topology

Arbitrated loop is a true physical loop, or ring, created by tying the transmit lead of one NL_Port to the receive lead of its downstream neighbor. The neighbor's transmit is, in turn, connected to the receiver of yet another NL_Port, and so on, until the circle completes at the original NL_Port's receiver. In this way, a continuous data path exists through all the NL_Ports, allowing any device to access any other device on the loop, as illustrated in Figure 4-2.

Figure 4-2. A daisy chain arbitrated loop between a server and two storage arrays

graphics/04fig02.gif

The first arbitrated loops were built in a daisy chain configuration, using copper or fiber-optic cabling to create the loop of NL_Ports. Several problems quickly arose. Powering off or disconnecting a single node would break the chain and thus crash the loop. A break in cabling or a faulty transceiver anywhere along the loop would also halt loop traffic and entail tedious troubleshooting to locate the problem. As with the problems encountered in hardwired Token Ring topologies, the overhead and risks associated with dispersed loop cabling promoted the development of centralized arbitrated loop hubs.

Arbitrated loop hubs provide a physical star topology for a loop configuration, bringing each NL_Port's transmit and receive leads to a common location. The internal architecture of a hub completes the connections between transmitters and receivers on a port-by-port basis via mux circuitry, and it finishes the loop by connecting the transmitter of the last hub port (say, port 8) to the receiver of the first (say, port 1). One of the most useful features of a loop hub is bypass circuitry at each port, which allows the loop to circumvent a disabled or disconnected node while maintaining operation. Most unmanaged arbitrated loop hubs also validate proper gigabit signal ing before allowing a device to insert into the loop, whereas managed hubs provide additional functionality. These features are described in more detail in Chapter 5.

Because an arbitrated loop hub supplies a limited number of ports, building larger loops may require linking multiple hubs. This is called hub cascading. As shown in Figure 4-3, a cascade is simply a normal cable connection between a port on one hub and a port on another. No special cable is required, although to minimize potential ground loop and noise problems, fiber-optic cabling is recommended instead of copper. Cascading consumes one port on the first and last hubs in a chain, and two ports on intervening ones. Cascading three 8-port hubs, for example, would yield 20 usable ports, with one-fifth of the total ports sacrificed to achieve the cascade. Depending on the vendor, hubs can be cascaded to the arbitrated loop maximum of 127 ports, although the advisability of doing so should be application-driven. Just because you can build some configurations does not mean that you should.

Figure 4-3. Cascaded arbitrated loop hubs

graphics/04fig03.gif

Cascading one hub to another extends the loop through the additional ports on the downstream hub. You can achieve a similar effect by inserting a JBOD into a hub port. Although the link between the hub and the JBOD consists of a single cable pair, the JBOD itself is composed of a series of arbitrated loop disks daisy chained (transmit to receive) together. The transmit and receive leads of the JBOD interface to the hub represent an entire cluster or loop segment of multiple NL_Ports and not just a single NL_Port. The loop is thus extended through the JBOD enclosure, and the loop population is increased by the number of drives in the JBOD chassis. This is an important consideration when you calculate hub port requirements and optimal loop size.

Arbitrated loop standards provide address space for up to 127 devices (126 NL_Ports and 1 FL_Port) on one loop. Fibre Channel specifications allow for 10km runs over single-mode cabling and longwave fiber-optic transceivers. You are advised not to combine these two concepts. Even a few 10km links on a single loop can severely impede loop performance, because each 10km link incurs a 50-microsecond propagation delay in each direction. Every transaction on the loop would have to traverse the extended links, multiplying the effect of each transit delay by the number of transactions. Long haul requirements for disaster recovery or campus networks are better served with dedicated fabric switch ports.

4.2.2 Loop Addressing

An NL_Port, like an N_Port, has a 24-bit port address. If no switch connection exists, the upper two bytes of this address are zeroed to x'00 00'. This arrangement is referred to as private loop because devices on the loop have no connection to the outside world. If the loop is attached to a fabric and if an NL_Port supports fabric login, the switch assigns a positive value to the upper two bytes (and possibly the last byte). This mode is called public loop because fabric-capable NL_Ports are members of both a local loop segment and a greater fabric community, and they need a full 24-bit address for identity in the network. In the case of public loop assignment, the value of the upper two bytes represents the loop identifier and is common to all NL_Ports on the same loop that performed login to the fabric.

In both public and private arbitrated loops, the last byte of the 24-bit port address is referred to as the arbitrated loop physical address, or AL_PA. The AL_PA is acquired during initialization of the loop; in the case of fabric-capable loop devices, the switch can modify the AL_PA during login. The 1-byte AL_PA provides a compact addressing scheme and lets you include a device's identity as part of a 4-byte ordered set. In fact, an ordered set can include two AL_PAs, identifying both source and destination devices. The ordered set for open full duplex, for example, is K28.5 D17.4 AL_PD AL_PS, with AL_PD representing the destination address and AL_PS representing the source address.

The total number of AL_PAs available for arbitrated loop addressing is 127. This number was not determined by rigorous performance testing on assorted loop topologies, nor was it calculated on theoretical throughput given various loop populations. Instead, it is based on the requirements of 8b/10b running disparity between frames.

As a frame terminates with an end of frame character, the EOF forces the current running disparity to be negative. By Fibre Channel standards, each transmission word between the end of one frame and the beginning of another frame should also leave the running disparity negative. This function is provided by the IDLE ordered set, which has a fixed format of K28.5 D21.4 D21.5 D21.5. The special K28.5 leaves running disparity positive. The D21.4 leaves the running disparity negative. The D21.5 characters used for the last two bytes are neutral disparity. The net result is a negative running disparity at the end of the IDLE transmission word.

Because the loop-specific ordered sets may include AL_PAs in the last two byte positions, negative running disparity is facilitated if these values are neutral. In the open full duplex ordered set just cited, for example, the D17.4 character following the special K28.5 would leave the running disparity negative. If the destination and source AL_PAs are neutral disparity, the Open transmission word will leave the running disparity negative. This satisfies the requirement for the next start of frame (SOF).

If all 256 possible 8-bit bytes are dispatched to the 8b/10b encoder, 134 of them will emerge with neutral disparity characters. Fibre Channel claims some of these for special purposes. The remaining 127 neutral disparity characters have been assigned as AL_PAs.

Thus, the number 127 is not a recommended load for a Fibre Channel loop transport. It is simply the maximum number (minus reserved values) of neutral disparity addresses that could be assigned for loop use. At higher Fibre Channel speeds, such as 200MBps, having 127 active loop participants may be quite reasonable or may even be considered inadequate for some needs.

Because the AL_PA values are determined on the basis of neutral disparity, a listing of hex values of AL_PAs seems to jump randomly over some byte values and not others. Listed sequentially, the hex value of AL_PAs would begin 00, 01, 02, 04, 08, 0F, 10, 17, 18, 1B, 1D,…. The gaps in the list represent byte values that, after 8b/10b encoding, result in nonneutral disparity characters. This is significant for some Fibre Channel disk drives, which allow the user to set jumpers or dip switches on a controller card to manually assign a fixed AL_PA. Typically, the jumper positions correspond only to an index of AL_PA values and do not correspond to actual hex values (in most network equipment they correspond to the actual hex values).

Arbitrated loop assigns priority to AL_PAs based on numeric value. The lower the numeric value, the higher the priority. AL_PA priority is used during arbitration to give advantage to initiators such as file servers and fabric loop ports. An FL_Port by default has the address x'00', which gives it the highest priority over all other NL_Ports. When arbitrating against other devices for access to the loop, the FL_Port will always win. This helps to ensure that a valuable resource such as a switch can quickly serve the loop and then return to fabric duties. During address selection, as shown in Figure 4-4, servers typically attempt to take the highest-priority, lowest-value AL_PAs, whereas disk arrays take lower-priority, higher-value AL_PAs. A server with an AL_PA of x'01' will have a statistically higher chance of winning arbitration against lower-priority contenders, although arbitrated loop also provides safeguards against starvation of any port.

Figure 4-4. AL_PA assignment on a small arbitrated loop

graphics/04fig04.gif

An NL_Port's AL_PA can change with every initialization of the loop or reset of the device. On the surface, this may seem disruptive, but dynamic address assignment by the topology itself greatly reduces administrative overhead. As anyone who has had to reconfigure an IP network can testify, off loading low-level address administration to the topology is highly desirable. Arbitrated loop initialization guarantees that each attached device will have a unique AL_PA. Potential addressing conflicts are possible only when you join two separate loops (for example, by cascading two active hubs) without initialization. Some hub vendors have responded to this problem by incorporating an initialization sequence whenever a cascade condition is sensed.

4.2.3 Loop Initialization

Loop initialization is an essential process for allowing new participants onto the loop, assigning AL_PAs, providing notification of topology changes, and recovering from loop failure. Following loop initialization, the loop enters a stable monitoring mode and begins (or resumes) normal activity. Depending on the number of NL_Ports attached to the loop, an entire loop initialization sequence may take only a few milliseconds. For Sun Solaris servers, a loop initialization may result in a message posted to the event log. For NT servers, loop initialization is largely ignored. In either case, a loop initialization on an active loop normally causes a brief suspension of activity, which resumes after initialization is complete.

A loop initialization can be triggered by a number of causes, the most common being the introduction of a new device. The new device could actually be a former participant that has been powered on, or an active device that has been moved from one hub port to another.

A number of ordered sets have been defined to cover the various conditions that an NL_Port may sense as it launches the initialization process. These ordered sets, called loop initialization primitive sequences, are referred to collectively as LIPs. An NL_Port issues at least 12 LIPs to start loop initialization. In the following examples, we assume a Fibre Channel host bus adapter installed in a file server:

  • An HBA that is attached to an active loop and is power cycled will, upon bootup, start processing the incoming bit stream. The presence of a valid signal and protocol verifies that the server is on an active loop. Because the server was powered down, however, the HBA has lost the AL_PA that it was previously assigned. That previously assigned AL_PA was stored in a temporary register in the HBA, and the register was wiped clean by the power cycle. The HBA immediately begins transmitting LIP(F7, F7) onto the loop. The xF7 is a reserved, neutral disparity character. The first occurrence of xF7 indicates that the HBA recognizes that it is on an active loop. The second xF7 indicates that the HBA has no AL_PA.

  • An HBA attached to an active loop is moved from one hub port to another. When the cable is unplugged from the hub and moved to the other port, the HBA temporarily loses Fibre Channel signal. Upon reinsertion, the HBA sees valid signal return and begins processing the bit stream. In this instance, the HBA still has its previously assigned AL_PA, and so it begins transmitting LIP(F7, AL_PS) onto the loop. The xF7 indicates that the HBA sees the active loop. The AL_PS is the source AL_PA of the LIP that is, the HBA's previously assigned AL_PA. In this example, the HBA is not issuing LIPs to acquire an address but to notify the loop that a topology change has occurred.

  • The receiver of the HBA is broken, or the receive cable is broken, and the server has been power cycled. In this instance, the HBA does not see valid signal on its receiver and assumes that a loop failure has occurred. It also does not recall its previously assigned AL_PA. The HBA therefore starts streaming LIP(F8, F7) onto the loop. The xF8 is another reserved, neutral disparity character that is used to indicate a loop down state. The xF7 indicates that the HBA has no AL_PA.

  • In the same scenario as the preceding, if the HBA still has a previously assigned AL_PA, it will issue a LIP(F8, AL_PS). The F8 indicates that the HBA senses loop failure. The AL_PS is the source AL_PA of the alert.

Of the conditions in this list, the most insidious for arbitrated loop environments is the LIP(F8) stream. A node issuing a normal LIP(F7) will trigger, at most, a temporary suspension of loop operations until the initialization process is completed. A node issuing LIP(F8)s, however, will continue streaming "loop down" alarms as long as it cannot recognize loop activity on its receiver. If the node's transmitter is connected to an active loop, all NL_Ports will enter a suspended initialization state and will continue to forward the offender's LIP(F8) stream, as shown in Figure 4-5. Normal loop initialization cannot complete, and the loop in fact fails. This has been another challenge for vendors of arbitrated loop hubs. Some have responded with auto-recovery policies that automatically bypass a port that is streaming LIP(F8)s.

Figure 4-5. An NL_Port streaming LIP(F8)s onto an arbitrated loop

graphics/04fig05.gif

In addition to loss of signal, an NL_Port may issue LIP(F8) if no valid ordered sets are present on the loop. This may occur if an upstream node is corrupting the bit stream because of excessive jitter or a malfunction of processing logic. Other conditions may trigger LIPs, including a node's inability to successfully arbitrate for loop access. Arbitrated loop provides a fairness algorithm for media access, but if a participant is not playing fair, others on the loop may issue LIPs to reinitialize a level playing field. Arbitrated loop also provides a selective reset LIP that is directed by one NL_Port to another. How the reset is implemented is vendor-specific, but the selective reset LIP(AL_PD, AL_PS) may cause the target device to reboot. This allows one NL_Port to force a misbehaving NL_Port into a known good state.

The loop initialization process begins when an NL_Port streams at least 12 LIPs onto the loop. As each downstream device receives the LIP stream, it enters a state known as Open-Init, which suspends any current operations and prepares the device for the loop initialization procedure. The LIPs are forwarded along the loop until all NL_Ports, including the originator, are in an Open-Init condition.

At this point, the NL_Ports need someone to be in charge. Unlike Token Ring, arbitrated loop has no permanent master to monitor the topology. Loop initialization therefore provides a selection process to determine which device will be the temporary loop master. After it is selected, the loop master is responsible for conducting the rest of the initialization procedure and returning the loop to normal operation (see Figure 4-6).

Figure 4-6. Steps in loop the initialization sequence

graphics/04fig06.gif

The loop master is determined by a subroutine known as the loop initialization select master procedure, or LISM. Each loop device vies for the position of temporary master by continuously issuing LISM frames that contain a port type identifier (x'00' for FL_Port, x'EF' for NL_Port) and its 64-bit World-Wide Name (WWN). When a downstream device receives a LISM frame from an upstream partner, it first checks the identifier. If the identifier is x'00', a fabric is present and the device ceases is own LISM frame broadcast and begins issuing the FL_Port's LISM frame. If the identifier is a standard NL_Port, the downstream device compares the WWN in the LISM frame to its own. As with AL_PA priorities, the WWN with the lowest numeric value has higher priority. If the received WWN has higher priority, the device ceases its own LISM broadcast and begins transmitting the received LISM. If the received WWN has lower priority, the device throws away the received LISM and continues broadcasting its own LISM.

Eventually, a node will receive its own LISM frame, indicating that it has the highest priority and is therefore temporary loop master. It then begins transmitting a special ordered set, ARB(F0), to notify the others that a temporary master has been selected. When the ARB(F0) circles back to the loop master, the initialization can proceed to the next phase.

The first task of the temporary loop master is to issue a series of four frames that will allow each participant on the loop to select a unique AL_PA. The frame format contains a 128-bit field that represents an abbreviated mapping of all possible AL_PAs. The position of each bit in the AL_PA map corresponds to the sequential list of AL_PAs, beginning with x'00' and ending with x'EF'. When the first frame is issued, all bits are initialized to 0, indicating that no AL_PAs have been selected. As each device picks an AL_PA from the map, the corresponding bit is set to 1.

The header of each AL_PA map frame contains an identifier that defines which of the loop devices is allowed to select an AL_PA. The first frame issued has an identifier of loop initialization fabric address, or LIFA. As this frame circulates the loop, only public loop devices that have previously been assigned an address by the fabric have permission to select a bit corresponding to their original AL_PA. When this frame returns to the loop master, it is reissued with a loop initialization previous address, or LIPA, identifier. Now the private loop devices that remember their previously assigned AL_PAs have an opportunity to reselect them from the map. If by chance two devices previously had the same AL_PA (for example, if two separate, active loops were hot cascaded), the first device to see the frame would be able to reselect it. The second device would see the bit already set to 1 and would have to wait for the next frame.

When the frame is issued for the third time, the identifier is changed to loop initialization hard address, or LIHA. NL_Ports that have dip switch or jumpered addresses (for example, disk drive controllers) can now attempt to select AL_PAs from the map corresponding to their hardwired addresses. If the hard assigned AL_PA is already taken, however, a device must wait for the next addressing frame. The last frame issued has an identifier of loop initialization soft address, or LISA. This frame is for any NL_Ports that did not qualify for or were unsuccessful in the previous rounds. On a populous, previously operational loop, devices that must select from the LISA frame would find only leftovers. Typically, an initiator such as a server will attempt to select a bit corresponding to a higher-priority AL_PA; a target such as a disk may attempt to select a lower-priority AL_PA.

After the LISA frame returns to the temporary loop master, each loop device will have a unique AL_PA. In the original Fibre Channel standard for arbitrated loop (FC-AL-1), the loop initialization process was closed after this phase and the loop returned to normal operation. As an option, vendors could implement an additional subroutine to provide positional mapping of devices along the loop. A positional map is useful for determining how AL_PAs are physically positioned in a loop topology. Knowing which AL_PAs are on which hub ports, for example, gives you the diagnostic capability and the opportunity to fine-tune a loop configuration for optimal performance.

One problem with the positional mapping subroutine, however, is that not all arbitrated loop devices support it. The HP Tachyon chip set, for example, was widely used in host bus adapters and disk controllers and was designed before positional mapping was developed. FC-AL-2 accommodates these devices by providing bits in the LISA frame header that can be set by Tachyon or other nonparticipants. If any device reports that it cannot support positional mapping, the subroutine is abandoned and the temporary loop master closes the initialization process. Otherwise, the loop master issues a frame called a loop initialization report position, or LIRP, frame, which may contain as many as 127 bytes. The temporary loop master inserts its own AL_PA into the first byte position, increments an offset by 1, and passes the frame downstream. As each loop device receives the LIRP, it inserts its AL_PA in the next byte position, increments the offset, and forwards the frame. Eventually the frame fills with a positional map that details how loop devices are physically positioned in relationship to one another. When the positional map is complete, the loop master distributes it in a loop initialization loop position, or LILP, frame, allowing each loop device to copy and process the contents.

As long as the positional map is used for diagnostic or optimization purposes, the failure of Tachyon or other Fibre Channel controllers to support it does not create interoperability problems. Some vendors, however, use the positional map to discover which devices are on the loop. Instead of polling the entire 127 AL_PA address space to discover targets, these implementations poll only those AL_PAs listed in the positional map. Consequently, interoperability has been an issue when older Tachyon devices are mixed with positional map-dependent devices on the same loop.

Following the LILP (if all devices support positional mapping) or LISA (if some do not), the temporary master finishes loop initialization by issuing a close (CLS) ordered set, followed by IDLEs. As each loop device receives the CLS, it leaves the open-init state and resumes normal operation. IDLEs continue to circulate around the loop until any previously suspended operations are resumed or new ones begun.

It is helpful to remember that although this discussion of the initialization process is somewhat lengthy, the actual process completes in mere milliseconds and generally is not disruptive to loop operations.

The variables introduced by loop initialization are important considerations for SAN design. In any network, failures occur most frequently during adds, moves, and changes. Proper selection of components, HBAs, disks, and arbitrated loop hubs will help to ensure that the topology change implied by loop initialization will not impact loop stability.

4.2.4 Port Login

Loop initialization allows each device to select a unique AL_PA and thus avoids addressing conflict on the loop. For example, a loop with a single server and 24 Fibre Channel disk drives will emerge from loop initialization with 25 distinct AL_PAs. Immediately following loop initialization, however, the server has no idea which other devices are on the loop. For an initiator (server) to discover targets (disks), an additional step is required. This function is provided by a port-to-port login process known as N_Port login, or PLOGI. A similar function is provided for fabric-capable devices, known as fabric login, or FLOGI. Both login processes are part of a set of extended link services that are used to exchange communication parameters and identities and thus establish logical sessions between devices on the topology.

In arbitrated loop, PLOGI is usually performed immediately following loop initialization. Because Fibre Channel disk drives do not normally communicate with one another (except in XOR drive configurations), there is no need for a disk to initiate a device discovery. A server, on the other hand, will need to discover all targets on the loop, even if the upper-layer application (such as NT Disk Administrator) assigns only a few for the server's use. In most implementations, the server attempts to establish login sessions with targets by issuing PLOGI frames addressed to each of the 126 possible NL_Port AL_PAs. The targets that accept the PLOGI from the server will return an ACC (accept) frame to the server, informing it of the target's WWN, buffer-to-buffer credit capability, maximum frame size support, and so on. By thus walking the address space, the server finds and establishes sessions with active participants on the loop.

The PLOGI login session is the first step in a series of interactions among loop devices that percolate up through the Fibre Channel hierarchy to FC-4's upper-layer protocol interface. At some point, an association must be made between the link-level AL_PAs and the application's logical definition of SCSI bus, target, and LUN (logical unit number) addressing. The extended link services that interface to FC-2 can determine which devices are out there; then the FC-4 protocol mapping for example, Fiber Channel Protocol for SCSI-3 can determine what can be done with them. The responsibility for maintaining the association between lower-level AL_PAs and upper-level SCSI addressing is assumed by the device driver of the HBA or controller installed in an initiator.

4.2.5 Loop Port State Machine

Arbitrated loop introduces a new functional layer to Fibre Channel architecture. This layer, called the loop port state machine, resides between the FC-1 encoding/decoding function and FC-2's frame management and flow control functions. As shown in Figure 4-7, loop-specific functions are embodied in the loop port state machine, and the state machine in turn is embedded in silicon or microcode in Fibre Channel loop end nodes.

Figure 4-7. Loop port state machine logic

graphics/04fig07.gif

The loop port state machine monitors and performs the actions required for a device to become a loop participant: accessing the loop for transactions, being opened for transactions by other devices, and yielding control of the loop when transactions are complete. These processes are expressed in 11 different states and are controlled by a number of input and output variables. As requests are handed down by the FC-2 layer, the loop port state machine must determine its current state and decide which further actions are required to fulfill the pending request. If, for example, the upper layer has frames to send, the loop port state machine must transition from its normal monitoring state and enter an arbitrating state to gain access to the loop. When it wins arbitration, it changes to an arbitration won state and then changes to an open state as it notifies the target that a transaction is under way. When the frame transmission is complete, the loop port state machine enters a transmitted close or received close state, and finally it returns to monitoring mode.

Depending on its current state, the loop port state machine may not be able to immediately serve upper-layer requests. FC-2 might have frames pending at the same time that the loop port state machine already is in an open state that is, another loop device has it conditioned to receive frames. Or it might be in an open state and in the midst of frame transfer when a LIP is received, in which case the loop port state machine must suspend any current activity and enter the open-init state. The logical transition from one state to another based on current inputs and variables provides an efficient mechanism for orderly conduct of the loop. Early interoperability testing of various vendors' implementations of loop port state machine logic revealed a number of timing and transition issues, but most of those issues, at least at the hardware level, have been resolved.

4.2.6 Arbitration

Because arbitrated loop is a shared transport, gaining access to the loop is a central function of the loop port state machine. Contention for access involves two components: the priority of a loop device's AL_PA, and an access variable that is toggled when a device wins arbitration. An NL_Port that observes fairness in sharing the loop will, after it has won arbitration, only reset its access bit and will be able to arbitrate again when no other devices are arbitrating. This scheme allows even low-priority devices on the loop to win arbitration and thus prevents starvation of any port. Fabric loop ports, however, do not observe fairness. If an FL_Port honored fair access, the switch might become congested as frames queued up at the FL_Port for delivery to the loop. Because an FL_Port is unfair and has the highest-priority AL_PA, it is always assured of winning arbitration whenever it needs loop access.

An arbitrate primitive, ARB(x), is transmitted whenever a loop device needs access to the loop. The actual format of the primitive is K28.5 D20.4 AL_PA AL_PA but is referred to generically as ARB(x) to indicate that each of the last two bytes contains the AL_PA of the arbitrating device. An ARB(x) can be transmitted even if another NL_Port owns the loop, but if frames are traversing the loop the ARB(x) can be issued only between frames. You transmit the ARB(x) by substituting an ARB for each current fill word (CFW). When there is no frame traffic on the loop, the CFW is normally the IDLE primitive. As each IDLE passes through the arbitrating NL_Port, it is replaced with an ARB(x) containing the AL_PA of the arbitrator. If another device already possesses the loop, the CFW will be ARB(F0). The x'F0' in this primitive has the lowest priority and is used as an indicator for the fairness algorithm. As you will see, the arbitrating device will substitute its own ARB(x) for the ARB(F0) received.

If no other device is arbitrating, the NL_Port will transmit an ARB(x) with its own AL_PA value, and, as the ARB(x) circles the loop, it will be returned to the sender. The NL_Port then transitions to the ARB_WON state and proceeds to an OPEN condition to send frames. If two or more devices are arbitrating at the same time, the NL_Port with the higher-priority AL_PA will win. When an arbitrating device receives an ARB(x) from an upstream partner, it compares the value of the AL_PA to its own. If the received ARB(x) carries a higher-priority AL_PA, the device must forward it on. If the received ARB(x) contains a lower-priority AL_PA, the device replaces the received ARB(x) with its own. In this way, only the NL_Port with the highest-priority AL_PA will receive its own ARB(x) and thus win arbitration. The contending arbitrators will continue issuing their own ARB(x)s and will continue to substitute their ARB(x) for any received lower-priority ARB(x) until one of them, too, eventually wins arbitration.

Fairness is monitored by the presence of the ARB(F0) primitive on the loop. When a loop device wins arbitration, it sets its access variable to 0. As long as the access bit is 0, the device cannot arbitrate again. The winning device also begins substituting all current fill words it receives with ARB(F0). These ARB(F0) primitives are forwarded by nonarbitrating devices around the loop. If another NL_Port begins arbitrating, it will compare the AL_PA value in any received ARB(F0) to its own. The xF0 will always have lowest priority, and consequently the contending NL_Port will substitute its own ARB(x) for the stream of ARB(F0)s it receives. When the contender's ARB(x) is received by the current arbitration winner, it informs the current loop owner that another device is still arbitrating. The current loop owner discards the received ARB(x) and again substitutes ARB(F0).

When the current owner is finished with the loop, it yields control of the loop to others. Its access variable, however, is still set to 0 and cannot be reset as long as ARB(F0)s are circulating. The next highest-priority AL_PA that is arbitrating will immediately win and will begin the same process of ARB(F0) substitution. As long as active arbitrators are contending for loop access, ARB(F0)s will continue to stream from the current winner, and this, in turn, will keep the access bits of all previous winners set to 0. The duration of this activity is called the access fairness window. The window is closed only when a current winner receives an ARB(F0). The fact that no other loop device has substituted an ARB(x) for the ARB(F0) in transit back to the current winner confirms that no other devices are arbitrating. When the current winner closes its transaction and yields control of the loop, it ceases issuing ARB(F0)s and replaces the current fill word with IDLEs. As each previous winner receives IDLEs, its access bit is reset to 1 and it is now free to arbitrate at will.

The arbitration strategy has design implications for SANs because it may be helpful for manipulating fairness to increase performance. Allowing certain file servers to be unfair will increase their access to disks vis-à-vis less critical servers.

4.2.7 The Nonbroadcast Nature of Arbitrated Loop

Shared LAN topologies such as Ethernet and Token Ring are broadcast transports in that data sent by any device on a segment is broadcast to all other devices on the same segment. Shared (nonswitched) Ethernet is an obvious example because an Ethernet transmission is sent along a common wire for all to hear. Each device along the wire processes the transmission to determine whether the destination MAC address matches its own. Token Ring employs a similar transmit-to-receive cabling scheme as arbitrated loop. A Token Ring device waits for a free token to pass by, marks the token "busy" and appends its data, and forwards the frame downstream. As each Token Ring device receives the frame, it examines the destination MAC address for a match. If the frame is addressed to the device, it marks the frame "copied" and forwards it on to the ring. Eventually the original frame returns to the sender. The sender sees that the frame was copied, removes the frame from the ring, and issues a new free token.

Because transmissions are visible to any device on a shared Ethernet or Token Ring segment, capturing traffic for problem diagnosis is relatively straightforward. You can plug in Sniffer, made by Network Associates, to any hub port for data capture of all segment activity. Multiple conversations among multiple pairs of devices can be captured on a single trace without moving the Sniffer from port to port. After the capture is complete, decoding of frames and analysis of protocol or performance problems can proceed.

Arbitrated loop, unlike shared Ethernet or Token Ring, is a nonbroadcast transport. As shown in Figure 4-8, when an NL_Port wins arbitration and opens a target for frame transmission, a frame or series of frames can be sent from initiator to target. Intervening loop devices in the path between the two will see the frames and forward them on. The target or recipient, however, removes the frames from the loop and simply issues an R_RDY or CLS primitive to the sender. The loop devices downstream from the recipient will forward the primitive back to the initiator, but they have no visibility to the frame transaction that occurred.

Figure 4-8. Frame transit in an arbitrated loop

graphics/04fig08.gif

Because frames are not broadcast throughout the loop segment, it is more challenging to capture traffic for problem diagnosis in arbitrated loops. With most Fibre Channel analyzers, you can capture only the traffic going into and out of a single port. Instead of simply plugging analyzer probes into any available hub port, you must place them in ports immediately preceding and following the monitored port, or you must configure them inline on the monitored device's link. This requirement alone is disruptive to loop traffic and may in fact mask an intermittent problem by altering the topology. Some vendors of arbitrated loop hubs have responded to these difficulties by engineering analyzer functionality into the hub itself.

During normal loop operation, the nonbroadcast nature of arbitrated loop enhances performance by removing the overhead of frame handling from at least part of the loop.

4.2.8 Design Considerations for Arbitrated Loop

Before the development of stable fabric switches, arbitrated loop was the most commonly deployed Fibre Channel storage topology. Loop hubs costing less than a thousand dollars were used to join hundreds of thousands of dollars' worth of servers and storage arrays into a SAN. The transition to fabric switches was accompanied by marketing campaigns highlighting LIPs and the vulnerability of loops to disruption. By that time, however, Fibre Channel arbitrated loop had achieved a fairly high level of interoperability, and managed loop hubs provided safeguards against disruptive behavior. Now arbitrated loop has been relegated to lower-end SAN solutions, to JBODs, and to storage backplanes of some RAID and NAS devices. From a SAN design standpoint, though, loops should still be considered, especially for price-sensitive configurations.

Arbitrated loop is not suited to some application requirements, but it provides an economical solution for a variety of shared storage needs. The following are among the application-driven criteria that you should consider when implementing arbitrated loop:

  • Types of devices per loop segment

  • Private and public loop support

  • Total number of loop devices per segment

  • Bandwidth requirements

  • Distance requirements

  • Managed or unmanaged environments

  • High-availability requirements

Additional issues also factor in to the design equation, especially because SANs must often support multiple, sometimes contending applications concurrently. Balancing the needs of each application on a common topology may be challenging, and that is why complex SANs are often constructed with a combination of shared loop segments and fabrics.

Types of Devices per Loop Segment

An arbitrated loop may support a variety of devices, including host bus adapters installed in servers, individual Fibre Channel disk drives, JBODs, Fibre Channel RAID arrays, native Fibre Channel tape subsystems, and Fibre Channel-to-SCSI bridges. At the link level, each device type simply appears as one or more AL_PAs, or as peers on a common transport. At the upper-layer protocol, the device types divide into a less egalitarian society of initiators and targets.

Because applications sit on top of the upper-layer protocol of initiators, applications typically determine the traffic patterns to and from targets. Read-intensive applications such as data mining, for example, create a traffic flow from target to initiator. Tape backup applications create a flow from initiator to target or, in the case of third-party copy backup, from target to hybrid target. For single-initiator loops, mixing multiple device types on the same topology has little impact, because the initiator is responsible for launching all transactions (such as reads and writes to disk or tape). The physical positioning of devices on the loop can therefore be optimized for the dominant application of that initiator. In multi-initiator environments, optimization via physical positioning of devices is more difficult.

Tape backup presents a special problem for multi-initiator loops. Tape subsystems connected through Fibre Channel-to-SCSI bridges or native Fibre Channel interfaces are not particularly fast (typically less than 15MBps) and so pose no bandwidth issues. A streaming tape backup, however, may not tolerate interruptions due to loop reconfiguration. Such disruptions may abort the entire process, defeating tape's prime directive of data integrity. In the more dynamic and populous loop environment implied by a multi-initiator configuration, the statistical occurrence of interruptions in the form of LIPs increases. Installing a new drive in a JBOD while a tape is backing up another array, for example, will initiate a LIP throughout the loop segment and potentially will abort the backup process. Therefore, it is strongly recommended that you segregate tape backup systems to fabric ports or restrict LIP propagation via intelligent hubs (provided that you manage the AL_PAs via hard addressing).

If multiple initiators and multiple targets share the same loop, can all servers access the same drives? The topology guarantees that they can but does not guarantee the consequences if they do. Arbitrated loop does not provide intrinsic file locking, file permission monitoring, volume ownership, or any other feature that would prevent two servers from overwriting data on the same drive. If data sharing among multiple initiators is required, you can do it using middleware, usually in the form of applications that sit between the operating system and SCSI-3. Otherwise, you must administer the servers manually to ensure that each target is owned by a single initiator.

Windows NT aggravates this problem because NT wants to own everything it sees. When an NT server boots on an arbitrated loop (or fabric), NT Disk Administrator queries and reports all Fibre Channel-attached drives. If the drive has not previously been accessed or formatted for NT, Disk Administrator immediately prompts the user to write a volume label on the disk so that it can be accessed. If the user consents, a volume label is written to the first sector of the disk, and if the disk actually belongs to a Solaris server, that will make the disk and its files unusable. This problem has been fixed with newer versions of Microsoft operating software, such as .NET.

Although the number of initiators on the same segment is important in terms of administering disk ownership, you should also consider hetero geneous operating systems and their appetites. Fibre Channel fabrics and switching hubs with port zoning (discussed later) are possible solutions for mixed OS environments.

Private and Public Loop Support

The bandwidth and population restrictions of private arbitrated loop can satisfy most requirements of applications used within departments. With one to four servers and several RAID or JBOD arrays, a customer service, engineering, or similar department would have adequate storage and response time for data access.

Support for fabric login and public loop becomes meaningful when requirements for bandwidth allocation, population, or distance exceed the capabilities of a single loop. Introducing a fabric opens new design possibilities for loop configurations, but it also mandates fabric support for HBAs, disks, and other loop devices. With a large installed base of legacy nonfabric, private loop devices in the market, some switch vendors have engineered support for both public and private loop devices on the same fabric.

For new SAN installations, it is preferable to select components that support public loop, even if the initial configuration is a stand-alone arbitrated loop. This is not a consideration for selecting arbitrated loop hubs, because they play a passive role in the loop and have no login functions. Selecting host bus adapters and storage devices with fabric login capability, however, will provide additional flexibility in evolving the SAN and will extend the life of the investment.

Total Number of Loop Devices per Segment

As discussed previously, the 126 NL_Port and 1 FL_Port capacity of a single arbitrated loop was not derived from performance calculations but is the result of neutral disparity requirements. What is a reasonable maximum number of devices that can be configured on a single loop? It depends on the application. Theoretically, a SAN design could specify a lone file server and 125 disks on a single arbitrated loop. Some current vendor implementations, in fact, have a single server with more than 90 Fibre Channel drives on a single loop. These large loop configurations are not unreasonable when only one or two file servers are present and when they are not aggressively contending for bandwidth. The obvious benefit of such large loops is storage capacity. More than a terabyte of storage can be provided by 125 ten-gigabyte Fibre Channel drives.

Arbitrated loops, especially ones with multiple initiators, normally do not exceed 20 30 devices. Most are much smaller, in the 2 10 device range, depending on whether Fibre Channel RAIDs or JBODs are used for storage. Host bus adapters are capable of driving more than 97MBps on a 1Gbps loop, or nearly the full Fibre Channel bandwidth. Fibre Channel drives can provide 15MBps 18MBps throughput each. If a server uses software RAID to stripe data blocks across multiple drives, a JBOD of 8 drives can fulfill server requests at nearly the same throughput as the HBA. From a performance standpoint, adding more JBODs to this 9-node loop will not significantly affect total throughput. It will simply make additional storage available. Adding initiators, however, may degrade performance if the new servers are equally contending for access and if the application has high bandwidth requirements (for example, video).

Bandwidth Requirements

In the example just mentioned, a configuration with one server (equipped with an efficient HBA) and an 8-drive JBOD provides enough throughput to nearly saturate a 100MBps Fibre Channel pipe on a 1Gbps loop. For the pipe to be fully utilized, the application would have to drive a sustained access to disk. In reality, this occurs only with specialized applications such as multiple streams of full-motion video. At 100MBps, arbitrated loop can support several video streams, primarily because such streams utilize the maximum 2,112-byte frame payload and require less command overhead per transaction.

Most enterprise applications do not require sustained throughput. Radiology, geological resource mapping, prepress, and other applications that require very large file transfers, for example, tend to be bursty, with periods of extremely high utilization followed by periods of inactivity. It is difficult to determine the total bandwidth requirements of such applications because averaging the total data requirement over time does not address the bandwidth needs of random bursts of data. Other applications such as online transaction processing (OLTP), Internet service provider (ISP) Web servers, and relational database queries may have more predictable bandwidth requirements.

Distance Requirements

Arbitrated loop is a closed ring topology, with the total circumference determined by the distances between nodes. At gigabit speeds, signals propagate through copper media at 4 nanoseconds per meter, and through fiber-optic media at 5 nanoseconds per meter. You can easily calculate the total propagation delay incurred by the loop's circumference by multiplying the lengths (both transmit and receive) of copper and fiber-optic cabling deployed by the appropriate delay factor. A single 10km link to an NL_Port, for example, would cause a 50-microsecond propagation delay in each direction, or 100 microseconds total. This is the equivalent of 1MBps of bandwidth consumed to accommodate the link. In practice, the propagation delay penalty is negligible compared with the degradation incurred by long links at the protocol level. Because all transactions must traverse a larger circumference, performance may decline by as much as 40 percent.

Managed or Unmanaged Environments

Advances in the design of arbitrated loop hubs have resulted in products with various levels of management capability, from simple enclosure services to enhanced analyzer-type functionality. Managed hubs usually provide a graphical interface, typically written in Java or platform-specific programming languages. When selecting loop hubs, you should consider, in addition to port density and port cost, the management features available in various products.

Not all loop environments require management. Applications that support less critical business operations, loops supporting only a few devices, homogeneous (single-vendor) configurations, and so on, may function quite well without management. Mission-critical applications, populous loops, and heterogeneous (multivendor) configurations, however, almost demand the higher level of visibility that management provides. Fibre Channel hardware components, like networking products in general, are very stable and have a high mean time between failure (MTBF). Even then, eventually a cable will be pulled inadvertently or a host bus adapter will misbehave. When something does break, products with good management features can help to reduce down time and restore the loop to operation.

High-Availability Requirements

With its parallel cabling scheme, the traditional SCSI architecture does not lend itself to high-availability configurations. The networking characteristics of SANs make it much easier to design and implement high availability to storage. A common configuration for arbitrated loop involves dual-provisioning HBAs in each server and installing two arbitrated loop hubs for redundant paths, as shown in Figure 4-9. Fibre Channel disk drives typically provide an "A" and a "B" channel for dual loop attachment. This configuration provides redundant data paths as well as redundant loop hubs, transceivers, cables, and power supplies. With the appropriate software running on the host, a failure of one loop automatically routes data to the standby loop.

Figure 4-9. Redundant loop configuration

graphics/04fig09.gif

High-availability configurations using dual loops are given additional reliability if managed hubs are used for each loop. Knowing the status of both loops (for example, knowing that the standby loop itself has not failed) provides a much higher level of stable operation.



Designing Storage Area Networks(c) A Practical Reference for Implementing Fibre Channel and IP SANs
Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel and IP SANs (2nd Edition)
ISBN: 0321136500
EAN: 2147483647
Year: 2003
Pages: 171
Authors: Tom Clark

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net