Section 8.2. InfiniBand

   

8.2 InfiniBand

InfiniBand is a new standard (and architecture) that defines a switched-fabric interconnect between a host and storage or network peripheral devices. The InfiniBand Trade Association (IBTA) drives the InfiniBand specification. The IBTA is an association that came into existence through the merger of two competing specifications: Future I/O led by Intel, and Next Generation I/O led by IBM, HP, and Compaq.

InfiniBand supports several changes, one of which is the replacement of an I/O bus (such as PCI) with a switched-fabric network. A switched serial fabric has several advantages over an I/O bus, the most notable being that a fabric can support more devices over vastly greater distances using a significantly lower number of electrical pins. Not only that, but a fabric can support simultaneous multiple data transfers and also provide fault tolerance. Before diving into the details of InfiniBand architecture, it is worthwhile to recollect the limitations of the PCI (Peripheral Component Interconnect) bus.

Although PCI has its merits ”most notably that it replaced a slew of competing standards (ISA, EISA, MCA) with one standard that provided additional functionality ”the fact is that PCI seems to have limitations, especially given the rapid advances made in CPU, memory, and peripheral technology. PCI limitations can be briefly summarized as follows :

  • Although PCI was fast when it was newly introduced, it no longer meets the bandwidth requirements ”the CPU memory front-side bus now has speeds of up to 1,066 Mbps ”and Gigabit Ethernet NICs and high-end storage devices (SCSI 3) can be choked by PCI bus speed.

  • PCI has problems with manageability in general and failure detection in particular. A single bad PCI card can bring the whole system down, yet it is hard to determine which particular card is faulty.

  • There are physical limitations in terms of how long the bus can be, how fast the bus can be, and the number of buses allowed. At the fastest bus rate, one may be able to connect only a single peripheral (device).

Note that although InfiniBand was initially also positioned as a PCI replacement, the advent of 3GIO has reduced the importance of InfiniBand as a PCI replacement.

8.2.1 InfiniBand Advantages

InfiniBand offers numerous advantages, including

  • Reduction in cabling complexity because InfiniBand can replace three cables ”Ethernet, storage, and an interprocess communication cable ”with a single interconnect. The result can be considerable simplification in silicon on the blade , backplane wiring density, and complexity. Further, the blades connect to a couple of edge connectors, and only a couple of high-bandwidth interconnects come out of the rack.

  • Built-in fault detection, which makes it easy to quickly identify a failed component.

  • Reduced memory bandwidth consumption because of fewer memory-to-memory copy operations.

  • Reduced number of context switches, including user and kernel mode switches.

  • Reduced protocol overhead ”for example, TCP/IP checksum calculations.

  • Provision for high availability with the alternative routes in a fabric environment and redundant components such as routers and switches. The redundant paths also cater to multipath and load-balancing solutions. Further, InfiniBand components allow hot plug, and they unplug easily.

  • InfiniBand can facilitate having a blade boot from an external device without any loss of efficiency, thus reducing blade silicon even more.

8.2.2 InfiniBand Architecture

The InfiniBand architecture specifies a point-to-point logical connection topology made over a switched fabric. The switched fabric itself has the following primary components:

  • Host channel adapters

  • Target channel adapters

  • InfiniBand switches

  • Physical media

Figure 8.9 shows the InfiniBand I/O bus and the relationship of the various components to each other. The following text discusses some of the components depicted in this figure.

Figure 8.9. InfiniBand I/O Bus

graphics/08fig09.gif

A host channel adapter ( HCA ) is a device that acts as a connection point between a host CPU and the InfiniBand fabric. Host channel adapters are closely associated with servers and tend to be located close to them. An HCA has built-in intelligence that will minimize interrupts and operating system involvement and can accomplish transfers directly into memory. Each HCA has a unique IPv6-based identifier associated with it.

A target channel adapter ( TCA ) is the equivalent of an HCA, but for a peripheral device instead of for a host CPU. Target channel adapters are associated with storage peripherals and tend to be located close to them. Just like an HCA, each TCA has a unique IPv6-based identifier associated with it.

An InfiniBand switch provides a means to connect multiple TCAs to a single HCA. Switches also provide functionality for routing among multiple logical divisions of the fabric (called subnets ). Whereas InfiniBand switches provide connectivity within a subnet, InfiniBand routers provide connectivity between subnets. Routers typically are implemented such that they are a superset of switch functionality.

InfiniBand allows for the use of off-the-shelf copper or optic cables. The links can be up to 17 meters for copper wire and 100 meters for optical cables. InfiniBand specifies a connection consisting of varying number of links that each operate at 2.5 Gbps. The supported combinations run from 1X (1 wire), to 4X (4 wires) for a 10 Gbps connection, to 12X (12 wires), up to 30 Gbps.

The physical entities that collectively constitute the InfiniBand architecture have just been described, but the InfiniBand architecture also specifies some logical concepts and entities, in addition to the physical components.

8.2.2.1 InfiniBand Link and Network Layers

At the link layer, each InfiniBand link is subdivided into a maximum of 16 lanes and a minimum of 2 lanes. One of the lanes is always dedicated to fabric management. Lanes receive QoS (quality of service) priorities, and a management lane (virtual lane 15; lanes 0 to 14 are data lanes ) has the highest priority and QoS.

Note that whereas a link is a physical entity, a lane is a logical entity. A lane allows two endpoints to communicate with each other. Two communicating endpoints can support a different number of virtual lanes, and InfiniBand defines an algorithm to handle this situation. The communicating endpoints are called a queue pair . The queue pairs have send and receive buffers associated with them. Each virtual lane (VL) has its own credit-based flow control. The management packets provide for device enumeration, subnet management, and fault tolerance.

InfiniBand supports collision-free [2] simultaneous data transfer operations. To ensure reliable data transmission, InfiniBand uses end-to-end flow control and not one, but two, CRC checks. It accomplishes flow control by having the receiving node set aside multiple buffers for data reception and communicate the number to the sender. This number represents the number of data buffers that the sender can send without receiving acknowledgments. Two CRC values are calculated and included in the data transmission, regenerated by the receiver, and then compared with the received value. A 32-bit CRC is created for end-to-end communication. Meanwhile, data may also pass through intermediate nodes. A 16-bit value exists between the intermediate and end node or between two intermediate nodes.

[2] Although InfiniBand does not have Ethernet-like collisions and random transmission delays to avoid these collisions, it does have flow control mechanisms and may delay data transmission when required. To an application, the effect is the same, a noticeable delay in data transmission and reception, and a poor throughput of data.

InfiniBand defines the basic unit of communication to be a message . A message may be sent or received via an RDMA operation, a send or receive operation, or a multicast operation. RDMA (remote direct memory access) is direct exchange of a message from one node memory to another without operating system interrupts and services being required. InfiniBand defines six communication modes for data transfer:

  1. Reliable connection, wherein hardware is responsible for generating and checking packet sequence numbers to ensure a reliable connection; the hardware also detects duplicate and lost packets and ensures error recovery.

  2. Unreliable connection.

  3. Reliable datagram.

  4. Unreliable datagram.

  5. Multicast connection (implementation is optional).

  6. Packet transfer (implementation is optional).

An InfiniBand message can consist of one or more packets. Packets can be up to 4,096 bytes long but may be smaller. Packets allow interleaving on a VL. Routing executes at a packet level. Packets that route between subnets have a global routing header that facilitates this routing.

InfiniBand architecture also defines a network layer that provides for routing between different subnets. Subnets allow traffic to be localized to just the subnet; for example, broadcasts and multicast packets stay inside the subnet. Subnets provide functionality similar to that of VLANs (virtual LANs) and can enforce security. Subnets use a 16-bit identifier for each device. This identifier is unique per subnet. Each packet routed contains an IPv6 address for source and destination nodes.

8.2.3 Microsoft and InfiniBand

Microsoft initially indicated that at an unspecified time in the future, the Windows Server family would natively support InfiniBand. Subsequently, in the third quarter of 2002, Microsoft and some other industry leaders indicated that they were rethinking their plans for InfiniBand support. Microsoft has indicated that it is now refocusing resources away from InfiniBand and on other areas, including IP storage via Gigabit Ethernet. Accordingly, this chapter may seem a little incomplete to some readers, given that there is no Microsoft implementation of InfiniBand to describe.


   
Top


Inside Windows Storage
Inside Windows Storage: Server Storage Technologies for Windows 2000, Windows Server 2003 and Beyond
ISBN: 032112698X
EAN: 2147483647
Year: 2003
Pages: 111
Authors: Dilip C. Naik

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net