5.2 Interfaces | System Performance Tuning2002

An interface serves as the rules governing the communication between the host computer system and a set of devices. These rules are usually established through the auspices of a host adapter, which resides on one of the system's peripheral interconnects (see Section 3.5).

The modern disk market is largely driven by three disk protocols: IDE, SCSI, and Fibre Channel. We'll focus primarily on these protocols, as well as one older protocol, IPI, and some of the emerging storage interfaces (IEEE 1394/Firewire and USB). It is worth pointing out that there are no "native" IEEE 1394 or USB devices; they are all IDE drives with adapters attached, and as such have many of the same problems as a strictly IDE device does.

5.2.1 IDE

IDE, for Integrated Drive Electronics , or alternatively, ATA, for AT Attachment , have historically been used in low-cost systems, particularly Intel-based personal computers. However, IDE disks are starting to become common in low-end workstations, such as the Sun Ultra 5 and Ultra 10, in order to keep prices as low as possible. The IDE specification was intended to reduce interface costs by placing the controller hardware on the drive itself, as well as making firmware implementations as easy as possible.

IDE development is managed to some degree by a technical committee known as T13, which is part of the National Committee on Information Technology Standards (NCITS, pronounced "insights"). NCITS is chartered by and functions under rules that are approved by the American National Standards Institute (ANSI). These rules try to ensure that voluntary standards are developed by industry group consensus. NCITS develops Information Processing System standards, while ANSI approves the process under which they are developed and publishes them. The T13 Technical Committee web page is located at http://www.t13.org.

Following the success of the original ATA interface, the Small Form Factor Committee (an organization of drive manufacturers) created a backwards -compatible extension of the ATA interface called ATA-2 (alternatively Fast-ATA). This standard added faster PIO and DMA modes. Contrary to popular suspicion, the logical block addressing (LBA) scheme that was revamped in ATA-2 had nothing to do with surpassing the 504 MB barrier ; even the original ATA specification had a capacity limit in excess of 100 GB.

Several years later, a further revision of the specification called ATA-3 had been released, which added more sophisticated power management, increased reliability, and added failure prediction. It did not define any faster modes. One of the major disadvantages of ATA is that it was designed for disks only; with the explosive popularity of CD-ROM drives, the interface was extended by means of the ATA Packet Interface (ATAPI), which facilitates adding such devices. ATAPI was formally integrated into ATA with the ATA-4 standard.

Enhanced IDE (EIDE) is a marketing term referring to ATA-2 and ATAPI implementations, whereas Fast-ATA is a competing marketing term that builds on ATA-2 only.

One of the consequences of the low-cost goals of the IDE device is that the host processor ends up taking a substantial role in processing I/O requests . IDE drives have essentially two ways to transfer data to the host processor: programmed I/O (PIO), and direct memory access , or DMA, which can be performed on a single-word or multi-word level. A PIO transfer mode causes the host processor to handle every transaction from disk into memory itself, whereas in a DMA transfer mode the IDE controller can write directly into the host's memory. The drive communicates its ready status to the controller by asserting the IRQ line of the drive, which can be done in two cases:

A read command has been issued to the drive; that command has been serviced and the requesting data is in the drive's buffer, ready to be taken up and transferred into memory.
A write command has been issued to the drive; that command has been transferred to the drive's buffer (if the drive does not have write caching enabled, the line will not be asserted until the data has been physically committed to disk).

During normal reads and writes , handling this interrupt can continuously bother the CPU and cause long delays in servicing the drive. The Read/Write Multiple commands are drive-level commands that attempt to address this problem by allowing up to 128 sectors to be transferred without incurring any intervening interrupts, substantially lessening the overhead on the host processor.

Single-word DMA performs very poorly relative to multi-word DMA; in fact, "DMA" usually carries with it the implicit assumption of multi-word transfers. Single-word DMA support was dropped in the ATA-3 specification.

As hard disks continued to get faster, the fastest multi-word DMA mode (at 16.7 MB/s) quickly became insufficient. Unfortunately, the interface was really designed for slow data transfer (~5 MB/s). Simply increasing the clock rate of the interface caused all sorts of signaling interference problems; as a result, a new type of DMA transfer mode (called UltraDMA) was introduced in a standard called Ultra-ATA, which was developed to bridge the gap between ATA-3 and the then-forthcoming ATA-4. Now that ATA-4 has been released, the Ultra-ATA standard is integrated into ATA-4.

The key technological advance introduced in UltraDMA was double-transition clocking: data is transferred on both the rising and falling edges of the clock signal. This is exactly the same technology used to improve transfer rates in DDR SDRAM (see Section 4.1). In order to improve data integrity, UltraDMA modules also use a cyclical redundancy checking (CRC) algorithm to compute an error-detection value for each block of data transferred, which is sent along with the block. This allows the recipient of the data to determine whether the data was corrupted in transit, in which case the data is retransmitted.

Despite the throughput increase given by double-transition clocking, transferring data faster than 33 MB/s finally exhausted the capabilities of the standard 40-conductor IDE cable. To use UltraDMA modes over 2, a special 80-conductor cable is required. ^[9] This cable uses the same 40 pins, but adds 40 ground lines between each of the original 40 signal lines in order to separate those lines and prevent interference.

^[9] The cable was specified in the ATA-4 specification for UltraDMA modes 0, 1, and 2, but was optional.

These standards are reviewed in Table 5-2.

Table 5-2. IDE interfaces

Type	Mode	Cycle time (MHz) ^[10]	Data rate (MB/sec)	Supported in
PIO		1.67	3.3	ATA
	1	2.61	5.2	ATA
	2	4.17	8.3	ATA
	3	5.56	11.1	ATA-2
	4	8.33	16.6	ATA-2
	5	11.11	22.2	Vaporware!

DMA (single-word)		1.04	2.1	ATA (Removed in ATA-3)
	1	2.08	4.2	ATA (Removed in ATA-3)
	2	4.17	8.3	ATA (Removed in ATA-3)

DMA (multi-word)		2.08	4.2	ATA
	1	6.67	13.3	ATA-2
	2	8.33	16.6	ATA-2

UltraDMA		4.17	16.6	ATA-4
	1	6.25	25.0	ATA-4
	2 (UDMA/33)	8.33	33.3	ATA-4
	3	11.11	44.4	ATA-5
	4 (UDMA/66)	16.67	66.7	ATA-5
	5 (UDMA/100)	25	100.0	ATA-6?

^[10] ATA cycle times are usually represented in nanoseconds, but for the ease of comparison, they are shown here in MHz.

Table 5-3 provides a summary of the IDE implementations and the features it supports.

Table 5-3. Characteristics of IDE implementation

	EIDE	Fast-ATA	Fast-ATA-2	Ultra-ATA
PIO Transfer Modes	Up to 3	Up to 3	Up to 4	All
DMA (Single-word) Modes	All	All	All	All
DMA (Multi-word) Modes	Up to 1	Up to 1	Up to 2	All
Read/Write Multiple	No	Yes	Yes	Yes
Data Checksumming	No	No	No	Yes

The implementations described here are supported in Linux kernels after 2.2.0, and support for the higher-performance Ultra-ATA can be patched into older kernels . The Linux IDE kernel driver, which was written by Mark Lord, has a particularly low overhead for setting up transactions; it is excellent for handing small data transfers.

5.2.1.1 Improving IDE performance in Linux

There are a few things the Linux kernel doesn't do by default that can increase IDE I/O performance, such as telling the IDE device driver to use 32-bit I/O, and ensuring that the disk is performing DMA transfers. In the 2.2.16 kernel, these are all off by default. This is best accomplished via hdparm (see Section 5.5.2.1 later in this chapter).

To see whether your disk is using 32-bit IDE transfers, use the -c switch:

 #  hdparm -c  /dev/hda   /dev/hda: I/O support = 0 (default 16-bit)

To enable 32-bit transfers, use -c 1 :

 #  hdparm -c 1  /dev/hda   /dev/hda: setting 32-bit I/O support flag to 1 I/O support = 1 (32-bit)

Similarly, you can check the status of the IDE DMA transfer mode via -d :

 #  hdparm -d  /dev/hda   /dev/hda: using_dma = 0 (off)

You can enable IDE DMA by specifying -d 1 :

 #  hdparm -d 1  /dev/hda   /dev/hda: setting using_dma to 1 (on) using_dma = 1 (on)

You may also see some performance increase by tweaking the number of sectors transferred per interrupt (set by the -m switch to hdparm ). As always, benchmark the changes to make sure that you're actually improving performance. When you've found settings you'd like to keep, run hdparm -k 1 /dev/hd X to save your settings across an IDE reset.

5.2.1.2 Limitations of IDE drives

IDE is sufficient for small applications that are not very performance-sensitive, particularly if they do not issue many concurrent disk I/O operations and the overall disk subsystem load is relatively light.

IDE is an attractive, low-cost option; at the time of this writing (mid-2001), commodity IDE drives are approximately half the price and twice the capacity of equivalent-performance SCSI drives. Unfortunately, IDE has several limitations; chief amongst these are its very limited capacity for expansion per channel and the fact that it is entirely single-threaded on a per channel basis (command overlap is not allowed, even with two disks on the channel). In comparison, SCSI offers flexible device attachment, support for almost any peripheral type, and command overlap. Higher-performance devices also tend to be available for SCSI well in advance of their availability for IDE.

5.2.2 IPI

When the SCSI standard was defined, a second mass-storage standard aimed at high-performance, high-cost installations was defined, known as the Intelligent Peripheral Interface (IPI). It is clear now that IPI has no future: it has been usurped by its little sibling SCSI as the dominant high-performance disk interface. We discuss it here for historical reasons, as well as for those who have a substantial installed base of IPI devices. IPI, in addition to Fibre Channel, falls under the guidance of the T11 Technical Committee (http://www.t11.org).

The IPI standard was designed to facilitate optimization as much as possible, resulting in the highest possible performance; to that end, global optimization was built into the protocol. The fully implemented IPI model consists of four levels:

Level 0 defines mechanical and electrical considerations such as cables and connectors.
Level 1 defines primitive transfer details such as bus protocols.
Level 2 defines disk string protocols, which describe device-specific commands, timing, and volume addressing.
Level 3 defines I/O channel protocols, which include logical commands, command-queuing rules, buffering systems, etc.

A fully implemented IPI system consists of a number of IPI-3 host adapters (also called facilities ), which connect to the host system. A host adapter is a complete I/O computer that is responsible for the management of a number of disk strings . Each string consists of one to eight disk drives, which are interfaced to the host adapter by an IPI-2 string controller . A disk can be connected to multiple strings, a condition known as multiporting . Up to sixteen string controllers interact with the host adapter over a high-performance IPI-3 bus. Like the host adapter, the string controller is vested with substantial intelligence; it is responsible for optimizing accesses along its string. Both the IPI-3 bus and IPI-2 string can extend up to fifty meters .

All of the intelligence in IPI is innate to the host adapter and the string controller; even primitive operations like bad-block mapping are done by the string controller. The disk drives are completely without intelligence and usually lack even local caches. As a result, an IPI disk must have a variety of conditions satisfied before it can perform a transfer:

The requesting string must not be otherwise occupied.
The string controller must have a free buffer to store the data.
The read/write head must be correctly positioned.
The data must be about to fly under the head.

Because the bus is used for essentially any transfer and is a resource that is shared between every device on the string, bus contention in IPI is a serious problem. For this reason, no more than three disks should reside on a single IPI-2 string; if the full eight disks are configured, severe performance problems arise.

In order to give the string controller enough information to optimize the request queue, IPI disks provide their rotational position and the radial position of the read/write heads. Each string controller then sorts a series of requests into an optimized order. Sometimes the string controller may be faced with a situation where it needs to retrieve data from two disks that are about to become ready to transfer. In such a case, the IPI-3 channel may be able to transfer the request to another string if the disk is connected to multiple strings, and if the operating system supports such a thing. ^[11] Incidentally, most IPI implementations do not fulfill the entire specification. For example, Sun did not fully implement the IPI-3 facility, instead treating each string controller as paired to exactly one host adapter. IPI performs best under massive load, when long queues form for each I/O device: if the queues are short or nonexistent, the intelligence in the IPI chain is wasted .

^[11] Solaris does not.

5.2.3 SCSI

The Small Computer System Interface (SCSI, pronounced "scuzzy") ^[12] specification has been the mid-range workstation disk standard of the last fifteen years, and is clearly the dominant I/O interconnect in the modern workstation market. It is presently controlled by the T10 Technical Committee; see http://www.t10.org.

^[12] This is something of a common joke amongst system administrators who have been fighting with a SCSI chain that is not operating properly. ("That scuzzy interface is screwed up again.")

The SCSI standard defines a peripheral bus model, which connects all the physical devices and is completely distinct from any other host bus, except for one intersection point. This intersection point is the host adapter , often called the SCSI controller . I refer to it as the host adapter, however, for reasons that will become obvious shortly.

Every device on a SCSI bus has a unique address that consists of two parts : a target identifier and a logical unit number (LUN). One analogy for this is mailboxes. Most mailboxes have a single identifying number (the target identifier) but some mailboxes have subunits , which are distinguished by the LUN (for example, a common set of mailboxes for an apartment building would have a single identifier but possibly many discrete mailboxes included within). For instance, the address "123 Anywhere Street, Apartment 918" would be equivalent to a target identifer of "123 Anywhere Street" and a LUN of "918." A request's target is specified by asserting a signal on a single line; as a consequence, an 8-bit SCSI bus has eight possible target identifiers.

Since it originally targeted small systems, the SCSI protocol was designed with an eye towards providing optimum performance at a low cost. In contrast to designs such as IPI, where much of the intelligence is vested in the controller, SCSI places the intelligence in the peripheral device itself. This greatly simplifies peripheral communication. Generally , this is handled by a microprocessor located in each device, called an embedded controller . Since this scheme means that the peripheral device controls itself rather than having an external controller, it doesn't make much sense to call a host adapter a SCSI controller.

One of the places that peripheral intelligence has greatly simplified operational procedures is defect management . Disk drives are very complex electromechanical devices, and bad blocks are inevitable. With host-intelligent systems such as IPI, the host was required to recognize disk errors and manage the defect list for every physical disk. With SCSI, this monitoring is accomplished at the disk level. In addition, the disk can monitor certain internal detail -- for example, when the defect management code is about to run out of free blocks -- and signal an impending failure to the host. One side effect, however, of peripheral-intelligent designs is that the intimate knowledge about a device resides with that device, rather than with a host controller; as a result, optimizations must be completed within the embedded controller, which can be complicated. From the host adapter's point of view, the resulting degree of abstraction allows a fantastically wide array of devices to interact over SCSI, but complicates optimization matters in large configurations.

If two devices wish to speak on the bus at the same time, arbitration is required. SCSI uses a simple scheme in which the highest-numbered target that requests the bus is given preference. For this reason, host adapters are almost always target 7. ^[13] It has also been common practice to assign slow devices such as tapes and CD-ROM drives high target numbers , so that they are not starved for bandwidth by fast devices. This is not generally a concern in the real world, however.

^[13] This is even true (for backwards-compatibility reasons) in Wide SCSI implementations, which have 16 target identifiers.

5.2.3.1 Multi-initiator SCSI

Although it is the standard, it is not necessary to have the host adapter configured at target 7. This leads to an interesting case called multi-initiator SCSI , where a SCSI bus is shared between several systems. This can be used to share a pool of disks between two systems, which is useful in high-availability implementations. Unfortunately, it has some drawbacks:

There is no provision within the SCSI standard for handling pending operations when a bus reset occurs (e.g., if one initiator decides to issue a bus reset, a system panic occurs, or a transient electrical fault). This means that the state of the disks becomes undefined and data could be corrupted.
Although the standard defines a way for host adapters to "own" devices (via the reserve and release SCSI commands), there is no standardized way to break a reservation.

It is also possible to use multi-initiator SCSI to enable fast system-to-system communication over short distances, which is much safer than the high-availability disk sharing that multi-initiator schemes are usually used for. This functionality is available in Linux as a kernel module.

5.2.3.2 Bus transactions

Operations on a SCSI bus are quite abstract. They are accomplished by means of several operations, all of which consume bandwidth. When a target ^[14] wants to initiate a request, it must arbitrate the bus. Once control is granted, the initiator selects the target, and then specifies the command . The target now has to decide if it can service the request immediately. If it can (for example, the request is for a read that can be satisfied from the disk's local cache), the data is transmitted back with a message-in or message-out . If the request can't be immediately serviced (for example, the request is for a read that must actually be read from the platters), the target disconnect s. At some later point, the target arbitrate s for the bus, reconnect s to the old initiator, and transmits the data (if appropriate) via message-in or message-out . The target then transmits the completion with a status message, and then disconnect s. This results in six to eight state changes on the bus, which is busy except for the time between the disconnect and reconnect. Generally, the nondata parts of the transaction take about 140 microseconds. By comparison, transferring 2 KB of data takes about 100 microseconds. The overhead is fairly high because only the data portion is transmitted at the full burst speed: commands and status are always transferred in the slower asynchronous mode.

^[14] This is generally the host adapter or a device wishing to communicate with the host adapter, but this is not always true; some systems (mostly those intended for real-time use) allow intradevice operations.

5.2.3.3 Synchronous versus asynchronous transfers

The SCSI standard defines two means of transferring data: synchronous and asynchronous. This refers to the way that the host adapter and its targets coordinate data transfer. In asynchronous transfers, data can be transmitted at an arbitrary time; the receiver and the sender need not be clock-synchronized. This kind of transfer necessitates that each byte be acknowledged , which imposes a practical limit on transmission speed. In synchronous mode, transfers may only start at specific clock phases, but this alleviates the acknowledgement problem that asynchronous transfers suffer from. The host adapter tries to negotiate the best way to communicate with devices when the targets are set up. If for some reason a synchronous speed cannot be negotiated with a target, the host adapter falls back to asynchronous transfers. You can find the negotiated synchronous transfer speed in Solaris via prtconf -v and by looking for the SCSI device driver. The next example is from a Fast SCSI-2 drive connected to an Ultra 1 workstation:

 %  prtconf -v  ...             esp, instance #0                Driver properties:                    name <target0-sync-speed> length <4>                        value <  0x00002710  >. ...

In this case, the negotiated synchronous transfer speed to the device was configured as target 0; attached to this controller, it is 0x00002710 KB/second. This number is reported in hexadecimal; representing it in decimal and dividing by a thousand gives us the negotiated speed, which is 10 MB/second.

This is often worth checking. Using cables that aren't of sufficiently high quality or have improper termination issues can often cause devices to fall back to a slower speed. If you aren't getting the performance out of your SCSI chain that you think you should be, take a look here first.

5.2.3.4 Termination

SCSI operates electrically via a straightforward bus mechanism. Traffic from one device is sent out on the bus; the receiving device is responsible for pulling traffic destined for it off the bus. However, data that goes past the last device on the SCSI chain can be " echoed " back down the bus. This reflection phenomenon causes severe problems that must be addressed by termination. In a nutshell , termination is the addition of a set of resistors to a device in order to prevent signal reflection by " absorbing " data that would otherwise echo. ^[15] The rules for termination are, in theory, quite simple: terminate the first and last devices on a SCSI bus. Many modern disks will automatically terminate themselves as necessary. Termination problems are generally characterized by extremely slow data rates or devices that don't appear reliably (if at all). Note that, because of their different electrical designs, differential and single-ended terminators are not compatible (to learn more about the difference, see Section 5.2.3.6 later in this chapter).

^[15] Termination is a fairly complex topic, and a full discussion is beyond the scope of this text.

5.2.3.5 Command queuing

When the original SCSI specification was drafted, performance was a secondary concern: the market at which the standard was being aimed was more interested in low cost than high performance. By the time that the SCSI-2 revision was issued, performance had become a far more important issue. In order to handle the problem of request optimization (that is, in which order to service requests), something needed to be done.

However, since each SCSI device has a high degree of autonomy, the host adapter lacks sufficient information to make real optimization decisions. Performing global optimization isn't possible, as was customary (and implemented in other standards, such as IPI). The SCSI-1 standard made this worse by permitting only one command to be pending to any given target; if only one request can be outstanding, no optimization can be performed!

This problem was addressed in SCSI-2 by adding command queuing , in which the target is able to accept multiple requests. It then processes them in the most advantageous order. Each request is assigned a tag (hence the synonymous term tag queuing ), which serves as an identifier so that each entity can associate a given communication with a request. This process improved performance by allowing the overlap of bus transactions with physical activity, which saves a substantial amount of time, especially in applications that involve a lot of small disk writes. Unfortunately, command queuing is most effective when the disk is very heavily loaded, and generally you will get more out of spreading out the disk usage to reduce utilization than you will from the command queuing optimization. It has a limited impact on throughput for single-threaded tasks common to workstations.

5.2.3.6 Differential signaling

Let's say that you and a friend are both standing on a strip of beach somewhere, and your friend wants to send you a message. He has a lantern, and he holds it near his feet if he wants to send a zero and near his head if he wants to send a one. This works well as long as you can clearly distinguish which is the "high" position and which is the "low" one. If he gets in a boat and sails a mile offshore, you will have a much harder time determining which lantern position is high and which is low. However, if he takes two lanterns with him, and holds them at the same height for a zero, and raises one and lowers the other for a one, you will have a much easier time. The first mode of signaling, with only one lantern, is called single-ended; the second is called differential.

Low-voltage differential (LVD) signaling is a new electrical standard, introduced with the SCSI-3 specification, that accomplishes essentially the same thing as differential signaling but at a lower cost.

Differential signaling arose to alleviate single-ended SCSI's limited transfer range: single-ended, 20 MB/sec implementations are limited to just a meter and a half. Differential implementations can send data as far as 25 meters. Since these distances refer to the entire signal path -- including the cables between units, most of the traces on the embedded controller, and about an extra foot for every connector ^[16] -- a given length of cable will not always go as far as you might think.

^[16] This is due to impedance issues across the connectors.

Differential signaling does not provide any performance improvement in and of itself. If the only difference between two SCSI buses is that one uses single-ended and the other uses differential signaling, they will perform exactly the same.

5.2.3.7 Bus utilization

Traditional performance recommendations suggest that no disk string should ever have a utilization of more than about 40%. This is based on old mainframe disk structures, which generally lacked caching at the disk; therefore, for a transaction to complete, the read/write head must have access to the controller in order to perform any transfer at all. If the bus is busy when the data flies under the head, the system must wait for the next window to perform the I/O -- at least 8.3 milliseconds later on a 7200 rpm disk. This is known as a rotational position sense miss (RPS miss ). Because SCSI targets always have local caches, however, much higher bus utilization can be obtained before performance starts to suffer. Remember that SCSI is a high overhead architecture; each request can take six to eight state changes. For small I/O- size operations, this overhead starts to dominate performance (approaching 60% of total time). The crucial issue then becomes how the high SCSI overhead shapes performance.

For sequential operation, seek and rotation times are not significant in determining performance -- bus utilization problems are the core issue. For 2 KB I/O operations on a 20 MB/second chain, performance plateaus at about 8 MB/second, no matter how many disks are configured. The bus is capable of far greater performance, and using 64 KB I/Os results in a 95% reduction in the number of commands issued and maximum performance of about 17 MB/second. Here is a general guideline for configuring SCSI disk subsystems for sequential access:

Configure 20 MB/second chains so that no more than 4-8 drives are active at any one time. Large (~64 KB) I/O sizes will saturate the bus with 4 drives; small (~2 KB) I/O sizes will take 8 drives. (Remember that you will see much higher throughput with 64 KB I/Os than with 2 KB I/Os: the amount of data the bus can transfer is the sum of both actual data and command overhead, and with 2 KB I/Os the majority of data bits on the bus are commands!)

A disk array consisting of a number of disk drives is equivalent to the same number of disk drives directly connected to the host adapter.

Expect sequential throughput to be substantially less than the rated bus speed, particularly for small I/O sizes.

In random-access operation, a completely different phenomenon is observed . If accesses are truly random, the disks must seek before every I/O operation. Under these circumstances, the throughput of the disks drops dramatically (to something on the order of 250 KB/second). As would be expected, then, you can configure as many disks as you want: throughput scales linearly. Of course, at 250 KB/second per disk and fifteeen disk drives, you are only sustaining about 3.7 MB/second, so it would be surprising if you saw any bus throughput limitations.

If data access is truly random, bus performance is not a limiting factor at all. You can safely configure ~80 devices on a single 20 MB/second bus.
For real-world random access conditions, configure 1 or 2 disk arrays (16-64 devices) per 20 MB/second bus.

5.2.3.8 Mixing different speed SCSI devices

One common misunderstanding about SCSI is how faster devices are affected by slower devices on the same bus. This question usually comes up when someone installs a new, fast SCSI disk on the same chain as an older, slower disk. The SCSI standard is carefully designed so that the management of an individual device is isolated from its environment as much as possible. The host adapter negotiates an appropriate burst transfer speed and mode with each device on the bus individually, and the bus is driven to the limits of each device. However, some complexity does arise; for example, the utilization of the bus varies with the speed of the transfer, particularly if asynchronous transfers are involved. Slower devices tend to hog the bus, since they cannot select fast enough burst transfer speeds to get off the bus expediently. This is almost never an issue in a workstation environment, but when many devices are present on a system, it is often a good idea to segregate low speed devices onto a separate bus to maintain throughput elsewhere.

5.2.3.9 SCSI implementations

In the spring of 1979, Shugart Associates began defining the Shugart Associates System Interface (SASI), which was subsequently released to the public domain in order to encourage disk controller companies to use the protocol. In 1980, SASI was placed before the ANSI Committee X3T9.3, in order to have it accepted in place of IPI (see Section 5.2.2 earlier in this chapter).

In the summer of 1981, NCR began exploring the possibility of using SASI instead of a proprietary interface that they had recently defined. This idea fell through due to fears that technical improvements would induce market confusion. In the early fall of 1981, however, Optimem (a Shugart subsidiary) requested the same technical improvements that NCR had. In a move to accept the inevitable, Shugart reapproached NCR, and a series of meetings were held to more fully document SASI and add several features drawn from NCR's proprietary interface. In December of 1981, NCR and Shugart jointly requested that X3T9.3 grant time at their next meeting in February of 1982 to consider the joint proposal. X3T9.3 responded by breaking the meeting into two parallel sessions, one for IPI and one for SASI. At the conclusion of that meeting, the sessions rejoined and decided that SASI should be assigned to an existing task group, X3T9.2, which had no current project. In April of 1982, X3T9.2 met and drafted a formal project proposal for the Small Computer System Interface (SCSI), which was to be based on SASI. Throughout 1982 and 1984, X3T9.2 met regularly, fully documenting SCSI with contributions from NCR (which proposed command sets for tapes, processors, and printers) and Optimem (which proposed an optical write-once, read-many, or WORM, command set). In April of 1983, NCR announced the NCR-5385, the first SCSI protocol chip; it was followed in April 1984 by the NCR-5380, which added significant hardware support for inexpensive SCSI implementations. ^[17] Also in April of 1984, X3T9.2 voted to forward SCSI to its parent committee in order to begin the lengthy ANSI standards approval process. Approval was gained in June of 1986, and SCSI was formally ANSI X3.131-1986.

^[17] The NCR-5380 was the chip initially used by Apple Computer for the SCSI bus on the Macintosh Plus.

In July of 1985, a special working group began a series of meetings to define enhancements to the SCSI command set for disks, later known as the Common Command Set (CCS). In June of 1986, X3T9.2 began the SCSI-2 project, which endeavoured to add support for the CCS and other enhancements into the SCSI protocol. 1986 to 1989 was a period of intense work, and the SCSI standard grew from 200 pages to nearly 600 pages. SCSI-2 was forwarded for ANSI approval in February of 1989, and approved in September of 1990.

Further development work in SCSI, specifically SCSI-3, involved splitting the effort into multiple task groups, each focusing on specific parts of the protocol.

Although SCSI-2 is completely backwards-compatible with SCSI-1, SCSI-3 is not completely compatible with SCSI-2 due to some aggressive hardware-oriented extensions. However, software is not affected.

The SCSI-3 standard has been adopted piecemeal; many vendors have chosen to implement some of the more compatible parts already. For an overview of the SCSI implementations, please consult Table 5-4.

Table 5-4. SCSI implementation overview

	SCSI-1	SCSI-2	SCSI-3
Media	Copper	Copper	Copper fibre
Topologies	Tree	Tree Multi-initiator tree	Tree Multi-initiator tree Arbitratrated loop Fabric
Signalling	Single-ended	Single-ended Differential	Single-ended Differential LV differential
Command queuing	None	Optional	Optional
Transfer modes	Synchronous Asynchronous	Synchronous Asynchronous	Synchronous Asynchronous
Transfer clocking	5 MHz	5 MHz 10 MHz	5 MHz 10 MHz 20 MHz
Transfer width	8-bit	8-bit 16-bit	8-bit 16-bit

I find myself answering a lot of questions about exactly how fast a given standard is. A lot of people I know have ended up building tables that look remarkably like Table 5-5 and Table 5-6. I hope you find these useful.

Table 5-5. Deciphering SCSI variants

	Standard	Clock (MHz)	Width (bits)	Max. rate (MB/sec)	Max. devices	Max. length ^[18]
SCSI-1	SCSI-1	5	8	5	8	6.0
Fast	SCSI-2	10	8	10	8	3.0
Wide	SCSI-2	5	16	10	16	3.0
Fast/Wide	SCSI-2	10	16	10	16	3.0
Ultra (fast-20)	SCSI-3	20	8	20	8	SE: 1.5 Diff: 12.5
Ultra Wide (fast-40)	SCSI-3	20	16	40	16	SE: 1.5 Diff: 12.5
Ultra2	SCSI-3	40	8	40	16	SE: 1.5 Diff: 12.5
Ultra2 Wide (fast-80)	SCSI-3	40	16	80	16	SE: 1.5 Diff: 12.5
Ultra160	SCSI-3	80	16	160	16	Diff: 12.5

^[18] In meters, for synchronous operation. Unless specified, for single-ended (SE) configurations only.

Table 5-6. Deciphering SCSI connectors

Connector	Pins	Form factor	Used for
SCSI-1	50	D-shell connector 3 rows of pins	SCSI-1
DB-25	25	D-shell connector 2 rows of pins	SCSI-1, narrow SCSI-2 Primarily used on Macintosh
Centronix 50	50	Centronix connector	SCSI-1, narrow SCSI-2
Micro 50	50	Micro-type connector 2 rows of pins	Narrow SCSI-2, narrow SCSI-3
Micro 68	68	Micro-type connector 2 rows of pins	Wide SCSI-2, wide SCSI-3
SCA	80	Micro-type connector 2 rows of pins	SCSI-2, SCSI-3 Internal attachment only "Universal" SCSI connector: integrated power and ID select
VHDC	68	Very High Density connector	Differential SCSI-3
HD-60	60	Micro-type connector with a bar instead of pins	Narrow SCSI-2 Old IBM SCSI adapters only
HD-68	68	Micro-type connector with a bar instead of pins	Wide SCSI-2 Old IBM SCSI adapters only

5.2.4 Fibre Channel

The Fibre Channel (FC) specification, defined by ANSI X3T9.3, uses SCSI-like commands, typically over fiberoptic cable rather than copper wires. Fibre Channel standards are the purview of the T11 Technical Committee (http://www.t11.org).

Fibre Channel is a full-duplex medium, meaning that it can support bidirectional data flow; it requires a fiber strand for each direction. Although it is defined by its own standard, it has been brought under the SCSI-3 umbrella. Fibre Channel has three very different topologies:

Point-to-point: One device connects to exactly one other device.
Arbitrated loop: Often called FC-AL, the devices and one or more hosts are connected in a ring using many point-to-point links.
Fabric: Switches and hubs are used to create a complex network in which there may be multiple paths from a given host to a given device. This is becoming known as storage-area networking , and is probably the direction enterprise-class storage will move in the future. Unfortunately, a discussion of the performance of storage area networks (SANS) is far too complicated for this text.

Fibre Channel devices are assigned a unique address, called the world wide name (WWN). Since devices can be connected into arbitrary fabrics , the common practice is to assign each device a completely unique WWN, much like Ethernet addresses are assigned. The Fibre Channel standard defines three classes of signaling, which are not interchangeable. They are usually described in terms of their data speed: 25 MB/second (FC-25), 50 MB/second (FC-50), or 100 MB/second (FC-100). The storage industry as a whole initially deferred acceptance of Fibre Channel until the arrival of FC-100 parts, but it is now in widespread use. In general, Fibre Channel disks should work just like SCSI disks. Be sure to follow any configuration rules provided by the vendor carefully.

Fibre Channel host adapters typically have two optical link modules, which enable the host adapter to connect to two independent Fibre Channel buses; the connection is managed by the serial-optical coupler , or SOC (typically called a GBIC). A typical SOC can handle up to 255 pending requests, which can occasionally become a limitation for single-connection interfaces and is usually a bottleneck for biported interfaces. In order to work around this restriction, it is vital that FC-interfaced devices be driven at large I/O sizes (8-64 KB). In light of this fact, the second port is probably most useful for attaching archival or other low-use devices.

One of Fibre Channel's most useful capabilities is the removal of distance restrictions imposed by the limitations of copper wires; the FC standard allows cable runs of up to ten kilometers. This allows the geographic dispersal of storage, which has remarkable implications for disaster recovery.

5.2.5 IEEE 1394 (FireWire)

Just like Fibre Channel, the IEEE 1394 specification has also been brought into the SCSI-3 fold. IEEE 1394, also known as FireWire , was originally developed by Apple Computer. It is rapidly entering widespread use in the midrange market for desktop systems.

By far the most important thing to consider in evaluating IEEE 1394 peripherals is that the bus speed is quoted in megabits per second -- not megabytes per second -- just like network interconnects. Currently, it is available in 100, 200, and 400 Mbits per second variants. IEEE 1394 supports the interconnection of up to 63 devices to a single chain, and is hot-pluggable.

IEEE 1394 supports isochronous data transfers, which differ from the more typical asynchronous data transfer type by guaranteeing the timely delivery of data. When an isochronous data transfer device such as a digital video camera is connected to the IEEE 1394 bus, that device grabs the specific, allocated portion of the bus that it requires. The bus also reserves 20% of the total available bandwidth for serial command overhead. Once the entire capacity of the bus has been allocated, the bus will not recognize additional devices -- regardless of whether those devices are actively transferring data. In addition, a condition known as multiple device overhead starts to exist when more than a handful of devices are introduced onto the bus; it then increases in impact as the number of attached devices increases. The self-configuration and hot-pluggability aspects of the technology increase latency in communicating with each device as the number of devices increases .

In general, IEEE 1394 is an excellent means of attaching devices to workstations, particularly if they are time sensitive (e.g., digital video production), but it is not ideally suited to even a medium- sized storage environment, due to the increasing drive overhead that accompanies multiple devices on the bus.

5.2.6 Universal Serial Bus (USB)

Universal Serial Bus, or USB, was developed to be a serial and parallel port replacement. When viewed in this context, it represents a large performance increase on the order of 250 Kbps to 12 Mbits/second, as well as ease-of-use improvements because it is hot-pluggable, has a robust power management, and is able to support up to 127 devices through a hub-based architecture.

Because of its relatively high peak data transfer rate (1.5 MB/second), USB became a reasonable connection method for relatively low-bandwidth devices such as printers, floppy drives, Zip drives, and small scanners . The inherently simple design that allows USB devices to be so affordable also forces them to be very host-centric; the microprocessor of the host system must initiate every transaction. As a result, USB does not scale well, and it is probably not a good idea to use it in anything beyond casual desktop peripheral connectivity.