Direct Attach Storage

< Day Day Up >

The term Direct Attach Storage, or DAS, did not exist before there was networked storage. Then, it was simply storage, and all of it was attached directly to a mainframe or server. Only after Storage Area Networks become popular did the DAS term become common.

DAS refers to any storage that is attached locally to a computer. In the case of large arrays of devices, the computer may be connected to a controller. The controller then provides a single interface for many devices. The hard drive in a PC is an example of DAS. A multiterabyte disk array that is connected to a server via a SCSI cable is also DAS. Size does not determine whether storage is DAS only its architecture.

Most open-system DAS devices use the SCSI or ATA standards to communicate to the storage devices or to a controller. ATA is more commonly used for slower, less reliable storage such as desktop storage, whereas SCSI is used for high-performance, high-reliability systems. That said, SCSI has shown up in desktop computers and ATA in enterprise-class arrays. Mainframes use their own protocols and specifications, most of which are proprietary.

SCSI

The term SCSI (pronounced "scuzzy") stands for Small Computer Systems Interface. It defines a specification for both hardware and software protocols, used to transfer data between peripheral devices and the peripheral bus in a computer. Although SCSI is not used exclusively for data storage, the most common usage of the technology is for mass storage devices. More expensive and complex than many other methods of storing data, SCSI tends to be deployed in situations where high performance is necessary.

The SCSI standards (Table 2-3) define both a set of hardware specifications and a software protocol. The hardware specifications include how many wires are used to move data and control information, addressing, basic topology, voltage, clock speed, and error correction methods. The software protocol defines how requests for data are made, how devices respond, and how information about devices can be retrieved.

Table 2-3. Common SCSI Types
SCSI Type	Bits Transmitted	Maximum Data Transfer Speed (Mbytes/sec)	Number of Addresses	Maximum Cable Length (Meters)
Ultra SCSI	8	20	8	SE : 1.5 HVD : 25
Ultra Wide SCSI	16	40	16	HVD : 25
Ultra2 SCSI	8	40	8	LVD : 12 HVD : 25
Wide Ultra2 SCSI	16	80	16	LVD : 12 HVD : 25
Ultra3 SCSI/Ultra160 SCSI	16	160	16	LVD : 12
Ultra320 SCSI	16	320	16	LVD : 12
FCP (SCSI over Fibre Channel)	1	100	> 15 Million	NA
SAS	1	300	128	NA
iSCSI	1	Depends on network speed	IP address limits	NA

Parallel SCSI is the predominant form of SCSI today. It is called this because it transmits all of its data and control bits at the same time. The SCSI software protocol has also been adapted for use over Fibre Channel interfaces and designated as FCP. Other forms of the SCSI protocol are also available, although they are new and not yet widely deployed. iSCSI is a networked version of SCSI that transmits data over an IP network. Serial Attached SCSI (SAS) is used for Direct Attach Storage but sends information one bit at a time for faster, more reliable data transfers. The type of SCSI implementation is usually denoted by the width of the data path (normal or Wide) and error correction method (Single Ended or SE, Low Voltage Differential or LVD, or High Voltage Differential or HVD). There are also several types of serial SCSI.

Targets and Initiators

The SCSI command protocol is based on a client-server architecture. With SCSI, the device that will request data is the initiator, and the device that will return data is the target. Most often, the initiator is a host bus adapter (HBA). An HBA is a peripheral board or embedded processor, used to connect a host's peripheral bus to the SCSI bus. The target is the storage device or device controller, such as a RAID controller or tape drive.

In all cases, the initiator is the master, and the target is the slave. The initiator begins all conversations and requests all data. The target provides whatever the initiator requests unless there is an error. It is possible to be both a target and an initiator. This is unlikely in the DAS situation, but there are Fibre Channel and management devices that use this capability.

Targets and Devices

Even knowledgeable people in the storage industry use the terms device and target as though they were synonyms. Although it is often the case that a storage device is the target, it is not always so. This causes confusion, because initiators are devices as well. The terms should not be used interchangeably.

SCSI Addressing

Parallel SCSI allows for either 8 or 16 addresses. Each device has an address with an ID from 0 to 7 or 0 to 15, depending on the SCSI implementation. The number of addresses is related to the number of control lines available, which is the same as the size of the data path. This is not a very large address space.

To expand this limited address space, an additional addressing layer was added to SCSI. Each SCSI address can also be broken down into sub-addresses, called Logical Unit Numbers (LUNs). A LUN represents a logical, rather than a physical, address. LUNs can be assigned to portions of a physical device, and multiple LUNs can be assigned to the same device. In Parallel SCSI, there can be 16 LUNs for each SCSI address, for a total of 256 device addresses. Other implementations of SCSI use LUNs to provide a very large address space.

The different Serial SCSI implementations maintain the same overall SCSI addressing scheme by mapping native addresses, such as Fibre Channel or IP addresses, to SCSI addresses and LUNs. Extensions to the SCSI addressing model allow for hierarchical addressing and a much larger address space. This is usually used to accommodate networked SCSI implementations.

Hierarchical Addressing

Even with the use of LUNs, the SCSI address space is very small. To compensate, the range of possible addresses was increased by use of hierarchical addressing. Hierarchical addressing adds sub-addresses to existing addresses and LUNs. Getting the Parallel SCSI hardware to cooperate with extended addressing requires LUN extenders. Many users of LUN extenders those needing a large address space have moved on to networked storage such as Fibre Channel or Serial Attached SCSI.

Parallel SCSI

Until there were serial implementations of SCSI, Parallel SCSI was simply SCSI. The "parallel" part was added to differentiate it from the new serial forms of SCSI, especially Fibre Channel SCSI (FCP). The name refers to the fact that data is transferred in parallel, on all wires in the cable at once. For Ultra and Ultra2 types of SCSI, 8 bits are sent at once, with 16 bits sent for Wide versions. Starting with Ultra3 SCSI, all implementations are Wide.

Note

Parallel SCSI is a hardware standard. It is separate from the SCSI software protocol, which operates over several different hardware architectures.

Parallel SCSI varies by three main characteristics. They are

Data transfer rate
Data path width
Hardware error correction method

The data transfer rate indicates how many blocks of data can be transferred in a given time period and is given in megabytes per second. This is a theoretical maximum rate based on the standardized signal speeds, not true access time. What is most confusing about SCSI nomenclature is that most implementations do not actually say what the data transfer rate is numerically. Instead, Parallel SCSI uses the term Ultra SCSI, Ultra2, and Ultra3 SCSI. Only with the advent of Ultra320 was a real data transfer rate mentioned. Many elements can affect the real throughput of a SCSI data transfer. These include software overhead in the SCSI host bus adapter and storage controller, the speed of the media itself (especially tape drives), and the condition and quality of the connector cables used.

The second major characteristic is the data path width. As previously discussed, Parallel SCSI has two different options: an 8-bit path and Wide I (16 bits).

The last characteristic is the error correction method. Originally, SCSI had no hardware error correction. All bits were sent down a single set of wires. This type of SCSI is known as Single Ended, or SE for short. SE hardware does not detect and resend individual bits when they are corrupt. The upper-level protocols have to detect corrupt data. The target then has to resend all the data in the requested block. One corrupted bit could cause an entire block, which might be megabytes in length, to be resent. This reduces the actual throughput of the device.

One of the major causes of lost or corrupted data is noise. The effects of noise become more pronounced as cable lengths increase and signal rates go up. After a certain length and speed, bit errors due to noise will occur frequently. Because of this, SE SCSI requires very short cables. The cables, however, are cheaper, as is the rest of the hardware. Single Ended SCSI can be used in places where the cable lengths are very short and noise well managed, such as inside a server.

It was quickly realized that the short cable lengths were a real hindrance to external system implementations of SCSI. Although SE was fine for attaching a few hard drives inside a file server, it was often impractical for longer external connections to storage devices such as disk arrays. Moreover, as disk arrays and tape libraries became larger, internal cable lengths became a serious issue. The cable length limitations affect the entire data path, including the cabling inside the arrays. The answer was Differential SCSI. With Differential SCSI, two wires are used. The voltage on one wire carries the data, and the voltage on the other is the exact opposite. One is represented as SIGNAL and the other as +SIGNAL. If you add the two voltages together, you should get 0. If not, something is wrong, and just the most recent byte or two of data needs to be resent. This allows SCSI to operate in much noisier environments without significant loss of throughput. The upshot of it all is that much larger cable can be used. The downside of differential SCSI is that hardware and cables were more expensive.

To make matters more confusing, there are also two versions of Differential SCSI: High Voltage Differential (HVD) and Low Voltage Differential (LVD). The major difference between the two is obvious: the voltage. LVD uses lower voltage differences to make its comparisons and detect lost bits. With the introduction of the Ultra2 standard, SE was supplanted by LVD. LVD is less expensive than HVD, and the cable lengths are somewhat shorter (though much longer than SE) and able to operate at data transfer rates that SE could never achieve. LVD worked well enough that it is the only method used for Ultra3 and Ultra 320.

There has been considerable debate as to which are better: Serial implementations of SCSI (especially Fibre Channel) or the Parallel ones. It is a silly argument, because each is better in some ways and poor in others. Parallel SCSI has the advantage of being very cheap and very fast. It is used extensively for connections inside disk arrays and tape libraries, because it is tried-and-true technology. The best place to use Parallel SCSI is in instances where distances are short, and speed and reliability are important

Serial Attached SCSI (SAS)

Serial Attached SCSI, or SAS, is relatively new, but the premise is simple. Use the same SCSI protocol, but rethink the hardware layer. Instead of sending 8 or 16 bits at a time, send only one, like Ethernet. Connectors and cables are kept small, and reliability is kept high. This makes for a very inexpensive, yet very fast way of connecting storage devices within a computer or array.

SAS has been envisioned as an in-the-box technology. A typical way to make use of SAS is inside a disk array that has external Fibre Channel or Ethernet connections. SAS is also likely to become popular as a replacement for internal SCSI drives.

SAS and SATA

SAS hardware, interestingly enough, also supports Serial ATA, an entirely different serial storage protocol. Although this is very good from the system architect's point of view, it is unusual. Even in situations where low-level hardware components are the same, as is the case with Fibre Channel and Gigabit Ethernet, it is rare that two different storage protocols will be able to share the same wire.

This is fine in theory, but it remains to be seen whether all manufacturers will support this capability. Even more important, will system administrators and architects want it, and will this mix-and-match capability actually work?

The SCSI Protocol

No matter which hardware architecture is used be it SAS, a flavor of Parallel SCSI, or Fibre Channel SCSI all architectures use some form of the SCSI protocol. At a software level, they all are very similar, with differences due mostly to addressing. From the perspective of the applications and operating systems that interface with SCSI devices, they are all the same. The same software architecture and commands are used by all types of SCSI devices. This helps to account for the longevity of SCSI as a protocol. It has been adapted to a variety of platforms and architectures without causing major changes in applications and operating system interfaces.

SCSI uses a client server architecture, though it doesn't use the terms client or server. The initiator sends commands to the target, which is then expected to respond. All commands follow a standard format called the Command Descriptor Block (CDB), which contains the command plus the target address. The target device responds with what was requested or an error block.

There are a number of phases to the protocol. The first few Bus Free, Arbitration, Selection, and Reselection are used to allow initiators to gain control of the SCSI bus so they can send a command. These phases do not necessarily apply to all types of SCSI. Next is the Command phase, where the target requests a command from the initiator. The Data phase occurs when the initiator sends its command and gets a response from the target. The final phases are used by the target to request messages and information from the initiator.

SCSI has a lot of commands. The most common are used to read and write data to block devices, such as tapes and disks. Other commands exist that allow an initiator to request addressing information, error messages, and device configurations. Specialized commands exist for media changers on tape and CD-ROM libraries, and so do a host of other read and write commands.

LUN Masking

LUN masking is a technique that hides LUNs from certain initiators. When a target implements LUN masking, it will respond only to the SCSI Inquiry command from select initiators. Other initiators will believe that no device exists at that LUN address. In this way, only select initiators will know of the existence of a device at certain LUN addresses. There are advantages to this approach. In large disk arrays, which are shared among several different hosts, the array can be partitioned among hosts. It is often used in environments where the hosts have different operating systems and could corrupt other hosts' disks. It is even used as a crude form of security.

LUN masking only hides the LUN from the SCSI Inquiry command. If an initiator continues to send commands to that LUN anyway, the target will respond normally.

SCSI Standards

SCSI standards are developed and governed by the T10 technical committee of the InterNational Committee on Information Technology Standards (INCITS). The rules under which this committee operates are approved by the American National Standards Institute (ANSI), which also publishes the standards. SCSI standards are often referred to as ANSI standards. The process is very rigorous.

The major complaint leveled at SCSI and similar storage standards is that they are "loose." Whereas many network standards, like those that govern Ethernet, begin with statements such as "It MUST," many SCSI standards say, "It MAY." This has led to compatibility problems with "standard" equipment from different vendors. The storage industry is famous for having to run plug-fests to find out which equipment that complies with ANSI standards has interpreted the standards differently.

The T10 committee's web site (www.t10.org) has documents that discuss all aspects of the standards that SCSI is based on, as well as work that is in progress by the committee.

ATA

ATA is the most popular mass storage device interface specification in use today. ATA stands for AT Attachment, as in the IBM PC AT, from roughly 1982. It is the interface of choice for desktop hard drives. It is also used extensively for CD-ROM/RWs in desktop and laptop PCs. Currently, ATA is used mostly inside computer and similar devices.

Parallel ATA, like Parallel SCSI, uses a bus architecture with a limited address space. Each ATA channel can only address two devices: a primary and secondary, also called the master and slave. Most desktop computers have controllers with two parallel ATA channels. It is not uncommon to have an ATA interface with a channel for two hard drives and another channel for the CD-ROM/RW and DVD-ROM/RW drives.

As is the case with Parallel SCSI, Parallel ATA transmits bits in parallel, 16 bits at a time. Data can be transferred to and from only one device at a time. Whichever device has control of the bus keeps it until it is finished with the transfer. This means that two high-usage devices will often be in contention for the bus, and performance will suffer. It is also why it is common to put the CD-ROM and other slow devices on a separate channel from the hard drives.

ATA also uses a protocol layer that is similar to SCSI, making it independent, in some respects, from the hardware specification. This is one of the reasons that it has been adapted for Serial ATA and that other protocols can run over the ATA hardware. ATAPI, an offshoot of ATA, is used mostly for removable media, such as CD-ROM and tape drives. It employs SCSI commands over the ATA hardware to communicate with these devices.

ATA Limitations Are Not Really Limitations

Much has been made about the so-called limitations of ATA compared with SCSI and Fibre Channel. ATA's hard drive performance, addressing, and reliability are inferior to those of SCSI and Fibre Channel. This is not a function of the drives themselves but of the interface. The drives are usually made with similar components, though the design is simpler.

ATA was meant to be inexpensive, and compromises were inevitable. It is an example of a technology that is good enough. It does the job splendidly, but the job is not to have server-class performance and reliability. Instead, it provides just what a desktop needs at a cost that is reasonable. If computer manufacturers had to rely on SCSI drives, only the highest-end users would have a hard drive bigger than 10 gigabytes. It simply would not be cost effective. Instead, it is common to find $1,000 home computers with 120-gigabyte drives.

ATA is being used successfully to design new types of drive arrays that are much less expensive than SCSI or Fibre Channel arrays, but with performance and reliability that is close to what they offer. By using a reliable, high-performance interface and enclosure, array manufacturers are able to take advantage of the lower cost of ATA drives and components. ATA is able to meet the requirements of a vast number of applications.

Different versions of ATA are usually referred to by their data transfer speed. Common implementations are ATA/33 (33 megabytes per second), ATA/66, ATA/100, and ATA/133.

Serial ATA (SATA)

For much the same reasons that Serial Attached SCSI was created, so was Serial ATA. They are, in fact, linked because the hardware layer is the same for both. The difference between SAS and SATA is the protocol that they use. This provides backward compatibility with the maximum number of applications and operating systems while keeping costs low. Like SAS, SATA is viewed best as an in-the-box technology, used to create very inexpensive, yet high-performance, disk arrays.

SATA works well for hot backups, staging data, and snapshots, and as storage for less important data. Performance should approximate that of SAS, and the reliability of SATA disk arrays should be nearly as good as that of a SCSI or Fibre Channel array.