A Quick History of Data Storage

 < Day Day Up > 

Data storage used to come in only one type: Direct Attach Storage (DAS). It wasn't called that at the time. Storage was simply storage. The primary storage media were magnetic tape for backup and archive, floppy drives, and hard disks for primary storage. In the IBM mainframe world, hard disk systems were referred to as DASD (Direct Access Storage Devices). By today's standards, DASD was large, slow, and cumbersome.

In the early 1980s, the first small hard disk drives were developed and became ubiquitous. The hard drive as we know it grew alongside the PC and open enterprise systems that dominate computing today. This led to a close coupling of data storage to the individual computer. Drives were addressed only from the computer they were directly attached to.

The advent of SCSI (Small Computer Systems Interface) and IDE/ATA storage protocols meant that drives were no longer proprietary. They did not have to be tied to a specific computer architecture or operating system. Drives were still held captive behind the server. The introduction of the file server further encouraged this model. Hard disk drives were accessible by anyone in the network, but only by going through the network server's processor and network operating system (NOS). The file server was unique because the NOS allowed many users to connect to the computer simultaneously while maintaining the integrity of the data on the disk.

The landscape changed dramatically in the late 1990s. Products built around the concept of network storage were developed. This changed the way storage architectures were designed. Although it was often argued that Network Attached Storage was just a better form of file server, Storage Area Networks (SANs) were entirely different. With the SAN architecture, disks or RAID groups could be accessed directly, at the block level, from remote machines without the intervention of a network operating system. Raw access to a disk or tape could now be performed over a network. This allowed for distributed storage architectures that could support fast applications, such as relational databases and tape backup.

SANs also led to better utilization of storage resources. By sharing resources, the problem of uneven allocation of disk and tape capacity can be eliminated. Rather than have some resources overburdened while others are underused, networked storage allows for more even usage of storage.

The Roles of Different Storage Devices

The purpose of a data storage system is to provide persistent or nonvolatile storage for data. It does not need to be as fast as random access memory, but when the power is turned off, the data has to continue to exist. Without persistent data storage devices, computing devices would need to have an uninterrupted power source, or else data stored on them would be lost. A variety of devices act as persistent data stores, each with its own specific role. Table 2-1 shows a partial list of these devices.

Table 2-1. Storage Media (Early 2005)

Device

Role

Capacity

Hard drive

Persistent online storage

Near-line backup

Temporary swap space

Up to 400 gigabytes

Floppy drives

Small file transport

Software and data distribution

1.44 megabytes

Magnetic tape (various styles)

Backup

Archive

Up to 1.3 terabytes (with compression)

CD-ROM/RW

Archive

Software and data distribution

Up to 740 megabytes

DVD-ROM/RW

Archive

Software and data distribution

Up to 9.1 gigabytes

Magneto-optical disk

Archive

Up to 50 gigabytes

Memory stick

Flash storage

USB flash drives

Large file transport

Individual backup

Primary storage for small peripherals

Up to 4 gigabytes


Hard disks are fast devices with high storage capacity. Magnetic tape has the advantage of also offering high-capacity data storage, though it is slower than hard drives. Tape is still the media of choice for backups, because tapes can be removed and moved off site. CD-ROM/RW and DVD-ROM/RW have moderate capacity, but high-quality media can last a very long time and do not degrade quickly with usage, unlike tape. They also are extremely inexpensive, which is why they are used extensively for software distribution as well as long-term storage of archival data. Floppy drives are used almost exclusively for transferring files between computers. The low capacity and relatively slow speed of the floppy drive has placed it on the road to extinction. Many computers today do not even ship with a floppy drive.

Magneto-optical disks, which combine the properties of magnetic and laser-based disks, have high capacity but are expensive and slow. They are sometimes used for long-term archive of large files.

Solid state, nonvolatile storage is a departure from the magnetic and laser media that dominate other data storage products. Utilizing flash RAM, solid state products have become a cheap method of transporting moderate-size files. They are important storage devices for mobile consumer electronic devices such as digital cameras and MP3 music players. USB flash drives are quickly supplanting floppy drives as a way of transporting files between computers.

Arrays, Libraries, and Jukeboxes

Hard drives dominate as primary storage, as tape does for backup. These technologies are supplemented by CD-ROM/RW and DVD-ROM/RW for archive and software distribution. Often, storage devices are aggregated into arrays, libraries, or jukeboxes of drives and media. They are the basic building blocks of the massive storage systems prevalent in large enterprises.

Arrays are large collections of hard drives tied together into a logical whole. A device called a controller provides the interface to the computer or network and manages the drive set. Advanced controller technology can provision the drives into a variety of configurations. In many cases, a large number of drives can appear to be one single large drive or many smaller drives.

There are advantages to aggregating drives into arrays. An array allows for larger disk space than is possible with a single drive. When multiple small disks are combined into a single, large storage device, very large files and databases can be stored on a single logical drive. Hard drives also have performance limitations. By streaming data to multiple drives simultaneously, storage systems can read and write data much more quickly than if only one massive drive were present. From a data protection point of view, having data parceled out to many drives is a major advantage. Disk failure won't destroy all the data stored in the array unit. In fact, all data may be recoverable even in the event of single drive failure if copies are made to other disks.

Controllers

Each disk, tape, and CD-ROM/RW drive has an interface that allows it to connect to a specific type of bus or network. This interface is called a drive controller, because it controls the movement of the mechanical parts of the drive.

Arrays, libraries, and jukeboxes are collections of single drives. They need an interface for the entire collection to the bus or network. This is usually accomplished through use of a device also called a controller. A controller provides the interface to the individual drive controllers and a common interface to the bus or network. Often, controllers will have processors and memory that allows them to host embedded services specific to storage device.


Tape libraries and CD-ROM/RW jukeboxes are better suited to managing multiple access to the individual tape and optical media. Because tapes, CDs, and DVDs are removable, manually placing and changing tapes and CD-ROMs is unproductive, as well as an opportunity for human and computer error. The sophisticated robotics in these systems can automate the movement of media in and out of drives. In the case of tape libraries, allowing multiple backup processes to stream data to multiple tapes makes the best use of a high-speed connection.

As with disk arrays, libraries and jukeboxes often have the advantage of having more than one drive. If one tape or CD-ROM/RW drive fails, the others in the library or jukebox can still be used. At least some servers will be backed up, and the data on the tapes or CD-ROM/RWs will still be accessible.

     < Day Day Up > 


    Data Protection and Information Lifecycle Management
    Data Protection and Information Lifecycle Management
    ISBN: 0131927574
    EAN: 2147483647
    Year: 2005
    Pages: 122

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net