Tape is the most ubiquitous form of storage today, with more data being stored on tape than any other media. Although tape storage started as the earliest form of mass storage, it survives today due to its cost resiliency and continued capability to store large amounts of systems and application data for safekeeping.
The most important aspect of tape systems is their sequential nature. Even though a good deal of innovation has been applied to tape technology, write mechanisms, and optimization, the information remains sequentially bound. This is important because of throughput of write processes and speed of access. The ability to find a specific location with tape media requires the transport system to sequentially move throughout the tape to a specific sector.
Once the location is found, it can read the information and reposition itself for additional operations. Within the writing operation, the tape must be positioned in the proper location and written sequentially. Although this has been enhanced through software utilities that provide tape catalogue information that speeds location search for data that has been written to tape, the random nature of access that is inherent with magnetic disks does not exist within tape.
Another important operational note that should not be overlooked is the effect of tape controller functionality. The tape controller, as shown in Figure 6-14, is necessary for many control functions. Among the most important is the capability to address multiple drives within the tape system. Just as with disk controllers, tape controllers provide a level of transparency to the operating system and applications. In addition, and most importantly, the majority of controllers will have some form of error recovery and correction when dealing with data as it comes from the system bus, generally from the RAM buffers. If the tape system has error recovery code (ERC) in its controller, the system RAM will dump its contents to the tape controller buffers and refresh its buffer locations for another load. If not, there is a long wait as the RAM buffers slowly transfer their locations to the read/write head and wait for confirmation.
The effects of tape devices and operations within a storage network environment cannot be overlooked. We will discuss tape systems associated with SAN and NAS in Parts III, IV, and V. However, its important to point out that fast system resources, such as RAM, become further decoupled from slower devices like tape media through the separation of network devices. As a result, the timing and throughput differences can become magnified and problematic .
Like disks, a distinguishing characteristic of tape drives is the format used to write and read data to and from the media. Popular formats in todays data center environments range from Digital Linear Tape (DLT), the most popular for commercial enterprises , to new formats such as Linear Tape Open (LTO). The general format specifies the type of media and the technique the tape heads use to write the data, most of which are supported by an open standard so that multiple vendors can provide an approximation of an interoperable solution.
The tape media is divided into parallel tracks. The number of tracks, the configuration of the read/write heads, and the mechanisms surrounding tape transport all make up the uniqueness of each format. Regardless of format, the tape media is formatted into blocks. As discussed, the format of the tape depends on the geometry of the media, the layout of the drive heads, and the transport control mechanisms, which all go together to define the size of the block in bytes that can be written. Blocks make up segments that can be referenced within the location of the media for faster access. Each segment has certain blocks that contain error correction codes and space for directories used by software utilities reserved within track 0 or a special directory track.
The read/write head mechanisms determine not only the format of the tape (tracks, parity, density, and so on), but more importantly the throughput and density of the write process and the speed of the read process. In most cases, todays tape drives contain read/write heads that perform dual roles. Surrounded by a read head, a write can take place as the read function verifies the operation, and vice versa as the tape transport changes direction. As mentioned earlier in our discussion of controllers, todays tape systems support increased levels of data buffering and, in some cases, caching, file systems, and fault-tolerant configurations similar to RAID technologies with disk systems.
The utilization of tape systems predates all other mass storage strategies used for commercial and scientific computing. The importance of tape in data centers, both present and future, cannot be overlooked. In understanding this legacy form of storage, there are two important points to consider. First is the natural extension to tape systems (which is unlike online disk media): the Tape Library. As data centers create sets of information for recovery, archival, and legal purposes, the tapes must be stored in environments that are secure, environmentally protected, and accessible through some form of catalogue structure. Second, is the level of sophistication vendors have developed to support the offline, nearline, and online strategies that encompass tape usage.
Lets first examine the tape library. Tape libraries provide both logical and physical archival processes to support the data center. The most important aspect of the library is rapid access to archived information. This requires some type of index, catalogue, and structure whereby the tape hardware working with the catalogue and archival software can locate a tape, mount in a drive, verify (yes, its very important to verify its the right tape), and read.
Provisioning and preparation follows this to support all the direct copying that is done. Generally performed during an off-shift, production information is written to tapes for a multitude of purposes. Although in many cases this will be the backup/recovery process, the sophistication of application jobs will utilize several levels of backup to ensure user data integrity is maintained . Also important is the type and form of copying that is performed in moving data to and from an online media. This means that some processes copy all data from the disk, regardless of the application, while others copy data to tape using specific file information, and yet others provide sophisticated update processes to archived data only changed if the online data has changed. Some of the more sophisticated processes take periodic snapshots of production data and copy these to other online or offline media.
Auto-loader libraries are highly sophisticated mechanical hardware that automate much of the provisioning and preparation in administrating tape media. Auto-loader development was driven largely by the need to provide a cost-effective way to eliminate the mundane and manual job of mounting tapes. Another major value is the ability to provide a larger number of pre-stage tapes for tasks like nightly backup and archival processing. Auto-loader libraries are integrated with tape drives and controller mechanisms and provide various additional levels of functionality in terms of buffering, error recovery, compression, and catalogue structure. Enterprise-level media libraries use mega-libraries, called silos , to provide an almost fully automated set of tape activities.
As you would suspect, many enterprise-level tape library systems are integrated into storage area networks. These SANs enable tape library systems to translate the SCSI-based tape drive operations with the Fibre Channelbased SAN network. NAS, on the other hand, has only recently integrated tape systems into its solutions. These can become more problematic given the direct attached internal storage characteristics of the NAS device.
How these processes and resources are managed requires a good deal of planning and execution within a generally tight time schedule. In order for this to happen, the correct amount of blank media should be available, formatted, and mounted in a timely fashion, the scheduling and processing of jobs needs to proceed according to a predefined schedule, and any exceptions must be handled in a timely manner.
Optical storage serves as a media where laser light technologies write information in a fashion similar to disk architectures. Although logically they share similar characteristics, such as random access of the stored data and have head disk assemblies (HDAs) to read the media, these similarities disappear with the media used and operation of the read/write assembly. Optical storage compact discs (CD-ROMs) and DVDs (digital video disks) are driven by laser technology. As we all know, CDs and DVDs are removable storage with much higher densities than traditional diskette technology. As such, they have become a very successful product distribution media, replacing tapes.
Optical has achieved notoriety as a new distribution format in the fields of software, entertainment, and education. However, only recently have they been able to function as a reusable media with digital read/write capabilities. Unfortunately, their speed and throughput have not met requirements for use within the data center for commercial backup/recovery, or other archival media.
CDs and DVD optical media have never been favored as a replacement for magnetic disks or tapes in larger enterprise data centers. However, they has been slowly gaining a niche in the support of special applications that deal with unstructured data, the distribution of data, and delivery of information. Examples are datacentric objects like videos , integrated audio/video application data, and entertainment and educational integrated video, image, audio, and text information. The basic configurations for optical media are very much like tape drives and libraries. In commercial applications, they are supported through optical libraries which include multiple drives and auto-loader slots for selecting media.
The major difference between magnetic disks and tape is the use of lasers to write information on the optical media. This requires modifying a type of plastic media through the use of the laser beam, either by bleaching it, distorting it, or creating a bubble. The data is written in tracks but uses a different geometry that employs a spiraling technique to format the tracks from the center of the disk to its outer areas. Regardless of how it forms the modification, this is the method of writing data to the optical disk. When accessed for read operations, the optical device turns at a variable rate so that access throughout the disk is close to linear.
However, even with optical attributes and increased data capacities , the optical media has yet to achieve the throughput necessary for online transactional multiuser systems, or the necessary controller sophistication for data protection and recovery operations. Consequently, optical libraries will not proliferate within storage network installations even though they are heavily used in applications that favor imaging, videos, and audio. Systems that are SCSI-based can participate with SANs through a bridge/router device (see Chapter 14) and are available as NAS devices using a NAS microkernel front end with optical drives and autoloader hardware attached.