Disk-Based Backup


The remainder of this chapter looks at ways disk device and subsystem technology are being used to augment tape technology in SAN backup and recovery systems. The last couple years have seen a renewed interest in this type of solution, but like so many other storage techniques, it is considerably trickier than it first appears. Disk technology in the form of virtual tape has been attempted several times but has rarely worked as well as hoped, except in mainframe environments.

NOTE

Some old pros I know who worked on computer storage technology half a century ago tell me they were predicting the death of tape in the early 1960s. Ever since, there has been something supposedly far better than tape, and every time the new technology couldn't be developed or had some sort of pathological problem.

Millennia from now, when there is nothing left of our civilization, cockroaches will probably still be using tape.


Advantages and Disadvantages of Disk Technology for Backup

There are a number of reasons people want to replace tape with something better:

  • Tape is perceived to be too fragile and failure-prone.

  • Tape capacity is too small for server disk capacities.

  • Tape is not fast enough or flexible enough for both backups and restores.

Here are some reasons why people seem to think disk can finally work for backup:

  • ATA disks are finally cheaper than tape on a cost-per-GB basis.

  • New applications and techniques are being developed for disk.

  • Network communications methods like remote copy are replacing backup in some instances.

Here are some reasons why tapes will still be needed:

  • Tapes are better for maintaining historical versions of data.

  • Tapes are still the most convenient and least expensive way to get off-site disaster protection.

  • Virtual tape on disk creates difficult media management problems.

Now, we'll examine some facts, perceptions, and opinions.

Problems with Tape

The physical structure of magnetic tape is truly amazing. The number of layers, the chemistry involved, and the magnetic and protective coatings are all the result of advanced materials research. However, that does not mean that tape is always as reliable as we would like it to be. Oxidation is a constant threat, and we live with many environmental hazards, such as heat and humidity, that are very hard on tapes.

In addition to environmental variables, the process of using tape is hard on tape. Tapes are stretched under tension in order to track correctly past the tape heads. Unlike disk technology, where the media is never touched, the tape surface is usually in contact with something elseusually the back of another section of tape. The capstans and rollers in tape drives all collect particles of various sorts, which are constantly rubbing on the tape surface.

While individual tape cartridges for newer technologies such as Super DLT and LTO Ultrium can exceed 200 GB of uncompressed data, most tape equipment has capacities that are much smaller. The main issue is the size of tape compared to the amount of stored data a server has. The more tapes that are needed for a restore, the more likely it is that one of them will be badthis is simple statistics. Tape storage does not scale incrementally; when you exceed the capacity of a cartridge, you have to use an additional cartridge. In some respects that's an easy solution for storage capacity, but it is a definite negative for reliability.

The trend is for server storage to increase faster than tape storage capacity. As servers store more data, tape has an increasingly difficult time keeping up with the growth. Even when new tape technology is introduced, if it is necessary to replace old tape technology every three years or so, an alternative to tape may be a better option.

Another big problem for tape is its performance. While new tape drives can move at streaming speeds of 70 MBps (estimated with compressible data), that is still less than a high-speed disk subsystem that does not depend on whether data is compressible. With backup continuing to be a problem for some time to come, administrators will choose brute-force solutions to get the job done faster. Disk subsystems are the fastest storage options now and will likely continue to be.

Advantages of Disk Subsystems for Backup

In general, the main advantage of using disk for backup is that it is the same technology used for primary storage. Where capacity is concerned, disk subsystems for backup can scale as large as primary disk storage.

As for reliability, disk is far superior to tape. Disks have fewer media errors, and for all practical purposes, disks do not wear out like tape does. Furthermore, disk technology uses RAID, which accommodates disk failures automatically without stopping the works. There is no such thing as a graceful failure with tape.

As mentioned, disk can be faster than tape through the use of striping techniques; however, avoiding the RAID 5 write penalty by using RAID 10 is probably a good practice. It's also worth pointing out that without striping, a streaming tape drive is likely to be faster than a single disk drive. It is also important to understand that caching is virtually worthless for backup processing due to the large amounts of data being transferredafter the cache fills, it provides no benefits. Disk subsystems used for backup processes are better off turning off their caching functions. In general, a disk subsystem is faster than writing to tapes, but the configuration of disk is an important variable.

Despite the functional advantages of disk subsystems, a big reason disk is suddenly getting so much attention is its cost. ATA disk drives are now cheaper than tape storage for equivalent capacity. That's why many of the discussions about disk for backup center on ATA and SATA disk technologies.

Another interesting aspect of disk is that random access to disk is much better than sequential access to tape for working with backup metadata, which could be used to great efficiency. For example, virtual views of backup data on disk are easier to create than they are with tape. Although not technically backup, software snapshot technology built into file systems on disk storage provide instantaneous access to historical versions of files. The same kind of access with tape could take hours.

Finally, the removable aspects of tape that allow data to be kept outside a local geography for disaster recovery protection are being diminished by high-speed networking and store-and-forward remote copy technologies. Again, the retrieval of data using electronic methods is much faster than a delivery van in most cases.

Reasons for Tape's Continued Use

Despite its shortcomings, tape has been used a long time for disaster protection. Companies have significant operations and plans that revolve around tape. These infrastructures and methods will continue to be used for a long time.

Although snapshot technology is excellent for restoring recently changed data, tape is still very good for storing and recovering historically important data. As it happens, old data almost never needs to be restored, but when it is needed, the reasons and motivations to do it are usually cogent.

It is true that MANs and WANs are making it easier to transfer backup data over a network, but the cost of the network is still higher than the cost of transporting tapes. Most companies using remote copy software are also using physical movement of tapes to off-site storage of tape for their less-critical systems.

But the main reason tape will continue to be used is because backup software packages were written to use it and the entire operation of backup has been developed around the use of tape.

For example, consider the following scenario: Assume you back up to disk with a tape backup program. The tape operation calls for a specific tapelet's say "Charlie-8." The backup operation writes to disk and updates its metadata to reflect that the data is on tape Charlie-8. You then start a process to make a copy to a real physical tape named Charlie-8, but the tape drive ejects the tape, telling you it has tape errors and cannot write to the tape. At this point there is a real problem: the data you just wrote may need to be restored from Charlie-8, but you can't throw the old one away and create a new Charlie-8 because it might still have data on it that you might need to restore. (Just because you can't write to a tape does not mean you can't read from it.) You could create a second Charlie-8, but that is precisely the sort of thing that makes media management difficult in addition to leading to lost data during restores. As it turns out, this is not an admin-friendly situation. Unfortunately, it is not necessarily a corner case either. Unfortunately, it is not necessarily a corner case either because tape problems are a way of life.

One way to make disks work with backup is to use two different backup systems: one to go from primary disk to backup disk and the other to go from backup disk to tape. Obviously this is far from optimal, because it requires two separate backup-and-restore operations. A better solution allows data to be restored in the most direct or fastest way possible and skip the disk as a middleman.

The best solution is using software developed to use both disk and tape and can that accommodate both of them flexibly.

Disk Backup Architecture

Three common architectures are used with disk-based backup, reflecting the way disk is integrated into backup operations:

  • Disk to disk

  • Disk to disk to tape

  • Integrated disk and tape

Each of these will be discussed briefly in the sections that follow. One thing to keep in mind with all disk-based backup solutions is the level of backup software integration that has been donethere are many possible designs within each generic architectural category. Just because a hardware solution provides data copy capabilities, do not assume that software understands what the hardware has done. Each solution needs to be evaluated for the consistency of information between backup software and all media used.

Disk to Disk

Disk to disk, otherwise referred to as D2D, was discussed briefly in the preceding sections as an architecture that backs up data to a disk subsystem. This architecture works well for full backups but does not work well if the goal is keeping historically important versions of files.

D2D backup can be coupled with tape-based backup so that the backup data on disk can be backed up to tape using separate operations, metadata, and media management. The idea is that disk backup is used for restoring recently backed-up files and for disaster recovery, and that the tape backup would be used for locating historically interesting files. However, restore processes necessarily take two steps: from tape to backup disk and from backup disk to system disk. This means there are twice as many restore processes to go wrong.

Disk to Disk to Tape

Disk to disk to tape (D2D2T) is an expansion of the scenario just described for D2D, but with an integrated second-stage backup operation for copying backup data from disk to tape. This approach overcomes the historical shortcomings of the D2D architecture and includes a single integrated metadata and media management system for locating data that has been backed up. Copying data from backup disk to tape is typically part of the same extended backup operation, as opposed to being two completely independent administrative operations. Restores are typically two-stage restore with D2D2T, although it is certainly possible to restore from tape to server storage using this architecture.

Integrated Disk and Tape

Another approach to using disk for backup is to create a single-level store for backup that integrates both disk and tape as common backup storage resources. The idea of a single-level store incorporates all backup storage, both disk and tape, as an abstract, virtual storage layer that can be used during backup. Policy-based management determines how the disk and tape storage resources are used.

In general, recent, high-priority backup data is stored on disk, and historical versions of data are stored on tape. The integrated disk and tape backup system automatically transfers data between disk and tape resources. Media is managed as a single integrated set of resources. Restores can be made from either disk or tape resources. Integrated solutions are more advanced and require a higher level of integration with backup software to be able to use and manage the virtualized resources provided by hardware.

Are Snapshots Backup?

Chapter 17 discusses a technology called point-in-time copy, also called data snapshots. Point-in-time copies of data provide both duplicate and delta redundancy and are used to restore individual files as well as for disaster recovery.

So the question is this: Should point-in-time copies be considered backup technologies since they can provide many of the same functions? In this book they are not, although in time they may be thought of as different variations of the same larger storage technology.

The processes and methods used by a technology do matter in differentiating between them. Considerably different processes and methods are used in legacy backup technologies and point-in-time snapshots. Backup is a process that runs over an extended period of time to make copies of data. Point-in-time snapshots are processed instantaneously to create redundancy.

The results also matter a great deal. Backup is very well suited to making historical copies of files, whereas point-in-time snapshot technology is really mostly good for disaster recovery. The two are often used together by IT organizations using a modified disk-to-disk-to-tape approach. First, point-in-time snapshots are used to make disk-to-disk copies, and then these snapshot images are backed up with a backup system.

It's clear that these two different functions can be integrated into a single data management solution that provides both instantaneous images of data as well as historical copies. These larger integrated solutions probably merit a new name or classification. The moniker ILM (Information Life Cycle Management) looks like it is the front-runner, although the ambiguity of this term could force the creation of a new term and classification.




Storage Networking Fundamentals(c) An Introduction to Storage Devices, Subsystems, Applications, Management, a[... ]stems
Storage Networking Fundamentals: An Introduction to Storage Devices, Subsystems, Applications, Management, and File Systems (Vol 1)
ISBN: 1587051621
EAN: 2147483647
Year: 2006
Pages: 184
Authors: Marc Farley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net