It might not happen today or tomorrow, but someday you will lose a hard drive containing essential network data. The drive might be stolen along with the computer, destroyed in a fire or other catastrophe, or simply fail. Whatever happens, the data is gone and it's up to you, as the network administrator, to get it back. The day this occurs is the day you will thank yourself for all the effort you took to set up a network backup strategy. If you don't have a backup strategy in place, it might be the day you start working on your résumé.
Backups are simply copies of your data that you make on a regular basis, so that if a storage device fails or is damaged and the data stored there is lost, you can restore it in a timely manner. A backup is the ultimate fault-tolerance measure. Even if you have other storage technologies in place that provide fault tolerance, such as mirrored disks or a redundant array of independent disks (RAID), you still need a backup solution. Networks both complicate and simplify the process of making regular backups. The process is more complicated because you have data stored on multiple devices that must be protected, and it is simpler because you can use the network to access those devices. A network backup strategy specifies what data you back up, how often you back it up, and what medium you use to store the backups. The decisions you make regarding the backup hardware, software, and administrative policies you will use depend on how much data you have to back up, how much time you have to back it up, and how much protection you want to provide.
You can perform backups using any type of storage device. One objective in developing an effective backup strategy, however, is to automate as much of the process as possible. Although you can back up 1 gigabyte (GB) of data onto 1.44-MB floppy disks, you probably don't want to be the person sitting around feeding 695 disks into a floppy drive. Therefore, you should select a device that is capable of storing all of your data without frequent media changes. This enables you to schedule backup jobs to run unattended. This doesn't mean, however, that you have to purchase a drive that can hold all of the data stored on all of your network's computers. You can be selective about which data you want to back up, so it's important to determine just how much of your data needs protecting before you decide on the capacity of your backup device.
Another important criterion to use when selecting a backup device is the speed at which the drive writes data to the medium. Backup drives are available in many different speeds, and, not surprisingly, the faster ones are generally more expensive. It is typical for backup jobs to run during periods when the network is not otherwise in use. This ensures that all of the data on the network is available for backup. The amount of time that you have to perform your backups is sometimes called the backup window. The backup device that you choose should depend in part on the amount of data you have to protect and the amount of time that you have to back it up. If, for example, you have 10 GB of data to back up and your company closes down from 5:00 P.M. until 9:00 A.M. the next morning, you have a 16-hour backup window—plenty of time to copy your data, using a medium-speed backup device. However, if your company operates three shifts and only leaves you one hour, from 7:00 A.M. to 8:00 A.M., to back up 100 GB of data, you will have to use a much faster device or, in this case, several devices.
Cost is always a factor in selecting a hardware product. You can purchase a low-end backup drive for $100 to $200, which is suitable for backing up a home computer where speed isn't a major factor. However, when you move up to the drives that have the speed and capacity that make them suitable for network backups, the prices increase exponentially. High-end backup drives can command prices that run into five figures. When you evaluate backup devices, you must be aware of the product's extended costs as well. Backup devices nearly always use a removable medium, such as a tape or disk cartridge. This enables you to store copies of your data off site, such as in a bank's safe deposit vault. If the building where your network is located is destroyed by a fire or other disaster, you still have your data, which you can use to restart operations elsewhere. Therefore, in addition to purchasing the drive, you must purchase storage media as well. Some products might seem at first to be economical because the drive is inexpensive, but in the long run they might not be, because the media are so expensive. One of the most common methods of evaluating various backup devices is to determine the cost per megabyte of the storage it provides. Divide the price of the medium by the number of megabytes it can store, and use this figure to compare the relative cost of various devices. Of course, in some cases you might need to sacrifice economy for speed or capacity.
The most common hardware device used to back up data is a magnetic tape drive, like the one shown in Figure 16.1. Unlike hard disk, floppy disk, and CD-ROM drives, tape drives are not random access devices. This means that you can't simply move the drive heads to a particular file on a backup tape without spooling through all of the files before it. As with other types of tape drives, such as audio and video, the drive unwinds the tape from a spool and pulls it across the heads until it reaches the point in the tape where the data you want is located. As a result, you can't mount a tape drive in a computer's file system, assign it a drive letter, and copy files to it, as you can with a hard disk drive. A special software program is required to address the drive and send the data you select to it for storage. This also means that tape drives are useless for anything other than backups, whereas other media, such as writable CD-ROMs, can be used for other things.
Figure 16.1 An external magnetic tape drive
Magnetic tape drives are well suited for backups; they're fast, they can hold a lot of data, and their media cost per megabyte is low, often less than one-half cent per megabyte. There are many different types of magnetic tape drives that differ greatly in speed, capacity, and price. At the low end are quarter-inch cartridge (QIC) drives, which can cost as little as $200. A single QIC tape cartridge can hold anywhere from 150 MB to 20 GB. At the high end are digital linear tape (DLT) and linear tape-open (LTO) drives, which can cost many thousands of dollars and store as much as 100 GB on a single tape. The most common magnetic tape technologies used for backups are listed in Table 16.1.
Table 16.1 Magnetic Tape Technologies
|Type||Tape Width||Cartridge Size||Capacity (uncompressed)||Speed|
4 × 6 × 0.625 inches (data cartridge); 3.25 × 2.5 × 0.6 inches (minicartridge)
Up to 20 GB
2 to 120 MB/ min
Digital audio tape (DAT)
2.875 × 2.0625 × 0.375 inches
Up to 20 GB
3 to 144 MB/ min
3.7 × 2.44 × 0.59 inches
Up to 60 GB
Up to 180 MB/ min
4.16 × 4.15 × 1 inches
Up to 40 GB
Up to 360 MB/ min
LTO, Ultrium media
4.0 × 4.16 × 0.87 inches
Up to 100 GB
Up to 1920 MB/ min
The capacities of magnetic tape drives are generally specified using two figures, such as 40 GB to 80 GB. These numbers refer to the capacity of a tape without compression and with compression. Most tape drives have hardware-based data compression capabilities built into them, but the additional capacity that you achieve when using compression is based on the type of data you are storing. The capacity figures assume an average compression ratio of 2:1. Some types of files, such as image files using uncompressed BMP or TIF formats, can compress at much higher ratios, as high as 8:1. Files that are already compressed, such as GIF or JPG image files or ZIP archives, cannot be compressed further and are stored at a 1:1 compression ratio.
Backup devices can use any of the standard computer interfaces, such as Integrated Drive Electronics (IDE), universal serial bus (USB), and Small Computer Systems Interface (SCSI). Some backup drives even connect to the computer's parallel port, although this is just a form of SCSI that uses a different port. The most common interface used in high-end network backup solutions is SCSI.
These SCSI devices operate more independently than those using IDE, which means that the backup process, which often entails reading from one device while writing to another on the same interface, is more efficient. When multiple IDE devices share a channel, only one operates at a time. Each drive must receive, execute, and complete a command before the other drive can receive its next command. On the other hand, SCSI devices can maintain a queue of commands that they have received from the host adapter and execute them sequentially and independently.
Magnetic tape drives, in particular, require a consistent stream of data to write to the tape with maximum effectiveness. If there are constant interruptions in the data stream, as can be the case with the IDE interface, the tape drive must repeatedly stop and start the tape, which reduces its speed and its overall storage capacity. A SCSI drive can often operate continuously without pausing to wait for the other devices on the channel.
A SCSI backup device is always more expensive than a comparable IDE alternative, because the drive requires additional electronics and because you must have a SCSI host adapter installed in the computer. Most SCSI devices are available as internal or external units, the latter of which have their own power supplies, which also adds to the cost. However, the additional expense is worth it for a reliable network backup solution.
The popularity of writable CD-ROM drives, such as compact disc-recordables (CD-Rs) and compact disc rewritables (CD-RWs), has led to their increasing use as backup devices. Although the capacity of a CD is limited to approximately 650 MB, the low cost of the media makes CDs an economical solution, even if the disks can be used only once, as is the case with CD-Rs. The biggest factor in favor of CD-ROMs for backup is that many computers already have CD-ROM drives installed for other purposes, eliminating the need to purchase a dedicated backup drive.
For network backups, however, CD-ROMs are usually inadequate. Most networks have multiple gigabytes worth of data to back up, which would require many disk changes. In addition, CD-R and CD-RW drives are usually not recognized by network backup software products. Although these drives often come with software that provides its own backup capabilities (intended for relatively small, single-system backups), this software usually does not provide the features needed for backing up a network effectively.
Another storage device commonly found in computers these days that can easily be used for backups is the removable cartridge drive. Products like Iomega's Zip and Jaz drives provide performance that approaches that of a hard disk drive, but they use removable cartridges. These drives mount into a computer's file system, meaning that you can assign them a drive letter and copy files to them just as with a hard drive.
Zip cartridges hold only 100 MB or 250 MB, which makes them less practical than CDs for backups. However, Jaz drives are available in 1-GB and 2-GB versions, which is sufficient for a backup device. The drawback of using this type of drive for backup purposes is the extremely high cost of the media. A 2-GB Jaz cartridge can cost $125 or more, which is more than 6 cents per megabyte, far more than virtually any other storage device.
In some cases, even the highest capacity drive isn't sufficient to back up a large network with constantly changing data. To create an automated backup solution with a greater capacity than that provided by a single drive, you can purchase a device called an autochanger. An autochanger, shown in Figure 16.2, is a unit that contains one or more drives (usually tape drives, but optical disk and CD-ROM autochangers are also available) and a robotic mechanism that swaps the media in and out of the drives. Sometimes these devices are called jukeboxes or tape libraries. When a backup job fills one tape (or other storage medium), the mechanism extracts it from the drive and inserts another, after which the job continues. The autochanger also retains a memory of which tapes are available, commonly called an index, and can load the appropriate tape to perform a restore job.
Figure 16.2 A tape autochanger
Some autochangers are small devices with a single drive and an array that holds four or five tapes, whereas others are enormous devices with as many as four drives and an array of 100 tapes or more. If you purchase a large enough autochanger, you can create a long-term backup strategy that enables backups to run completely unattended for weeks at a time. However, before you solidify your plans to get a refrigerator-sized autochanger and never load a tape into a drive again, be aware that the cost of these devices can be astonishingly high, reaching as much as six figures in some cases.
Apart from the hardware, the other primary component in a network backup solution is the software that you use to perform the backups. Storage devices designed for use as backup solutions are not treated like the other storage subsystems in a computer; a specialized software product is required to package the data that you want to back up and send it to the drive. Depending on the operating system you're using, you might already have a backup program that you can use with your drive, but in many cases an operating system's own backup program provides only basic functionality and lacks features that can be especially useful in a network environment.
The primary functions of a good backup software product are examined in the following sections.
The most basic function of a backup software program is to let you select what you want to back up, which is sometimes called the target. A good backup program enables you to do this in many ways. You can select entire computers to back up, specific drives on those computers, specific directories on the drives, or specific files in specific directories. Most backup programs provide a directory tree display that you can use to select the targets for a backup job. Figure 16.3 shows the interface that the Microsoft Windows 2000 Backup program uses to select backup targets.
Figure 16.3 The Backup dialog box in the Windows 2000 Backup program
In most cases, it isn't necessary to back up all of the data on a computer's drives. If a hard drive is completely erased or destroyed, you are likely to have to reinstall the operating system before you can restore files from a backup tape, so it might not be worthwhile to back up all of the operating system files each time you run a backup job. The same is true for applications. You can reinstall an application from the original distribution media, so you might want to back up only your data files and configuration settings for that application. In addition, most operating systems today create temporary files as they run, which you do not need to back up. Windows, for example, creates a temporary file for memory paging that can be several hundred megabytes in size. Because this file is re-created each time you start the computer, you can save space on your backup tapes by omitting files like this from your backup jobs. Judicious selection of backup targets can mean the difference between fitting an entire backup job onto one tape or staying late after work to insert a second tape into the drive.
Individually selecting the files, directories, and drives that you want to back up can be quite tedious, though, so many backup programs provide other ways to specify targets. One common method is to use filters that enable the software to evaluate each file and directory on a drive and decide whether to back it up. A good backup program provides a variety of filters that allow you to select targets based on file and directory names, extensions, sizes, dates, and attributes. For example, you can configure the software to back up a computer running Windows 2000 and use filters to exclude PAGEFILE.SYS, which is the memory paging file; the \Temporary Internet Files directories, which contain Microsoft Internet Explorer's browser cache; and all files with a .tmp extension, which are temporary files created by various applications. None of these files are necessary when restoring the system from a backup tape, so it's worthless to save them and they can add up to a significant amount of tape storage space.
You can also use filters to limit your backups to only files that have changed recently, using either date or attribute filters. The most common type of filter used by backup programs is the one for the archive attribute, which enables the software to back up only the files that have changed since the last backup. This filter is the basis for incremental and differential backups.
The most basic type of backup job is a full backup, which copies the entire contents of a computer's drives either to tape or to another medium. You can perform a full backup every day, if you want to, or each time that you back up that particular computer. However, this practice can be wasteful, both in terms of time and tape. When you perform a full backup every day, the majority of the files you are writing to the tape are exactly the same as they were yesterday. The program files that make up the operating system and your applications do not change. The only files that change on a regular basis are your data files and perhaps the files that store configuration data, along with special resources like the Windows Registry and directory service databases.
To save on tape and shorten the backup time, many network administrators perform full backups only once a week, or even less frequently. In between the full backups, they perform special types of filtered jobs that back up only the files that have recently been modified. These types of jobs are called incremental backups and differential backups.
An incremental backup is a job that backs up only the files changed since the last backup job of any kind. A differential backup is a job that backs up only the files that have changed since the last full backup. The backup software filters the files for these jobs using a special file attribute called the archive bit, which every file on the computer possesses. File attributes are 1-bit flags, stored with each file on a drive, that perform various functions. For example, the read-only bit, when activated, prevents any application from modifying that particular file, and the hidden bit prevents most applications from displaying that file in a directory listing. The archive bit for a file is activated by any application that modifies that file. When the backup program scans the target drive during an incremental or differential job, it selects for backup only the files with active archive bits.
During a full backup, the software backs up the entire contents of a computer's drives, and also resets (that is, removes) the archive bit on all of the files. Immediately after the job is completed, you have a complete copy of the drives on tape, and none of the files on the target drive has an active archive bit. As work on the computer proceeds after the backup job is completed, applications and operating system processes modify various files on the computer, and when they do, they activate the archive bits for those files. The next day, you can run an incremental or differential backup job, which is also configured to back up the entire computer, except that it filters out all files that do not have an active archive bit. This means that all of the program files that make up the operating system and the applications are skipped, along with all data files that have not changed. When compared to a full backup, an incremental or differential backup job is usually much smaller, so it takes less time and less tape.
The difference between an incremental and a differential job lies in the behavior of the backup software when it either resets or does not reset the archive bits of the files it copies to tape. Incremental jobs reset the archive bits, and differential jobs don't. This means that when you run an incremental job, you're only backing up the files that have changed since the last backup, whether it was a full backup or an incremental backup. This uses the least amount of tape, but it also lengthens the restore process. If you should have to restore an entire computer, you must first perform a restore from the last full backup tape, and you must then restore each of the incremental jobs performed since the last full backup. For example, suppose that you run a full backup job on a particular computer every Monday evening and incremental jobs every evening from Tuesday through Friday. If the computer's hard drive fails on a Friday morning, you must restore the previous Monday's full backup, and you must then restore the incremental jobs from Tuesday, Wednesday, and Thursday, in that order. The order of the restore jobs is essential if you want the computer to have the latest version of every file.
Differential jobs do not reset the archive bit on the files they back up. This means that every differential job backs up all of the files that have changed since the last full backup. If you perform a full backup on Monday evening, Tuesday evening's differential job will back up all files changed on Tuesday, Wednesday evening's differential job will back up all files changed on Tuesday and Wednesday, and Thursday evening's differential backup will back up all files changed on Tuesday, Wednesday, and Thursday. Differential backups use more tape, because some of the same files are backed up each day, but differential backups also simplify the restore process. To completely restore the computer that failed on a Friday morning, you only have to restore Monday's full backup tape and the most recent differential backup, which was performed Thursday evening. Because the Thursday tape includes all of the files modified on Tuesday, Wednesday, and Thursday, no other tapes are needed. The archive bits for these changed files are not reset until the next full backup job is performed.
Running incremental or differential jobs is often what makes it possible to automate your backup regimen without spending too much on hardware. If your full backup job totals 50 GB, for example, you might be able to purchase a 20-GB drive. You'll have to manually insert two additional tapes during your full backup jobs, once a week, but you should be able to run incremental or differential jobs the rest of the week using only one tape, which means that the jobs can run unattended.
Run the Backups video located in the Demos folder on the CD-ROM accompanying this book for a demonstration of incremental and differential backups.
When you have selected what you want to back up, the next step is to specify where to send the selected data. The backup software typically enables you to select a backup device (if you have more than one) and prepare to run the job by configuring the drive and the storage medium. For backup to a tape drive, this part of the process can include any of the following tasks:
All backup products enable you to create a backup job and execute it immediately, but the key to automating a backup routine is being able to schedule jobs to execute unattended. This way, you can configure your backup jobs to run when the office is closed and the network is idle, so that all resources are available for backup and user productivity is not compromised by a sudden surge of network traffic. Not all of the backup programs supplied with operating systems or designed for stand-alone computers support scheduling, but all network backup software products do.
Backup programs use various methods to automatically execute backup jobs. The Windows 2000 Backup program uses the operating system's Task Scheduler application, and other programs supply their own program or service that runs continuously and triggers the jobs at the appropriate times. Some of the higher end network backup products can use a directory service, such as Microsoft's Active Directory service or Novell Directory Services (NDS). These programs modify the directory schema (the code that specifies the types of objects that can exist in the directory) to create an object representing a queue of jobs waiting to be executed.
For more information about enterprise directory services such as Active Directory and NDS, see Lesson 3: "Directory Services," in Chapter 4, "Networking Software."
No matter which mechanism the backup software uses to launch jobs, the process of scheduling them is usually the same. You specify whether you want to execute the job once or repeatedly at a specified time each day, week, or month, using an interface like that shown in Figure 16.4. The idea of the scheduling feature is for the network administrator to create a logical sequence of backup jobs that execute by themselves at regular intervals. After this is done, the only thing that remains to be done is changing the tape in the drive each day. If you have an autochanger, you can even eliminate this part of the job and create a backup job sequence that runs for weeks or months without any attention at all.
Figure 16.4 The Windows 2000 Backup program's Schedule Job dialog box
When a backup job runs, the software accesses the specified targets and feeds the data to the backup drive in the appropriate manner. Because of the nature of the media typically used for backups, it is important for the data to arrive at the storage device in a consistent manner and at the proper rate of speed. The software, therefore, must be designed to address specific drives in the manner appropriate for that device.
As the software feeds the data to the drive, it also keeps track of the software's activities. Most backup software products can maintain a log of the backup process as it occurs. You can often specify a level of detail for the log, such as whether it should contain a complete list of every file backed up or just record the major events that occur during the job. Periodically checking the logs is an essential part of administering a network backup program. The logs tell you when selected files are skipped for any reason, for example, if the files are locked open by an application or if the computers on which they are stored are turned off. The logs also let you know when errors occur on either the backup drive or one of the computers involved in the backup process. Some software products can generate alerts when errors occur, notifying you by sending a status message to a network management console, by sending you an e-mail message, or by other methods.
It is also important to keep an eye on the size of your log files, particularly when you configure them to maintain a high level of detail. These files can grow huge very quickly, and can consume all of the available disk space on the drive on which they are stored.
In addition to logging their activities, backup software programs also catalog the files they back up, thus facilitating the process of restoring files later. The catalog is essentially a list of every file that the software has backed up during each job. To restore files from the backup medium, you browse through the catalog and select the files, directories, or drives that you want to restore. Different backup software products store the catalog in different ways. The Windows 2000 Backup program, for example, stores the catalog for each tape on the tape itself. The problem with this method is that you have to insert a tape into the drive to read the catalog and browse the files on that tape.
More elaborate network backup software programs take a different approach by maintaining a database of the catalogs for all of your backup tapes on the computer where the backup device is installed. This database enables you to browse through the catalogs for all of your tapes and select any version of any file or directory for restoration. In some cases, you can view the contents of the database in several different ways, such as by the computer, drive, and directory where the files were originally located, by the backup job, or by the tape or other media name. After you make your selection, the program specifies which tape contains the file or directory; insert it into the drive, and the job proceeds.
The database feature can use a lot of the computer's disk space and processor cycles, but it greatly enhances the usability of the software, particularly in a network environment.
Backup software products that rely on a database typically store a copy of the database on your tapes as well as on the computer's hard drive. This is done so that if the computer you use to run the backups should suffer a drive failure, you can restore the database later.
Some network administrators use new tapes for every backup job and store them all permanently. However, this can become extremely expensive. It's more common for administrators to reuse backup tapes. To do this properly, however, you must have a carefully wrought media rotation scheme, so that you don't inadvertently reuse a tape you'll need later. You can always create such a scheme yourself, but some backup software products do it for you. One of the most common media rotation schemes is called Grandfather-Father-Son, which refers to backup jobs that run monthly, weekly, and daily. You have one set of tapes for your daily jobs, which you reuse every week; a set of weekly tapes, which you reuse every month; and a set of monthly tapes, which you reuse each year. There are other schemes that vary in complexity and utility, depending on the software product.
When the software program implements the rotation scheme, it provides a basic schedule for the jobs (which you can modify to have the jobs execute at specific times of the day), tells you what name to write on each tape as you use it and, once you begin to reuse tapes, tells you which tape to put in the drive for each job. The end result is that you maintain a perpetual record of your data while using the minimum number of tapes without fear of overwriting a tape you need.
Restoring data from your backups is, of course, the sole reason for making them in the first place. The ease with which you can locate the files you need to restore is an important feature of any backup software product. It is absolutely essential that you perform periodic test restores from your backup tapes or other media to ensure that you can get back any data that is lost. Even if all your jobs complete successfully and your log files show that all of your data has been backed up, there is no better test of a backup system than an actual restore. There are plenty of horror stories of network administrators who dutifully perform their backups every day for a year, only to find out when disaster strikes that all their carefully labeled tapes are blank due to a malfunctioning drive.
Although making regular backups is usually thought of as protection against a disaster that causes you to lose an entire hard drive, the majority of the restore jobs you will perform in a network environment are of one or a few files that a user has inadvertently deleted. As mentioned earlier, the program's cataloging capability is a critical part of the restoration process. If a user needs to have one particular file restored and you have to insert tape after tape into the drive to locate it, everyone's time is wasted. A backup program with a database that lets you search for that particular file makes your job much easier and enables you to restore any file in minutes.
Restore jobs are similar to backup jobs, in that you typically select the files or directories that you want to restore, using an interface like that shown in Figure 16.5. You then specify whether you want to restore the files to the locations they originally came from or to another location. If you restore them to a different location, you can usually configure the software to place all of the restored files into one directory or re-create the directory structure from which the files were backed up.
Figure 16.5 The Windows 2000 Backup program's Restore dialog box
One of the problems with the typical backup software product is that, like any application, it requires an operating system to run. What happens if the drive in the computer hosting the backup drive should fail? You may have a complete backup of the computer, but to restore it, you first have to reinstall the entire operating system and the backup software product, which can be a time-consuming task. To address this problem, many backup software products provide a disaster recovery feature that enables you to create a boot disk that loads just enough of the operating system and the backup application to perform a restore. A restore from a full backup will then provide all of the software needed to restart the computer in the normal manner.
It is particularly important that you choose a backup software product that is designed for network use. The primary difference between network backup software and an application designed for stand-alone systems is that the former can back up other computers on the network. This means you can purchase one backup drive and use it to protect your entire network. Many stand-alone backup products can access drives on networked computers that you have mapped to a drive letter, but a fully functional network backup product can also back up important operating system features on other computers, such as the Windows Registry and directory service databases. This type of remote backup may require you to install a software component on the target computer, as well as on the computer where the backup drive is located.
In many cases, network backup products also have optional add-on components that enable you to perform specialized backup tasks, such as backing up live databases or computers running other operating systems. These can be a critical part of your network backup solution. If, for example, you have database or e-mail servers that run around the clock, you might not be able to fully back them up using a standard software product because the database files are locked open. As a result, your backup job protects the program files for the database engine (the part that's easily replaceable) but leaves your actual data unprotected. To back up a database of this type, you either have to close it by shutting it down or use a specialized piece of software that creates temporary database files (called delta files) that the server can use while the database itself is closed for the duration of the backup process.