14.1. Backup Strategies
To the uninitiated, computer backup can be an intimidating topic, filled with its own list of things that must be learned. These include backup hardware, complete and incremental backups, local and network backups, and client- versus server-initiated network backups. These topics all require at least minimal description before you can make an informed decision about how to set up a network backup system.
14.1.1. Backup Hardware
The first choice you must make when putting together a network backup solution is what type of hardware to use. The choices can be baffling because there are so many. If you want to use Linux with an existing backup device, you must consider Linux's compatibility with your hardware. In any event, backup hardware falls into several broad classes, each of which has many specific models and subtypes:
Of these broad classes, tape is still the medium of choice for backing up entire networks, but the initial cost can be high. A high-capacity single-tape drive can cost over $1,000, and a tape changer, which automatically changes several tapes, enabling you to treat several as one, is even more expensive. However, high-end tape formats, use tape media that are relatively inexpensivetypically about $1 per gigabyte, uncompressed. Individual tape capacities range from 4 GB to 160 GB uncompressed, for current models.
Removable hard disks have fallen in price enough that they're now competitive with tape, particularly for small sites. A typical removable disk system costs about $100, with extra trays going for another $50 or so. You'll need one tray for each hard disk you use, which is likely to raise the price for the media (tray plus disk) to $1 per gigabyte or thereabouts, at least in early 2005. Hard disk capacities, of course, compete with those of tapes.
Removable disks (other than hard disks) and optical media simply lack the capacity to be used for full network backups, or even for full backups of individual servers or desktop systems. You might still want to use them as part of your backup plan, however. For instance, if your desktop systems hold an OS but little or no user data (that is, if you store user data on a server), you can create CD-R or recordable DVD backups of your OS installations when you first set the systems up or when you perform major OS upgrades, then omit these computers from your normal backup schedules. If your OS installations are small enough, they might fit (with compression) on a single CD-R, and almost certainly on a recordable DVD. Because most desktop systems have CD-ROM drives, and many now have DVD-ROM drives, you can restore these backups without using the network, which greatly simplifies the restore process. You could also use this approach in conjunction with selective network backups of user data directories (such as /home on a Linux desktop system) to protect data stored on users' desktop systems.
If you elect to use tapes for some or all of your backup needs, you must choose a tape format. Quite a few exist, with varying capacities, prices, and speed. Table 14-1 summarizes some of the more common tape formats. Prices in this table were taken from Internet retailers in late summer 2004; they may change by the time you read this. Also, existing tape formats are often extended to support higher capacities, and new formats are periodically introduced. Thus, you may find something better suited to your needs than anything described here. Table 14-1 summarizes drives that are currently on the market and tapes for these drives; tapes for lower-capacity variants of these units are still available and may cost less than indicated here. This table also shows prices for single-tape units; changers for many of these formats are also available, but cost more.
One more consideration in your choice of backup hardware is how the hardware interacts with software. Removable disks and removable hard disks can be accessed like internal hard disks, by creating a filesystem on the disk and copying files to the disk. You can also compress files and store them in carrier archives, such as tarballs. Tapes must be accessed using special tape device files, which provide sequential access to the drive. Typically, files are backed up using a carrier archive file. Optical media are usually written using a special program, such as cdrecord, which writes the entire disc's contents at once. The disc usually holds a filesystem, though, so that it can be read as if it were an ordinary magnetic disk. Some software enables more direct read/write access to the drive, but it is still relatively new in Linux and may not be suitable for backup purposes. In all cases, using a carrier archive file can help preserve file permissions, time stamps, and so on, even if the carrier file isn't a strict requirement.
14.1.2. Complete Versus Incremental Backups
One of the difficult questions you must answer when designing a backup solution is how much to back up. Most computers hold gigabytes of data, but only some of that data changes frequently. For instance, most executable program files change infrequently. Even many user data files can go unchanged for extended periods of time. Thus, if you can identify the changed files and update them without updating unchanged files, you can save considerable time (and backup media space) on your backups. Doing this is called an incremental backup, which contrasts with a complete backup or full backup, in which every file is backed up.
Incremental backups sound like a great idea, but they do have a drawback: they complicate restores. Suppose for the sake of argument that you perform a complete backup on Monday and an incremental backup every day thereafter. If the hard disk dies on Friday, you need to restore Monday's full backup followed by either every intervening incremental backup or the last one, depending on whether the incremental backups copy files that have changed since the last backup of any type or just the last full backup. What's more, your restored system will have files that might have been intentionally deleted during the week. This can cause serious problems if the system sees heavy turnover in large files, such as if users routinely create and then quickly destroy large multimedia files. (Some backup packages can spot such deletions and handle them automatically, but not all backup software can do this.) These problems become more severe the longer you go between full backups.
Generally speaking, using a small number of incremental backups between full backups can be a great time-saver. For instance, on critical systems that see lots of activity, you might perform a weekly full backup and a daily incremental backup. A less busy or less critical system might manage with monthly full backups and weekly incremental backups.
Given these examples, you may be wondering just how often you need to perform backups. There's no easy answer to this question because it depends on your own needs. You should ask yourself how much trouble a complete system failure would cause and design a backup schedule from there. For instance, if losing a single day's work would be a major hassle, that system should be backed up daily; however, if losing even a week's worth of data would not be a major inconvenience, weekly or even less frequent backups might suffice. The answer to this question, of course, can vary from one system to another; a major file server might need daily backups, whereas desktop computers might need much less frequent backups, or even none at all if they just hold stock OS installations.
14.1.3. Local Versus Network Backups
Much of the preceding description has assumed that individual computers are being backed up. You can certainly back up computers one by one, equipping each one with its own backup hardware or using portable backup hardware that you can move between computers. This is likely to be tedious and expensive, though. When it comes to users' desktop systems, getting them to perform backups can be difficult. One solution to these problems is to perform network backups. These use network protocols to transfer data from the system being backed up (the backup client) to the computer that holds the backup hardware (the backup server).
The main advantages of performing network backups are reduced hardware cost and the potential for simplified backup administration. This second advantage has a corollary: because backups are likely to be less tedious, they're more likely to be done. On the other hand, network backups have certain disadvantages: they can consume a great deal of network bandwidth, they require larger backup storage devices than do individual backups, they require careful planning so as to operate smoothly, and they may require overcoming cross-platform differences (such as Linux versus Windows filename conventions).
Overall, network backups are worth doing on all but the smallest networksor at least, on any network with more than a tiny number of computers that are worth backing up. Typically, your first priority will be your servers, followed by workstations on which users store their data files. You may want to create your own priority list, though; knowing what's most important on your own network will help you plan what hardware to buy and what software will best back up the data.
The backup server computer itself can be fairly unassuming, aside from its backup device and a decent network connection. The computer most likely won't be running any RAM-intensive programs. (Some high-end backup software uses large RAM buffers, however.) If you compress your backups, the CPU might need to be adequate to back up the data, but this task won't strain a CPU unless you've paired it with much more modern network and data storage systems. You might be tempted to equip a major file server with the backup hardware and make it your backup server, and this does have the advantage of simplifying the backup of this important server. On the other hand, it also imposes an extra load on the file server, both in terms of CPU (particularly if you use it to compress data) and network bandwidth. This might be acceptable if you expect to be able to fully complete backups in off hours, but if you expect your backups to occur partly when the network is in use, you might want to use a dedicated backup server. Also, a backup server may have increased vulnerability to certain types of attack, so placing it on its own computer can have security implications compared to having a file server do double duty.
14.1.4. Client- Versus Server-Initiated Backups
When doing network backups, one critical detail is which system controls the backup process: the backup server or the backup client. Both approaches have several consequences:
Both client- and server-initiated backups have their uses. Broadly speaking, client-initiated backups work best on small networks with few users and irregular backup schedules, such as in a business with half a dozen employees. As the number of computers grows, though, the scheduling hassles of client-initiated backups become virtually impossible to manage, so server-initiated backups become preferable. You might also prefer server-initiated backups even on a small network because of software features of specific packages or for other reasons; don't feel compelled to use a client-initiated backup strategy on a small network.
14.1.5. Backup Pitfalls
Backups don't always proceed as planned. Worse, restores don't always work the way you expect, and a backup is useless if you can't restore it. Some common problems, particularly in cross-platform network backups, include:
Unfortunately, backup pitfalls can be very site-specific because they often involve details of your own network, the systems you're backing up, your backup hardware, and the programs you use (both for backup and on the systems being backed up). You may need to rely on testing and experience to discover these problems, then try to find a solution on the Web or in some other way. This is why testing your backups is so critically important; it's far better to discover problems before you need to restore data than after such a restore is needed!