Backup Operations

While backups tend to run in an automated fashion, a surprising amount of detail is involved in day-to-day backup operations. This section discusses three related topics: the concept of the backup window, commonly used automated backup operations, and hot backups, including copy-on-write technology.

The Backup Window

The backup window is the amount of time available for performing backup operations that do not interfere with production data processing operations on any given day. An assumption that is made with the concept of the backup window is that systems will not be processing applications and updating data while the backup is running. A backup that runs when applications are not is referred to as cold backup.

The backup window coincides with a period of reduced data processing activity, allowing the backup operation to do its work without interfering with other data processing operations. For instance, a business may have a backup window of three hours between 1 a.m. and 4 a.m. Monday through Thursday mornings. Backup operations that do not complete within the time specified by the backup window may have to be aborted in order to allow normal data processing operations to resume. Aborted backups are not all that useful for restores and are generally thought of as a failed attempt to back up data.

Most businesses work with two different backup windows. The first is on weeknights, between consecutive business days, and the other is on weekends or holidays when the business is closed and data processing operations are typically lighter. Weeknight backup windows are a few hours in length, usually not more than six hours. Weekend backup windows can last for more than 24 hours, sometimes allowing backups to run Friday night through Monday morning. The tape rotation schedule is usually selected to fit the opportunities reflected by these backup windows.

In general, backup windows are under constant pressure to be reduced by the requirement to move to 24/7 Internet-based data processing operations. Combined with the increasing amount of data being stored and the increasing size and number of media-rich files, it is becoming steadily more difficult to finish successful backup operations within the backup window. Hot backup, discussed later in this chapter, is a generic technology that some companies use to circumvent the constraints imposed by backup windows.

Types of Automated Backup Operations

Automated tape rotation mechanisms do more than determine which tapes to use on any given day; they also automate the selection of the backup operation to perform. The type of backup operation determines generically how much data is backed up in any scheduled backup operation.

There are three basic automated backup operations:

Full backups
Incremental backups
Differential backups

Full Backup

Full backup operations copy all the files or objects of a storage volume to tape. Using the terminology of redundancy as used in this book, the goal of full backups is to create duplicate redundancy. A full backup on a server system copies all the files on all the storage volumes belonging to the server. A complete system can be restored from the tape or tapes that contain the contents of a full backup operation. However, a full backup that does not have all the data for whatever reason obviously cannot generate a complete restore.

Full backups can take many hours to perform and usually exceed the weeknight backup window. Therefore, they typically can be performed only during the weekend backup window or on holidays when the business is closed.

Incremental Backup

Incremental backups copy only files that are newly created or that were updated since the last backup operation ran. Obviously, they provide a level of delta redundancy. Whereas full backups are intended to create full redundancy at the time they are run, incremental backups create delta redundancy, which allows a new composite full redundancy image to be restored. Incremental backups are used for their efficiency of backup and the fact they back up less data and take less time to complete than either full or differential backups.

Files copied during incremental backup operations are typically selected for backup either using the archive bit or from file system date attributes. Incremental backups can be appended to an existing full backup tape, can be written to a single incremental backup tape that is used several consecutive days, or can be written to a separate tape. Appending incremental backups to the last full backup tape makes restore processes easier because they involve fewer tapes, but it also results in greater risk that data may be lost if something occurs that destroys the tape containing several days of backup data.

Differential Backup

Differential backup operations copy files that are newly created or that were updated since the last full backup operation. They are another example of delta redundancy, which attempts to make restore operations more efficient by reducing the number of tapes needed for a full restore. Another way to think of differential backups is that they aggregate all changed data onto a single tape. There is no need to keep a differential backup tape in a tape drive for the next day's backup operation, so they are removed daily, reducing the risk that data could be lost in a disaster or accident of some type.

For example, assuming a full backup was done on the weekend, a differential backup on Monday night would copy the same files that an incremental backup would. However, on Tuesday night the differential backup would combine the data from a Tuesday incremental backup with Monday's data. Of course, if a file changed on both Monday and Tuesday, only the version changed on Tuesday would be backed up on Tuesday. On Thursday night the differential backup would include all the latest changes to files that occurred from Monday through Thursday. Table 13-2 compares incremental and differential operations used in the GFS rotation scheme.

Table 13-2. Incremental and Differential Operations in a GFS Tape Rotation Scheme
Scheme	Weekend	Mon	Tues	Wed	Thurs
Incremental	Full	Monday only	Tuesday only	Wednesday only	Thursday only
Differential	Full	Monday only	Monday + Tuesday newest changes	Monday Wednesday newest changes	Monday Thursday newest changes

Special-Purpose Backup Operations

Administrators sometimes need to run special-purpose backup jobs for historical archiving, system or application migration, and data sharing. Usually these operations are done manually, where the administrator selects the specific files to back up and creates a tape name for the operation. In general, mixing special-purpose backup data on backup data on the same tapes used for regular backup operations is a bad idea.

Hot Backup and Copy-on-Write

To get around some of the problems with shrinking backup windows and growing data, a technology called hot backup was developed to allow backup operations to run outside the backup window during "normal" production data processing hours. In many cases today, especially in Internet server environments, hot backup is the only option for backup and recovery.

Cold Backup Operations

To start understanding hot backup, we'll first look at normal legacy backup operations, referred to as cold backup, which were introduced previously in this chapter in the section "The Backup Window." Cold backup does its work copying files when the system is not accepting any updates to files or the creation of new files. In other words, the system is operating in a read-only state. With cold backups there is virtually no chance that an update will occur to a file while backup is trying to copy it.

The assumption with cold backups is that the system will be unavailable for any applications that need to write data while backup is running. For backup operations taking many hours, this is a serious productivity problem.

Hot Backup Operations

Hot backups allow files to be backed up while applications are creating or updating data. The potential for problems with hot backup revolves around two basic areas:

Each file or group of files being backed up needs to have guaranteed integrity.
The load of running hot backups along with applications can strain processing resources in a system.

The concern over data integrity with hot backups is based on the probability that an update can occur to a file while it is being copied for backup. Chapter 2, "Establishing a Context for Understanding Storage Networks," discusses byte range file I/O, where it is possible to access data by its byte range location within the file. Often, updates occur in rapid succession to different byte ranges in the file. If the backup process has already copied one of those byte ranges and not the other, the file stored by backup will be inconsistent with itself. This inconsistency is similar to the inconsistency that can occur with remote copy applications, as discussed in Chapter 10, "Redundancy Over Distance with Remote Copy."

Figure 13-2 shows the multiple byte ranges of a file. The shaded byte ranges indicate which ones have already been copied by a backup program that is in process. An update to the file occurs, which updates two different byte rangesone that has already been copied by the backup program and one that has not been copied yet. The backup copy of this file will be inconsistent with any version of the file that existed on disk. Data integrity in the backup version is lost.

Figure 13-2. The Creation of an Inconsistent Backup Copy

Copy-on-Write

The key technology providing data consistency for hot backups is copy-on-write (COW). COW is a generic concept that was originally developed as a memory management technique to reduce the amount of system memory needed for multiprocessing environments. It has since been adapted to work with file system technologies as a way to facilitate reliable hot backup and other data management processes.

The basic idea of COW is to place new data updates in a temporary storage location until another process that is running can finish its task. Subsequent accesses to the same data use the temporary data location until backup finishes running and the updated data is copied to its permanent location.

COW is typically implemented in conjunction with backup agent software. A COW-enabled backup agent monitors file accesses on each file as it is being backed up. If updates occur to the file, they are temporarily redirected to a temporary storage location until the backup system finishes copying the file. Then the COW process writes the new data to its intended location.

Figure 13-3 illustrates the fundamental process of COW for backup. A backup process is copying a file with seven storage locations. During the backup process, an update is made to location 5 and is written by COW to temporary location 5'. After backup finishes, the updated data is copied from temporary location 5' to permanent location 5.

Figure 13-3. Copy-on-Write Process for Backup

NOTE

COW can also be done where the updated data is written to the permanent storage location and the older, obsolete data is written to a temporary location. However, this method has possible disadvantages for serverless backups, as discussed in the section "Serverless Backup Agent Software."

Continuous Backup

Copy on write allows IT organizations to run backup as often as they like. Some companies have used this to implement continuous backup, where as soon as one backup operation stops, another one begins. Obviously, this would use more tapes, but it also gives these companies more complete backup protection. If the goal of the company is to not lose more than a few hours of data for mission-critical systems, continuous backup may be able to meet the requirement.

Other technologies such as remote copy (see Chapter 10) and point-in-time copy (see Chapter 17, "Data Management") can be used to achieve the same recovery goals.

Limitations of Hot Backup

On the surface it appears that hot backup would be the answer to everybody's problems. Unfortunately, it is not. The heavy processor and I/O loads that backup places on server systems can cause severe problems. Servers that are already heavily utilized can sink to unacceptable performance levels when attempting to perform backups along with their normal application load. Not only that, but backup processing can completely foul up normal caching patterns in storage subsystems that are needed for acceptable performance.