Various types of backup schemes exist, and they can be categorized in different ways. In an actual data center, one typically uses multiple types of backups. In short, the categorization of backups should not be taken to be
Sections 5.3.1 through 5.3.3 take a look at each of these types of classification.
One way of classifying backups is based on the architecture. That is, backups are classified in terms of the objects they deal with and the amount of awareness the backup application has of these objects. The available types of architecture-based backups, described in Sections 188.8.131.52 through 184.108.40.206, are
Image- or block-level backup
The backup application in this case deals with blocks of data. Typically, this kind of backup scheme needs all applications on the server to
The advantages of this kind of backup are that the backup and restore operations are very fast, and it can be a good disaster recovery solution. One disadvantage is that applications and even the operating system cannot access the disk while the backup or restore is happening. Another
Finally, it is hard to retrieve just a particular file or a few files rather than restore all the data to a disk. To do so, the restore software must understand the file system metadata as it exists on the tape, retrieve this metadata, and from there, compute the location on the tape where the data for the particular file resides. Some
The version of NTFS included with Windows 2000 already keeps all metadata in files ”for example, the bit map that represents logical block allocation. The restore application
Note that sometimes the choice of backup is limited. Consider the case in which a database uses a raw disk volume (without any kind of file system on that volume). In this case the only two choices are an image-level backup or an application-level backup (the latter is described in Section 220.127.116.11).
With this type of backup, the backup software makes use of the server operating system and file system to back up files. One advantage is that a particular file or set of files can be restored relatively easily. Another is that the operating system and applications can continue to access files while the backup is being performed.
There are several disadvantages as well. The backup can take longer,
Another disadvantage is
In this case, backup and restore are done at the application level, typically an enterprise application level ”for example, Microsoft SQL Server or Microsoft Exchange. The backup is accomplished via APIs provided by the application. Here the backup consists of a set of files and objects that together
Applications either use a raw disk that has no file system associated with the volume/partition or simply have a huge file allocated on disk and then lay down their own metadata within this file. A good example of an application that takes this approach is Microsoft Exchange. Windows XP and Windows Server 2003 introduce an important feature in NTFS to facilitate restore operations for such files. The file can be restored via logical blocks, and then the end of the file is
Yet another way of classifying backup applications is based on the functionality that is achieved in the backup process. Note that a data center typically uses at least two and very often all types of the backups described in Sections 18.104.22.168 through 22.214.171.124: full, differential, and incremental.
In a full backup , the complete set of files or objects and associated metadata is copied to the backup media. The advantage of having a full backup is that only one media set is needed to recover everything in a disaster situation. The disadvantage is that the backup operation takes a long time because everything needs to be copied. Full backups are very often accomplished with the image- or block-level backup architecture.
all changes since the last full backup
. Because differential backups can be either image block based or file based, this set of changes would represent either the set of changed disk blocks (for
With low-end storage deployed, file-based differential backups are used when the applications by nature tend to create multiple small files and change or create just a few of them since the last full backup. In addition, when low-end storage is deployed, file-based differential backups are not typically used with database applications, because database applications, by their very nature, tend to make changes in small parts of a huge database file. Hence a file-based backup would still have to copy the whole file. A good example here is Microsoft Exchange, which tends to make changes in small
With high-end storage deployed, image-based differential backup can be used in any situation, including with database applications. The reason for this flexibility is that the high-end storage units can track a lot of metadata and thus quickly identify which disk blocks have changed since the last full backup. Thus, only this small number of disk blocks needs be archived, and the large number of unchanged disk blocks that are present in the same database file can be ignored. Even though the backup with high-end storage is more efficient, APIs that start the backup at a consistent point and allow the I/O to resume after the backup has been accomplished are still needed. The efficiency of high-end storage simply minimizes the time during which all I/O must be frozen while the backup is being made.
An incremental backup archives only the changes since the last full or incremental backup . Again, the obvious advantage is that this backup takes less time because items not modified since the last full or incremental backup do not need to be copied to the backup media. The disadvantage is that a disaster recovery operation will take longer because restore operations must be done from multiple media sets, corresponding to the last full backup followed by the various incremental backups.
In the absence of high-end storage, file-based incremental backup is used only when a different set of files is typically created or modified. With high-end storage that can provide the required metadata tracking,
One way of classifying a backup scenario is based on the network topology used, and how that topology lends itself to achieving the best method for backing up the attached
Direct-attached backup was the first form of backup used, simply because it emerged in the era when storage devices were typically attached directly to servers. Despite the
The advantage of direct-attached backup is that it is
Tape devices are
The total cost of ownership is high because you need more administrators doing tape backups using multiple tape devices.
Storing multiple tapes can be confusing.
Because the data on different servers is often duplicated, but slightly out of sync, the tape media reflects duplication of data with enough seemingly similar data to cause confusion.
Last, but not least, the server must be able to handle the load of the read/write operations that it
As Chapter 3 discussed, the era of direct-attached storage was followed by the client/server era with a lot of
Figure 5.4 shows a typical deployment scenario for network-attached backup. The left side of the diagram shows a couple of servers. These could be application or file-and-print servers, and there may be more than just a couple. The right side of Figure 5.4 shows a backup server with a tape unit attached. This tape device can be used for backing up multiple file-and-print or application servers. Thus, network-attached backup allows a tape device to be shared for backing up multiple servers, which can reduce costs.
The problems that network-attached backup introduced are these:
The backup operation consumes LAN bandwidth, often requiring careful segmentation of the LAN to put the backup traffic on a separate LAN segment.
Host online hours (i.e., operating hours) increased; that is, the amount of time servers needed to be available for transactions and user access grew. In addition, the amount of data on the servers (that needed to be backed up) started increasing as well.
Increasingly, these problems led to the use of backup requirements as the sole basis for network design, determining the exact number of backup devices needed, and the selection and placement of backup devices.
The advent of storage area networks introduced new concepts for backup operations. The new functionality is based on the fact that a storage area network (SAN) can provide a high bandwidth between any two devices and also, depending on the topology, can offer multiple simultaneous bandwidth capability between multiple pairs of devices with very low latencies. In contrast, using Fibre Channel loop topology with many devices ”that is, more than approximately 30 ”cannot offer multiple simultaneous high-bandwidth connections with low latencies, because the total bandwidth of the loop must be shared among all attached devices.
Figure 5.5 shows a typical SAN-based backup application. Note the FC bridge device in the figure. Most tape devices are still non-FC based (using parallel SCSI), so a bridge device is typically used. In this figure, the Windows NT servers have a presence on both the LAN as well as the SAN.
The backup topology in Figure 5.5 has the following advantages:
The tape device can be located farther from the server being backed up. Tape devices are typically SCSI devices, although FC tape devices are now more readily available. This means that they can be attached to only a single SCSI bus and are not shared easily among servers. The FC SAN, with its connectivity capability, neatly
One solution is to use zoning, allowing one server at a time to access the tape device. The problem with this solution is that zoning depends on good citizen behavior; that is, it cannot ensure compliance. Another problem with zoning is that it will not ensure proper utilization of a tape changer or multitape device.
Another solution is to use the SCSI Reserve and Release commands.
Yet another solution is to have the tape device connected to a server, allowing for sharing of the tape pool by having special software on this server. Sharing of a tape pool is highly attractive because tape devices are fairly costly. IBM's Tivoli is one example of a vendor that provides solutions allowing the sharing of tape resources.
The backup is now what is often referred to as a LAN-free backup because the backup data transfer load is placed on the SAN, lightening the load on the LAN. Thus, applications do not get bogged down with network bandwidth problems while a backup is happening.
LAN-free backup provides more efficient use of resources by allowing tape
LAN-free backup and restore are more resilient to errors because backups can now be done to multiple devices if one device has problems. By the same token,
Finally, the backup and restore operations typically complete a lot more quickly, simply because of the SAN's higher network speed.
Server-free backup is also sometimes referred to as
In server-free backup, the backup server can
While appreciating the advantages of server-free backup, one should not forget that server-free restore is a very different issue. Server-free restore operations are still relatively rare; that is, backups made using server-free backup technology are very often restored via traditional restore technology that involves the use of a backup software server.
Server-free backup is illustrated in Figure 5.6. In the interest of simplicity, the figure shows the minimum number of elements needed to discuss server-free backup. In practice, however, SANs are much more complex. The figure shows a Windows server connected to an FC switch via an FC HBA. An FC-to-SCSI router is also present, to which are connected a SCSI tape subsystem and a disk device. The disk and tape devices need not be connected to the same router.
A backup server application on the Windows server discovers the data mover agent on the router, through Plug and Play. The backup application determines the details of the backup needs to be accomplished (disk device identifier, starting logical block, amount of data to be backed up, and so on). The backup server software first issues a series of commands to the tape device to reserve the tape device and ensure that the correct media is mounted and properly positioned. When that is done, the backup server software issues an Extended Copy command to the data mover, resident in the router, which then coordinates the movement of the required data. When the operation has been accomplished, the data mover agent
Several different entities play a role in server-free backup architecture, including the data source, data destination, data mover agent, and backup server.
The data source is the device containing the data that needs to be backed up. Typically a whole volume or disk partition needs to be backed up. The data source needs to be directly addressable by the data mover agent (described shortly). This means that storage devices connected directly to a server (or cases in which the server and the storage device have exclusive visibility) cannot be data sources for server-free backup because they cannot be addressed directly from outside the server.
The data destination is typically a tape device where the data is to be written. The device may also be a disk if one is backing up to disk instead of tape. Tape devices are typically connected to a fabric port to avoid disruption of the tape data traffic upon error conditions in other parts of the SAN. For example, if the tape were connected to an FC arbitrated loop, an error in another device or, for that matter, the occurrence of a device joining or leaving the loop, would cause loop reinitialization, resulting in disruption to the tape data traffic.
data mover agent
typically is implemented in the firmware of a storage router because the data mover agent must be able to act on the SCSI Extended Copy command, which is sent to the router in an FC packet. Switches and hubs that examine only the FC frame header are not readily suited to house data mover
The data mover agent is passive until it receives instructions from a backup server. Most tapes connected to SANs are SCSI devices, so a storage router (that converts between FC and SCSI) is typically required and provides a good location for housing the data mover agent. Fibre Channel tapes are now appearing on the scene, and some vendors, such as Exabyte, are including data mover agent firmware in the FC tape device itself. In addition, native FC tape libraries are usually built with embedded FC-to-SCSI routers, installed in the library, providing the ability for the library to have a data mover built in. Note that the data mover agent can also be implemented as software in a low-end workstation or even a server. Crossroads, Pathlight (now ADIC), and Chaparral are some examples of vendors that have shipped storage routers with data mover agents embedded in the firmware. A SAN can have multiple data mover agents from different vendors, and they can all coexist.
Of course, to be usable, a data mover agent needs to be locatable (via the SCSI Report LUNs command) and addressable (the WWN is used for addressing) from the backup server software. The data mover agent can also make two simultaneous backups ”for example, one to a
The backup server is responsible for all command and control operations. At the risk of being repetitious, it is worthwhile noting all the
Computer Associates, CommVault, LEGATO, and VERITAS are some examples of vendors that ship a server-free backup software solution. Storage router vendors that ship server-free functionality routinely work with backup independent software vendors (ISVs) to coordinate support because many of the
Note that although server-free backup has been around for a while, there is very little support for server-free restore.
A lot of the trade press and vendor marketing literature claims that a particular server-free backup solution is Windows 2000 compatible. It is worthwhile examining this claim in more detail to understand what it means. The following discussion examines each of the four
In most cases a data mover agent outside a Windows NT server will not be able to directly address data sources internal to the Windows NT server. The HBAs attached to servers usually work only as initiators, so they will not respond to the Report LUNs command. If the Windows NT server is using a storage device outside the server ”say, a RAID array connected to an FC switch ”it will be visible to the data mover agent. So rather than saying that storage used by a Windows NT server cannot constitute the data source for a server-free backup, one needs to state that storage internal to a Windows NT server cannot constitute the data source.
Having the data destination internal to the Windows server is also not possible, because the data destination also needs to be directly addressable from outside the Windows box (by the data mover agent).
Having the backup software run on the Windows server is
The Windows NT SCSI pass-through (IOCTL) interface is capable of conveying the Extended Copy command to the data mover agent (from the Windows NT backup server). Windows NT does not have native support for data movers; Plug and Play can discover them, but drivers are required to log the data mover into the registry.
Note that in Windows NT, an application uses the SCSI pass-through interface (DeviceIoControl with an IoControlCode of IOCTL_SCSI_PASS_THROUGH or IOCTL_SCSI_PASS_THROUGH_DIRECT) to issue SCSI commands.