< Day Day Up > |
Storage systems must be designed with backup and recovery in mind. All too often, they are afterthoughts to primary storage systems. This has led to a number of problems, including:
Good system design can alleviate these problems. In the same way that products are designed for manufacture, storage and server systems must be designed for backup and restore from the start.
Recovery Time Objective and Recovery Point ObjectiveWhen designing backup systems, two important metrics must be considered. They are Recovery Time Objective (RTO) and Recovery Point Objective (RPO). RTO is the amount of time that a system must be up and running after a disaster. In enterprise data centers, the RTO is a matter of a few hours to mere minutes. RPO is the point in time to which data must be restored. An RPO may require that data be restored up until one hour before the failure, for example. In other situations, it may be acceptable to restore data to the close of business the previous day. Different systems and types of data often have a dissimilar RTO and RPO. The recovery metrics set for a departmental file server are often different from those set for an enterprisewide database. Internal DAS BackupThe simplest backup method is to mount a drive internally within an individual server or other computer. A drive is embedded in the computer, and software writes a copy of the primary storage drive's data to the media on a regular basis. This drive can be almost any media, including tape, CD-ROM/RW, another hard drive, or a flash memory device. Tape is the usual media for a server, whereas desktop PCs often use other types of media, such as CD-RW. The advantage of internal DAS backup is that it is easy to build, configure, and manage. Also, it doesn't create network congestion problems, because no data is being sent over the LAN. Internal backups are self contained and have less chance of being interrupted by forces outside the server. On the other hand, internal DAS backup subjects a computer to a heavy strain on its memory, I/O bus, system bus, and CPU. It is not unusual for the load on the server to get so high that it can't function at all during the backup. Fine-tuning the backup software and setting a proper backup window are critical and worth the time. Managing lots of separate media is also a chore and can lead to costly mistakes. It is an especially difficult task, especially when servers are spread out over some area. There is a tendency to leave media in the drives and just let the backup happen automatically. This causes each previous day's data to be overwritten. Not changing media ensures that data can be restored only from the most recent backup. If it takes several days to discover that a critical file was accidentally deleted or corrupted, there will no longer be a credible backup of the file. Unless there is another version on a different computer, the data is lost forever. This defeats the purpose of doing backups. A similar problem arises when the amount of data to be backed up exceeds the capacity of a single target media. The backup will fail if there is no one around to change the media. It is important to have spare media available and equally important to have someone to load them into the drive. When it is impractical to run around and change media, look to network backups or DAS systems that can change media automatically.
External DAS BackupWhen the amount of data becomes large or the backup resource load becomes too disruptive, it is important to look to external DAS solutions. With this type of architecture, the backup unit is mounted in its own chassis, external to the computer being backed up, and attached by a cable. The internal DAS model is extended past the computer chassis and into a separate unit. The most common technology used for this is parallel SCSI. A SCSI Host Bus Adapter is mounted in the server and a cable attached. The far end of the cable is attached to the backup unit. The backup software sees the device as though it were mounted in the computer and works the same way as the internal DAS system (Figure 3-1). Figure 3-1. External Direct Attach Storage tape unitThere are several advantages to this type of architecture. To begin with, the backup unit has its own processor and memory. This immediately takes a portion of the load off the computer's resources, because some I/O is offloaded to the backup unit. By having a larger chassis, external units can accommodate multiple sets of media. Media is loaded automatically by robotics or mechanical loading trays. This takes away the problem of having to change media or making mistakes while changing media. External backup devices also allow for multiple drives and SCSI connections to be used for backup, enhancing performance and availability. Media management software is often included with these units, which helps manage unattended backups. Features such as bar code readers keep track of which media belongs to which backup set, and sensors can tell when media is wearing out.
LAN-Based BackupAt some point, managing individual backup units for each server, whether internal or external, becomes difficult and costly. It becomes more effective to centralize backup in one or more large backup units that can be used for many servers. When bandwidth to the backup unit is not an issue, but management and cost are, LAN-based backup should be considered. LAN-based backup uses a server with a backup unit attached to it, usually by some form of Parallel SCSI, to perform and manage backups. The backup server is in turn connected to the network via a standard Ethernet connection with TCP/IP support. This server, unlike an external DAS solution, is dedicated to backup and is used to back up the primary storage of other servers on the LAN (Figure 3-2). Figure 3-2. LAN-enabled backupA common variant of this model is to have a separate server control the backup process. The server to which the backup unit is attached only manages the I/O to the backup drives. This has the advantage of allowing one server to manage several backup units, centralizing management of the backup process. The third variant uses a backup unit that is network enabled. Sometimes called NAS-enabled backup, it has an operating system and NIC embedded in it. The system administrator can then attach the backup unit directly to the network without the need for a server to host it. For small installations, this greatly simplifies installation and management, but at the expense of flexibility (Figure 3-2). Note NAS backup and NAS-enabled backup sound very similar. NAS backup is the practice of backing up NAS devices. NAS-enabled backup is LAN-enabled backup, which uses an embedded operating system to eliminate the server to which the backup unit normally would have been attached. It is analogous to NAS disk arrays for primary storage. In all cases, the backup software on the server reads and writes the blocks of data from the primary storage and sends the data across the LAN to the backup server. The backup server in turn manages the I/O to the backup devices. Often, several backups can be run simultaneously. The advantages of LAN-based backup are scalability and management. Rather than deal with many different backup drives in many hosts, backup is consolidated into a single unit capable of serving a collection of hosts. Adding a new server does not necessarily mean adding a new tape drive. Instead, excess capacity on existing units is utilized, saving money and time. Backup windows can also be scheduled centrally, with all the backups taken into account. Tape management is much easier when all the tapes are stored in a central location, rather then spread out over a building or data center. LAN-based backup also allows backup devices to be located in a different part of a building or data center from the servers they are backing up, providing flexibility in physical plant design. The problem that most system administrators run into with LAN-based backup is the strain it places on network resources. Backups are bandwidth intensive and can quickly become a source of network congestion. This leaves the system administrator with the unattractive choice of either impacting overall network performance (and having angry end-users) or worrying that backups won't complete because of network-related problems. Worse, both can occur at the same time. At some point, the only way to alleviate the network problems is to create a separate network and have a very low server-to-backup-unit ratio. Either method takes away much of the advantage of deploying the LAN-enabled backup solution. Separate networks mean many additional and redundant network components, reducing cost savings and increasing management resource needs. Having few servers backed up over the network reduces the scalability of the architecture until it eventually approaches that of the external DAS solution. Some backup systems need to read and write data through the file system if they are to deliver it over the network. The backup server needs to mount the remote file system, open and read the files, and then write them as block data to tapes or other media. As is often the case with productivity applications, the files are small but numerous. Backing up a file server in this way means opening and closing many small files. This is very inefficient and slow. Software that can prevent this situation by placing agents on the servers may limit platform choices but allow the software to sidestep the file system. Another important problem is the issue of the backup window. Unless the backup design calls for a separate network, there will be some impact on the network when backups are being performed. The backup window has to be strictly adhered to, or end-users will feel its effects, even those who do not use the system being backed up. With many companies operating around the clock, the backup window may be very small or nonexistent when network effects are taken into account.
SAN BackupThe single most prevalent reason for the first deployments of Fibre Channel Storage Area Networks has been to alleviate the backup woes that large enterprises experience. By moving to a separate, high-capacity network, tailor-made to storage applications, many issues with backup windows, network congestion, and management can be addressed. As is always the case, things have not worked out that way. There are many special challenges to performing backups across a Fibre Channel SAN, including the performance of systems on the SAN when backups are running. Still, the basic value proposition remains. FC SANs are an important step in having more reliable and less intrusive backups. With SAN Backup, the backup unit is connected via a Storage Area Network, usually Fibre Channel, to the servers and storage that will be backed up. Much of the benefit derived from SAN backup comes from the fact that it is performed on a separate high-speed network, usually Fibre Channel. It is arguable that switching backup to a separate Ethernet network without changing anything else provides much the same advantage as a SAN. In many cases, that is true. The problems with network congestion are fixed, and there is enough bandwidth to back up more data. SANs, however, have the advantage of being able to perform block I/O over a network. Unlike other network backup schemes, in SANs, blocks of data can be read from a disk and delivered directly to the backup unit, which writes the blocks as they are. There is no need for intermediate protocols or encapsulation of data. This makes even IP-based SANs, such as iSCSI, more efficient for backup. Fibre Channel SANs provide the additional benefit of having a very efficient network stack, which again boosts performance. The SAN also provides connectivity with performance. Many storage devices can be backed up at the same time without impact on the primary LAN or servers. The high bandwidth of Fibre Channel and Gigabit Ethernet relative to tape drives, the most common backup media, allows data from several storage units to stream to multiple drives in the same library at the same time. It is important to remember that all networks enable the distribution of functions from a single entity to several entities. LANs allow us to break processing and memory resources into many computers, some of which have special purposes. This way, expensive resources can be shared for cost-control purposes, scalability, and increased performance. SANs do the same by distributing storage resources. This is a big part of the attraction of SAN-based backups. By distributing the resources needed to back up data across a network, greater performance, scalability, and cost savings are realized over the long term. There are two common architectures for a SAN-based backup system. The first, and most common, is LAN-free backup; the other is called server-less backup or server-free backup. Both have certain advantages and disadvantages, though they share the overall advantages of SAN backup. LAN-Free BackupAs the name suggests, LAN-free backup occurs off the LAN and on the SAN. The storage, servers, and backup units are connected to a network, across which block I/O can occur. This is a basic SAN backup system. All I/O goes across the storage network, and none travels through the LAN. Management and control information may still rely on the presence of a IP network, making LAN-free not completely free of the LAN (Figure 3-3). Figure 3-3. LAN-free backup
If the primary storage is also on the SAN, the backup server only interacts with servers to get system information. The backup server uses system information for purposes of maintaining catalogs and ensuring proper locks on objects. Actual data is copied by the backup server directly from the storage devices to the backup drives. The data path does not go through the application server. The immediate effect is to reduce the backup load on the application server. It no longer needs to read and write data during backups. This allows backups to be performed while the server continues to operate with reasonable efficiency. LAN-free backup is a good method of relieving stress on servers and the primary LAN. As is the case with all I/O-intensive applications, performance of the SAN is impacted but does not affect the end-user much. LAN congestion and slow servers are more obvious to the end-user. They impact on their ability to get their daily tasks done. The server I/O can be slower than peak performance without the majority of end-users feeling inconvenienced. Let network response time become too slow, and the help desk will be flooded with angry calls. Customers will abandon the e-commerce application the company spent millions to build and roll out. LAN-free backup provides welcome relief to networks and servers overstressed at backup time. There is another form of LAN-free backup, in which the backup software resides on the individual servers rather than on a dedicated backup server. This is easier to deploy and less expensive then a dedicated backup server. The downside is that the server is still in the data path; also, resources are still taxed during backups. It is also not a particularly scalable architecture. As the system grows, each server will need additional backup software, which will have to be managed separately. If the drag on servers during backups is not all that onerous, and there aren't enough servers to warrant consolidation of the backup function, the argument for performing backup across a SAN is weak. Server-less BackupThe ultimate backup architecture is the server-less backup. There is no server in the data path at all. The data is moved from the primary storage to backup by an appliance or through software embedded in one of storage devices. Software on a server tells the storage devices what to transfer to the backup unit, monitors the process, and tracks what was moved. Otherwise, it stays out of the way of the data. The data is moved from primary storage to the backup unit directly (Figure 3-4). Figure 3-4. Server-less backupThis is superior to the LAN-free backup because data moves only once and in one direction. Data does not have to be copied to a backup server before being written to the backup unit. An important performance bottleneck is eliminated, and more backup servers are not required to scale the system. Data travels through the SAN only once, reducing the amount of network bandwidth consumed From a system perspective, two things are needed for this design to work. First, there needs to be an appliance capable of handling the data transfer, called a data mover. The data mover has to be either embedded in one of the storage devices or in an appliance that sits in front of one of them. There also needs to be a protocol that would tell the data mover which blocks of data to move from primary to backup storage while monitoring the results. This is provided by a set of extensions to the SCSI protocol called Extended Copy. By issuing commands to the data mover, the backup software causes the backup to commence and is able to monitor results without being involved in moving the data itself. Terminology Alert Never have so many names been given to a protocol. Extended Copy is also commonly known as Third Party Copy, X-Copy, and even E-Copy. Many vendors refer to this SCSI extension as Third Party Copy, because it enables a third party (the data mover) to perform the backup. Technically speaking, Third Party Copy should refer to the architecture and not the protocol. Server-less backup has never been fully realized. Products claiming this capability have been finicky, exhibiting serious integration and compatibility issues. The performance gains did not outweigh costs for many IT managers. Network and server bottlenecks also turned out not to be as serious an issue as the throughput of the backup media. Disk-based backup is producing a greater impact on backup systems than server-less backup. Backing Up NASThere are three models for backing up NAS devices. The first mimics the DAS backup mechanisms of servers. A backup device is embedded within or attached to a NAS disk array system, which has specialized software to perform the backup. It provides a very fast and convenient method of backup and restore. The software can perform dedicated block I/O to the backup unit from the NAS disks. The software also already understands the NAS file system and does not need to use a network protocol to transfer the data on the disks. This is important, because many high-performance NAS arrays have file systems that are proprietary or that are optimized versions of common file systems such as NTFS. Backup software companies typically design custom versions of their software for these environments, or the NAS vendor will produce its own. Although this is a solution optimized for performance, the backup unit embedded in or attached to the NAS array is dedicated to it. It cannot be shared, even if it is underutilized. Use of a shared backup unit for failover is not feasible. To have a robust system requires duplicate backup units, which are rarely fully utilized. The second common architecture for NAS backup is via a LAN-enabled backup. As far as the LAN-enabled backup server is concerned, the NAS array is a file server or similar device using the CIFS or NFS protocol. The backup software queries the file system and reads each file to be copied. These files are then backed up through the backup server as usual. This approach has all the advantages and disadvantages of network backup. It is flexible, robust, and makes good use of resources. It is also slow if the NAS array has many small files, which is often the case. Opening and closing each file produces overhead that can bog down data transfer and negatively affect network and NAS system performance. NAS arrays can also be backed up over a SAN. Most backup software vendors have SAN agents specific to common NAS. SAN backup of NAS devices eliminates the network overhead associated with copying files over a LAN. What detracts from this design is cost. A SAN needs to be in place or built. Additional agent licenses cost more money as more NAS devices are added to a system. NAS and SANs in BackupMany NAS arrays use SANs for the back-end storage system. In most cases, this SAN is self-contained within the unit or rack in which the NAS system is mounted. There is a definite advantage in terms of scalability to this architecture. A backup system may be part of this back-end SAN. If this is the case, it is no different from a DAS backup unit contained within the NAS system. To the outside world, the backup devices are embedded. The advantage is that as the NAS device scales, there is opportunity for scaling the backup system as well. NAS-SAN hybrid systems are another story. With access to the data on the disk at file and block level, different configurations are possible. A NAS device that can be attached to a SAN may utilize the resources of the SAN for backup. This marries the ease of use and file I/O performance of a NAS while offering excellent backup options. NAS Backup Using NDMPTo back up files, a tape backup program has to access the data on the NAS array. The open protocols available (such as NFS and CIFS) allow backup software to see only the files, not the underlying blocks. The backup then has to be performed using file I/O. Each file has to be opened, the contents read and streamed to a backup device over the network or via an internal SCSI bus, and then closed. This is not a big problem if you want to back up only very large files or a small number of files. If you want to back up many small files, files must be opened and closed constantly, and the overhead associated with that will make the system very slow. The network model is often preferred for backing up NAS devices, yet going through the file system creates problems. Many NAS vendors have implemented their own protocols for streaming data to backup software. The downside of this approach is that the protocols are proprietary, requiring backup software vendors to support many different protocols. In response, vendors involved in NAS backup developed a standard protocol to manage NAS backup while using standard backup software running on a network. Called the Network Data Management Protocol (NDMP), this protocol defines a bidirectional communication based on XDR (Extended Data Records) and a client-server architecture optimized for performing backup and restore. NDMP allows for a standard way of backing up NAS arrays over a network while removing much of the complexity. NDMP requires that the backup software support the protocol and that the NAS array have an NDMP agent running on it. The agent is the server, and backup software is considered to be the host.
Backup and Restore SoftwareSoftware is the primary ingredient in backup and restore systems. Almost any type of backup unit can be used successfully, but without backup software, it is as inert as a doorstop, except less useful. All backup software must perform three important functions. First, it must copy the data to the backup media. Without the copy, there are no backups and no protection for the data. Second, it must catalog the data objects so that they can be found later. To restore data, the software has to find it and copy it back to the primary storage media. The catalog provides the mechanism for identifying what is on the backup media, what the characteristics of the original primary storage (including name, location, and assorted state conditions) were, and where on the backup media it is located. Catalogs provide the backup software the ability to restore data objects to the state they were in at backup time. Finally, it must be able to restore data to the exact state it was in when it was backed up. That may mean that an entire disk is re-created on a new disk or a single file is restored to what it was last Tuesday. Other common features that must exist in any enterprise backup and restore software are
|
< Day Day Up > |