Section 8.5. Aggressive Requirements


8.5. Aggressive Requirements

As mentioned in the "Switching Backup Products" sidebar, there's only one reason that you should consider changing your backup product: you have requirements that your current backup product cannot meet. These requirements include your recovery time objective, recovery point objective, consistency, and backup window groups.

Your recovery time objective, or RTO, is how quickly you want the system to be recovered. RTOs can range from zero seconds to many days, or even weeks. Each piece of information serves a business function, so the question is how long you can live without that function. If the answer is that you can't live without it for one second, you have an RTO of zero seconds. If the answer is that you can live without it for two weeks, you have an RTO of two weeks.

The recovery point objective, or RPO, is determined by how much data you can afford to lose. If you can lose three days worth of a given set of data, that set of data has an RPO of three days. If it's real-time customer orders, however, you may decide you can't afford to lose any of them; you have an RTO of zero for that application.

There can also be an RPO for a group of machines. If you have several systems that are related to each other, you may need to recover them to the same point in time. They are referred to as a consistency group. To meet such a requirement, you have to back up all related systems at exactly the same time, or you have to give each system a very small RPO. Having an RPO for a group of machines basically makes the RPO for each machine in that group the same as the lowest RPO of any machine in that group.

Once you've determined an RTO and RPO for each system and disaster type, you need to agree on when you can back up a system, how long you can take to back it up, and how much you are allowed to impact the production system while it is being backed up. These values are collectively and generally referred to as the backup windo w.

Once you've determined your requirements, you may find that they are too aggressive. We'll consider a requirement aggressive if a traditional LAN-based recovery method can't meet it. This typically means that the bandwidth is too small, the amount of data to move is too large, or the amount of time you're given isn't reasonable. This tends to happen in one of three scenarios:


Remote office backup

Remote offices have long been the elephant in the room at many data protection planning sessions. They're typically handled with remote tape drives or tape libraries that aren't being managed by skilled personnel, or they're not being managed at all. I recently met with an oil-and-gas company whose remote offices included off-shore drilling rigs. Imagine the fun they have getting a vaulting company to stop by!

However, an increasing number of people want to fix this problem by backing up their remote office across the network. How do you back up a remote office with hundreds of gigabytes of data if it's on the other side of a WAN connection? A typical backup and recovery system would not be able to meet any reasonable RTO, RPO, or backup window requirements.


Very large applications

I've recently seen a 150 TB Oracle database. Try backing that thing up! While you may not have a 150 TB application, it's highly possible that you have an application that's too large to back up and recover within an acceptable time. As of this writing, that size seems to be individual servers that have gone beyond several terabytes. Anything a few terabytes of smaller can be backed up and recovered within a few hours, which is within range of most RTOs and RPOs. However, what do you do if you've got a 10 TB application and a one-hour RTO?


Very critical applications

Some applications are so critical to the business that their owners simply won't accept any downtime or loss of data. How do you meet a one-minute RTO or a zero-second RPO, regardless of how large your application is? Applications that fall into this category are the hardest applications to design for and require very advanced data protection systems.

The following technologies can help you meet aggressive requirements, and they are listed in order of their ability to meet such requirements. The farther down the list they appear, the more aggressive requirements they can meet.

Test Your Restores

At a large insurance company, a decision had been made to back up our NetApp Filer by backing up the shares rather than purchasing an NDMP key and dedicated tape drive. We used the utility from the Windows resource pack to mount the share at boot time. For an unknown reason, the share did not get mounted, and because we specified soft in the mount configuration file, there was no notice or error message given.

Since the share was never mounted, there was no notice given during backups. However, if the owner of the share logged on to the system, the share would be mounted at that time and dismounted when he signed off. This went on for several months before the owner of the share corrupted the data and asked for a restore from the previous night. It was then that we found out there were no backups for three months and that the share he was using had expired.

Harry Tirrell


8.5.1. LAN-Free Backup

The first advanced backup and recovery technique is to send the backup data across a SAN instead of the LANthus earning the term LAN-free. Since LAN-free backups are faster than LAN-based backups, they can be used to help meet more aggressive RTOs and RPOs than you could meet with LAN-free backups. If you've got a large server that's having trouble meeting its RTO or RPO, you may consider making it a LAN-free backup client.

LAN-free backups require the backup device to be connected via a storage area network, or SAN, running either Fibre Channel or iSCSI. A Fibre Channel SAN would use Fibre Channel HBAs and a Fibre Channel switch. An iSCSI SAN would use regular HBAs with iSCSI drivers (or perhaps iSCSI HBAs), and can route their traffic across any IP networkalthough it's a good idea to separate this traffic from regular IP traffic.

Figure 8-1. A basic storage area network (SAN)


As illustrated in Figure 8-1, backup servers connected to a SAN have virtual physical access to all peripherals in the SAN. Once the peripherals are attached and the SAN is configured, all SCSI/Fibre Channel/iSCSI traffic is routed through the SAN, and each server "thinks" that the library is locally attached to it. This allows the server to take advantage of the recent advancements in backup technology that allow backups to be sent across the SAN. They are much faster (and much easier on the CPU) than LAN-based backups.

When people first see a SAN drawing, they don't see much difference between it and a LAN, and the line between the two gets more blurred every day. Historically, a SAN used Fibre Channel, and a LAN used Ethernet and IP. Now with iSCSI, a SAN can also use IP as a transport mechanism. Just remember that a SAN is talking SCSI. That SCSI may be running on top of Fibre Channel or IPbut it's still SCSI.


Commercial backup applications can dynamically configure which servers have access to each peripheral. For example, when it is time for a particular server's full backup, the backup application can configure the router in such a way that it has access to every available backup drive. Of course, it can do this for a critical restore as well.

8.5.2. Server-Free (or Serverless) Backup

LAN-free backups do make backups go much faster, and they do require less CPU than LAN-based backups, but they also still require some CPU. If you have a very large server, even that reduced CPU load may be more than the server can handle. It may impact the application too much, or it may require more CPU than you have available. What if you could back up the application without actually sending the data through the server that is using the database? If you're willing to accept a very different type of backup design, this is now possible. Consider the drawing in Figure 8-2.

Figure 8-2. Serverless backup


At the top of Figure 8-2, there are two database servers that are connected to a large, multihost-attachable disk array. The backup server also is attached to this array. The databases actually are sitting on top of mirrored partitions inside the large disk array. To accomplish a serverless backup, the backup server tells the database server that it needs to do a backup of the database. The database server splits off one of the mirrors or makes a virtual snapshot of its volumes, then tells the backup server that it can back it up. The backup server then backs up the data via a path that doesn't include the original database serverhence the term serverless.[] (Of course, the backup server is still involved. Its just the original server that no longer has to move the data.) The entire application can then be backed up without transferring the data through the client that is using it. The data may take one of two paths:

[] In another book of mine, It can back up the data via another dedicated server that can access the split-off mirror or snapshot.

  • It can back up the data via a SAN router than accepts the SCSI X-COPY command. This moves the data directly from the disk device to tape without going through any server.

  • If the data being backed up is a split mirror (instead of a virtual snapshot), it also offers another advantage that traditional backup methods cannot. This second mirror can be left disconnected until it is time to back it up again. At that point, it can be quickly resynced to the other side of the mirror. Leaving it disconnected like this gives you an instantly available backup of the entire database. If something were to happen to the production database, you could run a few commands and remount the database using the mirror. All you would need to do is to replay your transaction logs since the mirror was split off. Without this standby mirror, you would have to restore the database before you could replay your transaction logs. This technology could be used to meet very aggressive RTOs and RPOs.

    There are a few disadvantages to this method, starting with the fact that it is extremely complex. If everything goes right, you're fine. If something goes wrong, you've got logs on the backup server, media server, client, storage array, and SAN router. It can take a really long time to figure out why it's not working. The second disadvantage is that most serverless backup products don't offer serverless restores. Make sure to look into that when investigating this option. Finally, this is also a very expensive option.

    Unix Backup & Recovery spoke more highly of serverless backup. A lot of things have changed since then, starting with the three technologies that will be covered next. Many people now consider them the preferred methods for meeting very aggressive RTOs and RPOs.


    8.5.3. De-Duplication Backup Systems

    The developers of de-duplication backup systems asked themselves a few questions. If only a few bytes change in a file, why are we backing up the entire file? If the same file resides in two places on the same system, why do we back it up twice? Why don't we just store a reference to the second file? Some even asked why we're backing up the same file across multiple systems. Doesn't that waste server and network resources?

    The answers to all of these questions, of course, rest in the limitations of traditional backup systems. If we don't back up the file every time we see it, we're going to need to load a lot more tapes when we have to restore. In addition, if we only back up the changed bytes in a file, we might need multiple tapes just to restore a single file.

    However, if you back up any given file only once, and back up the changed bytes only when a file changes, it is actually possible to meet more demanding backup window requirements. The tape issues mentioned here are mitigated by backing up to disk. Tape-based copies of the disk-based backups can be created at any time, depending on the requirements of the customer. Some de-duplication products can also meet aggressive RTO requirements by restoring only the blocks that have changed since the file was last backed up. The RPO abilities of these products are based on how often you back up, but it is common to use such products to back up hourly, allowing you to meet a one-hour RPO. The same is true for your consistency group requirements.

    De-duplication backup systems use techniques similar to those used by disk targets with de-duplication features, which are discussed in Chapter 9. Where those systems de-duplicate at the target level, a complete de-duplication backup system eliminates the redundancy at the client level, reducing the amount of data that has to be sent from a remote office or laptop.


    The biggest advantage to de-duplication products is that, from the user adoption perspective, they're the closest to what users already know. Their interfaces are similar, and they often have database agents similar to those found in traditional backup software. They're simply able to back up faster and more often, and they use much less bandwidth.

    8.5.4. Snapshots

    Another alternate backup method is a snapshot. The most common type of snapshot is a virtual copy of a device or filesystem that relies on the original volume to actually present the data to you.[] This reliance on the original volume is why snapshots must be backed up to provide recovery from physical failures. Snapshot functionality may be found in a number of places, including advanced filesystems and volume managers, enterprise storage arrays, NAS filers, and backup software.

    ] Some vendors refer to split-mirrors as snapshots. I prefer to reserve the term snapshot for virtual copies. I would call the split mirror a business continuance volume, or BCV.

    Snapshots can help meet aggressive backup requirements. For example, some snapshots can meet an RTO of a few seconds by simply changing a pointer. You also can create several snapshots per day, allowing for an aggressive RPO. Since snapshots can be created in seconds, you can meet aggressive backup window requirements as well. You can create a stable, virtual backup of a multiterabyte database in seconds, reducing the impact on the application to potentially nothing. Then you've got hours to perform a backup of that snapshot. The next section discusses how replication is a great way to do that. Finally, creating synchronized snapshots on multiple systems is also relatively easy, so you can meet aggressive synchronicity requirements as well.

    One interesting development in the snapshot world is the development of APIs that allow other vendors to interface with snapshots. NDMP and Microsoft's VSS are examples. NDMP allows for backup vendors to schedule the creation of a snapshot, as well as catalog and restore from its contents. Restores are performed using the same interface that you would use for "normal" backups, but they are actually performed by the filer using snapshot technology. VSS allows storage vendors with snapshot capabilities to have the files in those snapshots listed in and restored from the Previous Versions tab in Windows Server 2003 and later. Hopefully, this functionality will be added to workstation versions of Windows, and more NAS vendors will support it as well.

    Another interesting development with snapshots is the creation of database agents that work with snapshots. The database agent communicates with the database so that it believes it's being backed up when all that's really happening is the creation of a snapshot. Recoveries are also sometimes integrated, allowing for incredibly fast recoveries that are controlled by the database application.

    8.5.5. Replication

    Replication is the practice of continually copying from a source system to a target system all files or blocks that have changed on the source system. Replication used to be what companies implemented after everything was completely backed up and redundant, which meant that very few people used replication. However, many people are now using replication as their first line of defense for providing both backup and disaster recovery.

    Replication by itself is not a good backup strategy; it copies everything, including bad things, such as viruses and file deletions. Therefore, a replication-based backup system must be able to provide history by either occasionally backing up the replicated destination or through the use of snapshots. It's usually preferable to make a snapshot on the source and replicate that snapshot to the destination. That way, you can prepare database applications for backup, take a snapshot, then have that snapshot replicated.

    8.5.6. Near-Continuous Data Protection Systems

    Replication, when coupled with snapshots, is called near-continuous data protection (or near-CDP). Since you can take snapshots hourly (or even more often in some systems), and the replication is occurring continuously, snapshots and replication are closer to CDP than traditional backup, hence the term near-CDP. Some near-CDP products replicate first, then take snapshots on the target system. Others take snapshots on the source system and replicate those snapshots to the target system.

    One advantage of near-CDP systems is that snapshots take just seconds to create, and replication is a very easy way to get the data to another device. You can also cascade replication to provide multiple copies, such as an on- and off-site copy, without touching a tape. If you then want to provide a tape copy of the replicated snapshot, you simply back up one of the destination devices.

    The biggest disadvantage when compared to true CDP products is that when you cause logical corruption on the source system, such as deletion or corruption of a file, that corruption immediately overwrites the current backup, and you have to recover the file as of the most recent snapshot. A true CDP product would be able to recover the file to just before you fat-fingered it. Another disadvantage to near-CDP systems is that they often require you to change your primary storage system to support them because they are usually storage-array-based.

    8.5.7. Continuous Data Protection Systems

    A true continuous data protection (CDP) system is fundamentally an asynchronous replication-based backup system that doesn't overwrite the target with the most recent data. The software is continuously running, and every time a file changes, the new bytes in that file are sent to the backup server within seconds or minutes. Unlike replication, however, a continuous data protection system stores the changes in a log instead of overwriting the target system with the most recent blocks; it is therefore able to roll back any changes at any time.

    Different CDP products transfer data to the backup server in different ways. Some transfer changed blocks immediately while others batch up changed blocks and send them every few minutes. They also differ in how they do recoveries. Some products do quick restores by restoring only the blocks that have changed since the point in time you are recovering from; others recover in a more traditional manner, recovering the entire file or filesystem that you asked to be recovered. Obviously, the first method allows you to meet much more aggressive RTOs and RPOs than the second method.

    Either It's Continuous or It's Not!

    Some vendors are referring to their near-CDP products as CDP. They're doing this to ride the market momentum that CDP has built. Some even defend that they're actually continuous because they are continually replicating. I heard one vendor say, "We're continuously copying all the data. We're just not keeping it all!" It reminds me of the Seinfeld episode on rental cars. "Oh, you're good at taking reservations... You're just not so good at holding reservations. And that's really the important part...the holding."

    Yes, they're replicating continuously. That means if you fat-finger a document, your mistake could be immediately replicated onto the backup. The only backup that you will have is the last snapshot. That is not continuous protection; it's replication with snapshotsalso referred to as near-CDP. A truly continuous product would restore your file right up to the point when you fat-fingered it.

    They want to differentiate themselves from the "old" way of doing backups and draw attention to the fact that they're making these snapshots throughout the day usually hourly. That's greatjust don't call it continuous. It doesn't matter if the snapshots are being done once an hour, once a minute, or even once a second. Each of those would be called a time period, and period is an antonym of continuous in the thesaurus I use.

    You either save every change, or you don't. If you're taking snapshots, you're not saving every change. It's as simple as that.

    This is not to say that I am not a fan of near-CDP products. Some of the coolest data protection things I've ever done have been with snapshots and replication. I also think that most people's requirements would easily be met by an hourly snapshot. I just don't want these products calling themselves CDP products.

    It's like a Fibre Channel array calling itself NAS because it is storage attached to a network. Come on, people!


    A CDP system has an unnoticeable backup window because it's copying only changed bytes as they change throughout the day. If they support the block-level recoveries discussed earlier, they also have incredibly fast RTOs. They likewise have infinitely granular RPOs because they can recover any file or filesystem to any point in time. This means that they can meet any type of synchronicity requirement, as they can recover 1, 10, or 100 systems to any synchronized point in time that you would like.

    Different CDP products also back up different things. Some are filesystem-based, enabling you to back up and recover any files within that filesystem. Others are database-centric, providing CDP functionality only to a particular database, such as Exchange or SQL-server.

    Unlike traditional backup products, file-based CDP products are not going to provide interfaces for your database applications, and believe they're unnecessary. Such vendors say that they copy blocks to the backup destination in the same order that they are changed on the client. They can therefore put the files back to literally any point in time that you want. Restarting your database after a CDP recovery causes it go into the same mode that it would go into if the server were to crash. It examines the datafiles, figures out what's inconsistent, rolls backward or forward any necessary transactions or blocks, and your database is up. (By the way, if this crash recovery process didn't work, your database vendor would be out of business. Servers crash, and they have to prepare for that.) If the CDP product puts the blocks back in the same exact order they were changed, the database should be able to recover from any point in time. They also say that if for some extreme reason the database can't perform crash recovery from the 12:03:57:01 p.m. image, recover to 12:03:57:00 or 12:03:56:59. Some products can even present a logical unit number (LUN) or volume to your database that it can mount and test before you do the recovery.

    Your database vendor may have a different opinion about CDP. They may feel that if you're not using their supported backup method, you shouldn't call them for support if something goes wrong. If you're considering a CDP product to back up your database, you should have that conversation with your database vendor and then make your own decision. In addition, your DBAs have to be sold on CDP as well. Some may think it's revolutionary; others will think it's scary. If you like the idea, keep pushing both your database vendor and your DBAs to consider it. Times change. It wasn't that long ago that Oracle didn't support NAS and snapshots; now it loves them.

    8.5.8. Remote Office Backup

    Figure 8-3 shows a remote office backing up to a central office using either de-duplication, near-CDP, or CDP software. If the clients on the left of the drawing are too large to meet their RTO if recovering from the central office, you can back up to a local recovery server that is used to facilitate nondisaster major recoveries. That server then replicates its backups to a centralized server.

    Figure 8-3. Backing up a remote office





    Backup & Recovery
    Backup & Recovery: Inexpensive Backup Solutions for Open Systems
    ISBN: 0596102461
    EAN: 2147483647
    Year: 2006
    Pages: 237

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net