Section 8.16. Protection of the Backup Index | Backup & Recovery: Inexpensive Backup Solutions for Open Systems

8.16. Protection of the Backup Index

The backup database, catalog, or index keeps track of which files were backed up to which volume. Since the backup system can't restore anything without this index, it becomes the single most important database in your environment. It is also the single point of failure in any backup system. As mentioned earlier, even if a volume is made with a format that is readable by a native utility, you still need the index to know what's on it. The backup index is the greatest invention since someone created volume labels, but if it goes bad, you are out of luck.

I Followed the Procedure

I had an experience at a gaming company that ran about 10,000 slot machines on a complete Windows shop and backed it up with a commercial backup software product. Because of a known, but unfixable, firmware issue with the library Fibre Channel cards, the library would lose a drive every now and then and corrupt the robot database. Somewhere early in the project, someone had written a document that detailed a fix for this error that had to do with uninstalling and reinstalling the backup software. Unfortunately, they left out a step: saving a copy of the catalog. This procedure was followed twice before it was caught within the first year of this datacenter's opening. Luckily, needed restores were run from a disk-based backup taken outside of our commercial backup software.

Brian Sakovitch

Backup indexes usually are located on the central backup server, but they can be spread out to what is sometimes called media or device servers. (A media server or device server is a server that is allowed to have backup devices.) One of the first questions you might ask is, "How big will this thing get?" The typical answer is .5 percent to 1 percent of the amount of data that is being backed up. That answer is very misleading, completely wrong, and totally irrelevant. The total size of the data that is being backed up has absolutely nothing to do with the database size. Let me state that again.

The total size of the data that is being backed up has absolutely nothing to do with the size of the backup database. It is the number of files being backed up, not their size, that determines the size of the database.

Each file that is backed up becomes a record in the index. That record will be the same size, regardless of how big the file is that was backed up.^[] The appropriate question, therefore, is, "How many bytes does each new file add to the index (a) the first time it is backed up and (b) any additional times it is backed up during incremental backups?" This number can then be multiplied by the number of files that are being backed up. This will show how big the index can get from one full backup. Multiply that number by the number of full backups the system is required to keep online. Using an estimate of a 2 to 5 percent daily volatility rate,^] estimate how big the index will grow from each incremental backup. Multiply that number by the number of incremental backups that the system is required to keep online. Add that to the first number, then multiply by 2. The result will be a pretty realistic, albeit slightly exaggerated, estimate of how big the backup index will get.

] Some products do use a variable-length record so that things like the length of the pathname can slightly affect the size of the record, but the size of the file still has no bearing.

] This is actually a huge volatility rate, but most environments dont have any data on the number of files that change each day. Even if they've been monitoring their backup software, most reports talk only about how much data was backed up, not how many files were backed up.

Managing the growth of the index is also a big issue. Whatever database format they use, one of the index files may grow larger than what the filesystem permits. If that happens, the index may get immediately corrupted. The backup product should have some method of dealing with this problem. Also, the entire index may get larger than the largest filesystem allowed, so it should be able to spread that data out across multiple filesystems.

Just as the volumes should be platform-independent, so should the backup index. You should be able to restore it to any system in which the server software runs and continue working. In order for this to work, the index needs to be completely platform-independent. Some products are, and some aren't. Some of them are not platform-independent, but they do provide a utility to move the index to other platforms. One of the best tests of this is to attempt to recover a Unix server's backup index to a Windows server.

Before committing to buying a commercial backup product, test its index restore procedure. Some products can restore the index in a single step, while others require 20 pages of steps. Once you actually purchase a product, test that procedure again. Then test it on a regular basis so that you never hear yourself saying, "My whole world just crashed, including my backup server. Now what am I supposed to do again?"

A couple of minor (but nice) features also are helpful. The first is the ability to change a client's name within the index. If a backup client's hostname changes, and the backup product does not support this feature, there are only two choices. The first choice is to give up all backup history for that host. The second is to pay for another license, because the software will recognize the new hostname as a new client.

Another very nice feature, seen as essential by some, is the ability to reread a volume back into the index. Suppose there is a volume that has been set aside and is now expired out of the index completely. What if the only backup of a file that you need is on that volume? What if you don't know whether it's on that volume? Some products can perform the restore without having to reread the entire volume, while others can read the volume right back into the index, making it appear as if it were just backed up. Some products are not able to reread that volume at all! One factor that goes into the product's ability to read a volume like that is whether the vendor puts a copy of the index information onto the volume. It's an extra step, but I think it's well worth it. Basically, after every backup, the new portion of the backup index that was created from that backup is placed on the volume. That makes rereading the volume from scratch much easier. It is possible to reread a backup volume without placing the index on the volume, but having the index there makes it easier.

The importance of the backup index, and your familiarity with how it works, cannot be overemphasized. It is the lifeblood of any backup system and should be treated like gold.

Index Horror Stories

Most of my backup horror stories stem from problems with the backup index. My first bad experience was with backing up a large NFS server that was used to store home pages for a large online service's web servers. There were more than three million small files, which made the index so large that it would often become corrupted. Even after distributing the index over multiple slave servers, the size would still cause index corruption. As if regular index corruption weren't enough, we often would not catch the corruption until several days later when the backup product would act stranger than usual. Since we were foolish enough to keep only two days' worth of index backups, we could not recover a reliable index. Eventually, we ended up dumping the index into ASCII files daily and then backing up those files from a different server with the regular retention schedules.

My other index horror story comes from the same site. In another effort to keep our index small, we stored the index of a backup for only two weeks (even though the data was kept on a backup volume for two months). I had one user who on multiple occasions deleted data after he was done with it, only to determine two and a half weeks later that he really wasn't done with it. Since the records containing those files had expired out of the index, every volume that might have the data had to be reread. One of those restores had me rereading more than 40 DLT 4000 tapes (in a jukebox that held only 28 tapes) while still trying to do regular backups. It took me more than three long days to read the tapes; even then I was not able to retrieve all of the data. Fortunately for my job, it was not mission-critical data.

Bryce Wade