Section 8.12. Support of a Standard or Custom Backup Format


8.12. Support of a Standard or Custom Backup Format

A custom backup format is one that is readable only by a particular commercial backup utility.[||] Backups made in a custom backup format cannot be read by native backup utilities such as tar, cpio, or dump. Some backup products use a custom format that is published or freely available. Although their backups can't be read with native utilities, a programmer theoretically could write a program that would read their backups. Some backup products are completely proprietary. In fact, some products are so proprietary that even their own product can't read a volume if the indexes for that volume have expired or been deleted. A standard backup format would be a format that is readable by a standard utility. There are two schools of thought on this subject.

[||] The word custom is more appropriate than proprietary because proprietary implies that you are not allowed to know the format.

There are those who feel that using proprietary or custom backup formats is dangerous. If the backup volumes can't be read by native utilities, what do you do when the commercial backup product is broken? They prefer to use utilities that back up using industry-standard backup formats such as cpio or tar. They provide a sense of security that is just not possible when using a custom backup format. Companies often switch backup products, and when that happens, their old volumes are not readable by the new product. If the volumes were readable by standard utilities, however, they still could be used for restores. What is commonly done in this scenario is to keep the old backup system running for restores only.

Vendors are on both sides of this debate. What follows is my best effort to explain the pros and cons of each side. You have to choose which pros and cons are most important to you.


Some say that old backup formats are just thatold. They served their purpose, but it is time to move on to more sophisticated utilities. There are two problems with native utilities. The first problem is that they are always changing. The dump command is filesystem-specific, and different versions of dump are incompatible. The tar and cpio commands also have changed their formats over time and are not always compatible between different operating systems. (ntbackup has remained the same over the years.) The second problem is that native utilities have significant limitations, such as pathname lengths and the inability to handle open files. The most significant limitation, though, is their inability to generate or receive multiple backup streams.

One of the open-source projects covered in this bookBaculachose to create its own custom format using this same logic.


8.12.1. Standard Backup Formats

On the other hand, longtime system administrators have learned how to use ditto, dump, ntbackup, tar, and cpio. There is a familiarity there that just isn't going to be possible with a unique format. Also, some people have been burned by commercial utilities that have come and gone. There have even been a few that have changed their own formats, making their old volumes unreadable by a new version of their own software! This means that concerns about custom and proprietary formats are valid. Since restricting yourself to native utilities significantly reduces the number of available products, make sure that you properly examine the limitations of these native utilities before doing so. The limitations of each of these utilities are discussed in the following sections.

8.12.1.1. The dump utility

The dump command is a Berkeley contribution and as such is not always included on some pure System V Release 4 systems. (It sure surprised me the first time that happened!) dump backs up a filesystem via the raw device, not through the filesystem. It therefore must know the structure of the filesystem that it is backing up, so each new type of filesystem requires a new version of dump. Also, a dump backup of one filesystem type will not necessarily be readable by the restore utility of another filesystem type. There have been a number of new filesystem types over the years. Each new filesystem usually comes with its own version of dump, and many of the newer versions are not reverse compatible with the older versions. (See the section "Different Versions of dump" in Chapter 3.) Some new filesystem types don't come with a traditional dump command at all.

A backup tool should not rely on a native utility that changes from filesystem to filesystem. The backup volumes are not compatible between platforms, and even within the same platform, such as (efs)dump and xfsdump on SGI. Also, dump is not always available.

8.12.1.2. The tar, ditto, and cpio utilities

Unlike dump, ditto, tar and cpio access files through the filesystem just as a user does. Since they are not filesystem-dependent, they change much less over time than dump does. This may be ditto's, tar's, and cpio's greatest advantage. However, there are different versions of ditto, tar, and cpio for each platform, and not all of them are compatible. It also should be noted that most of the commercial backup productsthat write in a tar- or cpio-compatible format do not use the actual tar or cpio command; they have their own command that writes in a format that is readable by tar or cpio. (This is the way ditto works.) That way, the commercial product can overcome some of tar's and cpio's limitations, such as cpio's 255-character limitation and tar's 100-character limit on pathnames. (GNU tar also has overcome some of these limitations; it is covered in Chapter 3.)

8.12.2. Custom Backup Formats

As stated earlier, the camps are divided into those backup products that use a standard format and those products that do not. Further, products that do not use a standard format should themselves be divided into two groupsthose that publish their format and those that do not. Theoretically, a programmer who knows the format of the volume could write a program to read it. Most products depend quite heavily on a database that tracks the location of each file or piece of file on the volume. If the database is corrupted or lost, they may not be able to read the volume at all.

Be sure to read the section "Content-awareness" in Chapter 9.


Should someone use a product that has a custom backup format? Before purchasing such a product, be sure to ask a few questions. Is the format of this volume completely proprietary, or is there a document explaining how it was written? Is there a standalone utility that allows me to read these volumes even if the catalog is down? If this product made a volume but then later did not know what was on it, could it reread the volume and determine the file sets that went to that volume?

Some backup programs that use custom formats come with a standalone utility that can read the volume without the use of the backup database, providing essentially the same functionality as a native command. This is a beautiful thing, but it's harder to come by than you might imagine.

8.12.2.1. What happened to SIDF?

Some readers may remember the System Independent Data Format (SIDF) that was first proposed back in 1993 as an international volume-interchange format. It was used on a limited basis by a small number of backup products. If a product followed this format completely, not only would it have completely platform-independent volumes, but its volumes would be readable by other backup software products. The format barely gained acceptance. Any questions on the status of SIDF are answered by going to http://www.sidf.org: "www.sidf.org not found. Please check the name and try again."

8.12.3. A Reality Check

Suppose you had a bunch of volumes that were written in tar format, and your backup software has been keeping track of them all. If that software is not functioning properly, how will you know what is on the hundreds, or even thousands, of backup volumes that you have? I suppose you could do a tar tvf on all of them and create your own "minicatalog." That's not an easy task. Suppose you had 500 or so tapes. It would take you more than a month to read them all. This is just to get a table of contents of these volumes.

A much better solution would be to get a backup system that you trust. Learn how to check the database for inconsistencies. Run those checks every day, and if any inconsistencies are found that can't be fixed, recover the database back to the point in time before it became corrupted. If the backup software allows it, you then have it reread any volumes that have been written to since then.




Backup & Recovery
Backup & Recovery: Inexpensive Backup Solutions for Open Systems
ISBN: 0596102461
EAN: 2147483647
Year: 2006
Pages: 237

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net