23.4. How Do I Read This Volume?If you're a system administrator for long enough, someone eventually will hand you a volume and ask "Can you read this?" She doesn't know what the format is, or where the volume came from, but she wants you to read it. Or you may have a very old backup volume that you wish you could read but can't. How do you handle this? How do you figure out what format a volume is? How do you read a volume that was written on a different machine? These are all questions answered in this section. There are about 10 factors to consider when trying to read an unknown or foreign volume, half of which have to do with the hardware itselfwhether or not it is compatible. The other half have to do with the format of the data. If you are having trouble reading a volume, it could be caused by one or more of these problems. 23.4.1. Prepare in AdvanceIf you've just been handed a volume and need to read it right now, ignore this paragraph. If you work in a heterogeneous environment and might be reading volumes on different types of platforms, read it carefully now. Reading a volume on a platform other than that on which it was created is always difficult. In fact, except for circumstances like a bad backup drive or data corruption, the only sure way to read a volume easily every time is to read it on the machine that made it. Do not assume that you will be able to read a volume on another system because the volume is the same size, because the operating system is the same, or even if the utility goes by the same name. In fact, don't assume anything. If it is likely that you are eventually going to have to read a volume on another type of system or another type of drive, see if it works before you actually need to do it. Also, if you can keep one or two of the old systems and drives around, you will have something to use if the new system doesn't work. (I know of companies that have 10- or 15-year-old computers sitting around for just this purpose.) If you test things up front, you might find out that you need to use a special option to make a backup that can be read on other platforms. You may find that it doesn't work at all. Of course, finding that out now is a lot better than finding it out two years from now when you really, really, really need that volume! 23.4.2. Wrong Media TypeMany media types look similar but really are not. DLT, LTO, AIT, and other drives all have different generations of media that work in different generations of drives. If the volume is a tape, and its drive has a media recognition system (MRS), it may even spit the tape back out if it is the wrong type. Sometimes MRS is not enabled or not present, so you assume that the tape should work because it fits in the drive. Certain types of media are made to work in certain types of drives, and if you've got the wrong media type for the drive that you are using, the drive will not be able to read it. Sometimes this is not initially obvious because the drive reports media errors. Problems involving incompatible media types sometimes can be corrected by using the newest drive that you have available. That is because many newer drives are able to read older tapes created with previous generations of drives. However, this is not always the case and can cause problems. 23.4.3. Bad or Dirty Drive or TapeIf the drive types and media types are the same but one drive cannot read the other drive's tapes, then the drive could be defective or just dirty. Try a cleaning tape, if one is available. If that does not work, the drive could be defective. It also is possible that the drive that wrote the tape was defective. A drive with misaligned heads, for example, may write a backup image that can't be read by a good drive. For this reason, when you are making a backup volume that is going to be stored for a long time, you should verify right away that it can be read in another drive.
Although less common, there are also tape cleaning machines. The machines look like tape drives. They load the tape and run the entire tape through a clean and vacuum process. Sometimes when a tape is unreadable in any drive, cleaning the tape like this can allow the tape to be read. It would be handy to have one of these machines to prepare for such a scenario. 23.4.4. Different Drive TypesThis is related to the media-types problem. Not all drives that look alike are alike. For example, not all tapes are labeled with the type of drive they should go into. Not all drives that use hardware compression are labeled as such, either. The only way to know for sure is to check the model numbers of the two different drives. If they are different manufacturers, you may have to consult their web pages or even call them to make sure that the two drive types are compatible. 23.4.5. Wrong Compression Setting/TypeUsually, drives of the same type use the same kind of compression. However, some value-added resellers (VARs) sell drives that have been enhanced with a proprietary compression algorithm. They can get more compression with their algorithm, thus allowing the drive to write faster and store more. If all of your drives are from the same manufacturer, this may not be a problemas long as the vendor stays in business! But if all your drives aren't from the same manufacturer, you should consider using an alternate compression setting if they have one, such as IDRC or DCLZ. Again, this goes back to proper planning. 23.4.6. The Little Endian That Couldn'tDifferences exist among machines of different architectures that may make moving volumes between them impossible. These differences include whether the machine is big-endian, little-endian, ones complement, or twos complement. For example, Intel-based machines are little-endian, and RISC-based machines are big-endian. Moving volumes between these two types of platforms may be impossible. Most big Unix machines are big-endian, but Intel x86 machines and older Digital machines are little-endian (see Table 23-1). That means that if you are trying to read a backup that was written on an NCR 3b2 (a big-endian machine), and you are using a backup drive on an NCR Intel SVr4 (little-endian) box, you may have a problem. There is also the issue of ones-complement and twos-complement machines, which are also different architectures. It is beyond the scope of this book to explain what is meant by big-endian, little-endian, ones complement, and twos complement. The purpose of this section is merely to point out that such differences exist and that if you have a volume written on one platform and are trying to read it on another, you may be running into this problem. Usually, the only way to solve it is to read the volume on its original platform.
Most backup formats use an "endian-independent" format, which means that their header and data can be read on any machine that supports that format. Usually, tar and cpio can do this, especially if you use the GNU versions. I have read GNU tar volumes on an Intel Unix or Linux (i.e., little-endian) box that were written on HPs and Suns (i.e., big-endian machines). For example, it is quite common to ftp tar files from a Unix machine to a Windows machine, then use WinZip to read them. Again, your mileage may vary, and it helps if you test it out first. Some people talk about reading a volume with dd and using its conv=swab feature to swap the byte order of a volume. This may make the header readable but may make the data itself worthless. This is because of different byte sizes (8 bits versus 16 bits) and other things that are beyond the scope of this book. Again, the only way to make sure that this is not preventing you from reading a volume is to make sure that you are reading the volume on the same architecture on which it was written. 23.4.7. Block Size (Tape Volumes Only)Tape volumes are written in different block sizes, and you often need to know the block size of a tape before you can read it. This section describes how block sizes work, as well as how to determine your block size. When a program reads or writes data to or from a device or memory, it is referred to as an I/O operation . How much data is transferred during that I/O operation is referred to as a block. Since the actual creation of each block consumes resources, a larger block usually results in faster I/O operations (i.e., faster backups). When an I/O operation writes data to a disk, the block size that was used for that operation does not affect how the data is physically recorded on the disk; it affects only the performance of the operation. However, when an I/O operation writes to a tape drive, each block of data becomes a tape block, and each tape block is separated by an interrecord gap. This relationship is illustrated in Figure 23-1. Figure 23-1. Tape blocks and interrecord gapsAll I/O operations that attempt to read from this tape must understand its block size, or they will be unsuccessful. If you use a different block size, three potential scenarios can occur: Block size is a multiple of the original block size For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 2,048. This scenario is actually quite common and works just fine. Depending on a number of factors, the resulting read of the tape may be faster or slower than it would have been if it used the original block size. (Using a block size that is too large can actually slow down I/O operations.) Block size is larger than the original block size (but not a multiple) For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 1,500. What happens here depends on your application, but most applications will return an I/O error. The read operation attempts to read a whole block of data, and when it reaches the end of the block that you told it to read, it does not find an interrecord gap. Most applications will complain and exit. Block size is smaller than the original block size For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 512. This will almost always result in an I/O error. Again, the application attempts to read a block of 512 bytes, then looks for the interrecord gap. If it doesn't see it, it complains and exits. Interrecord gaps actually take up space on the tape. If you use a block size that is too small, you will fill up a lot of your tape with these interrecord gaps, and the tape actually will hold less data. Each tape drive on each server has an optimal block size that allows it to stream best. Your job is to find which block size gives you the best performance. A block size that is too small decreases performance; a block size that is too large may decrease performance as well because the system may be paging or swapping to create that large block size. Some operating systems and platforms also limit the maximum block size. 23.4.8. Determine the Blocking FactorUse the trick described in Chapter 3 in the section "Using dd to Determine the Block Size of a Tape" to determine your block size. If you're reading a tar or dump backup, you'll need to determine the blocking factor. If the backup utility is tar, the blocking factor usually is multiplied by 512. dump's blocking factor usually is multiplied by 1,024. Read the manpage for the command that you are using and determine the multiplier that it uses. Then, divide the block size by that multiplier. You now have your blocking factor. For example, you read the tape with dd, and it says the block size is 32,768. The manpage for dump tells you that the blocking factor is multiplied times 1,024. If you divide 32,768 by 1,024, you will get a blocking factor of 32. You then can use this blocking factor with restore to read the tape. 23.4.9. AIX and Its 512-Byte Block SizeSome operating systems, such as AIX, allow you to hardcode the block size of a tape device. This means that no matter what block size you set with a backup utility, the device will always write using the hardcoded block size. During normal operations, most people set the block size to 0, allowing the device to write in any block size that you specify with your backup utility. (This is also known as variable block size.) However, during certain operations, AIX automatically sets the block size to 512. This normally happens when performing a mksysb or sysback backup, and the reason this happens is that a block size of 512 makes the mksysb/sysback tape look like a disk. That way, the system can boot off the tape because it effectively looks like the root disk. Most mksysb/sysback scripts set the block size back to when they are done, but not all do so. You should check to make sure that your scripts do, to prevent you from unintentionally writing other tapes using this block size. Why can't you read, on other systems, tapes that were written on AIX (with a block size of 512)? The reason is that AIX doesn't actually use a block size of 512. What AIX really does is write a block of 512 bytes and then pad it with 512 bytes of nulls. That means that they're really writing a block size of 1024, and half of each block is being thrown away! Only the AIX tape drives understand this, which means that a tape written with a block size of 512 can be read only on another AIX system. However, if you set the device's hardcoded block size to 0, you should have no problem on other systemsassuming the backup format is compatible. Setting it to 0 makes it work like every other tape drive. The block size you set with the backup utility is the block size the tape drive writes in. (If you want to check your AIX tape drive's block size now, start up smit and choose Devices, then Tape Drives, then Change Characteristics, and make sure that the block size of all your tape drives is set to 0!) You can even set the block size of a device to 1,024 without causing a compatibility problem. Doing so will force the device to write using a block size of 1,024, regardless of what block size you specify with your backup utility. However, this is a "normal" block, unlike the unique type of block created by the 512-byte block size. Assuming that the backup format is compatible, you should be able to read such a tape on another platform. (I know of no reason why you would want to set the block size to 1,024, though.) To set the block size of a device back to 0, run the following command: # chdev -l device_name -a block_size=0 23.4.10. Unknown Backup FormatObviously, when you are handed a foreign volume, you have no idea what backup utility was used to make that volume. If this happens, start by finding out the block size; it will come in handy when trying to read an unknown format. Then, use that block size to try and read the volume using the various backup formats, such as tar, cpio, dump, and pax. I would try them in that order; foreign volumes are most likely going to be in tar format because it is the most interchangeable format. One trick to finding the type of backup format is to take a block of data off of the volume and run the file command on it. This often will come back and say cpio or tar. If that happens, great! For example, if you used the block size-guessing command shown previously, you would have a file called /tmp/sizefile that you could use to determine the block size of the tape. If you haven't made this file, do so now, then enter this command: # file /tmp/sizefile If it just says "data," you're out of luck. But you just might get lucky, especially if you download from the Internet a robust magic file: # file -f /etc/robust.magic /tmp/sizefile In this case, file helps reveal the format for commands and utilities not native to the immediate platform. 23.4.11. Different Backup FormatSometimes, two commands sound the same but really aren't. This can be as simple as incompatible versions of cpio, or at the worst, completely incompatible versions of dump. Format inconsistencies between tar and cpio usually can be overcome by the GNU versions because they automatically detect what format they are reading. However, if you are using an incompatible version of dump (such as xfsdump from IRIX), you are out of luck! You will need a system of that type to read the volume. Again, your mileage may vary. Make sure you test it up front.
23.4.12. Damaged VolumeOne of the most common questions I see on Usenet is, "I accidentally typed tar cvf when I meant to type tar xvf. Is there any way to read what's left on this volume?" The quick answer is no. Why is that? Each time a backup is written to a tape, an end-of-media (EOM) mark is made at the end of the backup. This mark tells the tape drive software, "There is no more data after this markno need to go any further." No matter what utility you try, it will always stop at the EOM mark because it thinks this is the last backup on the tape. Of course, the tape could just be damaged or corrupted. One of the tricks I've seen used in this scenario is to use cat to read the corrupted tape: # cat device/tmp/somefile This just blindly reads in the data into /tmp/somefile, so you can read it with tar, cpio, or dump. 23.4.13. Reading a "Flaky" TapeOne of the fun things about being a backup specialist is that everyone tells you their favorite backup and recovery horror stories. One day a friend told me that he was having a really hard time reading a particularly flaky tape. The system would read just so far into the tape and then quit with an I/O error. However, if he tried reading that same section of tape again, it would work! He really needed the data on this particular tape, so he refused to give up. He wrote a shell script that would read the tape until it got an error. Then it would rewind the tape, fast-forward (fsr) to where he got the error, and try again. This script ran for two or three days before he finally got what he needed. I had never heard of such dedication. I told my friend Jim Donnellan that he had to let me put the shell script in the book. The shell script in Example 23-1 was called read-tape.sh and actually did the job. Maybe this script will come in handy for someone else. Example 23-1. The read-tape.sh script
If you've got tips on how to read corrupted or damaged volumes, I want to hear them. If I use them in later editions of the book, I will credit your work! (I also will put any new ones I receive on the web site for everyone to use immediately.) 23.4.14. Multiple Partitions on a TapeThis one is more of a gotcha than anything else. Always remember that when a backup is sent to tape, it could have more than one partition on that tape. If you are reading an unknown tape, you might try issuing the following commands: # mt -t device rewind # mt -t device fsf 1 Then, try again to read this backup. If it fails with I/O error, there are no more backups. (That's the EOM marker again.) If it doesn't fail, try the same commands that you tried in the beginning of the tape to read it. Do not assume that it is the same format as the first partition on the tape. Also understand that every time you issue a command to try and read the tape, you need to rewind it and fast-forward it again using the two preceding commands. 23.4.15. If at First You Don't Succeed...Then perhaps failure is your style! That doesn't mean that you have to stop trying to read that volume. Remember that the early bird gets the worm, but the second mouse gets the cheese. The next time you're stuck with a volume you can't read, remember my friend Jim and his flaky tape. |