Section 23.4. How Do I Read This Volume?

23.4. How Do I Read This Volume?

If you're a system administrator for long enough, someone eventually will hand you a volume and ask "Can you read this?" She doesn't know what the format is, or where the volume came from, but she wants you to read it. Or you may have a very old backup volume that you wish you could read but can't. How do you handle this? How do you figure out what format a volume is? How do you read a volume that was written on a different machine? These are all questions answered in this section. There are about 10 factors to consider when trying to read an unknown or foreign volume, half of which have to do with the hardware itselfwhether or not it is compatible. The other half have to do with the format of the data. If you are having trouble reading a volume, it could be caused by one or more of these problems.

23.4.1. Prepare in Advance

If you've just been handed a volume and need to read it right now, ignore this paragraph. If you work in a heterogeneous environment and might be reading volumes on different types of platforms, read it carefully now. Reading a volume on a platform other than that on which it was created is always difficult.

In fact, except for circumstances like a bad backup drive or data corruption, the only sure way to read a volume easily every time is to read it on the machine that made it. Do not assume that you will be able to read a volume on another system because the volume is the same size, because the operating system is the same, or even if the utility goes by the same name. In fact, don't assume anything.

If it is likely that you are eventually going to have to read a volume on another type of system or another type of drive, see if it works before you actually need to do it. Also, if you can keep one or two of the old systems and drives around, you will have something to use if the new system doesn't work. (I know of companies that have 10- or 15-year-old computers sitting around for just this purpose.) If you test things up front, you might find out that you need to use a special option to make a backup that can be read on other platforms. You may find that it doesn't work at all. Of course, finding that out now is a lot better than finding it out two years from now when you really, really, really need that volume!

23.4.2. Wrong Media Type

Many media types look similar but really are not. DLT, LTO, AIT, and other drives all have different generations of media that work in different generations of drives. If the volume is a tape, and its drive has a media recognition system (MRS), it may even spit the tape back out if it is the wrong type. Sometimes MRS is not enabled or not present, so you assume that the tape should work because it fits in the drive. Certain types of media are made to work in certain types of drives, and if you've got the wrong media type for the drive that you are using, the drive will not be able to read it. Sometimes this is not initially obvious because the drive reports media errors.

Problems involving incompatible media types sometimes can be corrected by using the newest drive that you have available. That is because many newer drives are able to read older tapes created with previous generations of drives. However, this is not always the case and can cause problems.

23.4.3. Bad or Dirty Drive or Tape

If the drive types and media types are the same but one drive cannot read the other drive's tapes, then the drive could be defective or just dirty. Try a cleaning tape, if one is available. If that does not work, the drive could be defective. It also is possible that the drive that wrote the tape was defective. A drive with misaligned heads, for example, may write a backup image that can't be read by a good drive. For this reason, when you are making a backup volume that is going to be stored for a long time, you should verify right away that it can be read in another drive.

Try, Try Again

During a restore, I came across a tape that kept saying it was blank. Of course, the info on the tape was needed for someone who accidentally deleted an important file. After several tries in all four tape drive units in the jukebox, it finally was able to be recognized and read. The restore was done without any further complications. After a lot of breath holding, prayer, and profanity, I realized the moral was, if you at first don't succeed, try, try again!

Ed Lam

Although less common, there are also tape cleaning machines. The machines look like tape drives. They load the tape and run the entire tape through a clean and vacuum process. Sometimes when a tape is unreadable in any drive, cleaning the tape like this can allow the tape to be read. It would be handy to have one of these machines to prepare for such a scenario.

23.4.4. Different Drive Types

This is related to the media-types problem. Not all drives that look alike are alike. For example, not all tapes are labeled with the type of drive they should go into. Not all drives that use hardware compression are labeled as such, either. The only way to know for sure is to check the model numbers of the two different drives. If they are different manufacturers, you may have to consult their web pages or even call them to make sure that the two drive types are compatible.

23.4.5. Wrong Compression Setting/Type

Usually, drives of the same type use the same kind of compression. However, some value-added resellers (VARs) sell drives that have been enhanced with a proprietary compression algorithm. They can get more compression with their algorithm, thus allowing the drive to write faster and store more. If all of your drives are from the same manufacturer, this may not be a problemas long as the vendor stays in business! But if all your drives aren't from the same manufacturer, you should consider using an alternate compression setting if they have one, such as IDRC or DCLZ. Again, this goes back to proper planning.

23.4.6. The Little Endian That Couldn't

Differences exist among machines of different architectures that may make moving volumes between them impossible. These differences include whether the machine is big-endian, little-endian, ones complement, or twos complement. For example, Intel-based machines are little-endian, and RISC-based machines are big-endian. Moving volumes between these two types of platforms may be impossible.

Most big Unix machines are big-endian, but Intel x86 machines and older Digital machines are little-endian (see Table 23-1). That means that if you are trying to read a backup that was written on an NCR 3b2 (a big-endian machine), and you are using a backup drive on an NCR Intel SVr4 (little-endian) box, you may have a problem. There is also the issue of ones-complement and twos-complement machines, which are also different architectures. It is beyond the scope of this book to explain what is meant by big-endian, little-endian, ones complement, and twos complement. The purpose of this section is merely to point out that such differences exist and that if you have a volume written on one platform and are trying to read it on another, you may be running into this problem. Usually, the only way to solve it is to read the volume on its original platform.

Table 23-1. Big- and little-endian platforms
Big-endian	Little-endian
SGI/MIPS, IBM/RS6000, HP/PA-RISC, Sparc/RISC, PowerPC, DG Aviion, HP/Apollo (400, DN3xxx, DN4xxx), NCR 3B2, TI 1500, Pre-Intel Macintosh, Alpha^[1]	DECStations,^[2] VAX, Intel x86

^[1] I have heard that Alpha machines can actually be switched between big- and little-endian, but I can't find anyone to verify that. But Digital Unix is written for a big-endian alpha, so yours will probably be big-endian.

^[2] These are the older DEC 3x00 and 5x00 series machines that run Ultrix.

Most backup formats use an "endian-independent" format, which means that their header and data can be read on any machine that supports that format. Usually, tar and cpio can do this, especially if you use the GNU versions. I have read GNU tar volumes on an Intel Unix or Linux (i.e., little-endian) box that were written on HPs and Suns (i.e., big-endian machines). For example, it is quite common to ftp tar files from a Unix machine to a Windows machine, then use WinZip to read them. Again, your mileage may vary, and it helps if you test it out first.

Some people talk about reading a volume with dd and using its conv=swab feature to swap the byte order of a volume. This may make the header readable but may make the data itself worthless. This is because of different byte sizes (8 bits versus 16 bits) and other things that are beyond the scope of this book. Again, the only way to make sure that this is not preventing you from reading a volume is to make sure that you are reading the volume on the same architecture on which it was written.

23.4.7. Block Size (Tape Volumes Only)

Tape volumes are written in different block sizes, and you often need to know the block size of a tape before you can read it. This section describes how block sizes work, as well as how to determine your block size.

When a program reads or writes data to or from a device or memory, it is referred to as an I/O operation . How much data is transferred during that I/O operation is referred to as a block. Since the actual creation of each block consumes resources, a larger block usually results in faster I/O operations (i.e., faster backups). When an I/O operation writes data to a disk, the block size that was used for that operation does not affect how the data is physically recorded on the disk; it affects only the performance of the operation. However, when an I/O operation writes to a tape drive, each block of data becomes a tape block, and each tape block is separated by an interrecord gap. This relationship is illustrated in Figure 23-1.

Figure 23-1. Tape blocks and interrecord gaps

All I/O operations that attempt to read from this tape must understand its block size, or they will be unsuccessful. If you use a different block size, three potential scenarios can occur:

Block size is a multiple of the original block size

For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 2,048. This scenario is actually quite common and works just fine. Depending on a number of factors, the resulting read of the tape may be faster or slower than it would have been if it used the original block size. (Using a block size that is too large can actually slow down I/O operations.)

Block size is larger than the original block size (but not a multiple)

For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 1,500. What happens here depends on your application, but most applications will return an I/O error. The read operation attempts to read a whole block of data, and when it reaches the end of the block that you told it to read, it does not find an interrecord gap. Most applications will complain and exit.

Block size is smaller than the original block size

For example, a tape was recorded with a block size of 1,024, and you are reading it with a block size of 512. This will almost always result in an I/O error. Again, the application attempts to read a block of 512 bytes, then looks for the interrecord gap. If it doesn't see it, it complains and exits.

Interrecord gaps actually take up space on the tape. If you use a block size that is too small, you will fill up a lot of your tape with these interrecord gaps, and the tape actually will hold less data.

Each tape drive on each server has an optimal block size that allows it to stream best. Your job is to find which block size gives you the best performance. A block size that is too small decreases performance; a block size that is too large may decrease performance as well because the system may be paging or swapping to create that large block size. Some operating systems and platforms also limit the maximum block size.

23.4.8. Determine the Blocking Factor

Use the trick described in Chapter 3 in the section "Using dd to Determine the Block Size of a Tape" to determine your block size. If you're reading a tar or dump backup, you'll need to determine the blocking factor. If the backup utility is tar, the blocking factor usually is multiplied by 512. dump's blocking factor usually is multiplied by 1,024. Read the manpage for the command that you are using and determine the multiplier that it uses. Then, divide the block size by that multiplier. You now have your blocking factor.

For example, you read the tape with dd, and it says the block size is 32,768. The manpage for dump tells you that the blocking factor is multiplied times 1,024. If you divide 32,768 by 1,024, you will get a blocking factor of 32. You then can use this blocking factor with restore to read the tape.

23.4.9. AIX and Its 512-Byte Block Size

Some operating systems, such as AIX, allow you to hardcode the block size of a tape device. This means that no matter what block size you set with a backup utility, the device will always write using the hardcoded block size. During normal operations, most people set the block size to 0, allowing the device to write in any block size that you specify with your backup utility. (This is also known as variable block size.) However, during certain operations, AIX automatically sets the block size to 512. This normally happens when performing a mksysb or sysback backup, and the reason this happens is that a block size of 512 makes the mksysb/sysback tape look like a disk. That way, the system can boot off the tape because it effectively looks like the root disk. Most mksysb/sysback scripts set the block size back to when they are done, but not all do so. You should check to make sure that your scripts do, to prevent you from unintentionally writing other tapes using this block size.

Why can't you read, on other systems, tapes that were written on AIX (with a block size of 512)? The reason is that AIX doesn't actually use a block size of 512. What AIX really does is write a block of 512 bytes and then pad it with 512 bytes of nulls. That means that they're really writing a block size of 1024, and half of each block is being thrown away! Only the AIX tape drives understand this, which means that a tape written with a block size of 512 can be read only on another AIX system.

However, if you set the device's hardcoded block size to 0, you should have no problem on other systemsassuming the backup format is compatible. Setting it to 0 makes it work like every other tape drive. The block size you set with the backup utility is the block size the tape drive writes in. (If you want to check your AIX tape drive's block size now, start up smit and choose Devices, then Tape Drives, then Change Characteristics, and make sure that the block size of all your tape drives is set to 0!)

You can even set the block size of a device to 1,024 without causing a compatibility problem. Doing so will force the device to write using a block size of 1,024, regardless of what block size you specify with your backup utility. However, this is a "normal" block, unlike the unique type of block created by the 512-byte block size. Assuming that the backup format is compatible, you should be able to read such a tape on another platform. (I know of no reason why you would want to set the block size to 1,024, though.)

To set the block size of a device back to 0, run the following command:

# chdev -l device_name -a block_size=0

23.4.10. Unknown Backup Format

Obviously, when you are handed a foreign volume, you have no idea what backup utility was used to make that volume. If this happens, start by finding out the block size; it will come in handy when trying to read an unknown format. Then, use that block size to try and read the volume using the various backup formats, such as tar, cpio, dump, and pax. I would try them in that order; foreign volumes are most likely going to be in tar format because it is the most interchangeable format.

One trick to finding the type of backup format is to take a block of data off of the volume and run the file command on it. This often will come back and say cpio or tar. If that happens, great! For example, if you used the block size-guessing command shown previously, you would have a file called /tmp/sizefile that you could use to determine the block size of the tape. If you haven't made this file, do so now, then enter this command:

# file /tmp/sizefile

If it just says "data," you're out of luck. But you just might get lucky, especially if you download from the Internet a robust magic file:

# file -f /etc/robust.magic /tmp/sizefile

In this case, file helps reveal the format for commands and utilities not native to the immediate platform.

23.4.11. Different Backup Format

Sometimes, two commands sound the same but really aren't. This can be as simple as incompatible versions of cpio, or at the worst, completely incompatible versions of dump. Format inconsistencies between tar and cpio usually can be overcome by the GNU versions because they automatically detect what format they are reading. However, if you are using an incompatible version of dump (such as xfsdump from IRIX), you are out of luck! You will need a system of that type to read the volume. Again, your mileage may vary. Make sure you test it up front.

They Used What Kind of Compression?

One day we needed to restore from some older tapes and were having trouble reading them. The drives kept complaining about I/O errors every time we tried to read one of these tapes. After further research, we found out that the tapes had been made on a particular brand of tape drive using their proprietary compression algorithm. Unfortunately, this company no longer made the drive. Luckily, we were able to find some refurbished drives that could read the tapes. The first thing we did was to copy them to a tape drive that used a standard compression algorithm.

Mike Geringer

23.4.12. Damaged Volume

One of the most common questions I see on Usenet is, "I accidentally typed tar cvf when I meant to type tar xvf. Is there any way to read what's left on this volume?" The quick answer is no. Why is that?

Each time a backup is written to a tape, an end-of-media (EOM) mark is made at the end of the backup. This mark tells the tape drive software, "There is no more data after this markno need to go any further." No matter what utility you try, it will always stop at the EOM mark because it thinks this is the last backup on the tape. Of course, the tape could just be damaged or corrupted. One of the tricks I've seen used in this scenario is to use cat to read the corrupted tape:

# cat device/tmp/somefile

This just blindly reads in the data into /tmp/somefile, so you can read it with tar, cpio, or dump.

23.4.13. Reading a "Flaky" Tape

One of the fun things about being a backup specialist is that everyone tells you their favorite backup and recovery horror stories. One day a friend told me that he was having a really hard time reading a particularly flaky tape. The system would read just so far into the tape and then quit with an I/O error. However, if he tried reading that same section of tape again, it would work! He really needed the data on this particular tape, so he refused to give up. He wrote a shell script that would read the tape until it got an error. Then it would rewind the tape, fast-forward (fsr) to where he got the error, and try again. This script ran for two or three days before he finally got what he needed. I had never heard of such dedication. I told my friend Jim Donnellan that he had to let me put the shell script in the book. The shell script in Example 23-1 was called read-tape.sh and actually did the job. Maybe this script will come in handy for someone else.

Example 23-1. The read-tape.sh script

# !/bin/sh DEVICE=/dev/rmt/0cbn # Set this to a non-rewinding tape device touch rawfile # The rawfile might already be there, but just in case while true ; do  size=\Qls -l rawfile | awk '{print $5}'\Q # Speaks for itself  blocks=\Qexpr "$size" / 512\Q  full=\Qdf -k . | grep <host> | awk '{print $6}'\Q   # Unfortunately, this only gets checked once per glitch. Maybe a fork?  echo $size  # Just so I know how it's going  echo $blocks  echo $full  if [ $full -gt 90 ] ; then       echo "filesystem is filling up"       exit 1  fi  mt -f $DEVICE rewind  # Let's not take chances. Start at the beginning.  sleep 60  # The drive hates this tape as it is. Give it a rest.  mt -f $DEVICE fsr $blocks  # However big rawfile is already, we can skip that on the tape  dd if=$DEVICE bs=512 >> rawfile # Let's get as much as we can  if [ $? -eq 0 ] ; then   # If dd got clipped by a tape error, there's still work to do,   echo "dd exited cleanly"   # if not, it must have gotten to the end of the file this time   # without a hitch. We're done.   exit 0  fi done

If you've got tips on how to read corrupted or damaged volumes, I want to hear them. If I use them in later editions of the book, I will credit your work! (I also will put any new ones I receive on the web site for everyone to use immediately.)

23.4.14. Multiple Partitions on a Tape

This one is more of a gotcha than anything else. Always remember that when a backup is sent to tape, it could have more than one partition on that tape. If you are reading an unknown tape, you might try issuing the following commands:

# mt -t device rewind # mt -t device fsf 1

Then, try again to read this backup. If it fails with I/O error, there are no more backups. (That's the EOM marker again.) If it doesn't fail, try the same commands that you tried in the beginning of the tape to read it. Do not assume that it is the same format as the first partition on the tape. Also understand that every time you issue a command to try and read the tape, you need to rewind it and fast-forward it again using the two preceding commands.

23.4.15. If at First You Don't Succeed...

Then perhaps failure is your style! That doesn't mean that you have to stop trying to read that volume. Remember that the early bird gets the worm, but the second mouse gets the cheese. The next time you're stuck with a volume you can't read, remember my friend Jim and his flaky tape.