Create portable tar archives with pax.
Some POSIX operating systems ship with GNU tar as the default tar utility (NetBSD and QNX6, for example). This is problematic because the GNU tar format is not compatible with other vendors' tar implementations. GNU is an acronym for "GNU's not UNIX" in this case, GNU's not POSIX either.
4.3.1 GNU Versus POSIX tar
For filenames or paths longer than 100 characters, GNU uses its own @LongName tar format extension. Some vendors' tar utilities will choke on the GNU extensions. Here is what Solaris's archivers say about such an archive:
% pax -r < gnu-archive.tar pax: ././@LongLink : Unknown filetype % tar xf gnu-archive.tar tar: directory checksum error
There definitely appears to be a disadvantage with the distribution of non-POSIX archives. A solution is to use pax to create your tar archives in the POSIX format. I'll also provide some tips about using pax's features to compensate for the loss of some parts of GNU tar's extended feature set.
4.3.2 Replacing tar with pax
The NetBSD and QNX6 pax utility supports a tar interface and can also read the @LongName GNU tar format extension. You can use pax as your tar replacement, since it can read your existing GNU-format archives and can create POSIX archives for future backups. Here's how to make the quick conversion.
First, replace /usr/bin/tar. That is, rename GNU tar and save it in another directory, in case you ever need to restore GNU tar to its previous location:
# mv /usr/bin/tar /usr/local/bin/gtar
Next, create a symlink from pax to tar. This will allow the pax utility to emulate the tar interface if invoked with the tar name:
# ln -s /bin/pax /usr/bin/tar
Now when you use the tar utility, your archives will really be created by pax.
4.3.3 Compress Archives Without Using Intermediate Files
Let's say you're on a system that doesn't have issues with tar. Why else would you consider using pax as your backup solution?
For one, you can use pax and pipelines to create compressed archives, without using intermediate files. Here's an example pipeline:
% find /home/kirk -name '*.[ch]' | pax -w | pgp -c
The pipeline's first stage uses find to generate the exact list of files to archive. When using tar, you will often create the file list using a subshell. Unfortunately, the subshell approach can be unreliable. For example, this user has so much source code that the complete file list does not fit on the command line:
% tar cf kirksrc.tar $(find /home/kirk -name '*.[ch]') /bin/ksh: tar: Argument list too long
However, in more cases, the pipeline approach will work as expected.
During the second stage, pax reads the list of files from stdin and writes the archive to stdout. The pax found on all of the BSDs has built-in gzip support, so you can also compress the archive during this stage by adding the -z argument.
When creating archives, invoke pax without the -v (verbose) argument. This way, if there are any pax error messages, they won't get lost in the extra output.
The third stage compresses and/or encrypts the archive. An intermediate tar archive isn't required as the utility reads its data from the pipeline. This example uses pgp, the Pretty Good Privacy encryption system, which can be found in the ports collection.
4.3.4 Attribute-Preserving Copies
POSIX provides two utilities for copying file hierarchies: cp -R and pax -rw. For regular users, cp -R is the common method. But for administrative use, pax -rw preserves more of the original file attributes, including hard-link counts and file access times. pax -rw also gives you a better copy of the original file hierarchy.
For an example, let's back up three executables. Note that egrep, fgrep, and grep are all hard links to the same executable.The link count is three, and all have the same inode number. ls -li displays the inode number in column 1 and the link count in column 3:
# ls -il /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep 31888 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 /usr/bin/egrep 31888 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 /usr/bin/fgrep 31888 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 /usr/bin/grep
With pax -rw, we will create one executable with the same date as the original:
# pax -rw /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep /tmp/ # ls -il /tmp/usr/bin/ 47 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 egrep 47 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 fgrep 47 -r-xr-xr-x 3 root wheel 73784 Sep 8 2002 grep
Can we do the same thing using cp -R? Nope. Instead, we create three new files, each with a unique inode number, a link count of one, and a new date:
# rm /tmp/usr/bin/* # cp -R /usr/bin/egrep /usr/bin/fgrep /usr/bin/grep /tmp/usr/bin/ # ls -il /tmp/usr/bin/ 49 -r-xr-xr-x 1 root wheel 73784 Dec 19 11:26 egrep 48 -r-xr-xr-x 1 root wheel 73784 Dec 19 11:26 fgrep 47 -r-xr-xr-x 1 root wheel 73784 Dec 19 11:26 grep
4.3.5 Rooted Archives and the Substitution Argument
If you have ever used GNU tar and received this message:
tar: Removing leading `/' from absolute path names in the archive
then you were using a tar archive that was rooted, where the files all had absolute paths starting with the forward slash (/). It is not a good idea to clobber existing files unintentionally with foreign binaries, which is why the GNU tar utility automatically strips the leading / for you.
To be safe, you want your unarchiver to create files relative to your current working directory. Rooted archives try to violate this rule by creating files relative to the root of the filesystem, ignoring the current working directory. If that archive contained /etc/passwd, unarchiving it could replace your current password file with a foreign copy. You may be surprised when you cannot log into your system anymore!
You can use the pax substitution argument to remove the leading /. This will ensure that the unarchived files will be created relative to your current working directory, instead of at the root of your filesystem:
# pax -A -r -s '-^/--' < rootedarchive.tar
Here, the -A argument requests that pax not strip the leading / automatically, as we want to do this ourselves. This argument is required only to avoid a bug in the NetBSD pax implementation that interferes with the -s argument. We also want pax to unarchive the file, so we pass the -r argument.
The -s argument specifies an ed-style substitution expression to be performed on the destination pathname. In this example, the leading / will be stripped from the destination paths. See man ed for more information.
If we used the traditional / delimiter, the substitution expression would be /^\///. (The second / isn't a delimiter, so it has to be escaped with a \.) You will find that / is the worst delimiter, because you have to escape all the slashes found in the paths. Fortunately, you can choose another delimiter. Pick one that isn't present in the paths, to minimize the number of escape characters you have to add. In the example, we used the - character as the delimiter, and therefore no escapes were required.
The substitution argument can be used to rename files for a beta software release, for example. Say you develop X11R6 software and have multiple development versions on your box:
/usr/X11R6.saturday /usr/X11R6.working /usr/X11R6.notworking /usr/X11R6.released
and you want to install the /usr/X11R6.working directory as usr/X11R6 on the beta system:
# pax -A -w -s '-^/usr/X11R6.working-usr/X11R6-' /usr/X11R6.working \ > /tmp/beta.tar
This time, the -s argument specifies a substitution expression that will replace the beginning of the path /usr/X11R6.working with usr/X11R6 in the archive.
4.3.6 Useful Resources for Multiple Volume Archives
POSIX does not specify the format of multivolume archive headers, meaning that every archiver may use a different intervolume header format. If you have a lot of multivolume tar archives and plan to switch to a different tar implementation, you should test whether you can still recover your old multivolume archives.
This practice may have been more common when Minix/QNX4 users archived their 20 MB hard disks to a stack of floppy disks. Minix/QNX4 users had the vol utility to handle multiple volumes; instead of adding the multivolume functionality to the archiver itself, it was handled by a separate utility. You should be able to switch archiver implementations transparently because vol did the splitting, not the archiver.
The vol utility performs the following operations:
Unfortunately, the vol utility isn't part of the NetBSD package collection. If you create a lot of multivolume archives, you may want to look into porting one of the following utilities:
4.3.7 See Also