1.18 Archiving and Compressing Files

Now that you know all about files, permissions, and the errors that you might get, you need to master tar and gzip .

Start with gzip (GNU Zip), the current standard Unix compression program. A file that ends with .gz is a GNU Zip archive. Use gunzip file .gz to uncompress file .gz and remove the suffix; to compress it again, use gzip file .

gzip does not perform the additional function of archiving, like the ZIP programs for other operating systems ” that is, it does not pack multiple files and directories into one file, thus creating an archive. To create an archive, use tar instead:

 tar cvf  archive  .tar  file1 file2  ...

tar archives have a .tar suffix. In the preceding example, file1 , file2 , and so on are the names of the files and directories that you wish to place in the archive named archive .tar . The c flag denotes create mode. You will learn about the v and f flags later in this section.

You can also unpack a .tar file with tar :

 tar xvf  file  .tar

Study this command. The xvf part specifies the mode and options ; it is the same as -xvf . The meanings are as follows :

The x flag puts tar into extract (unpack) mode. You can extract individual parts of the archive by placing the names of the parts at the end of the command line, but you must know their exact names. To find out for sure, see the table of contents mode described shortly.
The v flag activates verbose diagnostic output, causing tar to print the names of the files and directories in the archive when it encounters them. Adding another v causes tar to print details such as file size and permissions.
The f flag denotes the file option. The next argument on the command line must be the file on which tar is to work (in the preceding example, it is file .tar ). You must use this option and a filename at all times ” there is one exception, but it involves tape drives (see Section 13.6). To use standard input or output, use - instead of the filename.

Note	When using extract mode, remember that tar does not remove the archived .tar file after extracting its contents.

Before unpacking, it's usually a good idea to check the contents of a .tar file with the t (table of contents) mode instead of x . This mode verifies the archive's basic integrity and prints the names of all files inside.

The most important reason for testing an archive is that sometimes unpacking an archive can dump a huge mess of files into the current directory. This is difficult to clean up. When you check an archive with the t mode, verify that everything is in a rational directory structure ” that is, all file pathnames in the archive should start with the same directory. If you're not sure, create a temporary directory, change to that, and then extract. You can always use mv * .. if it turned out to be a false alarm.

One last significant option to tar is p , which preserves permissions. Use this in extract mode to override your umask and get the exact permissions specified in the archive. The p option is the default when working as the superuser. If you're having trouble with permissions and ownership when unpacking an archive as the superuser, make sure that you are waiting until the command terminates and you get the shell prompt back. Although you may only want to extract a small part of the archive, tar must run through the whole thing. You must not interrupt the process, because it sets the permissions only after checking the entire archive.

You should commit all of the tar options and modes in this section to memory; know them cold. If you're having trouble, make some flash cards. This may sound like a grade-school strategy, but it is very important to avoid careless mistakes with this command.

1.18.1 Compressed Archives (.tar.gz)

Many beginners find it confusing that archives normally come compressed, where the archive file ends in .tar.gz . To unpack a compressed archive, work from the right side to the left; get rid of the .gz first and then worry about the .tar . For example, these two commands decompress and unpack file .tar.gz :

 gunzip  file  .tar.gz tar xvf  file  .tar

When you're starting out, it's fine to do this one step at a time, first running gunzip to decompress and then tar to verify and unpack. When you do it enough, you soon memorize the entire archiving and compression process. However, this is not the fastest or most efficient way to invoke tar on a compressed archive. In particular, it wastes system resources ” disk space and kernel I/O time.

You can combine archival and compression functions with a pipeline; for example, this command pipeline unpacks file .tar.gz :

 zcat  file  .tar.gz  tar xvf -

zcat is the same as gunzip -dc . The -d option decompresses and the -c option sends the result to standard output (in this case, to the tar command).

Because this is such a common operation, the version of tar that comes with Linux has a shortcut. You can use z as an option to automatically invoke gzip on the archive. For example, use tar ztvf file .tar.gz to verify a compressed archive. However, for the sake of learning, you should make an effort to master the longer form before taking the shortcut.

Note	A .tgz file is the same as a .tar.gz file. The name is meant to fit into FAT (MS-DOS based) filesystems.

1.18.2 Other Compression Utilities

A newer compression program gaining some popularity in Unix is bzip2 , where compressed files end with .bz2 . Marginally slower than gzip , it often compacts text files a little more, and is therefore increasingly popular in the distribution of source code. The decompressing program is bunzip2 , and the options of both components are close enough to those of gzip that you don't need to learn anything new.

Most Linux distributions come with zip and unzip programs compatible with the ZIP archives on Windows systems. They work on the usual .zip files as well as self-extracting archives ending in .exe .

If you encounter a file that ends in .Z , you have found a relic created by the compress program, once the Unix standard. gunzip can unpack these files, but gzip will not create them.