Compressing and Archiving Files

Large files use a lot of disk space and take longer than smaller files to transfer from one system to another over a network. If you do not need to look at the contents of a large file very often, you may want to save it on a CD, DVD, or other medium and remove it from the hard disk. If you have a continuing need for the file, retrieving a copy from a CD may be inconvenient. To reduce the amount of disk space you use without removing the file entirely, you can compress the file without losing any of the information it holds. Similarly a single archive of several files packed into a larger file is easier to manipulate, upload, download, and email than multiple files. You may frequently download compressed, archived files from the Internet. The utilities described in this section compress and decompress files and pack and unpack archives.

bzip2: Compresses a File

The bzip2 utility ( compresses a file by analyzing it and recoding it more efficiently. The new version of the file looks completely different. In fact, because the new file contains many nonprinting characters, you cannot view it directly. The bzip2 utility works particularly well on files that contain a lot of repeated information, such as text and image data, although most image data is already in a compressed format.

The following example shows a boring file. Each of the 8,000 lines of the letter_e file contains 72 e's and a NEWLINE character that marks the end of the line. The file occupies more than half a megabyte of disk storage.

$ ls -l -rw-rw-r--  1 sam sam 584000 Mar  1 22:31 letter_e 

The -l (long) option causes ls to display more information about a file. Here it shows that letter_e is 584,000 bytes long. The --verbose (or -v) option causes bzip2 to report how much it was able to reduce the size of the file. In this case, it shrank the file by 99.99 percent:

$ bzip2 -v letter_e letter_e: 11680.00:1, 0.001 bits/byte, 99.99% saved, 584000 in, 50 out. $ ls -l -rw-rw-r--  1 sam sam 50 Mar  1 22:31 letter_e.bz2 

.bz2 filename extension

Now the file is only 50 bytes long. The bzip2 utility also renamed the file, appending .bz2 to its name. This naming convention reminds you that the file is compressed; you would not want to display or print it, for example, without first decompressing it. The bzip2 utility does not change the modification date associated with the file, even though it completely changes the file's contents.

In the following, more realistic example, the file zach.jpg contains a computer graphics image:

$ ls -l -rw-r--r-- 1 sam sam 33287 Mar  1 22:40 zach.jpg 

The bzip2 utility can reduce the size of the file by only 28 percent because the image is already in a compressed format:

$ bzip2 -v zach.jpg zach.jpg:  1.391:1,  5.749 bits/byte, 28.13% saved, 33287 in, 23922 out. $ ls -l -rw-r--r--  1 sam sam 23922 Mar  1 22:40 zach.jpg.bz2 

Refer to page 668 for more information on bzip2.

bunzip2 and bzcat: Decompress a File

You can use the bunzip2 utility to restore a file that has been compressed with bzip2:

$ bunzip2 letter_e.bz2 $ ls -l -rw-rw-r--  1 sam sam 584000 Mar  1 22:31 letter_e $ bunzip2 zach.jpg.bz2 $ ls -l -rw-r--r--  1 sam sam  33287 Mar  1 22:40 zach.jpg 

The bzcat utility displays a file that has been compressed with bzip2. The equivalent of cat for .bz2 files, bzcat decompresses the compressed data and displays the contents of the decompressed file. Like cat, bzcat does not change the source file. The pipe in the following example redirects the output of bzcat so that instead of being displayed on the screen it becomes input to head, which displays the first two lines of the file:

$ bzcat letter_e.bz2 | head -2 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee 

After bzcat is run, the contents of is unchanged; the file is still stored on the disk in compressed form.


The bzip2recover utility supports limited data recovery from media errors. Give the command bzip2recover followed by the name of the compressed, corrupted file from which you want to recover data.

gzip: Compresses a File

gunzip and zcat

The gzip (GNU zip) utility is older and less efficient than bzip2. Its flags and operation are very similar to those of bzip2. A file compressed by gzip is marked by a .gz filename extension. Files you download from the Internet are frequently in gzip format. Use gzip, gunzip, and zcat just as you would use bzip2, bunzip2, and bzcat. Refer to page 756 for more information on gzip.


The compress utility can also compress files, albeit not as well as gzip. This utility marks a file it has compressed by adding .Z to its name.

Tip: gzip versus zip

Do not confuse gzip and gunzip with the zip and unzip utilities. These last two are used to pack and unpack zip archives containing several files compressed into a single file that has been imported from or is being exported to a system running Windows. The zip utility constructs a zip archive, whereas unzip unpacks zip archives. The zip and unzip utilities are compatible with PKZIP, a Windows compress and archive program.

tar: Packs and Unpacks Archives

The tar utility performs many functions. Its name is short for tape archive, as its original function was to create and read archive and backup tapes. Today it is used to create a single file (called a tar file or archive) from multiple files or directory hierarchies and to extract files from a tar file. The cpio (page 693), ditto (page 715), and pax (page 809) utilities perform similar functions.

In the following example, the first ls shows the existence and sizes of the files g, b, and d. Next tar uses the -c (create), -v (verbose), and -f (write to or read from a file) options[2] to create an archive named all.tar from these files. Each line of the output from tar lists the name of a file that it is appending to the archive.

[2] Although the original UNIX tar did not use a leading hyphen to indicate an option on the command line, it now accepts hyphens. The GNU tar described here will accept tar commands with or without a leading hyphen. This book uses the hyphen for consistency with most other utilities.

The tar utility adds overhead when it creates an archive. The next command shows that the archive file all.tar occupies about 9,700 bytes, whereas the sum of the sizes of the three files is about 6,000 bytes. This overhead is more appreciable on smaller files, such as the ones in this example.

$ ls -l g b d -rw-r--r--   1 jenny jenny 1302 Aug 20 14:16 g -rw-r--r--   1 jenny other 1178 Aug 20 14:16 b -rw-r--r--   1 jenny jenny 3783 Aug 20 14:17 d $ tar -cvf all.tar g b d g b d $ ls -l all.tar -rw-r--r--   1 jenny jenny 9728 Aug 20 14:17 all.tar $ tar -tvf all.tar -rw-r--r-- jenny/jenny    1302 2005-08-20 14:16 g -rw-r--r-- jenny/other    1178 2005-08-20 14:16 b -rw-r--r-- jenny/jenny    3783 2005-08-20 14:17 d 

The final command in the preceding example uses the -t option to display a table of contents for the archive. Use -x instead of -t to extract files from a tar archive. Omit the -v option if you want tar to do its work silently.

You can use bzip2, compress, or gzip to compress tar files and make them easier to store and handle. Many files you download from the Internet are in one of these formats. Files that have been processed by tar and compressed by bzip2 frequently have a filename extension of .tar.bz2 or .tbz. Those processed by tar and gzip have an extension of .tar.gz or .tz, while files processed by tar and compress use .tar.Z as the extension.

You can unpack a tarred and gzipped file in two steps. (Follow the same procedure if the file was compressed by bzip2, but use bunzip2 instead of gunzip.) The next example shows how to unpack the GNU make utility after it has been downloaded (

$ ls -l mak* -rw-rw-r--  1 sam sam 1211924 Jan 20 11:49 make-3.80.tar.gz $ gunzip mak* $ ls -l mak* -rw-rw-r--  1 sam sam 4823040 Jan 20 11:49 make-3.80.tar $ tar -xvf mak* make-3.80/ make-3.80/po/ make-3.80/po/ ... make-3.80/tests/ make-3.80/tests/ 

The first command lists the downloaded tarred and gzipped file: make-3.80.tar.gz (about 1.2 megabytes). The asterisk (*) in the filename matches any characters in any filenames (page 134), so you end up with a list of files whose names begin with mak; in this case there is only one. Using an asterisk saves typing and can improve accuracy with long filenames. The gunzip command decompresses the file and yields make-3.80.tar (no .gz extension), which is about 4.8 megabytes. The tar command creates the make-3.80 directory in the working directory and unpacks the files into it.

$ ls -ld mak* drwxrwxr-x  8 sam sam    4096 Oct  3  2002 make-3.80 -rw-rw-r--  1 sam sam 4823040 Jan 20 11:49 make-3.80.tar $ ls -l make-3.80 total 3536 -rw-r--r--  1 sam sam  24687 Oct  3  2002 ABOUT-NLS -rw-r--r--  1 sam sam   1554 Jul  8  2002 AUTHORS -rw-r--r--  1 sam sam  18043 Dec 10  1996 COPYING -rw-r--r--  1 sam sam  32922 Oct  3  2002 ChangeLog ... -rw-r--r--  1 sam sam  16520 Jan 21  2000 vmsify.c -rw-r--r--  1 sam sam  16409 Aug  9  2002 vpath.c drwxrwxr-x  5 sam sam   4096 Oct  3  2002 w32 

After tar extracts the files from the archive, the working directory contains two files whose names start with mak: make-3.80.tar and make-3.80. The -d (directory) option causes ls to display only file and directory names, not the contents of directories as it normally does. The final ls command shows the files and directories in the make-3.80 directory. Refer to page 862 for more information on tar.

Caution: tar: the -x option may extract a lot of files

Some tar archives contain many files. Run tar with the -t option and the name of the tar file to list the files in the archive without unpacking them. In some cases you may want to create a new directory (mkdir [page 80]), move the tar file into that directory, and expand it there. That way the unpacked files do not mingle with existing files, and there is no confusion. This strategy also makes it easier to delete the extracted files. Some tar files automatically create a new directory and put the files into it. Refer to the preceding example.

Caution: tar: the -x option can overwrite files

The -x option to tar overwrites a file that has the same filename as a file you are extracting. Follow the suggestion in the preceding caution box to avoid overwriting files.


You can combine the gunzip and tar commands on one command line with a pipe (|), which redirects the output of gunzip so that it becomes the input to tar:

$ gunzip -c make-3.80.tar.gz | tar -xvf - 

The -c option causes gunzip to send its output through the pipe instead of creating a file. Refer to "Pipes" (page 128), gzip (page 756), and tar (page 862) for more information about how this command line works.

A simpler solution is to use the -z option to tar. This option causes tar to call gunzip (or gzip when you are creating an archive) directly and simplifies the preceding command line to

$ tar -xvzf make-3.80.tar.gz 

In a similar manner, the -j option calls bzip2 or bunzip2.

A Practical Guide to UNIX[r] for Mac OS[r] X Users
A Practical Guide to UNIX for Mac OS X Users
ISBN: 0131863339
EAN: 2147483647
Year: 2005
Pages: 234

Similar book on Amazon © 2008-2017.
If you may any questions please contact us: