Compressing and Archiving Files

 < Day Day Up > 

Large files use a lot of disk space and take longer than smaller files to transfer from one system to another over a network. If you do not need to look at the contents of a large file very often, you may want to save it on a CD, DVD, or other medium and remove it from the hard disk. If you have a continuing need for the file, retrieving a copy from a CD may be inconvenient. To reduce the amount of disk space you use without removing the file entirely, you can compress the file without losing any of the information it holds. Also you may frequently download compressed files from the Internet. The utilities described in this section compress and decompress files.

bzip2: Compresses a File

The bzip2 utility ( compresses a file by analyzing it and recoding it more efficiently. The new version of the file looks completely different. In fact, because the new file contains many nonprinting characters, you cannot view it directly. The bzip2 utility works particularly well on files that contain a lot of repeated information, such as text and image data, although most image data is already in a compressed format.

The following example shows a boring file. Each of the 8,000 lines of this file, named letter_e, contains 72 e's and a NEWLINE character that marks the end of the line. The file occupies more than half a megabyte of disk storage.

 $ ls -l -rw-rw-r--  1 sam sam 584000 Mar  1 22:31 letter_e 

The l (long) option causes ls to display more information about a file. Here it shows that letter_e is 584,000 bytes long. The verbose (or v) option causes bzip2 to report how much it was able to reduce the size of the file. In this case, it shrank the file by 99.99 percent:

 $ bzip2 -v letter_e letter_e: 11680.00:1, 0.001 bits/byte, 99.99% saved, 584000 in, 50 out. $ ls -l -rw-rw-r--  1 sam sam 50 Mar  1 22:31 letter_e.bz2 

.bz2 filename extension

Now the file is only 50 bytes long. The bzip2 utility also renamed the file, appending .bz2 to its name. This naming convention reminds you that the file is compressed; you would not want to display or print it, for example, without first decompressing it. The bzip2 utility does not change the modification date associated with the file, even though it completely changes the file's contents.

In the following, more realistic example, the file zach.jpg contains a computer graphics image:

 $ ls -l -rw-r--r--  1 sam sam 33287 Mar  1 22:40 zach.jpg 

The gzip utility can reduce the size of the file by only 28 percent because the image is already in a compressed format:

 $ bzip2 -v zach.jpg zach.jpg:  1.391:1,  5.749 bits/byte, 28.13% saved, 33287 in, 23922 out. $ ls -l -rw-r--r--  1 sam sam 23922 Mar  1 22:40 zach.jpg.bz2 

Refer to page 596 and the Bzip2 mini-HOWTO (see page 34 for help finding it) for more information.

bunzip2 and bzcat: Decompress a File

You can use the bunzip2 utility to restore a file that has been compressed with bzip2:

 $ bunzip2 letter_e.bz2 $ ls -l -rw-rw-r--  1 sam sam 584000 Mar  1 22:31 letter_e $ bunzip2 zach.jpg.bz2 $ ls -l -rw-r--r--  1 sam sam  33287 Mar  1 22:40 zach.jpg 

The bzcat utility displays a file that has been compressed with bzip2. The equivalent of cat for .bz2 files, bzcat decompresses the compressed data and displays the contents of the decompressed file. Like cat, bzcat does not change the source file. The pipe in the following example redirects the output of zcat so that instead of being displayed on the screen it becomes the input to head, which displays the first two lines of the file:

 $ bzcat letter_e.bz2 | head -2 eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee 

After bzcat is run, the contents of is unchanged; the file is still stored on the disk in compressed form.


The bzip2recover utility supports limited data recovery from media errors. Give the command bzip2recover followed by the name of the file from which you want to try to recover data.

gzip: Compresses a File

gunzip and zcat

The gzip (GNU zip) utility is older and less efficient than bzip2. Its flags and operation are very similar to those of bzip2. A file compressed by gzip is marked by a .gz filename extension. Linux stores manual pages in gzip format to save disk space; likewise, files you download from the Internet are frequently in gzip format. Use gzip, gunzip, and zcat just as you would use bzip2, bunzip2, and bzcat. Refer to page 688 for more information on gzip.


The compress utility can also compress files, albeit not as well as gzip. This utility marks a file it has compressed by adding .Z to its name.

tip: gzip versus zip

Do not confuse gzip and gunzip with the zip and unzip utilities. These last two are used to pack and unpack zip archives containing several files compressed into a single file that has been imported from or is being exported to Windows. The zip utility constructs a zip archive, whereas unzip unpacks zip archives. The zip and unzip utilities are compatible with PKZIP, a Windows compress and archive program.

tar: Packs and Unpacks Files

The tar utility performs many functions. Its name is short for tape archive, as its original function was to create and read archive and backup tapes. Today it is used to create a single file (called a tar file) from multiple files or directory hierarchies and to extract files from a tar file.

In the following example, the first ls shows the existence and sizes of the files g, b, and d. Next tar uses c (create), v (verbose), and f (write to or read from a file) options[2] to create an archive named all.tar from these files. Each line of the output from tar starts with the letter a to indicate that it is appending to the archive. This letter is followed by the name of the file tar is appending.

[2] Although the original UNIX tar did not use a leading hyphen to indicate an option on the command line, it now accepts hyphens. The GNU tar described here will accept tar commands with or without a leading hyphen. This book uses the hyphen for consistency with most other utilities.

The tar utility does add overhead when it creates an archive. The next command shows that the archive file all.tar is about 9,700 bytes, whereas the sum of the sizes of the three files is about 6,000 bytes. This overhead is more appreciable on smaller files, such as the ones in this example.

 $ ls -l g b d -rw-r--r--   1 jenny jenny 1302 Aug 20 14:16 g -rw-r--r--   1 jenny other 1178 Aug 20 14:16 b -rw-r--r--   1 jenny jenny 3783 Aug 20 14:17 d 

 $ tar -cvf all.tar g b d a g a b a d $ ls -l all.tar -rw-r--r--   1 jenny     jenny        9728 Aug 20 14:17 all.tar $ tar -tvf all.tar -rw-r--r-- jenny/jenny    1302 2003-08-20 14:16 2005 g -rw-r--r-- jenny/other    1178 2003-08-20 14:16 2005 b -rw-r--r-- jenny/jenny    3783 2003-08-20 14:17 2005 d 

The final command in the preceding example uses the t option to display a table of contents for the archive. Use x instead of t to extract files from a tar archive. Omit the v option if you want tar to do its work silently.

You can use bzip2, compress, or gzip to compress tar files and make them easier to store and handle. Many files you download from the Internet are in one of these formats. Files that have been processed by tar and compressed by bzip2 frequently have a filename extension of .tar.bz2. Those processed by tar and gzip have an extension of .tz or .tar.gz, while files processed by tar and compress use .tar.Z as the extension.

You can unpack a tarred and gzipped file in two steps. (Follow the same procedure if the file was compressed by bzip2, but use bunzip2 instead of gunzip.) The next example shows how to unpack the GNU make utility after it has been downloaded (

 $ ls -l mak* -rw-rw-r--  1 sam sam 1211924 Jan 20 11:49 make-3.80.tar.gz $ gunzip mak* $ ls -l mak* -rw-rw-r--  1 sam sam 4823040 Jan 20 11:49 make-3.80.tar $ tar -xvf mak* make-3.80/ make-3.80/po/ make-3.80/po/ ... make-3.80/tests/ make-3.80/tests/ 

The first command lists the downloaded tarred and gzipped file: make-3.80.tar.gz (about 1.2 megabytes). The asterisk (*) in the filename matches any characters in any filenames (page 129), so you end up with a list of files whose names begin with mak; in this case there is only one. Using an asterisk saves typing and can improve accuracy with long filenames. The gunzip command decompresses the file and yields make-3.80.tar (no .gz extension), which is about 4.8 megabytes. The tar command creates the make-3.80 directory in the working directory and unpacks the files into it.

 $ ls -ld mak* drwxrwxr-x  8 sam sam    4096 Oct  3  2002 make-3.80 -rw-rw-r--  1 sam sam 4823040 Jan 20 11:49 make-3.80.tar $ ls -l make-3.80 total 1816 -rw-r--r--  1 sam sam  24687 Oct  3  2002 ABOUT-NLS -rw-r--r--  1 sam sam   1554 Jul  8  2002 AUTHORS -rw-r--r--  1 sam sam  18043 Dec 10  1996 COPYING ... -rw-r--r--  1 sam sam  16520 Jan 21  2000 vmsify.c -rw-r--r--  1 sam sam  16409 Aug  9  2002 vpath.c drwxrwxr-x  5 sam sam   4096 Oct  3  2002 w32 

After tar exTRacts the files from the archive, the working directory contains two files whose names start with mak: make-3.80.tar and make-3.80. The d (directory) option causes ls to display only file and directory names, not the contents of directories as it normally does. The final ls command shows the files and directories in the make-3.80 directory. Refer to page 786 for more information on tar.

caution: tar: the x option may extract a lot of files

Some tar archives contain many files. Run tar with the t option and the name of the tar file to list the files in the archive without unpacking them. In some cases you may want to create a new directory (mkdir [page 80]), move the tar file into that directory, and expand it there. That way the unpacked files do not mingle with your existing files, and there is no confusion. This strategy also makes it easier to delete the extracted files. Some tar files automatically create a new directory and put the files into it. Refer to the preceding example.

caution: tar: the x option can overwrite files

The x option to tar overwrites a file that has the same filename as a file you are extracting. Follow the suggestion in the preceding caution box to avoid overwriting files.


You can combine the gunzip and tar commands on one command line with a pipe ( | ), which redirects the output of gunzip so that it becomes the input to tar:

 $ gunzip -c make-3.80.tar.gz | tar -xvf - 

The c option causes gunzip to send its output through the pipe instead of creating a file. Refer to "Pipes" (page 122), gzip (page 688), and tar (page 786) for more information about how this command line works.

A simpler solution is to use the z option to tar. This option causes tar to call gunzip (or gzip when you are creating an archive) directly and simplifies the preceding command line to

 $ tar -xvzf make-3.80.tar.gz 

In a similar manner, the j option calls bzip2 or bunzip2.

     < Day Day Up > 

    A Practical Guide to LinuxR Commands, Editors, and Shell Programming
    A Practical Guide to LinuxR Commands, Editors, and Shell Programming
    ISBN: 131478230
    EAN: N/A
    Year: 2005
    Pages: 213 © 2008-2017.
    If you may any questions please contact us: