Compressing Files in Unix


The sole reason for compressing files is to save space. You want to save space, so you make room. This is analogous to closet space in your home. Often, you can buy something that will allow you to organize and condense your space so that you have more room to put more things in. This theoretically is the same concept behind compression. You take something, squeeze it down, and organize it so that you can store more.

You can also compress files to send them to others. This is common with digital photography and today's email clients. Many people have email and digital cameras, and they want to send photos through email. Often an email of this size will be denied because the person you are sending it to may not have enough space on her system, or in the mailbox on the email server, to accept a file that size. Therefore, you will need to compress it.

Makes sense, right? Well, that's all you need to know about why to compress something. Now you need to know the actual mechanics of it.

If you happen to be using a system where disk space is restricted and you need to maximize available space, you can use the Unix commands you will learn here. These commands will reduce the amount of space your files occupy, and will allow you to store more files in the space you are allowed.

There are three major compression formats you will use when working with Unix:

  • Unix program compress makes compressed files

  • Personal or third party Unix programs (such as PKZIP program) make zipped files

  • Unix GNU program gzip makes gzipped files

We will cover all of these in both compressing and decompressing (or uncompressing) formats, as you will need to know how to decompress something that you compressed. Each of these formats has a set of programs for compressing and uncompressing. For our first example, we shall look at the standard (and hardly used) compress tool that comes with almost every distribution of Unix.

Why Compress? By applying an algorithm, you can compress files for the purpose of conserving space or speeding up file transfers.


The compress Command

Use this command and specify what you want to compress. The compress command, when used, will be seen as compress <filename>. The compress command is an older Unix command that uses an older algorithm to make the compression. In fact, this tool is not commonly used anymore, but it does exist on just about every version of Unix. Better compression algorithms have since been developed; that's why it's been moved to the side and replaced by tools such as gzip.

Files created with the compress command have the file suffix .Z. This will appear in the directory in which you compressed the original file, and can be seen by using the ls -l command.

The uncompress command uncompresses the results of a compress command. To use the uncompress command, you issue the command as uncompress <filename.Z>.

Remember learning about how the cat command can be used to read files? The zcat command is a version of cat that reads compressed files rather than normal text files. Using zcat is similar to using compress and uncompress; issue the command as zcat <filename.Z>. Remember, since you already compressed a file, the file suffix is .Z.

May the GNU Be with You It's common to use compression utilities if you are trying to save space. Don't get too hung up on using compress; instead consider using gzip.


The gzip Command

Using the compress command will get you the results you need, but again, the utility is older and does not work as well as newer ones. Also, the Unix version of compress can be slightly altered as you go from distribution to distribution. Any variance is not good as you may not be able to compress with one utility and decompress with another. To make this point clearer, consider why you would use compress: because it is the only thing you either know or have. It is located on your local Unix system and is there for use.

What if you wanted to use something that was a little less likely to be proprietary? The gzip command (stands for GNU zip) is the original file compression program for GNU/Linux and has been adopted for use with all Unix systems under the GPL (GNU Public License). This means that it is free for use and standardized as a common tool that almost everyone in Unix and Linux environments will use. Current versions of gzip produce files with a .gz extension.

The gzip command will work essentially identically to the compress/uncompress/zcat suite we just talked about. It is a better utility and less proprietary than the older tools in use such as compress.

To make your life a bit easier, GNU has included the capability to deal with compressed (.Z) files in their gunzip and gzcat utilities. You might find that gzip and gunzip exist on your system, but that gzcat is missing. Some distributions have renamed gzcat to zcat because it handles compressed files as well.

When gzip is combined with tar (which stands for Tape Archive and will be discussed later), the resulting file extensions may be .tgz, .tar.gz, or tar.Z.

zip/unzip

As we wind down to the end of our compression utilities offerings that can be used with Unix, let's cover the last of the commonly seen utilities used for compression and decompression. Most PC users, whether familiar with Unix or not, know about Zip files. The zip command offers compression that is based on the algorithm from the PC standard PKZip program. The zip and unzip programs work exactly as you might expect them to: zip <filename> to compress a file with zip, and unzip <filename.z> to unzip the files.

What Can bzip2 Do for You? Also appearing recently is the bzip2 compression utility, which despite being the newer kid on the block, looks very promising for tight compression. You can learn more about this tool at the bzip website at http://www.bzip.org/.

bzip2 is a freely available, high-quality data compressor. The current version is 1.0.3, released February 15, 2005, so it's still being updated as of the writing of this book.

It typically compresses files to within 10% to 15% of the best available techniques, while still being around twice as fast at compression and six times faster at decompression. This being said, it would make sense that if you need to have this higher rate of compression, you should use this utility. Many users still faithfully use gzip.

The syntax and options for bzip2 have intentionally been made similar to gzip, so if you encounter this program as it grows in popularity, you won't have too much trouble figuring it out. Compression with bzip2 follows the gzip format bzip2 <filename>, which produces the compressed file <filename.bz2>. Decompression is simply bunzip2 <filename.bz2>.

Since this utility is not common to see or use, if you encounter bzip2 and need to do more than trivial compressions or decompressions, it is recommended that you consult your local man pages for more current information.


Creating files using the zip format (which uses the file suffix .z in Unix) for distribution to other Unix users is generally not a good idea, as zip and unzip are not always available to Unix users. These utilities are freeware, so get your system administrator to install them if you need to have access to them.

If your target, however, is users of Macintosh or Windows computers, zip is a file format that they can most likely read. Both the zip and unzip programs have a number of potentially useful options, a list of which can be displayed by issuing either command followed by the option h.

In this section of the lesson, we have covered how to compress data, and we lightly touched on the use of the tar command. In the next section, we will dig deeper into the tar command and cover its use.



    SAMS Teach Yourself Unix in 10 Minutes
    Sams Teach Yourself Unix in 10 Minutes (2nd Edition)
    ISBN: 0672327643
    EAN: 2147483647
    Year: 2005
    Pages: 170

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net