Project27.Compress Files


Project 27. Compress Files

"How do I compress and uncompress files?"

This project shows you how to compress (or zip) files. It covers the commands gzip, gunzip, bzip2, bunzip2, zip, and unzip.

Compress and Uncompress

Zipping is a cross-platform way to compress files. The compression is lossless, meaning that the original file can be reconstructed verbatim from the compressed file. Specialized compression techniques, such as the JPEG image compression format, are lossy, meaning that some information from the original image is lost in the compressed image.

Two compression formats are in widespread use:

  • The original Lempel-Ziv coding (LZ77), implemented by the commands zip and unzip, and the GNU equivalents gzip and gunzip. We'll concentrate on the GNU equivalents in this project.

  • The newer Burrows-Wheeler algorithm, implemented by the commands bzip2 and bunzip2. Compression generally is considerably better than that achieved by LZ77.

Many files can be compressed into a single archive file with command zip or the Unix "tape archiver" tar, which is covered in Project 28.

Let's simply compress and uncompress a file to demonstrate gzip and gunzip.

$ ls -lh -rw-r--r-- 1 saruman saruman 1M ... list-all.txt $ gzip list-all.txt $ ls -lh -rw-r--r-- 1 saruman saruman 282K ... list-all.txt.gz


You'll notice three things: The compressed file is considerably smaller than the original, it has replaced the original, and it sports the extension .gz.

Tip

To view information about a compressed file, use

$ gzip -l --verbose ¬     list-all.txt.gz



Now let's uncompress the file (extension .gz is assumed if not given).

$ gunzip list-all.txt $ ls -lh -rw-r--r-- 1 saruman saruman  1M ... list-all.txt


You may want to keep the original file when, for example, you compress a file to email it. Use option -c, which sends the compressed file to standard out, and redirect standard out to an appropriately named file.

$ gzip -c list-all.txt > list-all.txt.gz $ ls -lh -rw-r--r-- 1 saruman saruman   1M ... list-all.txt -rw-r--r-- 1 saruman saruman 282K ... list-all.txt.gz


Tip

The gzcat command is equivalent to (and a few characters shorter than) gunzip -c. Also, gzip -d is equivalent to gunzip, and gzip -dc is equivalent to gunzip -c and also gzcat.


For the reverse case, in which you want to expand the compressed file and keep the original compressed copy, use gunzip with option -c. You must include the .gz extension in the filename when an uncompressed file with the same filename also exists.

$ gunzip -c list-all.txt > copy-of-list-all.txt gunzip: list-all.txt: not in gzip format $ gunzip -c list-all.txt.gz > copy-of-list-all.txt $ ls -lh -rw-r--r-- 1 saruman saruman   1M ... copy-of-list-all.txt -rw-r--r-- 1 saruman saruman   1M ... list-all.txt -rw-r--r-- 1 saruman saruman 282K ... list-all.txt.gz


Options -1 tHRough -9 are used to set compression levels in gzip. Higher settings yield smaller compressed files but also increase compression times. The default setting is -6, so specify an option in the range -7 to -9 for better but slower compression, or use a setting from -5 to -1 for faster compression but larger compressed files.

$ gzip -9 -c list-all.txt >best.gz $ gzip -1 -c list-all.txt >worst.gz $ ls -lh -rw-r--r-- 1 saruman saruman 271K ... best.gz -rw-r--r-- 1 saruman saruman   1M ... list-all.txt -rw-r--r-- 1 saruman saruman 345K ... worst.gz


Option --best is equivalent to -9, and --fast is equivalent to -1.

Tip

The commands cat, more, grep, and diff have z-variants ( gzcat, zmore, zgrep, and zdiff) that operate directly on zipped files.


Create Compressed Archives

Many files can be compressed into a single file with a command like

$ gzip -c *.txt > all.gz


Be warned, however, that when all.gz is uncompressed, it will not be split back into its constituent files.

If you want to archive many files into a single compressed file and be able to recover them as individual files, either use zip and unzip, or archive them first by using the tar command (see Project 28).

Here's an example that uses zip to compress all the files in a directory called week1 into a single file. Command zip takes the name of the archive as its first argument, followed by a list of files to be deflated into the archive file. The wildcard pathname week1/* denotes every file in directory week1.

$ zip week1.zip week1/*   adding: week1/friday.ws (deflated 48%)   adding: week1/monday.ws (deflated 47%)   adding: week1/thursday.ws (deflated 48%)   adding: week1/tuesday.ws (deflated 46%)   adding: week1/wednesday.ws (deflated 46%)


We can examine the contents of a zip file by giving option -l to unzip.

$ unzip -l week1.zip Archive: week1.zip  Length    Date    Time   Name  ------    ----    ----   ----    1712  05-03-104  17:22  week1/friday.ws    1593  05-03-104  17:22  week1/monday.ws    1546  05-03-104  17:22  week1/thursday.ws    1598  05-03-104  17:22  week1/tuesday.ws    1545  05-03-104  17:22  week1/wednesday.ws  ------                   -------    7994                   5 files


(These files were apparently created in the year 104!)

To unzip the archive, use

$ unzip week1.zip Archive: week1.zip   inflating: week1/friday.ws   inflating: week1/monday.ws   inflating: week1/thursday.ws   inflating: week1/tuesday.ws   inflating: week1/wednesday.ws


Other Formats

The gunzip command can uncompress files compressed with gzip, zip, and the older compress. The zip command appends the extension .z, and compress appends the extension .Z.

gunzip cannot handle files compressed with zip that have more than one member. If gunzip gives an error message complaining about more than one entry, use unzip instead. You'll get such an error message when trying to gunzip an archive created by the Mac OS X Finder.

$ gunzip week1.gz


gunzip: week1.gz has more than one entry -- unchanged


To find out more about zip and unzip, run them without any arguments. Versions of Mac OS X older than 10.4 do not have man pages for either of them.

Use bzip2

The bzip2 and bunzip2 commands are very similar to gzip and gunzip but use the newer Burrows-Wheeler algorithm to provide better compression. They use the extension .bz2 or sometimes just .bz.

Here's a quick demonstration of gzip versus bzip2.

$ gzip -9 -c list-all.txt > list-all.txt.gz $ bzip2 -9 -c list-all.txt > list-all.txt.bz2 $ ls -lh -rw-r--r-- 1 saruman saruman 1M ... list-all.txt -rw-r--r-- 1 saruman saruman 222K ... list-all.txt.bz2 -rw-r--r-- 1 saruman saruman 271K ... list-all.txt.gz


If you attempt to uncompress a damaged bzip2 file, bunzip2 will warn you of data corruption. There's a chance that you can recover the compressed file by using the bzip2recover command.

Tip

The commands cat, more, less, grep, egrep, fgrep, and diff have bz -variants (bzcat, bzmore, bzless, bzgrep, bzegrep, bzfgrep, and bzdiff) that operate directly on b-zipped files, without requiring decompression.





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net