Chapter 8. Archiving and Compression


Although the differences are sometimes made opaque in casual conversation, there is in fact a complete difference between archiving files and compressing them. Archiving means that you take 10 files and combine them into one file, with no difference in size. If you start with 10 100KB files and archive them, the resulting single file is 1000KB. On the other hand, if you compress those 10 files, you might find that the resulting files range from only a few kilobytes to close to the original size of 100KB, depending upon the original file type.

Note

In fact, you might end up with a bigger file during compression! If the file is already compressed, compressing it again adds extra overhead, resulting in a slightly bigger file.


All of the archive and compression formats in this chapterzip, gzip, bzip2, and tarare popular, but zip is probably the world's most widely used format. That's because of its almost universal use on Windows, but zip and unzip are well supported among all major (and most minor) operating systems, so things compressed using zip also work on Linux and Mac OS. If you're sending archives out to users and you don't know which operating systems they're using, zip is a safe choice to make.

gzip was designed as an open-source replacement for an older Unix program, compress. It's found on virtually every Unix-based system in the world, including Linux and Mac OS X, but it is much less common on Windows. If you're sending files back and forth to users of Unix-based machines, gzip is a safe choice.

The bzip2 command is the new kid on the block. Designed to supersede gzip, bzip2 creates smaller files, but at the cost of speed. That said, computers are so fast nowadays that most users won't notice much of a difference between the times it takes gzip or bzip2 to compress a group of files.

Note

Linux Magazine published a good article comparing several different compression formats, which you can find at http://www.linux-mag.com/content/view/1678/43/.


zip, gzip, and bzip2 are focused on compression (although zip also archives). The tar command does one thingarchiveand it has been doing it for a long time. It's found almost solely on Unix-based machines. You'll definitely run into tar files (also called tarballs) if you download source code, but almost every Linux user can expect to encounter a tarball some time in his career.



Linux Phrasebook
Linux Phrasebook
ISBN: 0672328380
EAN: 2147483647
Year: 2007
Pages: 288

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net