File Compression and Archiving

File Compression and Archiving

If you've been using a Mac, you have most likely seen files compressed using StuffIt, Aladdin Systems' popular file-compression utility.

In the Unix world there are a handful of commands, already part of the operating system, that are used for archiving and compressing files. It's likely that command-line software you download from the Internet will be compressed using these tools.

gzip

gzip is a program for compressing files. The g in gzip is from GNU (see the sidebar "What Is a GNU?"), and the zip is a reference to an earlier program called zip . gzip is not the same as zip , though. The older zip program not only compresses files, but it also combines multiple files and/or directories into a single file. The gzip program is more widely used than zip in the Unix world because gzip provides better compression (smaller files), and the Unix tar command is the commonly used program to combine multiple files into a single file. (The version of tar on Mac OS X can also compress its output with gzip . See tar below.)

To compress a file using gzip:

  • gzip filename

    gzip attempts to replace the files named in its arguments with compressed files, adding the .gz extension to the filename ( gzip may not be able to replace a file if the file with the new name already exists and cannot be changed, in which case you get an error message). So if the original filename is Picture 1.tiff, the compressed filename will be Picture 1.tiff.gz.

    Figure 4.2 shows an example of compressing all the .tiff files in the current directory. Notice how gzip did not change the modification times of the files, but it changed the group ownership (see "About Users and Groups" in Chapter 8, "Working with Permissions and Ownership," for more on group ownership of files).

    Figure 4.2. Compressing a set of files using gzip .
     localhost:~/Desktop vanilla$  ls -l *.tiff  -rw-r--r--    1 vanilla        wheel          186590 Apr 14 20:49 Picture 1.tiff -rw-r--r--    1 vanilla        wheel          569553 Apr 15 10:41 Picture 2.tiff -rw-r--r--    1 vanilla        vanilla        517795 Apr 15 10:51 Picture 3.tiff localhost:~/Desktop vanilla$  gzip *.tiff  localhost:~/Desktop vanilla$  ls -l *.tiff.gz  -rw-r--r--    1 vanilla        vanilla        93648 Apr 14 20:49 Picture 1.tiff.gz -rw-r--r--    1 vanilla        vanilla        170010 Apr 15 10:41 Picture 2.tiff.gz -rw-r--r--    1 vanilla        vanilla        119743 Apr 15 10:51 Picture 3.tiff.gz localhost:~/Desktop vanilla$ 

Tips

  • You can use gzip to compress the output of other commands. Just pipe the output of the other commands into gzip and redirect the output to a file. (See Chapter 2 to review pipelines.) For example:

     grep root /var/log/mail.log  gzip >  output.gz 

    That command line runs the grep command to search for the string root in the file /var/log/mail.log . The output of grep (the lines containing root ) is piped into the gzip program, and the output of gzip is redirected into the file output.gz .

  • If you use gzip to compress a Mac file that has a resource fork, the file loses both its icon and its association with the application that created it, and thus it may become useless. To safely compress traditional Mac files, use StuffIt.


gunzip

Of course, once you have a compressed file, you'll want to know how to uncompress it.

To uncompress a gzipped file:

  • gunzip filename. gz

    gunzip replaces the compressed file with the uncompressed version, removing the .gz filename extension. For example:

    gunzip "Picture 1.tiff.gz"

    replaces Picture 1.tiff.gz with Picture 1.tiff .

Tips

  • You can use gunzip in a pipeline. If you want the output of gunzip to go into a pipe, use the -c option:

    gunzip -c filename.gz command

    When gunzip is used this way, it does not replace the compressed file.

  • You can use the -t option to prevent removal of the compressed file.


What Is a GNU?

GNU is a recursive acronym (an acronym that refers to itself) that stands for Gnu's Not Unix . The GNU project is the domain of the Free Software Foundation, which coordinates a huge volunteer effort to create a complete Unix-like operating system that has the following four freedoms:

  • Freedom to run the program, for any reason

  • Freedom to study the internal workings of the program, and to alter them to suit you

  • Freedom to redistribute copies so that you can help other people

  • Freedom to make the program better, and make your changes available to the public, so that the others benefit from your efforts

There are hundreds of GNU software packages. There is even a GNU-Darwin project: the GNU-Darwin Distribution (http://gnu-darwin. sourceforge .net).

There's more information on the GNU project and the Free Software Foundation at GNU's Not Unix! (www.gnu.org).


tar

In many cases you will want to compress an entire directory (folder). Unlike the StuffIt program, gzip does not compress directories. To deal with this, you use another program from the command line, called tar , which creates a single file from a directory and all its contents (a tar file ). tar was originally used only for making backups to tape systems (hence its name, for tape archive ), but is now used far more frequently to combine entire folders into a single tar file prior to compression.

The version of tar included with Mac OS X can simultaneously combine a directory full of files into one file and compress the result with gzip .

To archive a directory using tar:

  • tar -cvzf newfile. tar directoryname

    The tar command needs the -c option to tell it to create a new archive.

    The v option tells tar to "be verbose" and show you the names of all the files it is processing. You can leave it out if you like.

    The z option tells tar to compress the result using gzip . Leave it out if you don't want compressed output.

    The f option tells tar that you are specifying a filename for the new archive. Without the f option, tar assumes that you are trying to write to an attached tape drive. The f option must come last right before the new filename, not before the source filename.

    Figure 4.3 shows a listing of the contents of the code directory and then output from tar when you create a tar file with

    tar -cvzf code.tar.gz code

    Figure 4.3. Listing the contents of a directory and then archiving it with tar .
     localhost:~/Desktop vanilla$  ls -l code  total 16 -rw-r--r--  1 matisse      staff      156 Nov 4 09:57 Changes -rw-r--r--  1 matisse      staff      1444 Nov 4 09:57 README drwxr-xr-x  4 matisse      staff      92 Apr 16 12:44 src [localhost:~/Desktop] vanilla%  tar -cvzf code.tar.gz code  code code/Changes code/README code/src code/src/syntax.c code/src/syntax.h localhost:~/Desktop vanilla$  ls -l code.tar.gz  -rw-r--r--  1 matisse      staff      3203 Apr 16 12:56 code.tar.gz localhost:~/Desktop vanilla$ 

    tar does not replace the original directory.

    Notice that the output from tar in Figure 4.3 lists all the files and subdirectories that are included in the archive (a result of the -v for "verbose" option).

Tips

  • tar has a plethora of options. See man tar for the complete list.

  • If you want to use the tar command in a pipeline, use the special filename - (a hyphen). This is especially useful if you are working on a Unix system where the version of tar cannot compress its output. On those systems you might use

    tar -cvf - code gzip > code.tar.gz


Of course, once you have a tar file, you want to be able to reverse the process and turn it back into a directory (this is called untarring or unpacking ).

To unpack a tar archive:

  • tar -xvzf code.tar

    This produces output as shown in Figure 4.4 . The -x option tells tar that you want to e x tract files from a tar file. The v and f options have the same meanings as before: v means "be verbose," and f means "extract from a file, not a tape drive."

    Figure 4.4. Extracting the contents of a compressed tar file. If the directory "code" already exists, the extracted files will be placed inside it, possibly overwriting existing files.
     localhost:~/Desktop vanilla$  tar -xvzf code.tar.gz  code code/Changes code/README code/src code/src/syntax.c code/src/syntax.h localhost:~/Desktop vanilla$ 

    Omit the z option if the archive is not compressed.

    Unlike gunzip , tar does not replace the archive when you extract files from it.

    If there is already a directory called "code" in the directory where you do this, then the extracted files will be placed inside it, possibly overwriting existing files.



Unix for Mac OS X 10. 4 Tiger. Visual QuickPro Guide
Unix for Mac OS X 10.4 Tiger: Visual QuickPro Guide (2nd Edition)
ISBN: 0321246683
EAN: 2147483647
Year: 2004
Pages: 161
Authors: Matisse Enzer

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net