37 Working with Compressed Files


#37 Working with Compressed Files

Throughout the years of Unix development, few programs have been reconsidered and redeveloped more times than compress . On most Linux systems there are three significantly different compression programs available: compress , gzip , and bzip2 . Each has a different suffix, .Z , .gz , and .bz2 , respectively, and the degree of compression of the results can vary among the three programs, depending on the layout of data within a file.

Regardless of the level of compression, and regardless of which compression programs are installed, working with compressed files on many Unix systems requires uncompressing them by hand, accomplishing the desired tasks , and recompressing them when finished. A perfect job for a shell script!

The Code

 #!/bin/sh # zcat, zmore, and zgrep - This script should be either symbolically #   linked or hard linked to all three names - it allows users to work with #   compressed files transparently.  Z="compress";  unZ="uncompress" ;  Zlist="" gz="gzip"    ; ungz="gunzip"     ; gzlist="" bz="bzip2"   ; unbz="bunzip2"    ; bzlist="" # First step is to try and isolate the filenames in the command line. # We'll do this lazily by stepping through each argument, testing to # see if it's a filename or not. If it is, and it has a compression # suffix, we'll uncompress the file, rewrite the filename, and proceed. # When done, we'll recompress everything that was uncompressed. for arg do   if [ -f "$arg" ] ; then     case "$arg" in        *.Z) $unZ "$arg"             arg="$(echo $arg  sed 's/\.Z$//')"             Zlist="$Zlist \"$arg\""             ;;       *.gz) $ungz "$arg"             arg="$(echo $arg  sed 's/\.gz$//')"             gzlist="$gzlist \"$arg\""             ;;      *.bz2) $unbz "$arg"             arg="$(echo $arg  sed 's/\.bz2$//')"             bzlist="$bzlist \"$arg\""             ;;     esac   fi   newargs="${newargs:-""} \"$arg\"" done case 
 #!/bin/sh # zcat, zmore, and zgrep - This script should be either symbolically # linked or hard linked to all three names - it allows users to work with # compressed files transparently . Z="compress"; unZ="uncompress" ; Zlist="" gz="gzip" ; ungz="gunzip" ; gzlist="" bz="bzip2" ; unbz="bunzip2" ; bzlist="" # First step is to try and isolate the filenames in the command line. # We'll do this lazily by stepping through each argument, testing to # see if it's a filename or not. If it is, and it has a compression # suffix, we'll uncompress the file, rewrite the filename, and proceed. # When done, we'll recompress everything that was uncompressed. for arg do if [ -f "$arg" ] ; then case "$arg" in *.Z) $unZ "$arg" arg="$(echo $arg  sed 's/\.Z$//')" Zlist="$Zlist \"$arg\"" ;; *.gz) $ungz "$arg" arg="$(echo $arg  sed 's/\.gz$//')" gzlist="$gzlist \"$arg\"" ;; *.bz2) $unbz "$arg" arg="$(echo $arg  sed 's/\.bz2$//')" bzlist="$bzlist \"$arg\"" ;; esac fi newargs="${newargs:-""} \"$arg\"" done case $0 in *zcat* ) eval cat $newargs ;; * zmore * ) eval more $newargs ;; *zgrep* ) eval grep $newargs ;; * ) echo "$0: unknown base name . Can't proceed." >&2; exit 1 esac # now recompress everything if [ ! -z "$Zlist" ] ; then eval $Z $Zlist fi if [ ! -z "$gzlist"] ; then eval $gz $gzlist fi if [ ! -z "$bzlist" ] ; then eval $bz $bzlist fi # and done exit 0 
in *zcat* ) eval cat $newargs ;; *zmore* ) eval more $newargs ;; *zgrep* ) eval grep $newargs ;; * ) echo "
 #!/bin/sh # zcat, zmore, and zgrep - This script should be either symbolically # linked or hard linked to all three names - it allows users to work with # compressed files transparently . Z="compress"; unZ="uncompress" ; Zlist="" gz="gzip" ; ungz="gunzip" ; gzlist="" bz="bzip2" ; unbz="bunzip2" ; bzlist="" # First step is to try and isolate the filenames in the command line. # We'll do this lazily by stepping through each argument, testing to # see if it's a filename or not. If it is, and it has a compression # suffix, we'll uncompress the file, rewrite the filename, and proceed. # When done, we'll recompress everything that was uncompressed. for arg do if [ -f "$arg" ] ; then case "$arg" in *.Z) $unZ "$arg" arg="$(echo $arg  sed 's/\.Z$//')" Zlist="$Zlist \"$arg\"" ;; *.gz) $ungz "$arg" arg="$(echo $arg  sed 's/\.gz$//')" gzlist="$gzlist \"$arg\"" ;; *.bz2) $unbz "$arg" arg="$(echo $arg  sed 's/\.bz2$//')" bzlist="$bzlist \"$arg\"" ;; esac fi newargs="${newargs:-""} \"$arg\"" done case $0 in *zcat* ) eval cat $newargs ;; * zmore * ) eval more $newargs ;; *zgrep* ) eval grep $newargs ;; * ) echo "$0: unknown base name . Can't proceed." >&2; exit 1 esac # now recompress everything if [ ! -z "$Zlist" ] ; then eval $Z $Zlist fi if [ ! -z "$gzlist"] ; then eval $gz $gzlist fi if [ ! -z "$bzlist" ] ; then eval $bz $bzlist fi # and done exit 0 
: unknown base name. Can't proceed." >&2; exit 1 esac # now recompress everything if [ ! -z "$Zlist" ] ; then eval $Z $Zlist fi if [ ! -z "$gzlist"] ; then eval $gz $gzlist fi if [ ! -z "$bzlist" ] ; then eval $bz $bzlist fi # and done exit 0

How It Works

For any given suffix, three steps are necessary: uncompress the file, rewrite the filename without the suffix, and add it to the list of files to recompress at the end of the script. By keeping three separate lists, one for each compression program, this script also lets you easily grep across files compressed using multiple compression utilities.

The most important trick is the use of the eval directive when recompressing the files. This is necessary to ensure that filenames with spaces are treated properly. When the Zlist , gzlist , and bzlist variables are instantiated , each argument is surrounded by quotes, so a typical value might be ""sample.c" "test.pl" "penny.jar"" . Because the list has levels of quotes, invoking a command like cat $Zlist results in cat complaining that file "sample.c" wasn't found. To force the shell to act as if the command were typed at a command line (where the quotes are stripped once they have been utilized for arg parsing), eval is used, and all works as desired.

Running the Script

To work properly, this script should have three names. How do you do that in Unix? Simple: links. You can use either symbolic links, which are special files that store the names of link destinations, or hard links, which are actually assigned the same inode as the linked file. I prefer symbolic links. These can easily be created (here the script is already called zcat):

 $  ln -s zcat zmore  $  ln -s zcat zgrep  

Once that's done, you have three new commands that have a shared code base, and each accepts a list of files to process as needed, uncompressing and then recompressing them when done.

The Results

The standard compress utility quickly shrinks down ragged.txt and gives it a .Z suffix:

 $  compress ragged.txt  

With ragged.txt in its compressed state, we can view the file with zcat :

 $  zcat ragged.txt.Z  So she sat on, with closed eyes, and half believed herself in Wonderland, though she knew she had but to open them again, and all would change to dull reality--the grass would be only rustling in the wind, and the pool rippling to the waving of the reeds--the rattling teacups would change to tinkling sheep-bells, and the Queen's shrill cries to the voice of the shepherd boy--and the sneeze of the baby, the shriek of the Gryphon, and all the other queer noises, would change (she knew) to the confused clamour of the busy farm-yard--while the lowing of the cattle in the distance would take the place of the Mock Turtle's heavy sobs. 

And then search for "teacup" again:

 $  zgrep teacup ragged.txt.Z  rattling teacups would change to tinkling sheep-bells, and the 

All the while, the file starts and ends in its original compressed state:

 $  ls -l ragged.txt*  -rw-r--r--  1 taylor  staff  443 Jul  7 16:07 ragged.txt.Z 

Hacking the Script

Probably the biggest weakness of this script is that if it is canceled in midstream, the file is guaranteed to recompress. This can be fixed with a smart use of the trap capability and a recompress function that does error checking. That would be a nice addition.




Wicked Cool Shell Scripts. 101 Scripts for Linux, Mac OS X, and Unix Systems
Wicked Cool Shell Scripts
ISBN: 1593270127
EAN: 2147483647
Year: 2004
Pages: 150
Authors: Dave Taylor

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net