14.1 Creating Effective Shell Scripts

In this section, we'll consider several different routine system administration tasks as examples of creating and using administrative shell scripts. The discussions are meant to consider not only these tasks in themselves but also the process of writing scripts. Most of the shell script examples use the Bourne shell, but you can use any shell you choose; it's merely a Unixprejudice that "real shell programmers use the Bourne/Korn/zsh shell," however prevalent that attitude/article of faith may be.^[1]

^[1] Once upon a time, the C shell had bugs that made writing administrative C shell scripts somewhat dicey. Although the versions of the C shell in current operating systems have fixed these bugs, the attitude that the C shell is unreliable persists. In addition, the C shell is considered poorly designed by many scripting gurus.

14.1.1 Password File Security

We discussed the various security issues surrounding the password file in Section 7.8 and Section 6.1. The various commands used to check it and its contents could be combined easily in a shell script. Here is one version (named ckpwd):

#!/bin/sh # ckpwd - check password file (run as root) # # requires a saved password file to compare against: #     /usr/local/admin/old/opg # umask 077  PATH="/bin:/usr/bin"; export PATH cd /usr/local/admin/old                # stored passwd file location echo ">>> Password file check for `date`"; echo "" echo "*** Accounts without passwords:"  grep '^[^:]*::' /etc/passwd  if [ $? -eq 1 ]                         # grep found no matches  then     echo "None found."  fi  echo "" # Look for extra system accounts echo "*** Non-root UID=0 or GID=0 accounts:"  grep ':00*:' /etc/passwd | \  awk -F: 'BEGIN       {n=0}          $1!="root"  {print $0 ; n=1}          END         {if (n==0) print "None found."}'  echo "" sort </etc/passwd >tmp1  sort <opg >tmp2                         # opg is the previously saved copy  echo "*** Accounts added:"  comm -23 tmp[1-2]                       # lines only in /etc/passwd  echo ""  echo "*** Accounts deleted:"  comm -13 tmp[1-2]                       # lines only in ./opg  echo ""  rm -f tmp[1-2] echo "*** Password file protection:"  echo "-rw-r--r--  1 root     wheel>>> correct values"  ls -l /etc/passwd echo ""; echo ">>> End of report."; echo ""

The script surrounds each checking operation with echo and other commands designed to make the output more readable so that it can be scanned quickly for problems. For example, the grep command that looks for non-root UID 0 accounts is preceded by an echo command that outputs a descriptive header. Similarly, the grep command's output is piped to an awk command that removes the root entry from its output and displays the remaining accounts or the string "None found" if no other UID or GID 0 accounts are present.

Instead of using diff to compare the current password file with the saved version, the script uses comm twice, to present the added and deleted lines separately (entries that have changed appear in both lists). The script ends with a simple ls command; the administrator must manually compare its output to the string displayed by the preceding echo command. However, this comparison also could be automated by piping ls's output to awk and explicitly comparing the relevant fields to their correct values. (I'll leave the implementation of the latter as an exercise for the reader.)

Here is some sample output from ckpwd:

>>> Password file check for Fri Jun 14 15:48:26 EDT 2002 *** Accounts without passwords:  None found. *** Non-root UID=0 or GID=0 accounts:  badboy:lso9/.7sJUhhs:000:203:Bad Boy:/home/bb:/bin/csh *** Accounts added:  chavez:9Sl.sd/i7snso:190:20:Rachel Chavez:/home/chavez:/bin/csh  wang:l9jsTHn7Hg./a:308:302:Rick Wang:/home/wang:/bin/sh *** Accounts deleted:  chavez:Al9ddmL.3qX9o:190:20:Rachel Chavez:/home/chavez:/bin/csh *** Password file protection:  -rw-r--r--  1 root     system     >>> correct values  -rw-r--r--  1 root     system     1847 Jun 11 22:38 /etc/passwd >>> End of report.

If you don't like all the bells and whistles, the script needn't be this fancy. For example, its two sort, two comm, and five other commands in the section comparing the current and saved password files could easily be replaced by the diff command we looked at in Section 7.8 (and possibly one echo command to print a header). In the extreme case, the entire script could consist of just the four commands we looked at previously:

#!/bin/sh # minimalist version of ckpwd  /usr/bin/grep '^[^:]*::' /etc/passwd  /usr/bin/grep ':00*:' /etc/passwd  /usr/bin/diff /etc/passwd /usr/local/admin/old/opg  /usr/bin/ls -l /etc/passwd

How much complexity is used depends on your own taste and free time. More complexity usually means it takes longer to debug.

Whatever approach you take, ckpwd needs to be run regularly to be effective (probably by cron).

14.1.2 Monitoring Disk Usage

It seems that no matter how much disk storage a system has, the users' needs (or wants) will eventually exceed it. As we discuss in Section 15.6, keeping an eye on disk space is a very important part of system management, and this monitoring task is well suited to automation via shell scripts.

The script we'll consider in this section ckdsk is designed to compare current disk use with what it was yesterday and to save today's data for comparison tomorrow. We'll build the script up gradually, starting with this simple version:

#!/bin/sh  # ckdsk: compares current and saved disk usage # saved data is created with du_init script # PATH="/bin:/usr/bin"; export PATH cd /usr/local/admin/ckdsk if [ ! -s du.sav ] ; then     echo "ckdsk: Can't find old data file du.sav."     echo "       Recreate it with du_init and try again."     exit 1  fi  du -k /iago/home/harvey > du.log  cat du.log | xargs -n2 ../bin/cmp_size 40 100 du.sav mv -f du.log du.sav

After making sure yesterday's data is available, this script checks the disk usage under the directory /iago/home/harvey using du, saving the output to the file du.log. Each line of du.log is fed by xargs to another script, cmp_size^[2], which does the actual comparison, passing it the arguments 40, 100, and "du.sav," as well as the line from the du command. Thus, the first invocation of cmp_size would look something like this:

^[2] On some systems, cmp_size could be a function defined in ckdsk; on others, however, xargs won't accept a function as the command to run.

cmp_size 40 100 du.sav   876 /iago/home/harvey/bin                               Output from du begins with argument 4.

ckdsk ends by replacing the old data file with the saved output from today's du command, in preparation for being run again tomorrow.

This simple version of the ckdsk script is not very general because it works only on a single directory. After looking at cmp_size in detail, we'll consider ways of expanding ckdsk's usefulness. Here is cmp_size:

#!/bin/sh # cmp_size - compare old and new directory size #   $1 (limit)=min. size for new dirs to be included in report #   $2 (dlimit)=min. size change for old dirs to be included #   $3 (sfile)=pathname for file with yesterday's data #   $4 (csize)=current directory size #   $5 (file)=pathname of directory #   osize=previous size (extracted from sfile) #   diff=size difference between yesterday & today  PATH="/bin:/usr/bin"; export PATH if [ $# -lt 5 ] ; then     echo "Usage: cmp_size newlim oldlim data_file size dir"     exit 1  fi # save initial parameters  limit=$1; dlimit=$2; sfile=$3; csize=$4; file=$5;  # get yesterday's data osize=`grep "$file\$" $sfile | awk '{print \$1}'`  if [ -z "$osize" ] ; then              # it's a new directory    if [ $csize -ge $limit ] ; then     # report if size >= limit        echo "new\t$csize\t$file"     fi     exit 0  fi # compute the size change from yesterday  if [ $osize -eq $csize ]  then     exit 0  elif [ $osize -gt $csize ]  then     diff=`expr $osize - $csize`  else     diff=`expr $csize - $osize`  fi # report the size change if large enough  if [ $diff -ge $dlimit ] ; then     echo "$osize\t$csize\t$file"  fi

cmp_size first checks to see that it was passed the right number of arguments. Then it assigns its arguments to shell variables for readability. The first two parameters are cutoff values for new and existing directories, respectively. These parameters allow you to tell cmp_size how much of a change is too small to be interesting (because you don't necessarily care about minor disk usage changes). If the size of the directory specified as the script's fifth parameter has changed by an amount greater than the cutoff value, cmp_size prints the directory name and old and new sizes; otherwise, cmp_size returns silently.

cmp_size finds yesterday's size by greping for the directory name in the data file specified as its third parameter (du.sav is what ckdsk passes it). If grep didn't find the directory in the data file, it's a new one, and cmp_size then compares its size to the new directory cutoff (passed in as its first argument) displaying its name and size if it is large enough.

If grep returns anything, cmp_size then computes the size change for the directory by subtracting the smaller of the old size (from the file and stored in the variable osize) and the current size (passed in as the fourth parameter and stored in csize) from the larger. cmp_size then compares the size difference to the old directory cutoff (passed in as its second argument), and displays the old and new sizes if it is large enough.

cmp_size reports on directories that either increased or decreased in size by the amount of the cutoff. If you are only interested in size increases, you could replace the if statement that computes the value of the diff variable with a much simpler one:

if [ $osize -le $csize ]  then     exit 0                     # only care if it's bigger  else    diff=`expr $osize - $csize`  fi

Unlike the simple version of ckdsk, cmp_size is fairly general; it could also be used, for example, to process output from the quot command.

One way to make ckdsk more useful is to enable it to check more than one starting directory, with different cutoffs for each one. Here is a version that can do that:

#!/bin/sh # chkdsk2 - multiple directories & per-directory cutoffs  PATH="/bin:/usr/bin"; export PATH du_it(  )  { # $1 = cutoff in blocks for new directories # $2 = cutoff as block change for old directories # $3 = starting directory # $4 = flags to du  abin="/usr/local/admin/bin" du $4 $3 > du.tmp  cat du.tmp | xargs -n2 $abin/cmp_size $1 $2 du.sav  cat du.tmp >> du.log; rm du.tmp  } umask 077  cd /usr/local/admin/ckdsk rm -f du.log du.tmp 2>&1 >/dev/null  if [ ! -s du.sav ] ; then     echo "run_cmp: can't find old data file; run du_init."     exit 1  fi  echo "Daily disk usage report for `date`"; echo ''  df  echo ''; echo "Old\tNew"  echo "Size\tSize\tDirectory Name"  echo "------------------------------------------------------"  du_it 40  100 /iago/home/harvey  du_it  1    1 /usr/lib  du_it  1 1000 /home/\* -s  echo "------------------------------------------------------"  echo ''  mv -f du.log du.sav  exit 0

This script uses a function named du_it to perform the du command and pass its output to cmp_size using xargs. The function takes four arguments: the cutoffs for old and new directories (for cmp_size), the starting directory for the du command, and any additional flags to pass to du (optional).

du_it saves du's output into a temporary file, du.tmp, which it appends to the file du.log afterwards; du.log thus accumulates the data from multiple directory checks and eventually becomes the new saved data file, replacing yesterday's version.

The script proper begins by removing any old temporary files from previous runs and making sure its data file (still hardwired as du.sav) is available. It then runs df and prints some header lines for the output from cmp_size. This version of the script then calls du_it three times:

du_it 40  100 /iago/home/harvey  du_it  1    1 /usr/lib  du_it  1 1000 /home/\* -s

It will run du and compare its output to the saved data for the directories /iago/home/harvey, /usr/lib, and all of the subdirectories of /home, passing the du command the -s option in the last case. In the third command, the wildcard is passed through to the actual du command line by quoting it to du_it. Different cutoffs are used for each call. When checking /usr/lib, this version asks to be told about any change in the size of any directory (size or size change greater than or equal to one). In contrast, when checking the users' home directories under /home, the report includes new directories of any size but only existing directories that changed size by at least 1000 blocks.

ckdsk ends by moving the accumulated output file, du.log, on to the saved data file, du.sav, saving the current data for future comparisons.

Here is some sample output from ckdsk:

Daily disk usage report for Tue Jun 11 09:52:46 EDT 2002 File system       Kbytes     used   avail  capacity  Mounted-on  /dev/dsk/c1d1s0    81952    68848   13104     84%    /  /dev/dsk/c1d1s2   373568   354632   18936     94%    /home  /dev/dsk/c1d2s8   667883   438943  228940     66%    /genome Old    New  Size   Size    Directory Name  ------------------------------------------------------  348    48      /iago/home/harvey/g02 new    52      /iago/home/harvey/test  2000   1012    /iago/home/harvey  new    912     /usr/lib/acct/bio  355    356     /usr/lib/spell  34823  32797   /home/chavez  9834   3214    /home/ng  new    300     /home/park  ------------------------------------------------------

The echo commands set off the output from cmp_size and make it easy to scan.

This version of ckdsk requires new du_it commands to be added by hand. The script could be refined further by allowing this information to be external as well, replacing the explicit du_it commands with a loop over the directories and parameters listed in a data file:

cat du.dirs |  while read dir old new opts; do  # default old and new cutoffs to 1  if [ "$old" = "" ]; then     old=1; fi  if [ "$new" = "" ]; then     new=1; fi  if [ -n "$dir" ]; then                   # ignore blank lines     du_it $new $old $dir $opts  fi  done

This version also assigns default values to the cutoff parameters if they are omitted from an entry in the data file.

Similarly, the script currently checks all users' home directories. If only some of them need to be checked, the final du_it command could be replaced by a loop like this one:

for user in chavez havel harvey ng smith tedesco ; do     du_it 1 1000 /home/$user -s  done

Alternatively, the user list could be read in from an external configuration file. We'll look at obtaining data from files in an upcoming example.

The cron facility is also the most sensible way to run ckdsk.

14.1.3 Root Filesystem Backups and System Snapshots

Backing up the root filesystem is a task for which the benefits don't always seem worth the trouble. Still, re-creating all of the changed system configuration files is also very time-consuming, and can be very frustrating when you don't immediately recall which files you changed.

An alternative to backing up the entire root filesystem and other separate system filesystems like /usr and /var is to write a script to copy only the few files that have actually changed since the operating system was installed to a user filesystem, allowing the changed files to be backed up as part of the regular system backup schedule without any further effort on your part. Creating such a script is also a good way to become thoroughly acquainted with all the configuration files on the system. When selecting files to copy, include anything you might ever conceivably change, and err on the side of too many rather than too few files.

Here is a C shell script that performs such a copy:

#!/bin/csh # bkup_sys - backup changed files from system partitions unset path; setenv PATH "/bin:/usr/bin" umask 077 if ("$1" != "") then    set SAVE_DIR="$1" else    set SAVE_DIR="/save/`hostname`/sys_save"  endif set dir_list=`cat /etc/bkup_dirs`  foreach dir ($dir_list)    echo "Working on $dir ..."    if (! -d $SAVE_DIR/$dir) mkdir -p $SAVE_DIR/$dir    set files=`file $dir/{,.[a-zA-Z]}* | \              egrep 'text|data' | awk -F: '{print $1}'`    if ("$files" != "") cp -p $files $SAVE_DIR/$dir:t  end echo "Backing up individual files ..."  foreach file (`cat /usr/local/admin/sysback/bkup_files`)     if ("$file:h" == "$file:t") continue     # not a full pathname     if ("$file:t" == "") continue            # no filename present     if (! -d $SAVE_DIR/$file:h) mkdir -p $SAVE_DIR/$file:h     cp -p $file $SAVE_DIR/$file:h  end  echo "All done."

This script performs the backup in two parts. First, it copies all text and binary data files from a list of directories to a designated directory; file types are identified by the file command, and the grep command selects ones likely to be configuration files (some extra files will get copied, but this is better than missing something). The default destination location is named for the current host and has a form like /save/hamlet/sys_save; this location can be overridden by including an alternate location on the bkup_sys command line. The directory list comes from the file /etc/bkup_dirs, which would contain entries like /, /etc, /etc/defaults, /etc/mail, /var/cron, and so on.

The final section of the script copies the files listed in /usr/local/admin/sysback/bkup_files, which holds the names of individual files that need to be saved (residing in directories from which you don't want to save every text and data file). It uses the C shell :h and :t modifiers, which extract the head (directory portion) and tail (filename and extension), respectively, from the filename in the specified variable. The first two lines in this section make sure that the entry looks reasonable before the copy command is attempted.

In both cases, files are stored in the same relative location under the destination directory as they are in the real filesystem (this makes them easy to restore to their proper locations). Subdirectories are created as necessary under the destination directory. The script uses cp -p to copy the files, which reproduces file ownership, protections, and access and modification times.

Copying files in this way is a protection against serious damage to a system filesystem (as well as against accidentally deleting or otherwise losing one of them). However, in order to completely restore the system, in the worst case, you'll need to reproduce the structure as well as the contents of damaged filesystems. To do the latter, you will need to know what the original configuration was. You can write a script to document how a system is set up.

Here is an example from a FreeBSD system:

#!/bin/csh # doc_sys - document system configuration--FreeBSD version unset path; setenv PATH "/sbin:/usr/sbin:/bin:/usr/bin" if ("$1" != "") then     set outfile="$1"                                   # alternate output file  else     set outfile="`hostname`_system.doc"  endif echo "System Layout Documentation for `hostname`" > $outfile  date >> $outfile  echo "" >> $outfile echo ">>>Physical Disks" >> $outfile  grep "ata[0-9]+-" /var/run/dmegs.boot >> $outfile     # Assumes IDE disks. echo "" >> $outfile echo ">>>Paging Space Data" >> $outfile  pstat -s >> $outfile  echo "" >> $outfile echo ">>>Links in /" >> $outfile  file /{,.[a-zA-Z]}* | grep link >> $outfile  echo "" >> $outfile echo ">>>System Parameter Settings" >> $outfile  sysctl -a

The purpose of this script is to capture information that you would not otherwise have (or have easy access to). Thus, commands such as df, which give information easily obtained from configuration files, are not included (although they could be in your version if you would find such data helpful). You may want to consider periodically printing out the results from such a script for every system you administer and placing the resulting pages into a notebook.

As this script illustrates, the commands you need to include tend to be very operating-system-specific. Here is a version for an AIX system (the common sections have been replaced with comments):

#!/bin/csh # doc_sys - document system configuration--AIX version unset path; setenv PATH "/usr/sbin:/bin:/usr/bin" set output file and write header line echo ">>>Physical Disks" >> $outfile  lspv >> $outfile  echo "" >> $outfile echo ">>>Paging Space Data" >> $outfile  lsps -a >> $outfile  echo "" >> $outfile echo ">>>Volume Group Info" >> $outfile # loop over volume groups  foreach vg (`lsvg`)     lsvg $vg >> $outfile     echo "===Component logical volumes:" >> $outfile     lsvg -l $vg | grep -v ":" >> $outfile     echo "" >> $outfile  end  echo "" >> $outfile echo ">>>Logical Volume Details" >> $outfile # loop over volume groups and then over the component LVs  foreach vg (`lsvg`)  foreach lv (`lsvg -l $vg | egrep -v ":|NAME" | awk '{print $1}'`)        lslv $lv >> $outfile        echo "===Physical Drive Placement" >> $outfile        lslv -l $lv >> $outfile echo "" >> $outfile     end  end  echo "" >> $outfile echo ">>>Defined File systems" >> $outfile  lsfs >> $outfile echo "" >> $outfile links in / listed here echo ">>>System Parameter Settings" >> $outfile  lsattr -E -H -l sys0 >> $outfile  lslicense >> $outfile                  # number of licensed users

This version of the script also provides information about the volume group and logical volume layout on the system.

Table 14-1 lists commands that will provide similar information for the Unix versions we are considering:

Table 14-1. System information commands
Version	Disk data	Swap space data	System parameters
AIX	`lspv`	`lsps -a`	`lsattr -E -H -l sys0`
FreeBSD	`grep` `pat` `'dmesg'`	`pstat -s`	`sysctl -a`
HP-UX	`ioscan -f -n -C disk`	`swapinfo -t -a -m`	`/usr/lbin/sysadm/system_prep -s system`
Linux	`fdisk -l`	`cat /proc/swaps`	`cat /proc/sys/kernel/*` (see script below)
Solaris	`getdev`	`swap -l`	`cat /etc/system`
Tru64	`dsfmgr -s`	`swapon -s`	`sysconfig` (see script below)

See Section 10.3 for the Logical Volume Manager commands for the various systems.

Sometimes more than just a simple command is needed to complete one of these tasks. For example, the following script displays all the system parameters under Tru64:

#!/bin/csh foreach s ( `/sbin/sysconfig -m | /usr/bin/awk -F: '{print $1}'` )    /sbin/sysconfig -q $s    echo "--------------------------------------"    end exit 0

Similarly, the following script records the current Linux system parameters.

#!/bin/csh foreach f (`find /proc/sys/kernel -type f`)   echo "$f":   cat $f   echo ""   end exit 0

14.1.4 A Few More Tricks

The following script illustrates a couple of other useful tricks when writing shell scripts. It polls various sites with which the local system communicates to exchange mail and runs a few times a day via the cron facility:

#!/bin/sh # mail.hourly PATH="/usr/bin:/bin" cd /usr/local/admin/mail for sys in (`cat ./mail_list`); do    if [ ! -f /etc/.no_$sys ]; then        echo polling $sys        exchange mail ...       touch last_$sys     else       echo skipping $sys    fi done exit 0

This script loops over the list of hosts in the file mail_list in the current directory. Let's consider how it works when the current host is lucia. The if statement determines whether the file /etc/.no_lucia exists. If it does, the host lucia is not polled. Using a file in this way is a very easy mechanism for creating script features that can be turned on or off easily without having to change the script itself, the way it is called from another script, any crontab entries using it, and so on. When I don't want lucia to be polled (usually because its owner has turned it off during an out-of-town trip, and I hate seeing dozens of failure messages piling up), I simply run the command touch /etc/.no_lucia. Deleting the same file reinstates polling on a regular basis.

The second technique consists of using an empty file's modification time to store a date. In this script, the touch command in the if loop records when the most recent poll of system lucia took place. The date it occurred can be quickly determined by running:

$ ls -l /usr/local/admin/mail/last_lucia

Such time-stamp files can be used in a variety of contexts:

Backups: If you create a time-stamp file at the beginning of a backup operation, you can use a -newer clause on a find command to find all files modified since then for a subsequent backup.
Testing: When you want to find out what files a particular program modifies, create a time-stamp file in /tmp, run the program, and then find files newer than the time-stamp file once the program finishes (this assumes you are on an otherwise idle system).
Files modified since an operating system installation or upgrade: Creating a time-stamp file at the end of an operating system installation or upgrade will enable you easily to determine which files you have modified with respect to the versions on the distribution media.

14.1.5 Testing and Debugging Scripts

The following list describes strategies for testing and debugging scripts:

Build the script up gradually. Start by getting a simple version running without arguments and handling only the easiest case and then add the bells and whistles. We've seen this strategy in action several times in this chapter already.
Test and debug the logic independently of the functionality if possible. One way to do this is to place an "echo" in front of every substantive command in the script, as in this fragment:
```
if [ some condition ]; then    echo rm -rf /  else        echo cp /tmp/junk /unix  fi
```
This will allow you to see what the script does in various cases in a completely safe way. Similarly, you can replace entire functions with an echo command:
```
go_on {  echo running function go_on  return  }
```
In general, inserting an echo command is a good way to see where you are in a script, to track variable values, and so on. In some cases, a construct like the following will be helpful:
```
echo "===${variable}==="
```
This sort of technique is useful when you are having trouble with a variable that may contain internal white space.
Use the shell's -v option. This option displays each script line as it is executed, and it will sometimes indicates how the flow of a script is proceeding.
Perform testing and debugging on local copies of system files. The script will modify the copied files rather than the real ones. For example, if the script you are writing alters /etc/passwd, develop the script using a local copy of /etc/passwd rather than the real thing.
Use small cases for initial tests.
Operate on a single item at first, even if the script is designed to work on a large collection of items. Once that version is working, alter it to work for multiple items.
Don't forget to test boundary conditions. For example, if a script is designed to alter several user accounts, make sure it works for one user account, two user accounts, zero user accounts, and many, many user accounts.
Assume things will go wrong. In general, include as much error-checking code in the script as possible, making sure that the script does something reasonable when errors occur.
Write for the general case. Not only will this give you more powerful tools and meta-tools that you can use over and over, but it is also no harder than coming up with a solution for one specific problem. In fact, if you take a little time to step back from the specifics to consider the general task, it is often easier.

14.1.1 Password File Security

14.1.2 Monitoring Disk Usage

14.1.3 Root Filesystem Backups and System Snapshots

Table 14-1. System information commands

14.1.4 A Few More Tricks

14.1.5 Testing and Debugging Scripts