Project83.Batch-Process Files


Project 83. Batch-Process Files

"How do I adapt my scripts to operate on multiple files?"

This project shows you how to write a script that parses a list of filenames and processes each file in the list. It also shows you how to develop wrapper scripts that feed each filename in a list, one at a time, to scripts that accept only single filenames. Projects 9 and 10 cover the basics of shell scripting.

Discover Loops and Wrappers

Suppose that we write a script called action that performs some specified action on a text file. The script takes two option flags: -v for verbose and -a for action followed by an action name. We are not concerned with what the script actually does; it's presented merely as a vehicle to illustrate how to write a script that processes a list of filenames passed on its command line.

We might call the script to process all the text files in the current directory by typing

$ action -v -a squeeze *.txt


We rely on the shell to expand *.txt into a list of all the .txt files in the current directory. To succeed, our action script must be written to accept any number of files and to act on each file in turn. The script must parse the options, save them, and then loop to process each file listed on the command line.

A second approach sees us writing a general-purpose wrapper script. The wrapper script accepts many filenames and calls a simpler action script a number of times, each time passing it the next filename from the list. Such a wrapper script can also be used on scripts and commands over which we have no control and that do not accept a list of files. Project 58 introduced this technique when it considered how we might batch-edit files. The solutions presented in that project are similar but take advantage of Bash functions. In this project, we write Bash shell scripts.

Process Multiple Files

Let's jump straight in with a sample script called action, which will take on the functionality described in the previous section.

$ cat action #!/bin/bash # This is our main function to process each file Process () {   echo "Processing $1, verbose: ${verbose:-n}, ¬     action: ${action:-none}" } # This while loop extracts and remembers each option setting while getopts "va:" opt; do   case $opt in     v) verbose="y";;     a) action=$OPTARG;;     *) echo "Usage: ${0##*/} [-v] [-a action] ¬     filename..."; exit 1;;   esac done shift $((OPTIND-1)) # This for loop processes each filename in turn for filename in "$@"; do   Process "$filename" done exit 0


Learn More

Project 52 explains Bash functions.


The script is written to demonstrate batch-processing techniques. The actual processing is performed in the function Process, appearing at the top of the script. In the example script, this function does nothing more than echo the name of the file it's supposed to process and its understanding of the options.

Although the script does not perform a real-world task, it serves as a template from which you can build your own scripts.

We assume that the script takes two optional parameters. The first is -v for verbose output; the second is -a for action, followed by an action type. The default values for these options are not verbose and an action of none.

Learn More

Project 76 covers parameter expansion. The function Process employs parameter expansion with default values when it echoes the options.


Process the Options

A script of any significance will accept options, and it's not possible to process the list of filenames without knowing where the options end and the filenames start. To ensure that the example script is a useful template, we'll first show you how to process and save the list of options.

Tip

Consult the Bash man page or type

$ help getopts


to learn more about getopts.


The while loop processes the options. The code shown here may be used by any script that must parse a list of options. It takes advantage of the Bash built-in function getopts written to process a script's positional parameters (the arguments passed on its command line), looking for options and their associated arguments. In our example, the string va: in

getopts "va:" opt


tells getopts that we allow the options -v and -a, but no others. The colon following a tells getopts to expect an argument to follow. getopts writes the next option it reads to the variable opt (or whatever is named in the command) and any associated argument to the variable OPTARG. We employ a case statement to process each argument, setting the variables verbose and action as appropriate. getopts drives the while loop by returning TRUE when an option is found and FALSE when the list of options is exhausted. When the options are exhausted, we expect the list of filenames to follow. The shift statement immediately following the while loop shifts all parameters down such that the first filename is moved to the positional parameter $1 and all the options we've just processed drop off the end. The value of OPTIND is set appropriately by getopts so that this works.

Process the Files

The for loop extracts each filename from the remaining positional parameters, expanding "$@" to be the list of quoted filenames.

Note that the for loop uses "$@", which expands to "$1" "$2" . . ., ensuring that our script is able to cope with filenames that include spaces. Note that if we had used "$*", we'd have generated one long filename: "$1 $2...".

The variable filename is assigned the value of the next filename in the list each time around the loop. To process the file, we call the function Process. Remember, the point of this exercise is to write a script that processes a list of filenames; the actual processing performed on a file is incidental.

Here are some examples of what we might see when we run the script.

$ ./action -x ./action: illegal option -- x Usage: action [-v] [-a action] filename... $ ./action -v -a ./action: option requires an argument -- a Usage: action [-v] [-a action] filename... $ ./action -v -a list $ ./action -v -a list *.txt Processing letter.txt, verbose: y, action: list Processing notes.txt, verbose: y, action: list Processing three one.txt, verbose" y, action: list


Write a Wrapper Script

For scripts and commands that don't accept a list of filenames, and perhaps to avoid adding such functionality to your own scripts, write a wrapper script. The script, which we'll call each, accepts a wildcard pattern, such as *.txt, and a command to execute. It expands the wildcard into a list of filenames and applies the target command to each filename in turn. The target command, therefore, does not have to be written to process a list of filenames.

Here's our script, in which we assume that the first argument is a wildcard pattern; the remaining arguments form the command to execute and any options it requires.

$ cat each #!/bin/bash filetype=$1; shift for file in $filetype; do   $* "$file" done


The first parameter (the wildcard pattern) is saved in the shell variable filetype for use later. The shift operator discards the first parameter, shuffling the remainder down. The for loop processes each file in the expanded wildcard pattern held in filetype (the shell automatically expands this for use, just as it does on the command line) by setting the variable $file to be the next filename in the list each time around the loop. The line that follows expands the remainder of the parameters into the target command and any arguments ($*) and the filename under consideration by the for loop ($file).

$* "$file"


We'll try out our each script by using it with another script to rename all the text files in a directory, replacing their .txt extensions with .txt.bak. Recall that each simply feeds one file at a time to the target command (or script). The script that does the name-changing is named rename, and it contains just one command. It takes a filename as its only argument and changes the filename by tacking .bak onto the original filename.

mv "$1" "$1.bak"


To create a script that contains this command, simply echo it and redirect output to file rename (after making sure that no file of that name already exists in the working directory); then set execute permissions on the file.

$ echo 'mv "$1" "$1.bak"' > rename $ chmod +x rename


Before we put each and rename to work, let's check the files in the current directory. Using wildcard pattern *.txt* with ls ensures that our list will include both normal text files (with extension .txt) and any that have been processed by rename (with extension .txt.bak).

$ ls *.txt* letter.txt      notes.txt    three one.txt


Type the following to have each call and execute rename.

$ each "*.txt"./rename


Note

You can't rename all. tx t files to .bak by using a command such as

$ mv *.txt *.bak


because of the way the shell expands wildcard patterns on the command line.


Now run ls again to check the results.

$ ls *.txt* letter.txt.bak     notes.txt.bak      three one.txt.bak


For our next trick, we'll remove the extension we just added. This example pairs our each wrapper with a script called unrename, which uses "topping and tailing" strings during parameter expansiona technique discussed at length in Project 76. In short, the parameter expansion ${1%.*} expands $1 and removes the final dot, and everything that comes after it, from any filename.

$ echo 'mv "$1" "${1%.*}"' > unrename $chmod +x unrename $ each "*.bak"./unrename $ ls *.txt* letter.txt      notes.txt       three one.txt


Finally, using each and a new script that applies techniques from rename and unrename, we'll change the extension of our .txt files to .bak.

$ echo 'mv "$1" "${1%.*}.bak"' > re-rename $chmod +x re-rename $ each "*.txt"./re-rename $ ls *.txt* ls: *.txt*: No such file or directory $ ls *.bak letter.bak      notes.bak      three one.bak


All the above are simply examples of what can be done. The each script can be customized to your own preferences and used from the command line or by another script.

Learn More

Project 18 shows what you can do with find and xargs.


Recursive Batch Processing

Here's a simple recursive version of each, which we call reach. It searches a whole directory hierarchy for matching filenames.

$ cat reach #!/bin/bash filetype=$1; shift find. -name "$filetype" -print0 | xargs -0 -n1 $*


To rename all .txt files to .bak, we employ the same rename script as before, but use reach to apply the script to all .txt files in the current directory hierarchy.

$ echo 'mv "$1" "${1%.*}.bak"' > rename $ reach "*.txt"./rename





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net