Project58.Batch-Edit Files


Project 58. Batch-Edit Files

"How do I find and edit potentially hundreds of files by using a single command line?"

Tip

When you write your own shell scripts, by falling back on the techniques discussed in this project, you can avoid having to code your script to accept a list of files.


Learn More

Project 11 explains wildcard expansion.


This project gives some techniques for batch-editing filesthat is, selecting many files and passing them one at a time to an editing utility. The solutions explored here are discussed in the context of the tr and sed commands but apply equally well to other commands. Project 57 covers the tr command, and projects 59 and 61 cover the sed command.

Add Utility to the tr Command

Because the TR command doesn't accept a list of filenames and cannot write back to the file being processed, it makes a good subject around which we can build some batch-processing utilities. (It would be no challenge if tr simply took a list of files to edit.) We'll develop some techniques for editing batches of files that use tr and overcome its limitations.

Use a for Loop

A simple trick uses a shell for loop that selects each file to be processed and forms a command line to process it. In the following example, we take advantage of shell wildcard expansion, using the expression *.txt to select all text files in the current directory.

In our command, the shell variable file assumes the value of each filename in turn, and the tr command processes the file. The editing operation tr performs herechanging every a character it finds in a file to z is frivolous, but it illustrates the technique.

$ for file in *.txt; do >     echo "Processing $file..."; tr 'a' 'z' <"$file" > done Processing car.txt... Sophie poured me into the czr, stzrted the engine, zbused some cogs, znd grzted her wzy out from the czr pzrk. Processing jill black.txt... Jillezn likes blzck. I'm not sure if she hzs zmbitions to be z Goth, or zn undertzker. Just z pzssing phzse I suspect - hezrse todzy, gone tomorrow.


For each new value of variable file, the echo expression displays a filename (within the phrase Processing [filename]. . .). For this solution to cope with filenames that contain spaces (such as jill black.txt), the use of double quotes around "$file" is essential.

We could extend the tr command to write back to the original file, using what we learned in Project 57.

$ tr 'z' 'a' <"$file" >tmp; mv tmp "$file"


Make It a Function

We can encapsulate the for loop in a function, which we'll call each, thereby saving a little typing whenever we wish to process a batch of files. Function each will allow us to issue a command such as

$ each txt tr 'a' 'z'


The each function is written to use its first argument as a filename extension and form the wildcard pattern *.extension. All other arguments are assumed to be part of the command to execute. Here's the function:

$ each () { > filetype="$1"; shift > for file in *.$filetype; do >     echo "Processing $file..."; $* <"$file" > done > }


Learn More

Projects 9 and 10 introduce shell scripting. Refer to Project 10 if you are unfamiliar with for loops.


Examining the code: We save the extension (argument 1) in variable filetype and then shift down all other arguments so that 3 becomes 2, 2 becomes 1, and the original argument 1 drops off the end. (This enables the TR command to process subsequent arguments as it would normally.)

Learn More

Project 52 covers bash functions.


The for loop assigns each filename that matches the *.extension pattern to variable file, writes the Processing [filename]... notice onscreen, and passes the contents of the matching file as standard input, ready for TR to process. Recall that $* expands to tr 'a' 'z'.

Issue the command

$ each txt tr 'a' 'z'


and our each function will form the following command line for each filename.

tr 'a' 'z' <"filename.txt"


Here's an alternate version, each2, that takes a filename specification, rather than an extension, as its direct argument. This gives us the option of using an individual filename or expressions that contain one or more wildcard characters, rather than just a file extension, to specify files to be processed by tr.

This time, we pass *.txt as the first argument, remembering to escape the star so that it won't be interpreted by the shell.

$ each2 () { >filetype="$1"; shift > for file in $filetype; do >     echo "Processing $file..."; $* <"$file" >done > } $ each2 \*.txt tr 'a' 'z'


Write a Generic each Function

The function examples so far have been very specific to the tr utility they serve. The main processing command in function each

$* <"$file"


specifically uses the input redirection required by TR. A more versatile technique uses a generic each function that expects the utility being called to sort out its own idiosyncrasies. Here's one possible solution.

$ geach () { > command=$1; shift > for file in "$@"; do >     $command "$file" > done > }


We pass the whole command line to geach, which simply expands its first argument as the command to run and takes subsequent argument(s) as filename(s) to process one at a time.

This technique works nicely in this instance, but it's seldom so easy to specify a whole command as a simple parameter. Weaving "$file" into this command line works because the variable is used to place an argument at the end of the command line. If this required insertion of arguments into the middle of the command line, as many other commands and functions do, it wouldn't work.

A better solution is to write the command to be executed as function, or as a mini shell script within its own file. Let's define a function called trx that does the usual a -to- z TRanslation on the contents of a filename specified as argument 1 and writes back to the original file:

$ trx () { tr 'a' 'z' <"$1" >tmp; mv tmp "$1"; }


Function trx takes one argument, which is the filename to process. We combine this with the generic each function, geach, by typing

$ geach trx *.txt


For each file it finds, the geach function forms the command line

trx filename.txt


and the trx function in turn executes

tr 'a' 'z' <"filename.txt" >tmp; mv tmp "filename.txt"


(You'll notice that this lets us place "$file" in the middle of our command line, as well as at the end.)

Note that we must write a new TRx function for each different editing task, but doing so does not involve much more than typing the raw command. When the generic reach function is defined, we have an easy way of sending a batch of files, one at a time, to any function or shell script.

Tip

Use the generic geach function to send a batch of files, one at a time, to any command or script, including scripts you have written yourself.


It's possible to make a generic trx too and to incorporate its functionality into geach, but it involves more serious scripting. See Project 83.

Tip

Take advantage of xarg's -t option (trace) to echo each command before it is executed.


Search with find

The examples in the previous section made use of shell wildcard expansion (or globbing) to drive a for loop. Sometimes, you might want to search a complete directory hierarchy, recursively, for files to process. To this end, we employ the find command (see Projects 15, 17, and 18 for detailed examples of using find). We can either take advantage of its primary -exec or pipe the output to xargs -n1, both of which will hand off matching files, one at a time, to the target command for editing or other processing.

Here's a recursive file search similar to the geach function we wrote earlier, which we'll call rgeach.

$ rgeach () { > command=$1; shift > find. -iname "$*" -print0 | xargs -0 -t -n1 $command > }


This time, we'll make trx a shell script instead of a function, partly to show an alternative approach to the previous examples, but mostly because the xargs command that we use here to invoke trx does not recognize Bash functions.

$ cat trx tr 'z' 'a' <"$1" >tmp; mv tmp "$1" $ chmod +x trx $ rgeach TRx "*.txt" trx./backup/jan.txt trx./car.txt trx./jill black.txt





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net