Project18.Use find, -exec, and xargs


Project 18. Use find, -exec, and xargs

"How do I delete all my .bak files and all the empty files too?"

This project shows how to process the files returned by the findcommand. It uses first the primary -exec and then the xargs command to process each file in the list. Project 15 uses locate and find to search for files by name. Project 17 shows how to find files by more complex criteria. Project 20 gives some handy find tips.

Do Something With What You Found

The find command is very powerful. It recursively searches an entire directory structure for files that match a given set of criteria. This project examines techniques for processing files identified in a find search, building on concepts discussed in previous projects.

In Project 15, we explored the use of pattern matching with the locate and find commands for searches based on filenames and pathnames.

Project 17 focused on the use of find with additional search criteria, including file type and size; time stamp for last access and modification times; permissions including file owner, associated group, and specific permission settings; and complex conditions that combine criteria using the AND, OR, and NOT operators.

The find command generates a list of files that match the specified criteria. Sometimes this is all you want, but often you'll need to process either the list itself or each file named in the list. Project 17 addresses the limited ability of the find command to process files named in such a list.

In this project, we'll look at file-processing techniques that invoke other commands to process the files identified by find. These methods include pairing find with the primary -exec and pipelining output from find to the xargs command.

Use the -exec Primary

When you've used the power of find to root out a group of filesall the HTML files in your Sites directory, for exampleyou can narrow your search further by using the primary -exec with the grep command to search the content of files. In the next example, find locates all HTML files; then -exec invokes grep to identify only those files that contain the text osxfaq.com.

$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ -exec grep -il "osxfaq.com" {} \; /Users/saruman/Sites/calendar/data/index.html /Users/saruman/Sites/mayo-family/frames/links-r.html /Users/saruman/Sites/osxfaq/index.html /Users/saruman/Sites/saruman/data-base/tipsandtricks.html /Users/saruman/Sites/unix/index.html


Learn More

The grep command is discussed at greater length in Project 23.


How does this work? The portion of the command line preceding -exec tells find to search the directory ~/Sites for HTML files (which are denoted by filenames ending with .htm or .html). Each time find identifies a matching filename, the -exec primary hands it off to the grep command by creating this command line.

grep -il "osxfaq.com" filename


Note

Piping the list of filenames to grep without using the primary -exec is very different. The grep command sees the list of files as its standard input and processes this text. It is not invoked with the name of each file to process.


The command line tells grep to check the contents of filename and, if it finds the text string "osxfaq.com", to write the file's name to the screen.

The placeholder symbol {} tells -exec where to place the filename when it creates a command line. The semicolon at the end of the command is required by the -exec primary; it signals the end of an argument list. The backslash that precedes the semicolon escapes it from the shell.

The primary -ok does the same thing as -exec but asks for confirmation before processing each file. This is demonstrated by the following command line, which locates and removes backup (.bak) files after confirmation.

$ find ~/Sites -name "*.bak" -ok rm {} \; "rm /Users/saruman/Sites/calendar/jan-index.bak"? y "rm /Users/saruman/Sites/projects/index.bak"? y ...


Process Each File via xargs

Instead of using the primary -exec, we can process each file by piping the output from find to the xargs command. A Unix command in its own right, xargs forms a command line from its parameters, which in this case includes the output from find.

You can see for yourself how it works by modifying the -exec-based command line we used to process HTML files. Our original command line was

$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ -exec grep -il "osxfaq.com" {} \;


We replace the -exec primary with a pipe to xargs. Note that xargs doesn't need the {} placeholder or the closing semicolon-with- backslash. Run the command, and you'll see a familiar file list.

$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ | xargs grep -il "osxfaq.com" /Users/saruman/Sites/calendar/data/index.html /Users/saruman/Sites/mayo-family/frames/links-r.html /Users/saruman/Sites/osxfaq/index.html /Users/saruman/Sites/saruman/data-base/tipsandtricks.html /Users/saruman/Sites/unix/index.html


This represents a different technique but yields the same results.

xargs vs. -exec

So which should you use: xargs or -exec?

  • The xargs command wins on speed, as it's much faster than -exec. In the comparison above, xargs performed ten times as fast. You may think piping to xargs is less efficient than using the built-in primary. This is not so. Why? Because -exec will execute grep once for each file found. The xargs command, on the other hand, absorbs all the arguments and passes them to grep in one go. This is fine assuming that grep, or whatever command you are invoking, can handle multiple arguments and expects them to be at the end of the command line.

  • The xargs command wins on options. Use option -t (trace) to echo each command line before it is executed, for example.

  • The xargs command fails when too many files are found. Because xargs processes all files in one command line, the resulting command line can become too big, and you may see an error message about "too many arguments."This problem can be corrected, as shown later in this project.

  • The xargs command fails if filenames contain spaces. Again, this problem can be corrected, as shown later in this project.

Note

After some experimentation, I found that 1,000 is a good, safe, round number to choose as the maximum number of filenames. The length of the command line formed by xargs is obviously dependent on the average length of a filename. A higher number increases the risk of the command's failing; a lower number makes the command run slower. The choice is yours.


Cope with Too Many Files

When many files match, in the order of thousands, the command line formed by xargs gets big, causing an error to be thrown. To solve this problem, use option -n, which tells xargs the maximum number of arguments to include on a command line. It will issue the target command repeatedly until all filenames have been processed.

$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ | xargs -n1000 grep -il "osxfaq.com" /Users/saruman/Sites/calendar/data/index.html /Users/saruman/Sites/mayo-family/frames/links-r.html /Users/saruman/Sites/osxfaq/index.html /Users/saruman/Sites/saruman/data-base/tipsandtricks.html /Users/saruman/Sites/unix/index.html


Handle Filenames with Spaces

When filenames include spaces, the find | xargs combination will fail. The xargs command cannot tell when a space is part of a filename or an argument separator. In the following example, we encounter a file named a space.html.

$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ | xargs -n1000 grep -il "osxfaq.com" grep: /Users/saruman/Sites/calendar/a: No such file or ... grep: space.html: No such file or directory ...


Spaces in Filenames

Spaces in filenames are not a problem when you use the -exec primary, which passes filenames one at a time to the utility being called. When you use xargs, the filenames are piped to xargs in a batch of arguments, with spaces separating the filenames. The command xargs can't distinguish spaces within filenames from spaces between them, so the distinction between a space in a filename and a space separating filenames is lost.


The solution is to tell both find and xargs to use a different argument separator, such as the null character instead of a space. The null character should not be part of any filename. Specify the primary -print0 to find and the option -0 (number zero, not letter O, in both cases) to xargs.

[View full width]

$ find ~/Sites -name "*.htm" -or -name "*.html" -print0 ¬| xargs -0 -n1000 grep -il "osxfaq.com" /Users/saruman/Sites/calendar/a space.html...


Limit find's Scope

Command find is often plagued by an excess of success: If you're looking for one specific file, a search that returns hundreds of matches is as useless as one that turns up nothing. One remedy is to limit the extent of find searches, to prevent it from looking in places where you know your target file won't be found.

Limit Recursion

Suppose that we initiate a search for all index files in the directory Sites by typing

$ find Sites -name "index.*" Sites/albums/index.html Sites/calendar/data/index.html Sites/calendar/index.php Sites/osxfaq/index.html Sites/osxfaq/index.ws Sites/sqmail/sqmail/class/deliver/index.php Sites/sqmail/sqmail/class/index.php Sites/webdav/index.php


Perhaps there are unwanted results, and we wish to eliminate particular directories from the searchfor example, those starting with sq* and os*. The find command provides a way to prevent searching of (or recursion into) selected directories by using the primary -prune. Like all primaries, -prune is applied to each file as it is found, starting with the root of the search.

We require three expressions: one to match files named index.* as we already have above, and one for each of sq* and os* telling find not to search any directory that matches either of the patterns. These look like this:

-name "sq*" -prune -name "os*" -prune


The complete command looks like this:

$ find Sites -name "index.*" -print -or -name "sq*" ¬ -prune -or -name "os*" prune Sites/albums/index.html Sites/calendar/data/index.html Sites/calendar/index.php Sites/webdav/index.php


You'll immediately notice three things: One, it worked; two, the expressions are ORed, not ANDed; and three, there's a -print primary after the name expression. Why?

Short answer:These are little tricks one learns.

Long answer: If you were to AND the terms instead of ORing them, you wouldn't get any results. No filename can match all three expressions index.*, sq*, and os*at the same time. Second, the primary -print is always implied and always displays a matched filename. So we'll see filenames that match sq* and os* as well as those that match index.*. The find command is matching them to -prune, but it still prints a filename when it matches. This is unfortunate, but it is a result of -print being implied. By stating -print explicitly, however, we can position it after the -name primary but before the others. The result is that only matches from the -name primary are printed.

If you don't follow this, don't worry. I had to spend a lot of time reading the man page for find to understand in detail how it worked. Accept the short answer. After all, this book is about how to do it, not necessarily about spending several hours studying theory.

Set the Maximum Recursion Depth

Primary -maxdepth limits the number of directory levels covered by a recursive search. Place a number (1 or greater) after the -maxdepth primary to specify the number of search levels. A -maxdepth value of 1 limits the search to files at the search root level; -maxdepth 2 looks inside directories at that level (but not any directories they enclose), and so on.

$ find ~/Sites -maxdepth 1 -name "*.htm" -or -name "*.html" /Users/saruman/Sites/index.html


Next, the depth of recursion is limited to two directory levels. Compare the results with those from earlier examples.

$ find ~/Sites -maxdepth 2 -name "*.htm" -or -name "*.html" /Users/saruman/Sites/albums/albums.html /Users/saruman/Sites/albums/index.html /Users/saruman/Sites/index.html /Users/saruman/Sites/webdav/index.html


Avoid Other File Systems

Use option -x (this is an option, not a primary) to stop find from looking inside mounted file systems. In this example, mounted file systems (usually in /Volumes) will not be searched.

$ sudo find -x / -name "index.html"





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net