Project 18. Use find, -
|
|
|
The grep command is discussed at greater length in Project 23. |
How does this work? The portion of the command line
grep -il "osxfaq.com" filename
Note
|
|
Piping the list of filenames to grep without using the primary -exec is very different. The grep command sees the list of files as its standard input and processes this text. It is not invoked with the name of each file to process. |
The command line tells grep to check the contents of filename and, if it finds the text string "osxfaq.com" , to write the file's name to the screen.
The placeholder symbol
{}
tells
-exec
where to place the filename when it creates a command line. The semicolon at the end of the command is required by the
-exec
primary; it signals the end of an argument list. The backslash that precedes the semicolon
The primary
-ok
does the same thing as
-exec
but asks for confirmation before processing each file. This is demonstrated by the following command line, which locates and
$ find ~/Sites -name "*.bak" -ok rm {} \; "rm /Users/saruman/Sites/calendar/jan-index.bak"? y "rm /Users/saruman/Sites/projects/index.bak"? y ...
Instead of using the primary -exec , we can process each file by piping the output from find to the xargs command. A Unix command in its own right, xargs forms a command line from its parameters, which in this case includes the output from find .
You can see for yourself how it works by modifying the -exec -based command line we used to process HTML files. Our original command line was
$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ -exec grep -il "osxfaq.com" {} \;
We replace the -exec primary with a pipe to xargs . Note that xargs doesn't need the {} placeholder or the closing semicolon-with- backslash. Run the command, and you'll see a familiar file list.
$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ xargs grep -il "osxfaq.com" /Users/saruman/Sites/calendar/data/index.html /Users/saruman/Sites/mayo-family/frames/links-r.html /Users/saruman/Sites/osxfaq/index.html /Users/saruman/Sites/saruman/data-base/tipsandtricks.html /Users/saruman/Sites/unix/index.html
This represents a different technique but yields the same results.
So which should you use: xargs or -exec?
The
xargs
command wins on speed, as it's much faster than
-exec
. In the comparison above,
xargs
performed ten times as fast. You may think piping to
xargs
is less efficient than using the built-in primary. This is not so. Why? Because
-exec
will execute
grep
once for each file found. The
xargs
command, on the other hand, absorbs all the arguments and
The xargs command wins on options. Use option -t (trace) to echo each command line before it is executed, for example.
The xargs command fails when too many files are found. Because xargs processes all files in one command line, the resulting command line can become too big, and you may see an error message about "too many arguments."This problem can be corrected, as shown later in this project.
The xargs command fails if filenames contain spaces. Again, this problem can be corrected, as shown later in this project.
Note
|
|
After some experimentation, I found that 1,000 is a good, safe, round number to choose as the maximum number of filenames. The length of the command line
|
When many files match, in the order of thousands, the command line formed by
xargs
gets big,
$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ xargs -n1000 grep -il "osxfaq.com" /Users/saruman/Sites/calendar/data/index.html /Users/saruman/Sites/mayo-family/frames/links-r.html /Users/saruman/Sites/osxfaq/index.html /Users/saruman/Sites/saruman/data-base/tipsandtricks.html /Users/saruman/Sites/unix/index.html
When filenames include spaces, the find xargs combination will fail. The xargs command cannot tell when a space is part of a filename or an argument separator. In the following example, we encounter a file named a space.html .
$ find ~/Sites -name "*.htm" -or -name "*.html" ¬ xargs -n1000 grep -il "osxfaq.com" grep: /Users/saruman/Sites/calendar/a: No such file or ... grep: space.html: No such file or directory ...
Spaces in FilenamesSpaces in filenames are not a problem when you use the -exec primary, which passes filenames one at a time to the utility being called. When you use xargs , the filenames are piped to xargs in a batch of arguments, with spaces separating the filenames. The command xargs can't distinguish spaces within filenames from spaces between them, so the distinction between a space in a filename and a space separating filenames is lost. |
The solution is to tell both find and xargs to use a different argument separator, such as the null character instead of a space. The null character should not be part of any filename. Specify the primary -print0 to find and the option -0 (number zero, not letter O, in both cases) to xargs .
[View full width]
[View full width]
$ find ~/Sites -name "*.htm" -or -name "*.html" -print0 ¬ xargs -0 -n1000 grep -il"osxfaq.com" /Users/saruman/Sites/calendar/a space.html...
Command find is often plagued by an excess of success: If you're looking for one specific file, a search that returns hundreds of matches is as useless as one that turns up nothing. One remedy is to limit the extent of find searches, to prevent it from looking in places where you know your target file won't be found.
Suppose that we initiate a search for all index files in the directory Sites by typing
$
find Sites -name "index.*"
Sites/albums/index.html
Sites/calendar/data/index.html
Sites/calendar/index.php Sites/osxfaq/index.html
Sites/osxfaq/index.ws Sites/sqmail/sqmail/class/deliver/index.php
Sites/sqmail/sqmail/class/index.php
Sites/webdav/index.php
Perhaps there are unwanted results, and we wish to eliminate particular directories from the searchfor example, those starting with sq* and os* . The find command provides a way to prevent searching of (or recursion into) selected directories by using the primary -prune . Like all primaries, -prune is applied to each file as it is found, starting with the root of the search.
We require three expressions: one to match files named index.* as we already have above, and one for each of sq* and os* telling find not to search any directory that matches either of the patterns. These look like this:
-name "sq*" -prune -name "os*" -prune
The complete command looks like this:
$ find Sites -name "index.*" -print -or -name "sq*" ¬ -prune -or -name "os*" prune Sites/albums/index.html Sites/calendar/data/index.html Sites/calendar/index.php Sites/webdav/index.php
You'll immediately notice three things: One, it worked; two, the expressions are ORed , not AND ed; and three, there's a -print primary after the name expression. Why?
Short answer:These are little tricks one learns.
Long answer: If you were to
AND
the terms instead of
OR
ing them, you wouldn't get any results. No filename can match all three expressions
index.*
,
sq*
, and
os*
at the same time. Second, the primary
-print
is always
If you don't follow this, don't worry. I had to
Primary -maxdepth limits the number of directory levels covered by a recursive search. Place a number (1 or greater) after the -maxdepth primary to specify the number of search levels. A -maxdepth value of 1 limits the search to files at the search root level; -maxdepth 2 looks inside directories at that level (but not any directories they enclose), and so on.
$
find ~/Sites -maxdepth 1 -name "*.htm" -or -name "*.html"
/Users/saruman/Sites/index.html
Next, the depth of recursion is limited to two directory levels. Compare the results with those from earlier examples.
$
find ~/Sites -maxdepth 2 -name "*.htm" -or -name "*.html"
/Users/saruman/Sites/albums/albums.html
/Users/saruman/Sites/albums/index.html
/Users/saruman/Sites/index.html
/Users/saruman/Sites/webdav/index.html
Use option -x (this is an option, not a primary) to stop find from looking inside mounted file systems. In this example, mounted file systems (usually in /Volumes ) will not be searched.
$
sudo find -x / -name "index.html"