File Compression and Archiving

 < Day Day Up > 

Searching for Files, Directories, and More

Unix traditionally has provided useful tools for searching for files by name and content, and Apple has expanded on these by making available a command-line interface into the same databases that the Finder uses to locate files. Unix's traditional tools don't work from a database like the Finder's file searching function does, so they run more slowly. On the other hand, they aren't hampered by needing a database to run or by being only as current in their results as the last database update.

Finding Files: locate, find, mdfind

Sometimes you want to find some files, but you are not sure where they are. Three tools are available to search for files: locate, find and mdfind. Despite having the same purpose, these commands all behave a little bit differently. locate works from a database of file names that is updated periodically, and is aware of your user permissions and places that you'd typically look for files. It sometimes won't show you files that have been added recently, or that are in odd corners of the system, but it works quickly. find actually goes to look at everything on the system when you run it, so it can take a very, very long time to search, but it's always up to date, and always returns information on all files that you can see. mdfind uses Tiger's new Spotlight search facility, which also uses a database, but one that's supposed to be updated continuously for every change that is made to any file, and is capable of searching file contents as well as file names - unfortunately (as of April 2005) it's too easy to accidentally break Spotlight's indexing and have files that match, get lost.

Using locate

If you know some of the name of a file, you can use the locate utility to try to find it.

For example, our user nermal looked earlier at a file called system.log. Does our machine have other files that have log in their name? You bet! The syntax for locate is

 locate <pattern> 

We encourage you to try the locate command for files with log in them (locate log) to see the output, but it is much too long to include here. locate searches a database of pathnames on the machine.


If you try locate log and produce no output, it's because your machine hasn't generated the database of paths yet. This database starts off empty and is automatically rebuilt once a week. If you're particularly adventurous, you will find what you need to know to build it by hand in the /etc/weekly script, but this is a bit more complex than a novice will want to face.

Further information on locate is shown in the command documentation table, Table 10.17.

Table 10.17. The Command Documentation Table for locate


Finds files.

locate <pattern>

Searches a database for all pathnames that match <pattern>. The database is rebuilt periodically and contains the names of all publicly accessible files.

Shell and wildcard (globbing) characters (*, ?, \, [, and ]) may be used in <pattern>, although they must be escaped, to prevent the shell from interpreting and expanding them before handing them to the command. Preceding a character by \eliminates any special meaning for it. No characters must be explicitly matched, including /.

As a special case, if you specify a pattern with no wildcard/globbing characters (such as a search for foo), the pattern actually is matched as though it was surrounded by * wildcard characters; that is, matched as *foo*.

Useful files:




Script to update database

Using find

A more powerful and more ubiquitous tool for finding files is find. It is much slower than the locate command because it actually searches the filesystem every time it's used instead of consulting a database, but that also means that it doesn't depend on a database for its information and the information is always completely up-to-date.

After running her search for files containing log in the name, our sample user nermal was overwhelmed by the results. However, she thinks that she might have heard that general system log files might be located in /usr or /var. To check whether what she recalls is correct, she decides to run find:

 brezup:nermal Documents $ find /var /usr -name \*log\* -print /usr/bin/grep-changelog /usr/bin/logger /usr/bin/login /usr/bin/logname /usr/bin/rcs2log /usr/bin/rlog /usr/bin/rlogin /usr/bin/slogin /usr/bin/xmlcatalog /usr/include/httpd/http_log.h /usr/include/libxml2/libxml/catalog.h /usr/include/php/ext/standard/php_ext_syslog.h /usr/include/php/main/logos.h /usr/include/php/main/php_logos.h /usr/include/php/main/php_syslog.h ... there's a mess o' files in the middle here, ... none of which are system.log, trust us! /usr/share/vim/vim62/syntax/prolog.vim /usr/share/vim/vim62/syntax/purifylog.vim /usr/share/vim/vim62/syntax/rcslog.vim /usr/share/vim/vim62/syntax/verilog.vim /usr/share/zsh/4.1.1/functions/_logical_volumes /usr/share/zsh/4.1.1/functions/_rlogin /usr/X11R6/bin/xlogo /usr/X11R6/include/X11/bitmaps/xlogo11 /usr/X11R6/include/X11/bitmaps/xlogo16 /usr/X11R6/include/X11/bitmaps/xlogo32 /usr/X11R6/include/X11/bitmaps/xlogo64 /usr/X11R6/lib/X11/doc/html/xlogo.1.html /usr/X11R6/lib/X11/xedit/lisp/progmodes/xlog.lsp /usr/X11R6/man/man1/xlogo.1 brezup:nermal Docuements $ 

In the preceding statement, nermal searches /usr and /var. The results, though, do not include the system.log file that nermal knows user joray was looking at earlier. According to these results, many files in /usr contain log, but nothing in /var. This seems slightly odd so many files (some of them not even really log files, but just containing log in their names) in the filesystem below one directory, but nothing in the other. nermal is sure that /var is the other possibility she has heard for a location for the file, so perhaps there's something about /var that's different, and the reason nothing at all shows up isn't because nothing's there, but rather because it isn't being searched. It turns out that if you look closer:

 brezup:nermal Documents $ ls -l /var lrwxrwxr-t  1 root  admin  11 Sep 17 23:47 /var -> private/var 

/var is actually a symbolic link to another directory. If you read find's man page, you'll discover that in its default behavior, find won't traverse symbolic links. Adding -H, as one of the options for find causes it to return information on the referenced file (target of the link), rather than for the link itself:

 brezup:nermal Documents $ find -H /var -name \*log\* -print find: /var/backups: Permission denied find: /var/cron: Permission denied find: /var/db/dhcpclient: Permission denied find: /var/db/netinfo/local.nidb: Permission denied find: /var/db/openldap/openldap-data: Permission denied find: /var/db/openldap/openldap-slurp: Permission denied find: /var/db/shadow: Permission denied /var/log /var/log/cups/access_log /var/log/cups/error_log /var/log/ftp.log /var/log/httpd/access_log /var/log/httpd/error_log /var/log/install.log /var/log/ipfw.log /var/log/lastlog /var/log/lookupd.log /var/log/lpr.log /var/log/mail.log /var/log/netinfo.log /var/log/secure.log /var/log/system.log /var/log/windowserver.log find: /var/root: Permission denied find: /var/run/sudo: Permission denied /var/run/syslog /var/run/ find: /var/spool/cups: Permission denied find: /var/spool/mqueue: Permission denied find: /var/spool/postfix/active: Permission denied find: /var/spool/postfix/bounce: Permission denied find: /var/spool/postfix/corrupt: Permission denied find: /var/spool/postfix/defer: Permission denied find: /var/spool/postfix/deferred: Permission denied find: /var/spool/postfix/flush: Permission denied find: /var/spool/postfix/hold: Permission denied find: /var/spool/postfix/incoming: Permission denied find: /var/spool/postfix/maildrop: Permission denied find: /var/spool/postfix/private: Permission denied find: /var/spool/postfix/public: Permission denied find: /var/vm/app_profile: Permission denied 

There, in the middle of that output, is the system.log file, as well as some additional files with system.log in their name. As we see from the output, nermal does not have permission to search everywhere, but find responds with information for areas where permissions permit it. nermal was lucky that her machine's logs appear to include log in the name. That is not the case on all systems.


If you think seeing all the errors regarding places that find isn't allowed to look is a space-consuming waste of time, keep reading. Because the Unix philosophy breaks functionality down into little indivisible actions, filtering the errors out of the output isn't rightly find's job. The next command we discuss in this chapter, plus a little Unix wizardry called a pipe (from Chapter 12 and 15), will enable you to construct a version that doesn't report the errors.

Numerous options are available in find. In addition to being able to search on a pattern, find can also run searches based on ownership, file modification times, file access times, and much more. Table 10.18 shows the complete syntax and some useful options for find.

Table 10.18. The Syntax and Primary Options for find


Finds files.

find [-H | -L | -P] [-EXdsx] [-f <file>] <file> .... <expression>

find recursively descends the directory tree of each file listing, evaluating an <expression> composed of primaries and operands.



Causes find to interpret regular expression patterns specified with -regex or -iregex as standard modern regular expressions, rather than as basic regular expressions (BREs). See re_format(7) manual page for a description of each format.


Causes the file information and file type returned for each symbolic link to be those of the link itself. This is the default.


Causes a depth-first traversal of the hierarchy. In other words, directory contents are visited before the directory itself. The default is for a directory to be visited before its contents.


Excludes find from traversing directories that have a device number different from that of the file from which the descent began.


Specifies a file hierarchy for find to traverse. File hierarchies may also be specified as operands immediately following the options listing.

Primaries (in Expressions)

All primaries that can take a numeric argument allow the number to be preceded by +, -, or nothing. n takes on the following meanings:


More than n


Less than n


Exactly n

-atime n

True if the file was last accessed n days ago. Note that find itself changes the access time.

-ctime n

True if the file's status was changed n days ago.

-mtime n

True if the file was last modified n days ago.

-newer XY <file>

True if the current file has a more recent last access time (X=a), change time (X=c), or modification time (X=m) than the last access time (Y=a), change time (Y=c), or modification time (Y=m) of file. In addition, if Y=t, then file is instead interpreted as a direct date specification of the form understood by cvs(1).

-name <pattern>

True if the file or directory name matches <pattern>. The pattern may include standard shell globbing wildcards such as *, but the shell gets to expand these characters before find gets hold of them. This means that if you find / -name *.log, the search will actually be for all files beneath the root directory that have the same name as files in your current directory that end in .log. If you want to find all files that end in .log, not just ones that also occur in your current directory, you need to escape the wildcard so that it isn't interpreted by the shell, and so that it gets passed on for find to work with when it searches. This can be accomplished as shown in the earlier examples by preceding characters that the shell would expand with a \character (that is, -name \*.log would search for all files ending in .log).

It is also possible in recent versions of the bash shell and find to use double quotes around a wildcarded string to keep it from expanding in the shell (that is, -name "*.log" should be equivalent to -name \*.log).

-iname <pattern>

True if the filename or directory name matches <pattern> in a case-insensitive way.

-exec <command>

True if <command> returns a zero-value exit status. Optional arguments may be

[<argument> ...];

passed to <command>. The expression must be terminated by a semicolon. If {} appear anywhere in the command name or arguments, they are replaced by the current pathname.


True if the file is contained in a filesystem specified by -fstype.

Using mdfind

The mdfind command searches a continuously updated database of "metadata" about files. This metadata database is the same one used by the Spotlight search facility in the Finder, so the type of things that you can search for, and find with it are the same as those that can be found through the GUI. These include both filename searches, file content searches on files of known (and indexable) types, and other associated metadata (data about data) that Apple chooses to index.

The strength of mdfind is that the database is connected directly to the file access system at a low level, so as files are created or edited, the appropriate metadata changes are automatically and instantly inserted into the database. There are therefore neither long, and delayed indexing issues such as affect locate, nor long search times as with find. The downside however, is that if the file is not indexed for some reason (such as the underlying automated indexing crashing invisibly - a situation that is all too common with the latest developer release we have available, though it may also occur with removable media used on non-Tiger machines), it will never be indexed, because the system assumes that all files are indexed at creation. Presuming Apple gets this situation straightened out however, mdfind is both faster than find and more complete than locate.

The syntax for mdfind is simply:

 mdfind <what you want to find> 

<what you want to find> may be a filename, file contents, or more specifically constrained file metadata. If you'd like to constrain the search to only look in a particular directory (which with find is important for speed, but with mdfind is mostly useful for paring down the returned results), you can add the -onlyin <directory> option. System administrators may find some interesting uses for the -live option, that causes mdfind to keep running, and display continuously updated results for the search. Keeping track of the number of files matching common metadata features such as images, or .mov files that live on your system, can help you monitor and respond to the things your users are doing to your system.

Finding Files with Specific Contents: grep

Trying to remember what you've named a file that you need can sometimes be a real chore, especially if you haven't used the file for a long time or its name is similar to many other files on your system. For situations such as these, it is useful to be able to search for files based on patterns contained within the contents of the files themselves, rather than just the filenames. The basic syntax for grep is

 grep <pattern> <files> 

Here is a sample of using grep:

 brezup:joray Documents $ grep me file* grep: file1: Permission denied file2:It's me.  Doing some file3:Yep, me again.. file4:me again file5:Another test by me... 

In the preceding statement, we see that grep provides output as permissions permit. We also see that the default output lists only the file, the filename, and the lines containing the searched pattern. A number of options are available in grep (enough that entire books have been written on the subject). For example, we could ask grep to list the line numbers on which our pattern, me, appears in the files:

 brezup:joray Documents $ grep -n me file* grep: file1: Permission denied file2:2:It's me.  Doing some file3:2:Yep, me again.. file4:6:me again file5:1:Another test by me... 

Another available option is the recursive option, for descending a directory tree searching all the contents.

The grep command is even more powerful than might be immediately apparent because it is also useful for searching for patterns in the output of other commands. It could, for example, have been used to filter the rather verbose output from the preceding finds to print out only the specific lines containing exact matches to the filename of interest. Although we haven't gotten to the syntax of the more complex matter of chaining Unix commands together to make sophisticated commands, keep grep in mind as a building block, and consider its possible uses when you reach the end of Chapter 15.

Table 10.19 shows the syntax and primary options for grep.

Table 10.19. The Syntax and Primary Options for grep


Prints line matching a pattern.



grep [options] <pattern> <file1> <file2> ...

grep [options] [-e <pattern> | -f <file>] <file1> <file2> ...

grep searches the list of files enumerated by <file1> <file2> ..., or standard input if no file is specified or if - is specified. By default, the matching lines are printed.

Two additional variants of the program are available as egrep (same as grep -E) or fgrep (same as grep -F).

-C <num>

Prints <num> lines of output context. Default is 2.




Assumes that a file is type <type> if the first few bytes of a file contain binary data.

Default <type> is binary, and grep normally outputs a one-line message indicating the file is binary, or nothing if there is no match. If <type> is without-match, it is assumed that a binary file does not match. Equivalent to -I option. If <type> is text, it processes the file as though it were a text file. Equivalent to -a option. Warning: Using this option could result in binary garbage being output to a terminal, some of which could be interpreted by the terminal as commands, resulting in unwanted side effects.


Prints a count of matching lines for each file. Combined with -v, counts -- nonmatching lines.



Inverts matching to select nonmatching lines.hb



Recursively reads files under directories. Equivalent to -d recurse option.


-f <file>

Reads a list of patterns from <file>, which contains one pattern per line. An empty file has no patterns and matches nothing.


-e <pattern>

Uses <pattern> as the pattern. Useful for protecting patterns beginning with -.



Interprets <pattern> as a basic regular expression. This is the default behavior.



Interprets <pattern> as an extended regular expression. Equivalent to egrep.



Interprets <pattern> as a list of fixed strings, separated by newlines, any of which is to be matched. Equivalent to fgrep.



Ignores case in <pattern> and input files.



Output includes the line number where the match occurs.



Suppresses error messages about nonexistent or unreadable files.



Selects only lines that have matches that form whole words.



Selects only those matches that exactly match the whole line.


     < Day Day Up > 

    Mac OS X Tiger Unleashed
    Mac OS X Tiger Unleashed
    ISBN: 0672327465
    EAN: 2147483647
    Year: 2005
    Pages: 251 © 2008-2017.
    If you may any questions please contact us: