19 Locating Files by Filename

#19 Locating Files by Filename

One command that's quite useful on Linux systems, but isn't always present on other Unixes, is locate , which searches a prebuilt database of filenames for the specified regular expression. Ever want to quickly find the location of the master . cshrc file? Here's how that's done with locate :

 $  locate .cshrc  /.Trashes/501/Previous Systems/private/etc/csh.cshrc /OS9 Snapshot/Staging Archive/:home/taylor/.cshrc /private/etc/csh.cshrc /Users/taylor/.cshrc /Volumes/110GB/WEBSITES/staging.intuitive.com/home/mdella/.cshrc 

You can see that the master .cshrc file is in the /private/etc directory on this Mac OS X system. The locate system sees every file on the disk when building its internal file index, whether the file is in the trash queue, is on a separate volume, or is even a hidden dot file. This is a plus and a minus, as I will discuss shortly.

This method of finding files is simple to implement and comes in two parts . The first part builds the database of all filenames by invoking find , and the second is a simple grep of the new database.

The Code

 #!/bin/sh # mklocatedb - Builds the locate database using find. Must be root #   to run this script. locatedb="/var/locate.db" if [ "$(whoami)" != "root" ] ; then   echo "Must be root to run this command." >&2   exit 1 fi find / -print > $locatedb exit 0 

The second script is even shorter:

 #!/bin/sh # locate - Searches the locate database for the specified pattern. locatedb="/var/locate.db" exec grep -i "$@" $locatedb 

How It Works

The mklocatedb script must be run as the root user , something easily checked with a call to whoami , to ensure that it can see all the files in the entire system. Running any script as root, however, is a security problem, because if a directory is closed to a specific user's access, the locate database shouldn't store any information about the directory or its contents either. This issue will be addressed in the next chapter with a new secure locate script that takes privacy and security into account. For now, however, this script exactly emulates the behavior of the locate command in standard Linux, Mac OS X, and other distributions.

Don't be surprised if mklocatedb takes a few minutes or longer to run; it's traversing the entire file system, which can take a while on even a medium- sized system. The results can be quite large too. On my Mac OS X reference system, the locate.db file has over 380,000 entries and eats up 18.3MB of disk space. Once the database is built, the locate script itself is a breeze to write, as it's just a call to the grep command with whatever arguments are specified by the user.

Running the Script

To run the locate script, it's first necessary to run the mklocatedb script. Once that's done (and it can take a while to complete), locate invocations will ascertain all matching files on the system for any pattern specified.

The Results

The mklocatedb script has no arguments or output:

 $  sudo mklocatedb  Password: $ 

You can see how large the database file is with a quick ls :

 $  ls -l /var/locate.db  -rw-r--r--  1 root  wheel  42384678 Mar 26 10:02 /var/locate.db 

To find files on the system now, use locate :

 $  locate -i gammon  /OS9/Applications (Mac OS 9)/Palm/Users/Dave Taylor/Backups/Backgammon.prc /Users/taylor/Documents/Palm/Users/Dave Taylor/Backups/Backgammon.prc /Users/taylor/Library/Preferences/Dave's Backgammon Preferences /Volumes/110GB/Documents/Palm/Users/Dave Taylor/Backups/Backgammon.prc 

This script also lets you ascertain other interesting statistics about your system, such as how many C source files you have:

 $  locate '.c'  wc -l  381666 

That's quite a few! With a bit more work, I could feed each one of these C source files to the wc command to ascertain the total number of lines of C code on the box, but, um, that would be kinda daft, wouldn't it?

Hacking the Script

To keep the database reasonably current, it'd be easy to schedule an invocation of mklocatedb to run from cron in the wee hours of the night, or even more frequently based on local usage patterns. As with any script executed by the root user, care must be taken to ensure that the script itself isn't editable by nonroot users.

The most obvious potential improvement to this script would cause locate to check its arguments and fail with a meaningful error message if no pattern is specified; as it's written now, it'll spit out a grep command error instead, which isn't that great. More importantly, as I discussed earlier, there's a significant security issue surrounding letting users have access to a listing of all filenames on the system, even those they wouldn't ordinarily be able to see. A security improvement to this script is addressed in Script #43, Implementing a Secure Locate .


There are newer versions of the locate command that take security into consideration. These alternatives are available as part of the latest Red Hat Linux distribution, and as part of a new secure locate package called slocate , available for download from http://rpms.arvin.dk/slocate/ .

Wicked Cool Shell Scripts. 101 Scripts for Linux, Mac OS X, and Unix Systems
Wicked Cool Shell Scripts
ISBN: 1593270127
EAN: 2147483647
Year: 2004
Pages: 150
Authors: Dave Taylor

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net