Searching for Files

Searching for Files

While grep and egrep are great for searching inside files, they don't look at the names of files or other information about the files, such as size or modification date. For that, Mac OS X provides other tools.

Spotlight and mdfind

In version 10.4 of Mac OS X, a powerful new search feature called Spotlight was added.

Spotlight searches two areas of the operating system that are always up-to-date. Every time a file is saved, Mac OS X now updates the metadata store and the content index. The metadata store and content index are only updated if the file being saved is recognized by one of the metadata importers installed on your machine. Mac OS X comes with numerous metadata importers that recognize files created with all the common Apple applications, such as iTunes, iChat, iCal, Pages, and Keynote, as well as most of the common file formats used by Microsoft Office documents, image formats, PDF files, and so on. Apple strongly encourages developers of applications that use new file formats to create metadata importers. (For more on Spotlight, check out the Apple Web site at http://developer.apple.com/macosx/spotlight.html).

The mdfind command allows you to search the same data sources that Spotlight uses, but from the command line. However, mdfind will only return the names of files that the user has permission to see. (See Chapter 8 for more on Unix permissions.)

The metadata stored by Mac OS X includes obvious things like the filename, date and time it was last changed, and so forth, but it can also include a large number of other metadata attributes , such as keywords, copyright information, and number of pages. Table 4.3 is a partial list of metadata attributes. Apple's list of metadata attributes is available at http://developer.apple.com/documentation/Carbon/Reference/MetadataAttributesRef/. The mdls command can be used to display the metadata attributes for a file.

Table 4.3. Some Metadata Attributes

Attributes in italics may be set from the command line on files converted to another format using the textutil command. For a longer list of attributes, see http://developer.apple.com/documentation/Carbon/Reference/MetadataAttributesRef/.

File-System Metadata Attribute Keys

Metadata attribute keys describe the file-system attributes for a file. These attributes are available for files on any mounted volume.

Attributes Common to Many File Types

 

A TTRIBUTE

D ESCRIPTION

kMDItemFSCreationDate

The date when the contents of the file were created. This is different from the file-creation date. It can be used to store information on when the file contents were first created or first modified.

kMDItemAuthors

The author, or authors, of the contents of the file. Can be set with textutil using the -author option.

kMDItemContentModificationDate

Date and time when the content of this item was modified.

kMDItemCreator

Application used to create the document content (for example, Pages or Keynote ).

kMDItemKeywords

Keywords associated with this file. For example, Birthday or Important . Can be set with textutil using the -keywords option.

kMDItemOrganizations

Companies or organizations that created the document. Can be set with textutil using the -company option.

kMDItemPageHeight

Height of the document page, in points (72 points per inch). For PDF files this indicates the height of the first page only.

kMDItemPageWidth

Width of the document page, in points (72 points per inch). For PDF files this indicates the width of the first page only.

kMDItemSubject

Subject of the item. Can be set with textutil using the -subject option.

kMDItemTextContent

Contains a text representation of the content of the document. Applications can search for values in this attribute but are not able to read the content of this attribute directly.

kMDItemTitle

The title of the item. For example, this could be the title of a document, the name of a song, or the subject of an e-mail message. Can be set with textutil using the -title option.

Image Metadata Attribute Keys

 

A TTRIBUTE

D ESCRIPTION

kMDItemAlbum

Title for a collection of media. This is analogous to a record label or photo album.

kMDItemBitsPerSample

The number of bits per sample. For example, the bit depth of an image ( 8-bit , 16-bit , and so on).

kMDItemColorSpace

The colorspace model used by the document contents. For example, RGB , CMYK , YUV , or YCbCr.

kMDItemOrientation

The orientation of the document contents. Expected values: (landscape) 1 (portrait).

Video Metadata Attribute Keys

 

A TTRIBUTE

D ESCRIPTION

kMDItemCodecs

The codecs used to encode/decode the media.

kMDItemMediaTypes

The media types present in the content.

kMDItemTotalBitRate

The total bit rate, audio and video combined, of the media.

Audio Metadata Attribute Keys

 

A TTRIBUTE

D ESCRIPTION

kMDItemAudioSampleRate

Sample rate of the audio data contained in the file. The sample rate is a float value representing hz (audio_frames/second). For example: 44100.0 , 22254.54 .

kMDItemKeySignature

Musical key of the song in the audio file. For example: C , Dm , F#m , Bb .


The mdfind and mdls commands are available only in Mac OS X/Darwin.

To search for files using mdfind:

  • mdfind query

    mdfind produces a list of files that match the query. The query is a string that is matched (without regard to upper/lowercase) against the central metadata store. In the simplest case, the query can be just a single word. For example,

    mdfind susie

    would produce a list of files where any of their metadata included the string susie it could be in the files' name, in the content, in the kMDItemKeywords attribute, in the kMDItemOrganizations attribute, and so on.

Tips

  • Documentation for the query language used by mdfind is available at http://developer.apple.com/documentation/Carbon/Conceptual/SpotlightQuery/. Table 4.4 shows some examples of queries.

  • You can list the metadata attributes on a file with the mdls command:

    mdls file

  • To have mdfind ignore permissions on files it finds, do the search using the sudo command (covered in Chapter 11, "Introduction to System Administration"):

    sudo mdfind query

  • You can use the -live option to mdfind to have it continuously and instantly update its output to show the number of files that match the query. As files are added or removed, the count will change.

  • Consider using the -0 option to pipe the output of mdfind through the xargs command. For example, to back up every file that mentions peaches, you would use the following:

     mdfind -0 peaches  xargs -0 -J % cp  % /Volumes/Backups/ 

    See man xargs for moreit is a very powerful command.

  • You may use the -onlyin option to limit mdfind results to files in a particular directory.

  • As of Mac OS X 10.4, there is only one command-line tool for setting or changing metadata attribute values. The textutil command is primarily designed for converting files from one format to another (for example, from Microsoft Word to HTML) but also allows setting a few metadata attributes (like kMDItemAuthor ) on the converted files. See man textutil for more information.


Table 4.4. Complex Queries Using mdfind

Find files that mention jack , but only in your Documents folder:

mdfind -onlyin ~/Documents jack

Find all HTML docs that mention refactoring in their content:

mdfind "(kMDItemKind == 'HTML Document'c) && (kMDItemTextContent == 'refactoring'wc)"

All PDF's that mention dashboard :

mdfind "(kMDItemKind == 'PDF document'c) && (kMDItemTextContent == 'dashboard'wc)"

Find PDF documents where the height (of the first page) is less than or equal to 842 pixels:

mdfind "(kMDItemKind == 'PDF document'c) && (kMDItemPageHeight <= 842)"

Find files that use the Palatino and/or Arial fonts:

mdfind "(kMDItemFonts == Palatino) (kMDItemFonts == Arial)"


locate

The locate command searches a database of files for filenames that match its argument. The locate database contains the names of almost all the files on the system (it omits some files for security reasons). The locate command is available on most versions of Unix, unlike mdfind , which is (currently) available only in Mac OS X/Darwin. Also, since locate searches only filenames, it will sometimes be easier to use than trying to restrict a different search tool to search only filenames.

This means that locate can perform a very fast search of practically the whole system, but it only finds files that existed as of the last database update. The database is rebuilt weekly (Sundays at 4:30 a.m. on Mac OS X). See the "Running Regularly Scheduled Commands" section in Chapter 11 to learn how to change that.

If you are looking for a file you believe has been around since before the last Sunday update, locate is a good way to look for it.

To use locate to search for a file:

  • locate string

    locate produces a list of file paths that include the string. Note that it is case sensitive:

    locate security

    and

    locate Security

    produce different results, as shown in Figure 4.31 (partial results shown).

    Figure 4.31. Comparing results for Security and security when using locate . (Your output will differ , but you get the idea.)
     localhost:~ vanilla$  locate security  ... /Library/Documentation/Services/apache/misc/security_tips.html /System/Library/Frameworks/JavaVM.framework/Versions/1.3.1/Home/lib/security /System/Library/Frameworks/JavaVM.framework/Versions/1.3.1/Home/lib/security/cacerts /System/Library/Frameworks/JavaVM.framework/Versions/1.3.1/Home/lib/security/java.policy /System/Library/Frameworks/JavaVM.framework/Versions/1.3.1/Home/lib/security/java.security ... localhost:~ vanilla$  locate Security  ... /Library/Receipts/SecurityUpdate10-19-01.pkg /Library/Receipts/SecurityUpdate10-19-01.pkg/Contents /Library/Receipts/SecurityUpdate10-19-01.pkg/Contents/Resources /Library/Receipts/SecurityUpdate10-19-01.pkg/Contents/Resources/BundleVersions.plist /Library/Receipts/SecurityUpdate10-19-01.pkg/Contents/Resources/da.lproj ... localhost:~ vanilla$ 

Tips

  • locate tends to produce voluminous output, so consider piping its output through grep to filter it or through less to see it one screen at a time. For example:

    locate security grep Library

    or

    locate security less

  • Want to count how many files locate located? Pipe it through wc -l :

    locate security wc -l


find

While locate is fast and simple, find is more flexible, allowing you not only to search for patterns in filenames, but also to specify multiple criteria. You can search by type of file (directory versus plain file), modification date, size, and many more. See the Unix man page ( man find ) for a complete list and several good examples.

The first argument to find is always a directory name, which tells find where to start looking (it can be . for the current directory, / for the root directory, or any other path ). You then specify options to tell find which files will match, and finally what to do with each matching filename. The default is to send each matching filename to stdout , but you can do other things, such as execute a command using each found filename as an argument.

To search for files based on name:

  • find dirname -name " pattern "

    For example,

    find ~ -name "Picture*"

    would show all the files in your home directory whose names begin with Picture ( Figure 4.32 ). (Your shell interprets the ~ (tilde) character as "my home directory." See Chapter 5.) The quotes around Picture* are important. Without them the shell would interpret the * instead of passing it to find . We want the find command to get Picture* as the value for the -name option.

    Figure 4.32. Finding all the files in your home directory whose names start with Picture .
     localhost:~ vanilla$  find ~ -name "Picture*"  /Users/vanilla/Documents/Picture of Susan /Users/vanilla/Pictures /Users/vanilla/Pictures/Picture 1.jpg /Users/vanilla/Pictures/Picture 2.jpg /Users/vanilla/Pictures/Picture 3.jpg /Users/vanilla/Pictures/Picture 4.jpg localhost:~ vanilla$ 

To search for files based on type:

  • Use the -type option to select the file type you want, such as directories.

    find ~/Documents -type d

    finds all the directories inside your Documents directory ( Figure 4.33 ).

    Figure 4.33. Using the -type d option to find only directories.
     localhost:~ vanilla$  find ~/Documents -type d  /Users/vanilla/Documents /Users/vanilla/Documents/Contracts /Users/vanilla/Documents/Misc. /Users/vanilla/Documents/PartyPlans localhost:~ vanilla$ 

    To find only regular files, use -type f . See man find for a complete list of available types.

Tip

  • Combine the -type option with the -name option to find only files that match both name and type.

    find ~ -type d -name "Picture*"

    looks in your home directory ( ~ ) and finds only directories whose names begin with the string Picture .


A very cool feature of find is its ability to find every file that has been modified after some specific reference file, such as one with a particular date. One use of this feature would be in scripts that perform backups.

To find every file modified after a reference file:

  • find dirname - newer filename

    For example,

    find . -newer "Figure 4.27.doc"

    searches the current directory and all the subdirectories contained inside it for files that have been modified more recently than the file Figure 4.27.doc.

    Figure 4.34 shows what the output might look like. Notice how the current directory (.) showed up in the list. This is because a directory is considered "modified" whenever a file is added to or removed from it.

    Figure 4.34. Searching the current directory for files modified after a reference file.
     localhost:~ vanilla$  find . -newer "Figure 4.27.doc"  . ./Chapter 4.doc ./Figure 4.28.doc ./Figure 4.29.doc ./Figure 4.30.doc ./Figure 4.31.doc ./Figure 4.32.doc ./Word Work File D 3702 ./Word Work File D 4 localhost:~ vanilla$ 

Tip

  • Find can also find files that have (or have not) been modified, accessed, or created in the past n 24- hour periods, where n is any whole number. For example, this will find files that have been modified less than one day ago:

    find . -mtime -1

    and this will find files modified more than two days ago:

    find . -mtime +2


One last thing we will show you about find is how to apply a command line to every file that find discovers. That is, the find command produces a list of files, and you may want to use each of those filenames as an argument to a command, over and over. One reason to do this would be to use grep to search each of the found files. Another reason would be to move the found files to a new location, and a third reason would be to remove each of the found files. The possibilities are endless.

find provides a built-in option for executing a command on each file, the - exec option. However, this option does not handle files with spaces in their names very well. Also, if a large number of files are involved, the -exec option is noticeably slower than using a command called xargs .

The alternative is to pass the output of find , via a pipe, to another command, called xargs . The xargs command takes a list of files on stdin and executes its arguments as a command line once for every filename it is passed. Normally xargs will put the filename at the end of the command, but see the -J option, described in a tip below.

To apply a command to each file found:

  • find . -name "*.doc" -print0 xargs

    -0 ls -sk

    The first part of the find command should be familiar by nowyou are finding all the files in the current directory whose names end in .doc. The -print0 option tells find not to put a newline character at the end of each line of output, but instead to use a special character (called the null character), which never appears in filenames.

    You then pipe the output of find into xargs . The -0 option to xargs tells it to use the null character (instead of spaces) as the separator between filenames. xargs then executes the command line ls -sk filename on each filename that it gets from the pipe.

    The result will look like Figure 4.35 , giving us the size (in kilobytes) of every .doc file in the current directory and all its subdirectories.

    Figure 4.35. Finding the sizes of all the .doc files in the current directory.
     localhost:~/Documents/OS X vanilla%  find . -name "*.doc" -print0  xargs -0 ls -sk  204 ./Chapter 0 - Introduction/Introduction.doc   136 ./Chapter 0 -TOC/Outline.doc    32 ./Chapter 0 -TOC/Outline_Comments_me.doc   180 ./Chapter 0 -TOC/Outline_v2.doc   260 ./Chapter 0 -TOC/Outline_v2.hb.doc   184 ./Chapter 0 -TOC/Outline_v3.doc   448 ./Chapter 0 -TOC/Outline_v4.doc    64 ./Chapter 0 -TOC/TOC.doc    64 ./Chapter 0 -TOC/TOC2.doc   188 ./Chapter 1/Chapter 1 v2.doc   160 ./Chapter 1/Chapter 1 v2beta.doc   204 ./Chapter 1/Chapter 1 v5a/Chapter_1_v5a.doc    24 ./Chapter 1/Chapter 1 v5a/Figure 1.2.doc   localhost:~/Documents/OS X vanilla$ 

Tips

  • The xargs command normally puts each incoming argument at the end of the command line, but you can use the -J option to put the argument in the middle of a command line:

     mdfind -0 Apples  xargs -0 -J % cp  % /Volumes/Backups/ 

    copies all the files found by mdfind into the directory /Volumes/Backups . The -J % option tells xargs to replace the % in the following command line with the argument it is processing.

  • You could get just the numbers with

     find . -name "*.doc" -print0  xargs -0 ls -sk  awk '{print }' 

    And if you wanted to add them up, you would create a Perl script called add containing

     #!/usr/bin/perl while (<>) {      $sum += $_; } print "$sum\n"; 

    and then use the command line

     find . -name "*.doc" -print0  xargs  -0 ls -sk  awk '{print }'  add 

    (If you are an experienced Unix user, we acknowledge that, yes, you could use an awk script instead of the Perl script.)


The fact that a complicated command line like the one we've been using is just a grouping of simpler commands into a problem-solving entity shows the power, and also the simplicity, of the Unix command line. Compare the command line above with creating a Workflow in the Mac OS X Automator application (http://developer.apple.com/macosx/automator.html).

which

The which command is used specifically to find out which version of a command should be executed. When you enter a command line, Unix looks for the command in a series of directories named in a list called your PATH , and executes the first matching command it finds. (See Chapter 7, "Configuring Your Unix Environment," to learn how to alter your PATH .)

Sometimes there will be more than one command with the same name among the various directories in your PATH . The which command tells you which one will actually be executed.

To search your path for a command:

  • which command

    For example,

    which ls

    shows you that the ls command is /bin/ls , and which cd shows you that the cd command is /usr/bin/cd .



Unix for Mac OS X 10. 4 Tiger. Visual QuickPro Guide
Unix for Mac OS X 10.4 Tiger: Visual QuickPro Guide (2nd Edition)
ISBN: 0321246683
EAN: 2147483647
Year: 2004
Pages: 161
Authors: Matisse Enzer

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net