Searching for Text in Files


Instead of searching for files by name, you can search the text contents of your document files. Find has the ability to look inside certain kinds of documents and search them for keywords. You don’t have to laboriously open and search files one by one for the text information you want. By using Find, you can quickly and easily find the files that contain the information you seek. In addition, you can use Find to find a file when you don’t remember its name but do remember what it contains.

What can Find search?

By necessity, Find can only search documents that contain text, and it can’t get into every kind of document that contains text. In general, you’ll have success searching the following kinds of documents:

  • Plain text documents, such as those you can edit in the TextEdit application

  • HTML documents are plain text documents with codes that are used to create Web pages.

  • PDF files (Adobe’s Portable Document Format)

  • Microsoft Word documents

  • AppleWorks word processing documents

  • WordPerfect documents

  • Email stored by Eudora, Outlook Express, and some other email applications

In addition to searching document contents for key words, Find also searches file, folder, and disk names for the same keywords.

Note

When searching the text contents of HTML and PDF documents, Find intelligently ignores the text-formatting commands that occur naturally throughout these documents. Special Text Extractor plug-in files make this possible. These plug-in files are in the PlugIns folder at /System/Library/Find/PlugIns/.

Specifying contents to find

Generally, you get the best results when searching by contents if you specify the least common keywords that you think will be in the documents that you want to find. If instead you specify common words that occur in many of your documents, the results of your search by contents will be a long list of mostly extraneous documents. To find the fewest extraneous documents, try to think of one or two unusual words that occur in only the documents that you want to find, and have Find search for those unusual words.

To search by contents, follow these steps:

  1. Open the Find window.

  2. In the Content includes text field, enter the keywords that you want to find as shown in Figure 7-7. To enter text, the text box must be selected. If necessary, you can accomplish this by clicking inside the box or pressing Tab as needed to place an insertion point in the box. If the box already contains text from a previous search request, pressing Tab selects this text so that you can replace it simply by typing.

    click to expand
    Figure 7-7: The Find window can search contents of several types of files in volumes of the computer that have been indexed.

    It doesn’t matter how you capitalize keywords. Find is not case-sensitive, for example, iMac is the same as imac, Imac, or IMAC.

  3. Click the Search button or press Return to begin the search.

    Note

    Find can search document contents only in folders and volumes (disks) that have been indexed. Indexing is described later in this Chapter.

Looking at found documents

When Find finishes searching by contents, a list of the documents it has found is displayed. Find displays the number of documents it found below the list. If Find doesn’t find any documents that match the keywords you specify, it displays a message to that effect.

Ranking the relevance of found documents

Find lists the found documents according to their relevance. A document’s relevance is determined by how often the keywords occur in it and how close together they are in it. The more often the keywords occur and the closer their proximity, the higher the document’s relevance. This method of evaluating relevance is not always 100 percent accurate. The document you’re looking for may not have the very highest relevance. Even so, the document you want probably will be nearer the top of the list than the bottom. Find indicates the degree of each found document’s relevance with the length of a bar in the list of found documents. The longer the bar, the more relevant Find judges a document to be. Figure 7-8 shows how Find ranks found documents by relevance when it displays the results of searching by contents.

click to expand
Figure 7-8: After searching by contents, Find ranks found documents by their relevance to the keywords.

Doing more with found documents

Aside from the relevance ranking, the list of found documents that results from a search by contents is similar to the list of found items that results from a search by name. If you click a listed document once, its folder location appears in the information area at the bottom of the Find window. Double-clicking a document opens it. You can discover what else you can do with files and folders in the found documents list, such as finding similar documents, later in this Chapter.

Indexing folders

Searching the contents of documents seems almost magical but isn’t. It requires some advanced preparation. Before Find can search document contents, it must index them. The Finder can create an index for each folder that you add to its list of searchable sources (as described earlier in this Chapter). It stores each folder’s index in an invisible file inside the folder. This file contains a database of words from the documents in the folder.

Given an index database, the Finder can determine whether the words in your search request exist in the indexed files by quickly searching the database. This search happens quickly because the database is much smaller than the aggregate length of the files it indexes. In addition, searching the index database is faster because the words in it are arranged in order. The first time you search a volume that is not indexed, Find begins by creating an index for that volume and then searching the index. Therefore, your first search of a new volume takes longer than any subsequent searches.

Note

The Finder indexes only the first 2,000 unique words of each document to keep the index database file from becoming too large and bogging down searching by contents. Therefore, the Finder does not index all the words in a document that contains more than 2,000 unique words. The closer a unique word is to the end of such a long document, the less likely the Finder is to include it in the index. If the Finder doesn’t index some words in a long document due to this limitation, you won’t be able to find that document by searching for those words.

Creating indexes

The Finder initially indexes your home folder, and it automatically indexes other folders when you add them to the list of searchable sources. You can also have Find index some volumes, such as removable disks and network volumes, but it doesn’t index entire volumes automatically. The Finder cannot index all folders and volumes.

Note

Actually, the Finder does not prepare content indexes. It hands off this task to an application named ContentIndexing. ContentIndexing hides while it is operating and quits automatically when it finishes indexing. If you want to see ContentIndexing at work, open the Process Viewer application (in the Utilities folder) while the Finder is reporting that indexing is under way and look for ContentIndexing in Process Viewer’s list of running processes.

What can and can’t be indexed

The folders and volumes you index can be located on your computer, another computer on your network, or a network file server. To index a folder, you must have the privilege to save items in it.

Cross Reference

For more information about using folders from file servers and other computers on your network, see Chapter 10.

Find cannot index some types of folders and volumes because it cannot write (save) their index files. As you may expect, Find cannot write an index file on a write-protected disk, such as a CD-ROM or a locked Zip disk. What’s more, Find cannot index a folder for which you do not have Write privilege, which is the privilege to make changes. Many such folders are on your Mac OS X startup disk. It’s also common not to have Write privileges for folders from network file servers and other computers on your network.

Note

Although you can’t create an index on an existing CD-ROM, you may have CD-ROMs that are indexed. The index for such a CD-ROM was created in advance and recorded as part of the CD-ROM’s contents. If you have a CD-R or CD-RW recorder, also known as a CD burner, you can provide a Find index for it by creating an index of the folder or disk whose contents will be recorded on the CD.

Tip

You may be able to index a folder that Find says can’t be indexed. Try logging in as a user who has administrator privileges and indexing again. If you still can’t index the folder, log in as the root user (System Administrator) and try again. Note that Find maintains separate lists of searchable sources for each user. So any folders you add while logged in as one user you will have to add again after logging in as another user. We cover administrator and root user privileges in Chapter 14.

Updating indexes

The Finder updates indexes every time you search that folder or volume. Updating an index generally takes much less time than creating the index initially.

Find determines which indexes to update by going through the list of searchable sources. An index becomes out of date when you change the contents of an indexed document, add documents to an indexed folder or volume, or remove documents from an indexed folder or volume. Find can’t search a folder accurately by contents if the folder’s index is out-of-date. The more outdated an index is, the less accurate the search will be.

Manually updating or creating an index

You can manually update the index for any indexable folder or volume listed in the Files channel, or you can create the index for an indexable folder or volume that doesn’t have one. Follow these steps to start indexing manually:

  1. Select the item you want to index by clicking its icon or name in a Finder window.

  2. Choose File Get Info. The Get Info window appears as shown in Figure 7-9.


    Figure 7-9: Indexes can be updated using the Content index pane of the Get Info window.

  3. Click the disclosure triangle to the right of the Content index pane to show the Content Index pane of the Get Info window. The Content index pane displays.

  4. Click the Index Now button. The index for the volume, folder, or disk is updated.

Indexing in the background

Indexing a folder or volume that contains many documents may take many minutes or even hours. You can let the Finder continue indexing in the background while you use the computer for other tasks. This background indexing is usually unobtrusive thanks to Mac OS X’s preemptive multitasking.

Adjusting indexing speed and disk use

The speed at which the Finder creates and updates indexes depends on the number of languages it uses. The amount of disk space required for index files also depends on the number of languages. Fewer languages yield faster and smaller indexes.

To select which languages Find uses, follow these steps:

  1. Choose Finder Preferences to display the Preferences window.

  2. Click the Advanced icon in the window’s toolbar.

  3. Click the Select... button under “Languages for searching file contents:” to display the Languages window. Figure 7-10 shows the Languages window.

    click to expand
    Figure 7-10: Make indexing faster and make indexes smaller by selecting fewer languages.

  4. Select the languages that you want the Finder to use when it creates and updates indexes.

Deleting indexes

If you want to create a completely new index for a folder that already has one, you can delete the existing index. It’s a good idea to delete a folder’s existing index and create a new one if you make major changes to the folder or if you notice Find becoming noticeably slower at searching the folder by contents.

Follow these steps to delete an index:

  1. Select the item you want to index by clicking its icon or name in a Finder window.

  2. Choose File Get Info. The Get Info window appears, as shown earlier in Figure 7-9.

  3. Click the disclosure triangle to the right of the Content index. The Content index pane displays.

  4. Click the Delete Index button.

  5. Click OK when asked to confirm that you really want to delete.

If the Delete Index button is grayed-out — not available — it is because the item has not yet been indexed. In this case, the Status line in the Content index pane will read Not Indexed.




Mac OS X Bible, Panther Edition
Mac OS X Bible, Panther Edition
ISBN: 0764543997
EAN: 2147483647
Year: 2003
Pages: 290

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net