< Day Day Up > |
Teach Windows XP or 2000 how to search the full text of your PDF along with your other documents. Or, use Adobe Reader to search PDF only . Search is essential for utilizing document archives. Search can also find things where you might not have thought to look. The problem is that Windows search doesn't know how to read PDF files, by default. We present a couple of solutions. 2.7.1 Search PDF with Adobe ReaderThe free Adobe Reader 6.0 provides the easiest solution. It enables you to perform searches across your entire PDF collection (Edit Search). Its detailed query results include links to individual PDF pages and snippets of the text surrounding your query, as shown in Figure 2-5. Its Fast Find setting, enabled by default, caches the results of your searches, so subsequent searches go much faster. View or change the Reader search preferences by selecting Edit Preferences Search. Figure 2-5. Collection search results in Reader linking directly into the documentsThe downside to Adobe Reader search is that it searches PDF documents only. 2.7.2 Index and Search PDF with Windows XP and 2000It makes sense to search across all file types from a single interface. Newer versions of Windows enable you to extend its built-in search feature to include PDF documents. With Windows 2000, all you need to do is install the freely available PDF IFilter from Adobe. With Windows XP, you must also apply a couple of workarounds. In both cases, you can use the Windows Indexing Service to speed up searches. The Windows Indexing Service is powerful but needs to be configured for best performance. The next section introduces you to the Indexing Service. We then discuss installing and troubleshooting Adobe's PDF IFilter. 2.7.3 Windows Indexing Service: Installation, Configuration, and DocumentationYou don't need Indexing Service to search your computer, but it can be handy. Queries run much faster, and you can use advanced search features such as Boolean operators (e.g., AND , OR , and NOT ), metadata searches (e.g., @DocTitle Contains "pdf "), and pattern matching. The downside is that the Indexing Service always runs in the background, using resources to index new or updated documents. A little configuration ensures that you get the best performance. First off, do you have Indexing Service? If not, how do you install it? Both questions are answered in the Windows Components Wizard window. In Windows XP or 2000, open this wizard by selecting Start Settings Control Panel Add or Remove Programs and clicking the Add/Remove Windows Components button on the left. Find the Indexing Service component and place a check in its box, if it is empty, as shown in Figure 2-6. Click Next and proceed through the wizard. Figure 2-6. Adding the Indexing Service component to XP or 2000Access Indexing Service configuration and documentation from the Computer Management window, shown in Figure 2-7. Right-click My Computer and select Manage. In the left pane, unroll Services and Applications and then Indexing Service. Figure 2-7. The Computer Management window, where you configure the Indexing ServiceSometimes you must stop or start the Indexing Service. Right-click the Indexing Service node and select Stop or Start from the context menu. Under the Indexing Service node you'll find index catalogs , such as System. Add, delete, and configure these catalogs so that they index only the directories you need. For details on how to do this, I highly recommend the documentation under Help Help Topics Indexing Service. This document also details the advanced query language.
You still can search the directories you do not index by selecting Start Search For Files or Folders, so don't feel compelled to index your entire computer. Before installing the PDF IFilter, create a special catalog for testing purposes. Put a few PDFs in its directory. Disable indexing on all other catalog directories by double-clicking these directories and selecting "Include in Index? No." This will simplify testing because indexing many documents can take a long time.
2.7.4 Prepare to Install PDF IFilter 5.0On Windows XP and 2000, you have two kinds of searches: indexed and unindexed . An indexed search relies on the Indexing Service, as we have discussed. An unindexed search takes a brute-force approach, scanning all files for your queried text, as shown in Figure 2-8. In both cases, the system uses filters to handle the numerous file types. These filters use the IFilter API to interface with the system. Figure 2-8. An unindexed searchA PDF IFilter is freely available from Adobe. Visit http://www.adobe.com/support/salesdocs/1043a.htm and download ifilter50.exe . Adobe's web page states that this PDF IFilter works only on servers. In fact, it works on XP Home Edition, too. If you run Windows 2000, you can install the PDF IFilter and it will work for both indexed and unindexed PDF searching. If you run Windows XP Home Edition and install the PDF IFilter (Version 5.0), you might need to disable the PDF IFilter for unindexed PDF searches. Unindexed searching of PDFs on XP Home Edition with the PDF IFilter can leave open file handles lying around, which will cause all sorts of problems. Visit http://www.pdfhacks.com/ifilter/ and download PDFFilt_FileHandleLeakFix.reg . We will use it in our installation instructions, later in this hack. This registry hack ensures that only the Indexing Service uses the PDF IFilter. After you apply this hack, PDFs will be treated like plain-text files during unindexed searches. You can undo this registry hack with PDFFilt_FileHandleLeakFix.uninstall.reg .
2.7.5 Install and Troubleshoot Adobe PDF IFilter 5.0On XP, installing the PDF IFilter might require a couple of registry hacks. First we'll install it, then we'll troubleshoot.
| |||||||||
|
|
To test your index, don't select Start Search. Instead, in the Computer Management window, select the Query Catalog node listed under your test catalog. Submit a few queries that would work only on the full text of your PDFs. Avoid using document headings or titles. Did it work? If so, you're done! If you get no results, as shown in Figure 2-9, work through the next section, which explains a common workaround for Windows XP.
PDF IFilter and Indexing Service don't see eye to eye on Windows XP. If querying indexed PDF yields empty sets, give this a try:
In the Computer Management window (right-click My Computer and select Manage), right-click Services and Applications Indexing Service and select Stop.
Open the Registry Editor (Start Run . . . Open: regedit OK).
Select HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex and double-click the DLLsToRegister key to edit it.
In the list of DLLs, delete the following line:
C:\Program Files\Adobe\PDF IFilter 5.0\PDFFilt.dll
Click OK, and then close the Registry Editor.
Start the Indexing Service back up (right-click Services and Applications Indexing Service and select Start).
|
When searching PDFs by selecting Start Search For Files and Folders, don't search for Documents. Search All Files and Folders instead. The Documents search overlooks PDFs.
If you indexed a specific folder instead of an entire drive, that folder (or one of its subfolders ) must be given in the Look In: field when using Start Search For Files and Folders. Otherwise, the index won't be consulted; an unindexed search will be performed instead, even within the indexed folder. Set the Look In: field to a specific folder by clicking the drop-down box and selecting Browse . . . , as demonstrated in Figure 2-11.
When searching within an indexed folder, you can use advanced search terms (e.g., @DocTitle Contains "earnings "). Consult the Indexing Service online documentation, described earlier, for details.
Using the older Windows search tool on PDF still can be useful, even if it doesn't access the full text of your document. If the PDF documents are not encrypted, their metadata (Title, Author, etc.) and bookmarks are visible to the search tool as plain text. PDF shortcut titles [Hack #17] also are searched.
< Day Day Up > |