Searching Other Document Formats

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 19.  Managing Indexing


SharePoint Portal Server provides, as discussed in Chapter 5, filters used to retrieve text and properties for most common document formats. But the list may not include your favorite document formats, such as PDF, a common format for archived information. In such cases, you need to check if an IFilter is available. If you cannot find a suitable IFilter, you can consider the usage of the plain text filter, or just index the system properties such as name , creation date, and last modification date.

TIP

You can download an IFilter for PDF from Adobe. Check http://www.adobe.com/support/downloads/detail.jsp?ftpID=1276.

For PDFs that are saved as images, you will be limited to searching the metadata only, and not the text. To determine if the PDF is an image or contains indexable text, you can open the PDF and use the Select/Find text option from the menu. If you can do this, then the content should be indexable.


Document formats are typically determined from the filename extension. Therefore, to enable searching of new document formats, we need to make SharePoint Server aware of the new document extension.

  1. Open the Management/Content Sources Web folder.

  2. Select the Additional Settings icon.

  3. Select File Types.

  4. your new extension.

  5. When closing the dialog, you will be prompted whether the index needs to be updated. If you want to install a filter for this format, select No.

After you have added your file type, you should not see subsequent index updates with the message "URL is excluded because the URL extension is restricted as defined in the file type rules" in the gatherer log.

The next step is to improve user feedback by adding icons that the user associates with the new format. Therefore you need a 16x16 pixel gif file. Perform the following steps:

  1. Name the gif file <ext>16.gif. So, for example, for PDF files name it pdf16.gif.

  2. Open a Web folder to your workspace and enable to view hidden folders and items.

  3. Open the hidden Portal/Resources/DocTypeIcons folder.

  4. Drag the gif file into this folder.

The last (optional) step is to install a filter for this document format. Therefore, you need to install the filter according to the guidelines of the supplier. A filter is implemented as a Dynamic Link Library (DLL) that is loaded once specific registry entries are set. Typically the procedure is as follows :

  1. Stop the Microsoft Search service.

  2. Install the filter. If no instructions are given, you may just register the DLL using the regsvr32 command.

  3. Start the Microsoft Search service.

  4. If you know specific content sources that do contain documents with that extension, for example a particular Web site, perform a full update just for this content source. Otherwise, start a full update of all content sources.

TIP

Depending on the document format, you may be able to reuse an existing filter, such as the "plain text" IFilter. To do so, open the registry editor and go to the key HKR\.<your extension> (HKR is used as an abbreviation for HKEY_CLASSES_ROOT) and create a new key with the name PersistentHandler. Set the default entry to {c1243ca0-bf96-11cd-b579-08002b30bfeb}, the Class ID of the Text IFilter.

Similar steps are necessary to index mht files, single-file Web Archives encoded in MIME. While there is a MIME IFilter, Microsoft has excluded it from the release. To include support for these files, open the registry editor and go to the key HKR\CLSID{3050F3D9-98B5-11CF-BB82-00AA00BDCE0B} and create a new key with the name PersistentHandler. Set the default entry to {5645C8C1-E277-11CF-8FDA-00AA00A14F93}, the Class ID of the MIME IFilter.



                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net