Checking for Errors While Indexing

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 19.  Managing Indexing


There are numerous reasons why SharePoint Portal Server may not be able to index a certain document. The most common problems are

  • Insufficient privileges to access a page. This may occur on sites where particular URLs are password-protected.

  • Network or remote server problems that make it impossible to access the site.

  • "Broken" links included on a Web page.

  • SharePoint Portal Server Site Path rules that exclude a particular URL.

  • The file type that is associated with the URL is not supported by SharePoint Portal Server.

  • URLs that include query parameters. Query parameters are appended to the URL following a question mark (?). Each parameter is separated by an ampersand (&). You will see these constructs typically with Active Server Pages.

A first indication whether a content source is indexed successfully is the "health" indicator that shows up in the left pane of your Content Sources Web Folder after selecting that content source (see Figure 19.6).

Figure 19.6. This figure shows the Content Sources Web Folder with the Discussions content source selected. In the left pane you see details, such as the health, of that content source.

graphics/19fig06.jpg

CAUTION

When indexing Web sites, you will often not be able to achieve a 100% health. Web sites often include "broken" links and are generally outside your control.


The best place to check for errors is the gatherer log, which is created for each crawl.Within the log file, successful accesses, access errors, and accesses prohibited by rules can be reported . Tracking successful accesses and accesses prohibited by rules need to be explicitly enabled. These options come in handy if index restrictions, such as Site Path rules, need to be verified or debugged . To enable these options, do the following:

  1. Log in as Workspace Coordinator.

  2. Open the Management Web Folder of your workspace.

  3. Select Workspace Settings.

  4. Select the Logging tab.

  5. Select Log success or Log items excluded by rules as appropriate.

  6. Click OK.

NOTE

Enabling these options will produce significantly larger log files. They are global to the workspace and thus will affect crawls for other content sources. Turn these options off as soon as the crawl behaves as expected.


As Administrator, you also can use the Management Console to change the logging settings. Open Programs, Administrative Tools, SharePoint Portal Server Administration and select the workspace node in the tree on the left. Right-click or select the Properties option from the Actions menu to see the same dialog that is shown with step 3 above.

Coordinators can view the gatherer log file (see Figure 19.7) from an Active Server Page that includes some options to filter the events. To do so, click Click here for detailed log in the left pane of your Web folder after selecting the content source in which you are interested. This option is located in the lower-right corner, under the health indicator mentioned earlier.

Figure 19.7. This figure shows the gatherer log rendered as a Web page. The details show that .gif files will not be indexed.

graphics/19fig07.jpg

The gatherer log is written in a binary format. There is an undocumented object model to retrieve the information in a readable format. Administrators, for example, can view the gatherer log by using a Visual Basic script file that is included on the SharePoint Portal Server CD. You can find the gthrlog.vbs script in the Support\Tools directory.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net