Troubleshooting

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 18.  Configuring SPS to Crawl Other Content Sources


Troubleshooting crawling and index-creation- related issues is covered in this section.

Note that for additional troubleshooting details, please refer to the last chapter in this book, "Troubleshooting," beginning p. 615, Chapter 23.

Viewing the Gatherer Log

[click here]

I need to view the Gatherer Log after SPS updates an index.

A1:

Each time SharePoint Portal Server updates an index, it creates what is called a gatherer log file for the workspace. This file contains data about URLs that SharePoint Portal Server accesses while creating an index. The file records successful accesses, access errors, and accesses disallowed by rules (should the administrator or coordinator need to debug the index restrictions). This log may be viewed for up to five days from an ASP page in the workspace. After five days, SPS deletes the log. Take note that the most recent log can be determined by observing the file namethe name with the largest number is the most recent. Thus, workspacename.2.gthr is newer than workspacename.1.gthr.

To actually view the gatherer log

  1. In the workspace, open the Management folder, and then open the Content Sources folder.

  2. To view the gatherer log for a specific content source or several specific content sources, click once to select the content source or sources. To view the log for the entire folder, do not select any content sourcessimply go to the next step below.

  3. In the Web view of the Content Sources folder, click the link named Click here for Detailed Log. The Web view is in the lower left corner of the folder view.

  4. The Gatherer Log Viewer page now opens in a browser window, and the following three sections are expandedoverview/statistics, detailed log entries, and filter criteria. Under Filter criteria, the following may be viewed:

    Content Source This allows filtering by specific content sources. By default, All in Range is selected.

    Documents Added, Removed, or Updated Note that the All the above option is selected by default.

  5. If you do not supply any additional filtering criteria, SharePoint Portal Server builds a log by default. According to Microsoft's SPS Resource Kit, this log loads from the most recent log entry to the least recent log entry, from the beginning of the log file, starting with information from the most recent crawl, or from the maximum number of entries allowed; whichever is fastest .

  6. Click Submit to include your new filter criteria, or click Reset to reload complete information from the logs and to recalculate all statistics.

  7. Wait for the gatherer log to compile a summary of scanned file statistics and information.

Fortunately, to assist us in tuning and configuring SPS, Microsoft has included a number of Performance Monitor counters that we can leverage. One counter in particular regarding the Microsoft Gatherer performance object includes Documents Delayed Retry, which is the number of documents that are retried after a time-out. When this number is greater than zero, the Web Storage System on the local server being crawled is actually shut down.

Another important crawling counter tracked by the Microsoft Gatherer performance object is Threads Accessing Network, which is the number of threads waiting for a response from the filter process. If no activity is occurring and this number equals the number of filtering threads, this may indicate a network problem or unavailability of the server being crawled.

A host of crawl-based counters are tracked in another object as well, the Microsoft Gatherer Projects performance object. These are identified and described in Table 18.1:

Table 18.1. Microsoft Gatherer Counters

Counter Name

Counter Description

Adaptive Crawl Accepts

Documents accepted by adaptive update.

Adaptive Crawl Error Samples

Documents accessed for error sampling.

Adaptive Crawl Errors

Documents that adaptive update incorrectly rejects.

Adaptive Crawl Excludes

Documents that adaptive update excludes.

Adaptive Crawl False Positives

Number of false positives that occur when the adaptive update has predicted that a document has changed when it has not. If this number is high, the adaptive update algorithm is not modeling the changes in the documents correctly.

Adaptive Crawl Total

Documents to which adaptive update logic was applied.

Changed Documents

Documents that have changed since the last crawl.

Crawls in progress

Number of crawls in progress.

Incremental Crawls

Number of incremental crawls in progress.

Not Modified

Number of documents that were not filtered because no modification was detected since the last crawl.

Started Documents

Number of documents initiated into the Gatherer service. This includes the number of documents on hold, in the active queue, and currently filtered. When this number goes to zero during a crawl, the crawl will be completed.

URLs in History

Number of files (URLs) in the history list. This indicates the total number of URLs covered by the crawl, either successfully indexed or failed.

TIFF Issues

[click here]

I'm having problems dealing with TIFF files. How can I correct this?

A1:

As previously discussed, SharePoint Portal Server Setup automatically installs an IFilter for TIFF files. This filter handles both .tif and .tiff file extensions. When crawling TIFF files, SPS only looks at file propertiesthis process is quite clean. If optical character recognition (OCR) is enabled, though, SharePoint Portal Server scans the TIFF document and attempts to recognize words and characters such that additional data may be gleaned and included in the index. This process is less than perfect, but quite valuable in many cases.

Should issues arise with TIFF files, a registry key may be updated to specify writing TIFF-based error messages in the Application Log (one of three logs accessible via the Microsoft Windows 2000 Event Viewer, also referred to as the Windows 2000 event log). By default, the SPS Server logs TIFF error messages in the gatherer log.

CAUTION

After editing any of the TIFF filter registry keys, the Microsoft Search (MSSearch) service must be restarted. If the SharePoint Portal Server being restarted also serves as an Exchange or SQL Server, keep in mind that restarting the MSSearch service will impact these applications as well.


Troubleshooting Crawling a Content Source

[click here]

I'm having problems crawling content sources.

A1:

If crawling a content source fails, verify the following:

  • Is access denied? If so, has the default content access account expired ? If another content access account is being used, is this account still valid? If the account is valid and access is still denied , a permissions issue may exist in terms of accessing or reading the content.

  • Is the "file not found"? If this is the case, check the URL for the content source. Try accessing the URL from a standard Web browser while logged on as the specified access account, thus verifying at some level whether the URL is valid and the account information is good.

Also, be sure to review the gatherer log as previously discussed. In this way, detailed information on the search may be of assistance in troubleshooting.

Failure of Crawling a Web Site

[click here]

I'm having problems crawling a Web site but I have done it in the past. What gives?

A1:

If crawling simply fails to work on a Web site on the Internet, and it has worked previously, the time-out settings or the proxy settings may simply need to be reconfigured. Review these areas, and verify entries.

Troubles Crawling a Lotus Notes Content Source

If crawling a Lotus Notes content source fails, start troubleshooting by confirming the following possibilities:

  • Has the Lotus Notes protocol handler been configured correctly? If running the Lotus Notes Index Setup Wizard fails for any reason, don't forget to restart MSSearch before running the wizard again!

  • Does the protocol handler need to be reconfigured? Reconfiguration of the protocol handler is required in the event that the Lotus Notes installation has changed. Reconfiguration is also required if Lotus Notes security changes (which may be likely, for example if users are added, changed, or removed in regard to the access ID).

  • If the security mapping has changed, the MSSearch service must be stopped and restarted for the changes to take effect.

  • Has the Lotus Notes administrator changed the port number that the Lotus Notes server uses? If this is the case, any content sources must be fully updated.

  • Does the Lotus Notes server name contain a space? This is a no-no, as SharePoint Portal Server cannot crawl a Lotus Notes server that contains a space in the computer name.

Exchange 5.5 Content Source Crawling Issues

[click here]

My search results aren't what I expected when crawling Exchange 5.5.

A1:

If crawling an Exchange Server 5.5 content source fails or search results are not as expected, explore the following possibilities:

  • Is Outlook installed on the SharePoint Portal Server computer? If so, is the optional Collaboration Data Objects (CDO) featureincluded with Outlookinstalled on the server, too? Not only must CDO be installed on the SharePoint Portal Server computer, but it is also recommended that Outlook be the only installed mail client.

  • Does the administrator account specified on the Exchange 5.5 tab of the Properties page of the server node have permissions on the site of the server running Exchange? Ditto for the site configuration containers, toothe administrator account must have permissions on both the site and site configuration containers. SharePoint Portal Server uses the administrator account to verify access when a user searches from the dashboard site.

  • Has the administrator account changed? If this account or password is changed in Windows NT 4.0 or Windows 2000, the account in SharePoint Portal Server Administration must also be updated immediately. Otherwise, if a user executes a search query from the dashboard site that contains one or more Exchange Server 5.5 items in the results, the entire query fails. SPS then simply logs an error in the event log.

  • Do queries continue to fail after the account in SharePoint Portal Server Administration has been changed? Ensure that the MSSearch service has been stopped and restartedthis is required for the change to take effect, and the queries to actually have a shot at completing correctly.

Proxy Server Issues

[click here]

I have a proxy server in place and it is causing me problems configuring crawling.

A1:

When using a proxy server and crawling Internet sites, issues may arise that are described in the following scenarios:

  • Does the account being used for the crawl have privileges on the proxy server? The account used to crawl Internet sites must have privileges on the proxy server, else crawling Internet sites is impossible .

    TIP

    If the default content access account is being used, try to access the URL with Internet Explorer while simply logged on as the default content access account.


  • Does crawling content on an Internet site fail? When crawling Internet sites, SharePoint Portal Server first tries to use the default content access account. If that account is not configured, SharePoint Portal Server tries to use Anonymous. In either case, crawling fails unless the site allows access.

Troubleshooting the Impact of Power Failures During Crawling

[click here]

We had a major power failure. Now I have to resume crawling and I'm having a lot of problems. What should I do?

A1:

If power to the server is interrupted during a crawl or update, the crawl or update continues after power is restored. First, though, the index is displayed as in the "initializing" state for a certain period of time (the period of time ranges from seconds to perhaps hours, depending on the size of the crawl). The crawl resumes after it finishes initializing. Meanwhile, the index is available for queries during this time.

Specifically, the following status messages may be displayed for the update once power is restored to the server:

Table 18.2. Index Status Messages

Status

Definition

Compiling

MSSearch is assembling the index.

Flushing

MSSearch is assembling the index at the end of a search/run.

Idle

No update of the index is in progress.

Indexing

An update of the index is in progress. MSSearch searches content to update the index by following links contained in documents, or by following directory trees in a file system or other hierarchical storage systems such as Lotus Notes databases, Exchange servers, and other SharePoint Portal Server computers.

Initializing

The server is loading the index.

Paused

The update is paused.

Processing notifications

MSSearch has received one or more notifications. MSSearch receives one notification per document. When processing notifications, the server extracts the properties and contents of each document and adds them to the index. Processing notifications occurs when the notifications queue is not empty and no crawls are in progress.

Propagating

MSSearch is propagating the index to the server dedicated to searching.

Retrying propagation

MSSearch is trying to propagate the index after a failed attempt.

Shutdown

The index is being deleted, or there is a critical error that is preventing access to the index.

Crawl Issues After Performing an SPS Restore

[click here]

I've restored my server from backup. Will I have to re-create scheduled content source crawls from Windows 2000 Scheduled Tasks?

A1:

Most SPS implementations leverage scheduled content source updates. However, if a server has been restored from backup, the SPS backup image does not include any scheduled content source crawls from Windows 2000 Scheduled Tasks. Thus, these must be re-created on the restored server.

In addition, any shortcuts to workspaces in My Network Places must also be restored.

For complete information on backup and restore of SPS workspaces, see "Restore Process", p. 333.

As the final step in this process, the SharePoint Portal Server restore process initiates an incremental crawl of the internal content of every workspace. By doing so, consistency between the Web Storage System and the index is guaranteed . Incremental crawls are also initiated when notifications of changes to content sources on file systems occur. This, too, guarantees consistency between the Web Storage System and the index.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net