General Overview

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert  Ferguson

Table of Contents
Chapter  18.   Configuring SPS to Crawl Other Content Sources


One way that portals like SharePoint provide great value to their users is by facilitating search capabilities. Sometimes the documents being searched reside locally on the portal, and sometimes they reside external to the portal. Given that it is impractical to physically copy every potential document that may be leveraged by an organization to the "local" portal disk subsystem, the concept of crawling evolved. Crawling allows for content that is stored both locally and on remote systems to be indexed locally, such that a search for a particular word, phrase, or metadata field can be addressed by simply searching one or more indexes maintained on the local SharePoint server.

Regardless of how "wide" or "deep" a portal crawls, the data indexed is only as valuable as the taxonom y, or structure, applied to that data. That is, the less detailed the taxonomy, the less valuable the results from a search become. Real world examples abound. For example, anyone who has performed a search on one of the large Internet portals knows the frustration of getting 100,000 hits on even relatively detailed searches. The taxonomy supporting the search often fails to meet our needs, even though the crawled data and resulting indexes are quite detailed.

To read more about the role and importance of taxonomies, see "The Goal of Developing a Taxonomy," p. 248.

The importance of taxonomies is covered in greater detail in Chapter 9. Here, though, we will focus on how to configure and crawl various content sources. We will also address SPS functional considerations, performance considerations, troubleshooting, and more, from a crawling perspective. Thus, before we go further, it's important to note the following general crawl objectives:

  • To crawl specific sites to compile a full-text index of site content, and thus improve the searchability of the data

  • To tag content pages with metadata, and then index this data, to help further refine the results from a search

In the next few pages, we drill down into the process employed by Microsoft SharePoint Server to crawl.

For a basic review on crawling, see "How to Crawl a Web Server," p. 175.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net