MSSearch Overview

MSSearch provides a series of components that work together to provide full-text access to collections of data. For an detailed overview of MSSearch architecture and technologies, see Chapter 5, Introducing Microsoft Full-Text Search Technologies.

The process for providing full-text access to content follows:

  • For a content source, the Gatherer instantiates a Filter Daemon that contains protocol handlers and filters.
  • The Protocol Handlers open the content source in its native format, thus making it readable by MSSearch. MSSearch then retrieves the content and makes it accessible to the Filter.
  • The Filter takes a unit of information such as a document and emits a stream of Unicode characters, which represent both the properties of and the content contained in the document. The Filter returns this stream of characters to MSSearch, where the word breakers then manipulate it.
  • Word breakers apply language-specific linguistic rules to separate the Unicode stream emitted by the filters into individual words. These words are stored in the index.
  • After SharePoint Portal Server creates the index, users can retrieve information by issuing queries through the Search Engine.

SharePoint Portal Server applies word breakers to query terms to break them into individual words. The search engine then expands the query to include the expanded list of terms. It is important that MSSearch uses the same word breaker at both index and query time so that it applies the same linguistic rules and the query returns the most relevant results.

Stemming also occurs at query time. Stemming is a method of mapping a linguistic stem to all matching words to increase the number of relevant results. For example, in English, the stem "buy" matches "bought," "buying," and "buys."

SharePoint Portal Server provides noise word files and thesaurus files that can be customized to account for specific language differences within your organization. A noise word is a word such as "the" or "an" that is not useful for searches. A noise word is statistically unimportant for full-text search queries. A list of noise words for a particular language is stored in the noise word file for that language. SharePoint Portal Server provides noise word and thesaurus files for the following languages: Chinese-Simplified, Chinese-Traditional, Dutch, English-International, English-US, French, German, Italian, Japanese, Korean, Spanish, Swedish, and Thai. The thesaurus file is empty by default; you must populate it with selected words and concepts that are relevant to your deployment.

For more information about how to customize MSSearch, see Appendix B, For More Information.



Microsoft Sharepoint Portal Server 2001 Resource Kit
Microsoft SharePoint(TM) Portal Server 2001 Resource Kit (Examples & Explanations Series)
ISBN: 0735615624
EAN: 2147483647
Year: 2001
Pages: 231

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net