MSSearch provides a series of components that work together to provide full-text access to collections of data. For an detailed overview of MSSearch architecture and technologies, see Chapter 5, Introducing Microsoft Full-Text Search Technologies.
The process for providing full-text access to content follows:
SharePoint Portal Server applies word breakers to query terms to break them into individual words. The search engine then expands the query to include the expanded list of terms. It is important that MSSearch uses the same word breaker at both index and query time so that it applies the same linguistic rules and the query returns the most relevant results.
Stemming also occurs at query time. Stemming is a method of mapping a linguistic stem to all matching words to increase the number of relevant results. For example, in English, the stem "buy" matches "bought," "buying," and "buys."
SharePoint Portal Server provides noise word files and thesaurus files that can be customized to account for specific language differences within your organization. A noise word is a word such as "the" or "an" that is not useful for searches. A noise word is statistically unimportant for full-text search queries. A list of noise words for a particular language is stored in the noise word file for that language. SharePoint Portal Server provides noise word and thesaurus files for the following languages: Chinese-Simplified, Chinese-Traditional, Dutch, English-International, English-US, French, German, Italian, Japanese, Korean, Spanish, Swedish, and Thai. The thesaurus file is empty by default; you must populate it with selected words and concepts that are relevant to your deployment.
For more information about how to customize MSSearch, see Appendix B, For More Information.