When MSSearch crawls a document and includes it in an index, it goes through the following series of steps to apply the proper language resources to that document:
A single installation of MSSearch can crawl documents containing content in multiple languages. The resultant index is language independent and contains words in any language without differentiation.
However, to ensure that SharePoint Portal Server crawls content in multiple languages efficiently and successfully, you must consider the following topics:
These actions extend to all content included in the index. Although MSSearch can include content stored across multiple servers in the index, it does not include the default language from the server it is crawling. Instead, if the document does not carry LCID properties, MSSearch uses the language setting of the server on which MSSearch is installed. If your deployment includes a server dedicated to the purpose of creating and maintaining indexes, this behavior can dramatically affect the content that MSSearch returns during a search query.
The language specified on the server hosting the index workspace determines how MSSearch includes content in the index. In addition, SharePoint Portal Server installs noise word lists for all languages when it creates an index. You can modify these lists in order to add and remove terms, but any changes will only be in effect for subsequent crawls. For the changes take effect for all documents, you must reset the index and have MSSearch perform a full crawl.
Changes to the noise word list will not be reflected in the index until the index is rebuilt. For example, if you add the word, "Microsoft" to the noise word list, the search engine continues to return results containing "Microsoft" until SharePoint Portal Server performs a full update of the index. If you choose to remove a term from the noise word list, you must follow the same steps.
SharePoint Portal Server includes word breakers for the following languages: Japanese, Simplified Chinese, Traditional Chinese, Korean, Thai, English, Spanish, French, German, Italian, Swedish, and Dutch. If you install SharePoint Portal Server on a server with a language that is not from this list, then MSSearch uses the neutral word breaker. The neutral word breaker derives from the English word breaker. Therefore, the neutral word breaker works best when applied to documents written in western European languages.
Shared Service: MSSearch is a server-based, shared service. This means that any installations of SharePoint Portal Server and Microsoft SQL Server that are installed on the same computer use the same version of MSSearch to create indexes and to perform search queries. However, MSSearch creates individual indexes for each application. The service is shared but the data is independent. You can categorize MSSearch and its resources into shared resources and index-specific resources:
It is important that you read the documentation included with each application that uses MSSearch before installing it on the same server as SharePoint Portal Server in order to ensure that when updating MSSearch, you ensure that each application uses the correct word breaker for creating indexes and search queries.
If you install other applications that use MSSearch, you must ensure that each application uses the same word breaker for creating indexes and for search queries. If you install a newer version of a word breaker, you must reset the index and have MSSearch conduct a full crawl of your content.
SharePoint Portal Server contains the most current version of MSSearch. When you install MSSearch, setup checks the version of the word breakers and always keeps the latest version. So, if you install SQL Server 2000 on a computer running SharePoint Portal Server, MSSearch retains the word breakers from SharePoint Portal Server.