Optimizing Search Results

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 19.  Managing Indexing


SharePoint Portal Server ranks text search results based on a very advanced algorithm, also known as probabilistic ranking. The result is dependent on the frequency of words not only within a particular document, but also in the overall corpus . For this reason, it is possible to filter frequently occurring words, so-called noise words. Another feature is the thesaurus, which gives you the ability to substitute or expand words so that more matching documents are found.

CAUTION

Noise words and the thesaurus are applied for all workspaces on a single server. If the index gets propagated to another system, you should apply the same settings not only on the target server, but also on all the other servers that propagate their indices to that same target server. Otherwise you may notice inconsistencies in ranking.


Customizing Noise Words

Noise words, words that are very common, are filtered out of the query and index as they will match with many documents. The list of words clearly is language dependent, but certainly also subject dependent. On a Microsoft Technologies portal, for example, the word "Windows" is likely present in almost any document. Therefore it is possible to edit the list of noise words to either add or remove words. Initially, the thesaurus is empty, and that is quite understandable, as the substitution or expansion of words is highly dependent on the subject. On our technologies portal, MS would get expanded to Microsoft, whereas on a medical portal it probably would refer to the chronic disease multiple sclerosis.

To modify the list of noise words do the following:

  1. Log in as Administrator on the server running SharePoint Portal Server.

  2. Go to the directory where the SharePoint Portal Server Property Store is located. By default this is in C:\Program Files\SharePoint Portal Server\Data\FTData\SharePointPortalServer\.

  3. Go to the Config subdirectory.

  4. Select the appropriate noise text file depending on your language. All files begin with "noise" and have a three-letter language postfix. For any language that is not supported out-of-the-box by SharePoint Portal Server, the neutral noise word file noiseneu.txt will be used.

  5. Make a backup copy of your selected noise file before you make any modifications.

  6. Open the fileby default Notepad will be opened as the noise word files have the extension .txt.

  7. Make your changes, keeping each word on a single line.

  8. When you are done you need to re-index each workspace on the server. To do so, open Programs, Administrative Tools, SharePoint Portal Server Administration.

  9. Select each workspace node and click Action, All Tasks, Start Full Update.

NOTE

When you start a full index, all users that have subscribed to any changes will get notified. Even though nothing in the real content changed, the notifications will be generated, as in the case of a full index. All index information will be removed to ensure that only properly linked information is stored. By changing some settings in your Windows registry, you can disable notifications while a full index is taking place. See later in this chapter for more information.


Customizing the Thesaurus

The thesaurus is another language-dependent feature available in SharePoint Portal Server. The thesaurus allows the substitution or expansion of words. This feature allows you, for example, to substitute the term "IE" with the term "Internet Explorer".

To modify the list of noise words, do the following:

  1. Log in as Administrator on the server running SharePoint Portal Server.

  2. Go to the directory where the SharePoint Portal Server Property Store is located. By default this is in C:\Program Files\SharePoint Portal Server\Data\FTData\SharePointPortalServer\.

  3. Go to the Config subdirectory.

  4. Select the appropriate thesaurus XML file depending on your language. All files begin with "ts" and have a three-letter language postfix. For any language that is not supported out-of-the-box by SharePoint Portal Server, the neutral thesaurus file tsneu.xml will be used.

  5. Open the file with your favorite XML editing tool, for example Notepad (see Figure 19.2).

    Figure 19.2. The figure shows an opened example thesaurus file.

    graphics/19fig02.jpg

  6. Make your changes. You will see a commented example (XML comments begin with <!-- and end with -->) of the expected XML syntax in the opened XML file.

  7. Verify that you have written valid XML. Open the thesaurus file in Internet Explorer. An invalidly encoded XML file will cause an error message in the Windows Application log once the thesaurus is loaded with the first query request.

  8. Restart the Microsoft Search Service.

Figure 19.2 shows the US English thesaurus file (tsenu.xml), where the terms "SPS" and "SharePoint Portal Server" can be used synonymously. Some people also refer to SharePoint Portal Server as "SPPS", a usage that is discouraged by Microsoft. Documents should not contain SPPS; if they do, then they will not be found unless the user encloses them in double quotes.

NOTE

Thesaurus (and noise word entries) are case-sensitive and accent-sensitive, whereas words in the index are not stored with case or accent variations. For example, if you add the word SharePoint to the thesaurus, and someone searches for Sharepoint (lowercase p!), the thesaurus will not be applied. To get the expected results, add thesaurus (and noise word) list entries for all common case variations of a word.


TIP

To know which thesaurus files are used, you can check the Windows Application Event log. To find the log entry, restart the Microsoft Search service first. Issue a search request from a browser where the language is set to your choice. Look for informational entries from the MssCi source with the id 4155.


Maximum Number of Search Results

SharePoint Portal Server returns at most the 200 most relevant documents that match a user's query out-of-the-box. To improve the query performance, you may wish to limit the results, while in some other deployment scenarios you might need to return even more. The maximum number is maintained in the registry.

If you feel comfortable with registry updates, you may modify the value. Check the key HKLM\Software\Microsoft\Search\1.0\Applications\SharePoint Portal Server\Catalogs\<workspace name >, where you will find a DWORD value MaxResultRows. The value is set by default to 0xc8, which equals 200.

NOTE

Generally this kind of update is only of interest if you write your own solution. If you want to retrieve more than 200 results in the portal, you need to increase some global variables used by the out-of-the-box Web Parts.


Adjusting the Query Time-Out

For each workspace, you can edit the time that a query will take. Such a limit releases server processes consumed by unsuccessful queries, instead of waiting and keeping the server busy.

To adjust these settings, do the following:

  1. Log in as Workspace Coordinator.

  2. Open the Management Web Folder of your workspace.

  3. Select Workspace Settings.

  4. Select the Index tab.

  5. Specify the query time-out in milliseconds. The default value is 20,000 milliseconds .


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net