Configuring the Thesaurus and Noise Word Files


There are several ways to configure the result set that your end users will receive. The thesaurus and noise word file are the two features most commonly used to configure the result set. In this section we discuss the more common ways to configure these elements.

Configuring the Thesaurus

The thesaurus is a method to manually force the expansion or replacement of query terms as the query is executed against the index. It allows you to create expansion or replacement sets, as well as weighting and/or stemming the terms within the expansion or replacement sets.

The thesaurus is configured via an XML file, which is located by default in the drive:\Program Files\Office SharePoint Server\Data\ directory and has the format of TS<XXX>.XML, where XXX is the standard three-letter code for a specific language. For English, the file name is Tsenu.xml.

The default code for the file is as follows:

 <XML > <!--  Commented out     <thesaurus xmlns="x-schema:tsSchema.xml">     <diacritics_sensitive>0</diacritics_sensitive>         <expansion>             <sub>Internet Explorer</sub>             <sub>IE</sub>             <sub>IE5</sub>         </expansion>         <replacement>             <pat>NT5</pat>             <pat>W2K</pat>             <sub>Windows 2000</sub>         </replacement>         <expansion>             <sub>run</sub>             <sub>jog</sub>         </expansion>     </thesaurus> --> </XML> 

To create new expansion sets, perform the following steps:

  1. Open My Computer and go to the location of the thesaurus XML file.

  2. Open the XML file using Notepad or some other text editor.

  3. Enter your expansion terms within the tags using well-formed XML, as illustrated here:

          <expansion>         <sub>term1</sub>         <sub>term2</sub>         <sub>term3</sub>      </expansion> 

  4. Save the file.

  5. Restart the Mssearch.exe service.

To create new replacement sets, perform the following steps:

  1. Open My Computer and go to the location of the thesaurus XML file.

  2. Open the XML file using Notepad or some other text editor.

  3. Enter your replacement terms within the tags using well-formed XML. Note that the terms being replaced are in the <sub> extensions, and the term to replace them is in the <pat> extension. This is illustrated here:

          <replacement>         <sub>term1</sub>         <sub>term2</sub>         <pat>term3</pat>      </replacement> 

  4. Be sure to save the thesaurus files.

Configuring the Noise Word File

The noise word file is a text file that contains all of the words that you don't want to appear in the index. When a word is placed in the noise word file, the indexer removes that word during the indexing process so that the word itself doesn't appear in the index. Words that you would place in this file are those that have little or no discriminatory value in a search query in your environment. Such words often include the following:

  • Pronouns

  • Adverbs

  • Adjectives

  • Conjunctions

  • Prepositions

  • Articles

  • Single letters

  • Single numbers

  • Your organization's name

The noise word files are located in the same directory as the thesaurus. All you need to do is open the file using Notepad and enter the words that you do not want to appear in the index. To configure the noise word file, perform the following steps:

  1. Go to the noise word file and open it using Notepad or some other text editor.

  2. Enter the words you do not want to appear in the index.

  3. Save the file.

  4. Run full index builds to effect your changes in the index.



Microsoft SharePoint Products and Technologies Administrator's Pocket Consultant
Microsoft SharePoint Products and Technologies Administrators Pocket Consultant
ISBN: 0735623821
EAN: 2147483647
Year: 2004
Pages: 110
Authors: Ben Curry

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net