The Indexing Service provides Web-type indexing and querying to corporate intranets, Internet sites, and more conventional networks without reformatting documents. With the click of a button, end users can index and query the contents of intranet or Internet sites on Windows 2000 Server with Internet Information Services (IIS). The Indexing Service does more than just index documents, however. It provides a system for publishing information on your intranet or on the Web. Because the Indexing Service indexes both the content and properties of formatted documents, you don't need to convert existing documents to HTML to make them available to your users. Instead, documents in a variety of formats, such as Microsoft Word or Microsoft Excel, are directly available.
Even though its primary function is the indexing of Web servers, the Indexing Service is useful on any network where searches for documents are common, and it is essential on any network with frequent searches through large numbers of files.
The Indexing Service functions much as one would expect—it catalogs a set of documents, enabling dynamic full-text searches using either the search function, a query form, or Microsoft Internet Explorer. Just as an index in a book maps an important word to a page inside the book, content indexing on a computer takes a word within a document and maps it back to that document. Documents to be indexed can be specified in catalogs and can include document properties as well as the actual text in the document. Once the Indexing Service is set up, it needs no ongoing maintenance and administration is required only if you need to change a basic configuration. If you didn't include the Indexing Service in your original installation of Windows 2000, you can add it through Add/Remove Programs in Control Panel.
When administering the Indexing Service, you'll encounter a number of terms that have a special meaning when used in the Indexing Service context. Here are some of the most common ones, with their definitions:
The Indexing Service uses filters that can read certain types of documents, extract the text and properties, and send that information to the indexing engine. The filters included with Windows 2000 will index the following kinds of documents: text, HTML, Microsoft Office 95 and later, and Internet Mail and News (provided IIS is installed). The Indexing Service can use other filters made available by software vendors. The vendor that supplies the filter will also supply installation instructions.
After extracting the text and properties, the Indexing Service determines the language the document is written in and removes words that are on the language's exception list. The exception list contains prepositions, pronouns, articles, and so forth and is appropriately named Noise.xxx, where xxx represents the language. Figure 26-1 shows a portion of the Noise.eng file, which contains the exception list for American English. You can add words to or remove words from the exception list using any text editor, such as Notepad.
After words from the exception list are removed, the remaining words are stored first in a word list in memory. At least once a day, the word lists are combined to form temporary saved indexes, and later the Indexing Service consolidates the temporary indexes into a single master index. All this is done automatically, although under certain circumstances you may need to intervene by initiating a merge manually, as described later in this chapter.
Figure 26-1. A portion of the exception list for American English.