Indexing


The information store process creates and manages indexes for common key fields to enable faster lookups and searches of documents that reside in a store. An index allows Outlook users to search for items more easily. With full-text indexing, the index is built prior to the client search, thus enabling faster searches. Text attachments can be included in the full-text indexing. Each information store can be indexed individually for flexibility.

Property promotion allows for advanced searches on any document property, such as Author, Lines, or Document Subject (Figure 2-11). When Exchange stores a document in a supported file type, the document’s properties are automatically parsed and promoted to the information store. Hence, the properties become a part of the document’s record in the database. Searches can then be performed on these properties.

click to expand
Figure 2-11: Advanced search for document properties that have been promoted to the information store.

The index is word-based, not character-based. This characteristic means that if a user performs a search for the word “admin,” only those documents that have the word “admin” will be returned. The word “administrator” will not be identified as a match. Both the message and attachments can be indexed. Binary attachments and document properties are not indexed. Not all file types are indexed either; the following documents are the only types that are indexed by default:

  • Word documents (*.doc)

  • Excel documents (*.xls)

  • PowerPoint documents (*.ppt)

  • HTML documents (*.html, *.htm, *.asp)

  • Text files (*.txt)

  • Embedded MIME messages (*.eml)

Indexing is provided by the Microsoft Search service. Both the information store service and the search service must be running for the index to be created, updated, or deleted. Depending on the size of your store, completing a full index could take hours. Therefore, it’s best to have this activity occur at a time when your server will be underutilized. Remember that indexing consumes about 20 percent of the disk space of your database. Also, individual indexes cannot be backed up; they must be backed up at the server level. Finally, even though multiple instances of a message might be held in the database, the message is indexed only once. This single-instance message indexing results in smaller indexes that can be created more quickly.

The Indexing Process

Microsoft Search builds the initial index by processing the entire store one folder at a time. The Search process identifies and logs searchable text. During the indexing process, you will see heavy CPU utilization; depending on the size of the store, this process could take hours.

After the index is created, any change to a folder within the store causes a synchronization event to notify the Microsoft Search service of the change. Depending on how you have configured the Search service to run, the service will either wait for the scheduled time to regenerate the index so that the new change is included or update the index shortly after the change is made.

Updating the Index

The time delay for an immediate (automatic) update of the index will vary based on the current server load. You can optimize this setting on the Full-Text Indexing tab of the Information Store’s property sheet (Figure 2-12).

click to expand
Figure 2-12: The Full-Text Indexing tab of the property sheet for a public folder store.

Scheduled updates allow granular control over when the index is updated. The advantage of scheduling the index update is that it can be planned for off-peak hours when the server is not heavily accessed by users. The disadvantage is that the index can become out-of-date over the course of a day. However, this might not be a big problem, since most users search for documents that were received and indexed more than 24 hours before the search. Try to schedule your updates to occur at least once each day.

Search Architecture

If you want to implement full-text searching widely in your organization, you’ll need to consider which messaging clients to deploy. Only online MAPI and IMAP4 clients are able to perform full-text searches on the server. POP3 and WebDAV clients do not have search capability.

Exchange Server 2003 can perform two types of searches. The first is a full-text query of the index that has been built using Microsoft Search service. The second type is a query based on the properties of the documents that are not available in the full-text index.

When a user performs a search in the Outlook client by choosing Advanced Find from the Tools menu, several options are available (Figure 2-13). Once the user enters the desired variables, the query is sent to the Query Processor, which determines how the search should be conducted. If the search is based on both a string of text characters and a desired property variable, the Query Processor splits the query request into two parts. For instance, suppose that the request is for all documents that are larger than 5 MB and that have the phrase “building plan” in the subject line. The Query Processor splits this request and has the Microsoft Search service generate a list of documents that have “building plan” in the subject line. It then evaluates the size of each document that the Search service returned to find all those larger than 5 MB and generates a new list of documents that meet both criteria.

Finally, Exchange Server 2003 applies security restrictions to the remaining documents to ensure that the client does not receive a document that the client is not supposed to see. After this security enforcement, the matching results are returned to the client.

click to expand
Figure 2-13: The Advanced Find dialog box in Outlook 2003.

Gather Files

Gather files are created during each index process. They are located by default in the ExchangeServer_<servername>\Gatherlogs directory and end with the extension .GTHR. You can use these text files to identify every document and message that was not successfully indexed. For example, if a document is named with an extension indicating a supported file type, but it is not actually that type, the indexing component halts, fails the index, records the URL of the document in the gather file (Figure 2-14), and then continues with the next message or document.

click to expand
Figure 2-14: A URL in a gather file.

In addition to the URL, the subject or filename and the error code are also recorded in the gather file. To decode the error number, use the Gthrlog.vbs utility in the \Program Files\Common Files\System\MsSearch\Bin directory. The syntax for this utility is as follows:

Gthrlog <filename>

The name of the gather file is <filename>. You will be prompted with a series of dialog boxes that contain the data found in each line of the gather file, as shown in Figure 2-15.

click to expand
Figure 2-15: A dialog box containing a line from the gather file.

Moving the Index When It Gets Too Big

If your catalogs become so large that you’re running out of disk space, you might want to move the index to another server. To do this, stop the Search service and use the Catutil.exe utility located in the Program Files\Common Files\System\MSSearch\Bin directory. For help in using this utility, type Catutil Movecat /? at the command prompt.




Microsoft Exchange Server 2003 Administrator's Companion
Microsoft Exchange Server 2003 Administrators Companion (Pro-Administrators Companion)
ISBN: 0735619794
EAN: 2147483647
Year: 2005
Pages: 254

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net