Indexing

[Previous] [Next]

The information store process creates and manages indexes for common key fields for faster lookups and searches of documents that reside in a store. An index allows Outlook users to search for documents more easily. With full-text indexing, the index is built prior to the client search, thus enabling faster searches. Text attachments can be included in the full-text indexing. As Figure 2-14 shows, each information store can be indexed individually for flexibility.

Property promotion allows for advanced searches on any document property, such as Author, Lines, or Document Subject (Figure 2-15). When Exchange stores a document in a supported file type, the document's properties are automatically parsed and promoted to the information store. Hence, the properties become a part of the document's record in the database. Searches can then be performed on these properties.

Figure 2-14. Scheduling indexing for a mailbox store.

click to view at full size.

Figure 2-15. Advanced search for document properties that have been promoted to the information store.

This feature offers outstanding flexibility. You can build your indexes around those attributes that are most important in your document management structure and expose those attributes for fast searches to your clients.

The index is word-based, not character-based. This characteristic means that if a user performs a search for the word "admin," only those documents that have the word "admin" will be returned. The word "administrator" will not be identified as a match. Both the message and attachments can be indexed. Binary attachments and document properties are not indexed. Not all file types are indexed either; the following documents are the only types that are indexed:

  • Word documents (*.doc)
  • Excel documents (*.xls)
  • PowerPoint documents (*.ppt)
  • HTML documents (*.html, *.htm, *.asp)
  • Text files (*.txt)
  • Embedded MIME messages (*.eml)

Indexing is provided by the Microsoft Search service. Both the information store service and the search service must be running for the index to be created, updated, or deleted. Depending on the size of your store, completing a full index could take hours. Therefore, it's best to have this activity occur at a time when your server will be underutilized. Remember that indexing consumes about 20 percent of the disk space of your database. Also, individual indexes cannot be backed up; they must be backed up at the server level. Finally, even though multiple instances of a message might be held in the database, the message is indexed only once. This single-instance message indexing results in smaller indexes that can be created more quickly.

The Indexing Process

Microsoft Search builds the initial index by processing the entire store one folder at a time. The Search process identifies and logs searchable text. During the indexing process, you will see heavy CPU utilization; depending on the size of the store, this process could take hours.

After the index is created, any change to a folder within the store causes a synchronization event to notify the Microsoft Search service of the change. Depending on how you have configured the Search service to run, either it will wait for the scheduled time to regenerate the index so that the new change is included or it will update the index shortly after the change is made.

Updating the Index

The time delay for an immediate (automatic) update of the index will vary based on the current server load. You can optimize this setting on the Full-Text Indexing tab of the Information Store's property sheet (Figure 2-16).

Figure 2-16. The Full-Text Indexing tab of the property sheet for a public folder store.

Scheduled updates allow granular control over when the index is updated. The advantage of scheduling the index update is that it can be planned for off-peak hours when the server is not heavily accessed by users. The disadvantage is that the index can become out-of-date over the course of a day. However, this may not be a big problem, since most users search for documents that were received and indexed more than 24 hours before the search. Try to schedule your updates to occur at least once each day.

Automatic updates will keep the index up-to-date. Changes to documents are queued for a short period of time, and then the index is updated. All changes that are made during the wait period are incorporated into the index as a batch job. The disadvantage of automatic updates is that you cannot control when server resources are used to perform the index update. If your store is becoming increasingly busy—meaning that an increasing number of documents are being posted, deleted, or changed—the server will expend more resources to keep the index up-to-date. It's best to configure automatic indexing on stores in which documents change infrequently, or on servers that are not heavily used for purposes other than document management and indexing.

Search Architecture

If you want to implement full-text searching widely in your organization, you'll need to consider which messaging clients to deploy. Only online MAPI and IMAP4 clients are able to perform full-text searches on the server. POP3 and WebDAV clients do not have search capability.

Exchange 2000 Server can perform two types of searches. The first is a full-text query of the index that has been built using Microsoft Search service. The second type is a query based on the properties of the documents that are not available in the full-text index.

When a user performs a search in the Outlook client by choosing Advanced Find from the Tools menu, several options are available (Figure 2-17). Once the user enters the desired variables, the query is sent to the Query Processor, which determines how the search should be conducted. If the search is based on both a string of text characters and a desired property variable, the Query Processor splits the query request into two parts. For instance, suppose that the request is for all documents that are larger than 5 MB and that have the phrase "building plan" in the subject line. The Query Processor splits this request and has the Microsoft Search service generate a list of documents that have "building plan" in the subject line. It then evaluates the size of each document that the Search service returned to find all those larger than 5 MB and generates a new list of documents that meet both criteria.

click to view at full size.

Figure 2-17. The Advanced Find dialog box in Outlook 2000.

Finally, Exchange 2000 Server applies security restrictions to the remaining documents to ensure that the client does not receive a document that the client is not supposed to see. After this security enforcement, the matching results are returned to the client.

Gather Files

If a file is attached to a message or is placed in a public folder that cannot be indexed, it will be recorded in a gather file. Gather files are created during each index process; they are located by default in the \Exchsrvr\ExchangeServer\Gatherlogs directory and end with the extension .GTHR. You can use these text files to identify every document and message that was not successfully indexed. For example, if a document is named with an extension indicating a supported file type, but it is not actually that type, the indexing component halts, fails the index, records the URL of the document in the gather file (Figure 2-18), and then continues with the next message or document.

click to view at full size.

Figure 2-18. A URL in a gather file.

In addition to the URL, the subject or filename and the error code are also recorded in the gather file. To decode the error number, use the Gthrlog.vbs utility in the \Program Files\Common Files\System\MsSearch\Bin directory. The syntax for this utility is

 Gthrlog <filename> 

where <filename> is the name of the gather file. You will be prompted with a series of dialog boxes that contain the data found in each line of the gather file, as shown in Figure 2-19.

click to view at full size.

Figure 2-19. A dialog box containing a line from the gather file.

Moving the Index When It Gets Too Big

If your catalogs become so large that you're running out of disk space, you may want to move the index to another server. To do this, stop the Search service and use the Catutil.exe utility located in the Program Files\Common Files\System\MSSearch\Bin directory. For help in using this utility, type Catutil Movecat /? at the command prompt.



Microsoft Exchange 2000 Server Adminstrator's Companion
Microsoft Exchange 2000 Server Adminstrator's Companion
ISBN: N/A
EAN: N/A
Year: 1999
Pages: 193

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net