When designing an indexing site, the first question that arises is how much storage space will be needed. The minimum disk space allocated should be at least 30 percent of the size of your corpus, and 40 percent is better. During a master merge, the Indexing Service could temporarily need up to 45 percent of the corpus size.
Depending on the filters used to index a group of documents, the actual size of the indexes might be less than the standard 30 percent. For example, if you write a filter for indexing large documents (such as large image files), you can limit indexing to the first few hundred bytes (about all you'd need to get the header information), thus reducing the amount of space needed for the index.
Because most Indexing Service operations are read requests (searching the indexes, returning the results, and then accessing the actual documents), disk striping is a good way to reduce disk-bound I/O operations. Disk striping is covered in detail in Chapter 15.
Planning for future site growth is essential. Moving documents to larger disks to overcome space limitations can cause query errors until you are able to run a complete reindex, which can take many hours. Another critical part of planning an Indexing Service site is to make sure that plenty of memory is available on the indexing machine. Table 27-1 shows the minimum memory required versus the recommended minimum amount for different quantities of documents. As usual, the more memory you have available, the better (and with the price of memory as low as it is, consider 128 MB a minimum for any type of Windows 2000 server). With large numbers of documents, a faster CPU also speeds up indexing and searches.
Table 27-1. Memory requirements by number of documents indexed
Number of Documents | Minimum Memory | Recommended Memory |
---|---|---|
Fewer than 100,000 | 64 MB | 64 MB |
100,000 to 250,000 | 64 MB | 64 MB to 128 MB |
250,000 to 500,000 | 64 MB | 128 MB to 256 MB |
500,000 or more | 128 MB | 256 MB or more |
The Indexing Service automatically combines memory-resident word lists into disk-resident temporary lists and, once a day, merges all temporary indexes into a master index. Depending on the number of temporary lists, merging can be a long process that uses much of the CPU's resources. Queries are slower during a merge, and other processes on the computer are slower still.
By default, merges are done at midnight local time. If this is unsuitable for your system, you can change when the master merge is performed. You can also initiate a merge manually when a large number of documents in a catalog are changed. This section describes how to perform these two tasks.
To change the operation's schedule from the default time, follow these steps:
Figure 27-2. Locating the setting for the master merge time.
MasterMergeTime has a valid range of values from 0 to 1439 minutes, though no error is reported if you enter a larger value. The default is 0. When the specified number of minutes after midnight has passed, the Indexing Service initiates a master merge.
If a large number of documents change in a short period, you might want to perform a merge of the temporary indexes without waiting for the scheduled master merge. To initiate a merge, follow these steps:
For easy and frequent access, ideally you should set up a Microsoft Management Console (MMC) with Indexing Service. To do so, follow these steps:
Figure 27-3. Initiating a merge of temporary indexes.
Figure 27-4. The Indexing Service in the MMC.
You can also administer the Indexing Service by launching Computer Management from the Administrative Tools menu. In the console tree, the Indexing Service is under the Services and Applications node. The illustrations and examples shown in the following sections use the Indexing Service in the MMC, but you can also perform these tasks just as well through Computer Management.