Section 13.1. The Indexing Service


13.1. The Indexing Service

Windows Server 2003 includes version 3.0 of the Indexing Service that catalogs files stored on network drives, corporate intranets, and Internet sites, and provides a web-based query form for easy search and retrieval of those cataloged resources. The service is part of Internet Information Services (see Chapter 8 for a complete and detailed walkthrough of IIS).

Part of the power behind the Indexing Service is its ability to catalog documents without needing them reformatted to a special, proprietary format. The service understands most Microsoft Office file formats, including Word and Excel documents. This makes the service very useful, even beyond its basic premise of indexing plain web sites.

The Indexing Service works by identifying unique words within a document and establishing its location with that document, and then reporting that information back to a central databasethe "index," as it were. You, as the administrator, can specify certain documents to either be indexed or be excluded from indexing, and you also can include the properties of a documentconsisting typically of title, author, date of creation, date of last edit, and similar bits of informationin the catalog to expand the criteria on which your users can search.

Of course, in some instances you might not want the Indexing Service installed. For example, on regular client workstations with no special needs, there would be no reason to have this service installed, only to occupy resources needlessly and present an additional security risk (that's not to say that the service is insecure, but that you should reduce the surface of attack for a machine as much as possible). On fileservers, however, the Indexing Service adds value and provides a service for your user community.

You can install the Indexing Service through the Add/Remove Programs applet within the Control Panel. Click Add/Remove Windows Components, and then check the box next to Indexing Service and click Next. That's all it takes to install ita very easy process.

To confuse you further: by default, the Indexing Service is already installed, but it's just not started. If you open the Services console from the Administrative Tools menu and select Indexing Service, set its startup type to Automatic, and click Start, the service starts and functions properly even though it still doesn't show up as installed in Add/Remove Programs.

However, the only way to fully uninstall the service is to simply uncheck the box within Add/Remove Programs and click Next.

13.1.1. How the Indexing Service Works

The Indexing Service uses filters to extract information from documents. The CiDaemon process, which is initiated by the Indexing Service, runs in the background and filters documents for later indexing. It filters DLLs that actually extract information from a documentbe it words within a document or the properties of a documentfrom specific types of files, such as Word documents or HTML pages. The Indexing Service comes with a standard set of filters that can index text, HTML, Microsoft Office documents created in versions 95, 97, 2000, XP, and 2003, and Internet Mail and News posts. Filters are extensible and can be created by third-party vendors for their specific datatypes.

After using filters to extract data, the service compares the filtered data against an exception list, which mainly contains a list of commonly used prepositions, pronouns, articles, and other nonessential words. The exception list is called NOISE.XXX, where the XXX represents the language of the document being indexed. After the filtered data has had words that matched entries on the exception list removed, the remaining data is moved to word lists, which are small, temporary, and nonpersistent stores of index information that serve as holding bins. About once a day, a process called a shadow merge takes place to aggregate the information within shadow indexes and remove data from the "holding bin," to both free up memory occupied by nonpersistent word lists and make filtered data persistent by saving it on a disk. Shadow indexes are created when word lists and other shadow indexes are combined into a single index.

At a separate time, the Indexing Service initiates master merges, which take place when individual shadow indexes are aggregated and infused into a current master index to create a single master index. The master index is a permanent index of a larger collection of documents. In a truer sense, the master index is the only index, containing pointers to resources within the corpus (a technical term for the body of work that is being indexed), much like the index of this book points you to certain words and phrases at specific points within the body. Picture a set of indexes, each for a certain chapter of this book. One could take these individual indexes (the "shadow indexes") and combine them into a master index, which would be placed at the back of this bookthis is the process of master merging. These indexes are stored in the catalog, a specific folder that contains all indexes, either temporary word lists or more permanent shadow and master indexes.

Here are some additional terms you might run across while administering the Indexing Service:

  • A query is simply a certain request to the Indexing Service to retrieve files or data that match certain criteria.

  • Saved indexes are just highly compressed indexes that are stored on disk media and not simply placed in memory; thus, saved indexes persist after reboots. Saved indexes come in two flavorsshadow indexes and master indexes.

  • A scan takes place when the Indexing Service wants to determine what files in a particular location have changed or otherwise been modified.

  • The scope of an index is simply the range of documents and files to be searched when satisfying a query.

  • A virtual root is an alias to a certain directory on a disk. The Indexing Service can index any location defined as a virtual root.

13.1.2. Performance Considerations

Obviously, the single largest requirement of any indexing service is its disk spacethe service will need room to store its indexing files. Microsoft recommends that you allocate about 35% of the size of your corpus for the indexing serviceI would allocate about 45%, simply to provide your service with room to grow. As more electronic information hits your disks you'll want to have ample space to index that data optimally. Master merges typically require large amounts of disk space on a temporary basis, as much as 50% of the corpus size.

Memory is also an important consideration. Table 13-1 shows the Microsoft minimum memory amounts and my recommended memory amounts for certain corpus sizes. Keep in mind that these recommendations are in excess of the current amount of memory in a machine for Windows Server 2003's general useadd the amount of memory you have plus the appropriate recommended amount from the table to obtain the correct total amount of memory for your machine.

Table 13-1. Corpus sizes and memory requirements

Corpus size

Minimum memory size

Recommended memory size

100,000 or fewer documents

64MB

64MB

Between 100,001 and 250,000 documents

64MB

192MB

Between 250,001 and 500,000 documents

64MB

256MB

More than 500,000 documents

128MB

512MB, or more if corpus size is considerably larger than 500,000 documents


Perhaps the greatest demand on your machine's CPU from the Indexing Service comes from master merges, which are very intensive and require large amounts of CPU time. Because of this, the Indexing Service schedules master merges automatically for midnight local time. However, if there is a better time when your machine's CPU load is low, you can change the time at which master merges will begin by doing some Registry editing. The MasterMergeTime value, located in HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\ContentIndex, allows you to specify the number of minutes after midnight local time that the master merge should commence. For this value, you can enter any number between 0 and 1,439.

13.1.3. Common Administrative Tasks

In this section, I'll go through some common administrative tasks you will encounter with the Indexing Service. When performing most of these tasks, you'll find it easier to create a custom view within the MMC to access the Indexing Service controls because a clean default view of these options is not built into Windows Server 2003. To create a custom view, follow these steps:

  1. From the Start menu, select the Run option.

  2. Type mmc in the Run box and press Enter.

  3. The MMC starts with an empty console, as shown in Figure 13-1.

    Figure 13-1. Adding a node to a blank MMC window


  4. From the File menu, choose Add/Remove Snap-in, and then click Add.

  5. The Add Standalone Snap-in box appears. Select Indexing Service and click Add, as shown in Figure 13-2.

    Figure 13-2. Adding the Indexing Service to an MMC window


  6. Select Local Computer when prompted, unless you're installing this set of tools to administer the Indexing Service on a remote machine; in that case, enter the name of the remote computer.

  7. Click Close, and then click OK, and you'll be returned to the console with the Indexing Service node now added to the left pane.

Now you're ready to manage the service, as described in the next section.

13.1.3.1 Administering a catalog

As you read previously in this chapter, a catalog is a specific folder that contains all indexes; both temporary word lists and more permanent shadow and master indexes. When you first add the Indexing Service to a computer, the service creates a default index, named System, that includes all directories on all local drives attached to the system, and another default index, named Web, for any IIS-based web sites that might be running on that particular machine.

For security reasons, I recommend deleting both of these default catalogs. They are too encompassing, particular for web servers. It's best to create your own catalogs that index only certain data on your disk and not every file it can find. However, you might have a completely sanitized system and find that the defaults work well for youif this is the case, by all means go for it. But for most people, I recommend deleting the default catalogs and enabling more specific, focused, and restrictive catalogs.

13.1.3.1.1 Creating a catalog

To create a custom catalog, use the custom MMC that you created with the Indexing Service snap-in and highlight the Indexing Service node in the left pane. Then, select the Action menu and choose Catalog from the New menu. The Add Catalog screen displays, as shown in Figure 13-3.

Figure 13-3. Creating a custom catalog


Enter a name for the new catalog in the Name box and then enter the path to the folder that will house the contents of the catalog in the Location box. You can use the Browse button to graphically navigate your directory structure. Click OK when you've entered this information. Keep in mind that if you're managing the service on a remote computer, that remote computer must have the default administrative shares (i.e., C$, D$, and the others, as discussed in Chapter 3) intact; otherwise, the operation will fail.

Avoid putting the catalog in the directory that you're cataloging. For example, if you're trying to catalog D:\DOCS\WINSERVERBK, do not put the catalog you create for that directory in D:\DOCS\WINSERVERBK. You'll create a near-perpetual loop because the catalog is always changing, and the service will attempt to index the changing catalog and recatalog the catalog, and so on.

Also, avoid putting a catalog in the WWWROOT directory where IIS web sites live. It's fine to catalog the web sites; just don't put the actual catalog there.


Before the new catalog will become active, you must restart the Indexing Service. To quickly restart the service, right-click the Indexing Service node within the Console window, and select Stop. Once the service has stopped, right-click in the same place again, and choose Start.

13.1.3.1.2 Configuring a catalog

After catalogs are added, they need to be configured to act as you want. Within the Indexing Service console, right-click the catalog to be configured and select Properties. The screen shown in Figure 13-4 appears.

Figure 13-4. Adjusting the properties of an individual catalog


A discussion of the features available on each tab follows.


General

On the General tab is information about the catalog that cannot be modified, including the name and location of the catalog, the number of documents in its corpus, and the size of the property cache.


Tracking

Figure 13-5 shows the Tracking tab. On the Tracking tab, you can elect to automatically add and remove aliases for shared network drives and whether to inherit the setting for that option from the overall Indexing Service configuration. Simply put, this means that if the service indexes data on a mapped drive, it will remember both the mapped drive and the full UNC path to the data. If you want to turn it offto keep data private or to control the indexing of network drives and the traffic resulting from that processyou'll need to disable the inheritance feature with the second checkbox and then disable the automatic alias function by unchecking the first box. A lot of administrators turn this off, however, to control and limit that resulting traffic.

Figure 13-5. The Tracking tab


If you have IIS installed on the machine that is running the Indexing Service, you can select which web site to index from the drop-down list labeled WWW Server, and you can do the same for any news (NNTP) server running on the machine as well. If IIS is not installed on the machine, the option is grayed out and unavailable.


Generation

Figure 13-6 shows the Generation tab. On this tab, you can elect to index files that have extensions that aren't covered by the filters currently installed within the service. You also can specify whether the Indexing Service should generate abstracts for files returned from a query and present them on the results page. These two options are, by default, inherited from the overall Indexing Service configuration; to turn them off, uncheck the Inherit above settings from Service checkbox and then adjust the settings individually. Generally, administrators turn off the Index files with unknown extensions feature because of a marked increase in processor usageit's just not worth it for a lot of situations.

Figure 13-6. The Generation tab


You also can adjust the size of abstracts returned to the results page; because more time is needed for a query to be returned as abstracts increase in size, it's best to leave this at the default 320 characters unless you have a specific business need to change it.

Aside from the abstract generation setting, all the options on these tabs require a restart of the catalog to be recognized. To restart the catalog, simply right-click the appropriate catalog within the Indexing Service snap-in and select Stop and then Start. If you change the abstract generation setting, you need to stop and restart the overall Indexing Service itself; see the previous instructions for a procedure to do that.

13.1.3.1.3 Selecting a directory and location

Upon adding a new catalog and configuring its properties, you also need to define the directories to be included or excluded from its indexing activities. Specifying included directories includes any subdirectories of that particular directory. You can choose to exclude individual directories within an included parent directory, but you cannot include individual directories within an excluded parent directorythe directory will appear to be included, but it will not be indexed.

How does security play into the indexing process? The Indexing Service is completely compatible with any NTFS permissions you apply to files and folders; if a user's current security privileges won't allow him to see a file that is stored on a local NTFS volume, the Indexing Service won't return that file within the results of a query. If a catalog is configured to index a remote UNC share, it will show the protected files in the results of a search, but the user won't be able to access them. Additionally, encrypted files are not indexed at all. If a file included in a catalog is encrypted after it is indexed, it will be removed from the index.

You can block the service from indexing a particular file or folder by adjusting that object's attributes. Right-click the appropriate file or folder, choose Properties, and then click the Advanced button on the General tab. This opens the Advanced Attributes dialog box, as shown in Figure 13-7.

Figure 13-7. The Advanced Attributes dialog box


Under Archive and Index attributes, uncheck the second option, and the folder won't be indexed by the service.

Also, note that the operating system that hosts drives being indexed also affects the operation of the Indexing Service in the following ways:

  • FAT volumes hosted remotely on machines running operating systems other than Windows NT, 2000, or Server 2003 will need to be rescanned periodically to detect modified files.

  • Volumes on any filesystem hosted on Novell NetWare servers or Unix systems can be indexed, but there is no permission validation on files stored on those servers.

  • Novell NetWare volumes will need to be rescanned periodically to detect modified files.

To include or exclude directories from a catalog's indexing processes, follow these steps:

  1. Open the Indexing Service console.

  2. Right-click the appropriate catalog in the right pane, and then from the New menu, select Directory.

  3. The Add Directory dialog box appears, as shown in Figure 13-8. In the Path box, enter the location of the directory you're either including or excluding. This can be either a local path or a network (UNC) path.

  4. If you are specifying a path to a remote computer, supply a valid username and password in the Account Information section.

  5. Finally, select whether to include or exclude this particular directory in the index in the Include in Index? section to the right of the Account Information box.

    Figure 13-8. Specifying included and excluded directories


13.1.3.1.4 The property cache

The property cache is where the Indexing Service stores file property information for all documents and pages within each catalog. The cache is a dual-level cache, with the primary level containing property information accessed fairly regularly, and the secondary level holding information not accessed very often.

Table 13-2 shows the property values stored in the cache by default and their respective levels.

Table 13-2. Default property cache values

Descriptive identifier

Value

Function

Resident cache level

(none)

0x5

Unique identifier assigned to all NTFS volumes

Primary

(none)

0x6

Work ID of current directory's parent

Primary

(none)

0x7

Secondary storage ID

Primary

File Index

0x8

Unique ID of a document housed on an NTFS volume

Primary

Attrib

0xd

Attributes of a document

Primary

DocTitle

0x2

Document's title

Secondary

Path

0xb

Path of a document

Secondary

Size

0xc

Size of a document

Secondary

Write

0xe

Date and time the document was last modified

Secondary


You might find that you would like to track and include other properties within the index. For instance, your users might often search on the date a document was created, a property that is not tracked by default. You definitely can add properties to either level of the property cache and track them, but adding values to either level degrades the performance of the service overallthis effect is even more pronounced if you add a value to the primary level. Also, adding properties of variable length dramatically increases the size of the cache, something to be aware of if disk space isn't inexpensive to you. Also, after you've restarted the Indexing Service, the levels to which you assigned any new properties are finalized and cannot be changed.

You can see all the available properties to track by opening the Indexing Service console and, in the left pane, clicking the appropriate catalog. In the right pane, all the available properties will be listed.

To add a property to be saved in the property cache, follow these steps:

  1. Open the Indexing Service console.

  2. In the left pane, find the appropriate catalog, and then select the Properties folder underneath the node.

  3. In the right pane, click the property you would like to add to the property cache.

  4. Select Properties from the Action menu.

  5. The property's Properties screen will appear. To include the property in the cache, check the Cached checkbox and then check the appropriate storage level in the drop-down checkbox. This is shown in Figure 13-9.

    Figure 13-9. Adding a property to the property cache


  6. Click OK when you're finished.

The property has been enabled for inclusion in the property cache. You will need to restart the Indexing Service for these changes to take effect. Also, only new documents added to the index will have these properties tracked and added to the cache; to include these specific properties of documents already in the index, you'll need to perform a full scan of the index (see the next section for details on that process).

To remove a property from being tracked, simply repeat the preceding process for the appropriate property, and on the Properties sheet, remove the check mark in the Cached checkbox. Then, restart the Indexing Service and again perform a full scan of the index to remove all traces of the property from the property cache.

13.1.3.1.5 Initiating scans

Full scans involve making a complete list of all documents contained in a catalog. When the Indexing Service is first installed, it of course conducts a full scan, but these types of scans also are conducted when directories are added to a catalog and as part of the error recovery process. On the other hand, incremental scanswhich only look for changed documents within a catalogare done automatically upon a restart of the Indexing Service to determine what documents have changed while it was inactive.

If you have a heavy load on your server from a large amount of modified files, you might want to manually initiate either a full or an incremental scan. Here are the steps:

  1. Open the Indexing Service console.

  2. In the left pane, select the appropriate catalog.

  3. In the right pane, double-click Directories.

  4. Select the directory for which you want to initiate a scan.

  5. From the Action menu, select All Tasks and then either Rescan (Full) or Rescan (Incremental), depending on which operation you want.

  6. Confirm your choice by clicking OK.

The scan will proceed.

13.1.3.1.6 Indexing new web sites

When you create a new web site with IIS, it isn't indexed automatically when you create a catalog for it. If you want the contents of the web site to be indexed, follow these steps:

  1. Open the Indexing Service console.

  2. Select the relevant catalog and right-click it in the left or right pane. Choose Properties.

  3. Navigate to the Tracking tab.

  4. In the drop-down box at the bottom of the screen, select the web site to index, and then click OK.

  5. Open the IIS Manager console (see Chapter 8 for detailed instructions on administering IIS).

  6. Right-click the relevant web site in the left pane, and then select Properties from the context menu.

  7. Navigate to the Home Directory tab.

  8. Check the Index This Resource box, and then click OK.

  9. Restart the Indexing Service.

The new catalog is active and will begin indexing the site you specified. I'll cover how to query this new catalog later in this chapter.

13.1.3.1.7 Indexing PDF files

Although the Indexing Service and Windows Server 2003 do not come bundled with a filter that can index the contents and properties of PDF files, Adobethe manufacturer of Acrobathas made available a free filter that you can install that will enable that functionality. You can find this filter at http://www.adobe.com/support/salesdocs/1043a.htm, and you will need to have a login and password for the Adobe web site (both of which are free) to download it.

Adobe doesn't officially certify this plug-in for Windows Server 2003it guarantees it will work only on Windows NT and Windows 2000but it definitely works in all of my test systems and I have no reports that it doesn't work on other administrators' systems, too.


To install the PDF filter, follow these steps:

  1. Open the Indexing Service console.

  2. Stop the Indexing Service.

  3. Double-click the ifilter50.exe file you downloaded from the Adobe web site.

  4. The installation process will commence. You can accept the default location to install the filter product, unless you have a reason to change it.

  5. Once the installation process finishes, start the Indexing Service.

  6. Initiate a new scan, as described in the previous section.

If, for some reason, after that procedure PDF files still are not being indexed, check the Registry to make sure the Indexing Service knows the PDF filter is present and where it can find it. Stop the Indexing Service, and then open the Registry Editor and navigate to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex key. In the right pane, double-click the DLLsToRegister key. Look to see whether PDFFILT.DLL is present, and make sure the path is correct. (If you accepted the default entries during the filter installation process, the path is C:\Program Files\Adobe\PDF IFilter 5.0.)

13.1.3.2 Controlling merges

At some point within your organization a significant number of documents within your corpus might be modified. In this instance, it might be beneficial to initiate a master merge yourself, instead of waiting for the automatic master merge to occur in the evening.

To initiate a master merge manually, follow these steps:

  1. Open the Computer Management applet within the Control Panel.

  2. Expand the Indexing Service node in the left pane.

  3. Right-click the appropriate catalog where the changed documents are represented, and select Merge from the All Tasks menu. This is shown in Figure 13-10.

    Figure 13-10. Initiating a master merge manually


  4. Confirm your choice to merge the catalog by clicking Yes. Remember that this is a CPU- and time-intensive operation.

You also might find it convenient to change the scheduled time for master merges to occur. Perhaps your lowest CPU load occurs at 3:00 a.m. and not at midnight, as the service comes preconfigured. To change this time, you'll need to edit the Registry. Follow these steps:

  1. Open the Registry Editor (selecting the Run command from the Start menu and entering regedit is an easy way).

  2. Navigate to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex key.

  3. In the right pane, double-click the MasterMergeTime window.

  4. In the Data box in the resulting DWORD Editor window, enter a value that represents the number of minutes past midnight that the master merge process should begin. This should be a number between 0 and 1,439. For our exampleto begin at 3:00 a.m. instead of midnightenter 180. Be sure Decimal is selected.

  5. Click OK.

13.1.3.3 Running and configuring queries

The Indexing Service has several interfaces. Perhaps the easiest and most accessible is simply through the Search command off the Start Menu, as shown in Figure 13-11.

Figure 13-11. Accessing the Indexing Service via the Windows user interface


When using this interface, choose the option to search for files and folders, and then enter a filename, a word, a string of text from a file, or some other criterion in the box provided. Then the Indexing Service will work its magic, displaying results sometimes as much as 10 times faster than an ordinary search done with Windows without the Indexing Service present.

The Indexing Service console also contains a Query the Catalog interface, as shown in Figure 13-12.

Figure 13-12. Accessing the Indexing Service via the Query the Catalog page


The main advantage of the Query the Catalog form is the wider availability of search criteria. Using this page, you can search for words and phrases, search for words and phrases that are near other words and phrases, search for strings within text properties (such as a document summary in Microsoft Word), search within certain document formats, use operators such as <, <=, =, =>, >, and != against a fixed data point (useful for comparing against a date, a time, a size, or the like), use Boolean operators, use wildcard operators, use regular expressions, and rank results by how close the match is to the query. It's certainly quite a list.

If you want to create your own custom query form, that's simple to do as well. A basic form might consist of the following:

<h1>Indexing Service Query</h1>      <p>Enter the term for your search, and then press Submit.</p>     <form method="POST" action="/scripts/querydemo.idq"> <p><input type="text" name="CiRestriction" size="75"><input  type="submit" value="Submit" name="B1"> <input type="reset" value="Reset" name="B2"></p> </form>

A custom query form has one requirement: it must post back to the Internet data query (IDQ) file, which simply configures the correct query parameters for a search. (Head over to http://msdn.microsoft.com and search for "format IDQ" for a detailed reference on the formatting for these files.) The code shown next is a standard format for an IDQ file.

[Query] # CiCatalog=d:\ <= COMMENTED OUT - default registry value used CiColumns=filename,size,rank,characterization,vpath,DocTitle,write CiRestriction=%CiRestriction% CiMaxRecordsInResultSet=200 CiMaxRecordsPerPage=35 CiScope=/ CiFlags=DEEP CiTemplate=/iissamples/issamples/ixtourqy.htx CiSort=rank[d] CiForceUseCi=true

Let's take a closer look at each part of the IDQ file.


[Query]

Identifies the following information as a query restriction.


CiCatalog=d:\

Points to the index to use. In the previous case, the statement is commented out, so default is used.


CiColumns=filename,size,rank,characterization, vpath,DocTitle,write

Indicates the kind of information to return in the result set.


CiRestriction=%CiRestriction%

Indicates the query terms to search for. In this case, the CiRestriction form parameter is used, which matches the variable name of the text box used in the example form previously.


CiMaxRecordsInResultSet=200

Sets the maximum number of results to be returned; in this example, 200.


CiMaxRecordsPerPage=35

Determines how many results are shown on each web page returned. In this case, 35 results will be shown per web page.


CiScope=/

Tells where to start the query. In this example, the query starts at the root of the virtual directory space. You can list more than one virtual directory in your scope by separating the directories with a comma ( , ). For example: CiScope = /docs, /work,/school.


CiFlags=DEEP

Instructs the query to search all subdirectories within the scope. Change DEEP to SHALLOW to search only the directory shown in CiScope.


CiTemplate=/iissamples/issamples/ixtourqy.htx

Indicates which file to use to format the results; in this case, Ixtourqy.htx.


CiSort=rank[d]

Tells how to sort the results. This example calls for results to be listed in descending ([d]) rank order; that is, the results are listed in order from the file with the most hits to the file with the least hits.


CiForceUseCi=true

This is an optional variable that, when set to trUE, forces Indexing Service to search the content index even if it is out of date.

If you use a sort method other than rank descending, you receive only a subset of the total set of matching documents, and that subset might be different with every successive query. The only surefire, consistent rank sorting method is rank descending, as described earlier.


If you are receiving an error such as "No documents matched the query" when using a custom query form, you can try a few things. For one, check the .IDQ file that is being used for the query, and make sure the line CiCatalog is pointing to the correct catalog location. If you are using a custom catalog, be sure to point this entry somewhere; otherwise, you are searching in the default catalog, which isn't what you want.

Also, if you are trying to search content on an IIS-hosted web site, make sure the Index this Resource checkbox is checked for that particular site. Open the IIS Manager console (see Chapter 8 for detailed instructions on administering IIS), and right-click the relevant web site in the left pane. Then select Properties from the context menu. Navigate to the Home Directory tab, check the Index This Resource box, and then click OK. Finally, restart the Indexing Service.

You also might be impeded from viewing some documents because of permissions. The Indexing Service scans and indexes using the System local account and must have at least Read permissions on the files you want indexed; otherwise, the service can't read them and they're not indexed. The service also needs Full Control permissions for the root folder of the drive that houses the catalog, and it needs Full Control on the CATALOG.WCI directorythis is located within the catalog directory. Additionally, if your users are attempting to search for documents, they might not be allowed to access them, and thus those documents would not show up in the search results (if those documents are hosted on an NTFS volume).

13.1.3.4 Adjusting performance options

Trying to adjust performance for the Indexing Service and to issue recommendations is tantamount to aiming at a moving target: several variables significantly affect the performance of the service, including the obvious onescorpus size, amount of memory available, and amount of physical disk space present. Testing, on an informal basis, has revealed that indexes with 150,000 documents or less tend to not require a special hardware emphasis: the stock hardware that runs Windows Server 2003 should be a sufficient base for such a small corpus. Above that "magic" number, however, and you might need to look at expanding hardware on the machine running the Indexing Service to improve performance.

13.1.3.4.1 Configuring performance within the Indexing Service

You need to adjust a couple of knobs within the Indexing Service to configure a certain level of performance based on system load; these adjustments are sometimes a quick fix to avoid needing a hardware upgrade. However, it's important to realize that in the majority of cases, the service works in the background and configures itself to consume resources appropriately; these options will make a noticeable difference only in either very high- or very low-load situations.

With that disclaimer out of the way, let's turn to adjustments. For one, you can adjust the level at which the Indexing Service thinks it runs on the serversometimes this can make the service a better player among the other processes jockeying for CPU time on your machine. To try this, do the following:

  1. Open the Indexing Service console.

  2. Right-click the Indexing Service node in the left pane, and select Stop.

  3. Now, from the Action menu, select All Tasks and then choose Tune Performance.

  4. The Indexing Service Usage screen appears, as shown in Figure 13-13.

    Figure 13-13. Adjusting Indexing Service usage


On this screen, simply select the options that adequately fit this machine's usage profile. Your options are as follows:


Dedicated server

Provides "instant" indexing and "high load" querying


Used often, but not dedicated to this service

Uses "lazy" indexing and "moderate load" querying


Used occasionally

Specifies "lazy" indexing and "low load" querying


Never used

Completely turns off the indexing service


Customize

Brings up a separate dialog, shown in Figure 13-14

Figure 13-14. Customizing Indexing Server performance


The Desired Performance screen allows you to adjust individually the indexing and querying settings for the service to use. You can choose between lazy, moderate, and instant indexing, and low load, moderate load, and high load querying.

13.1.3.4.2 Monitoring performance using the Performance Monitor

You might find that using the Performance Monitor bundled with Windows Server 2003 provides you with data on how the Indexing Service is performing. To call up the Performance Monitor, load the application from the Administrative Tools menu off the Start menu. Then, click the "+" icon in the middle of the toolbar in the right pane to open the Add Counters screen. Select the appropriate performance object as outlined in Table 13-3, which lists the relevant counters you can use to track this performance. Then, select the appropriate performance object and the appropriate counters on the right side of the screen, using Table 13-3 as a guide.

Table 13-3. Performance Monitor counters relevant to the Indexing Service

Object

Counter

Function

Indexing Service

Number of documents indexed

A tally of the documents indexed in the current session

 

Deferred for indexing

Number of documents currently in use that require indexing

 

Documents to be indexed

Least number of documents requiring indexing

 

Index size (MB)

Size of the index, in megabytes

 

Merge progress

Percentage of current merge process complete

 

Running queries

Number of queries being processed at the moment

 

Saved indexes

Number of saved indexes

 

Total number of documents

Number of documents familiar to the Indexing Service

 

Total number of queries

Number of queries that have been conducted in the current session

 

Unique keys

Number of unique keyswords, properties, and other search criteriapresent in the index

 

Word lists

Number of word lists

Indexing Service Filter

Binding time (msec)

Average time, in milliseconds, for filter binding

 

Indexing speed (MBph)

Speed of indexing the contents of a document, in megabytes per hour

 

Total indexing speed (MBph)

Speed of indexing the contents of a document and its properties, in megabytes per hour




    Learning Windows Server 2003
    Learning Windows Server 2003
    ISBN: 0596101236
    EAN: 2147483647
    Year: 2003
    Pages: 149

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net