Specifying Location of the Index

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 19.  Managing Indexing


To increase performance or to ensure that the necessary disk space is present, the Administrator can specify the location of the index and the associated files. These files can be classified as

  • Search Property Store The indexed properties are kept for all workspaces together in a file called SPS.EDB. Because the property store contains properties from each indexed document, the SPS.EDB file will become fairly large. To ensure optimal performance, it is loaded into main memory. As it may be too large to cache it entirely, it may get paged into memory on demand. Associated with this file are related transaction log files. These files are created by default in the C:\Program Files\SharePoint Portal Server\Data\FTData directory.

  • Search Index Catalogs Individual full text search data is stored next to the property store, which spans across all workspaces. These files are also referred to as "catalog". SharePoint Portal Server stores all these files in directories that are named after the workspace. The default file location under which a subdirectory is created for each workspace is C:\Program Files\SharePoint Portal Server\Data\FTData\SharePointPortalServer\Projects.

    NOTE

    If you take a look at this directory, you will notice that for each regular workspace, two subdirectories are created. One is named after the workspace, whereas the other contains of the workspace name appended with _train$$$. This latter directory is used to train the auto categorization tool. If an index is propagated to the server, you will also find a directory with the originating workspace name , but of course not one with the _train$$$ suffix.


  • Search Temporary Files During crawling, SharePoint Portal Server may create temporary files. By default these files are stored in the folder specified by the system TMP variable (typically C:\WINNT\TEMP).

  • Search Gatherer Logs Each time SharePoint Portal Server updates the index, it creates a log, for example to record access errors, about the URLs that have been crawled.

NOTE

These log files are not related to the transaction log files. Both files are in a binary format. However, the gatherer log files can be viewed by the Coordinator from a user -friendly Active Server Pages (ASP) page as you will see later in this chapter.


The location of any of these files can be changed. Through the existing SharePoint Portal Server Management Console it is only possible to change the file locations used for newly created workspaces. For already existing workspaces, the location can only be changed using some script-based support tools that can be found on the SharePoint Portal Server media kit in the support/tools directory. These tools are not installed on your server.

TIP

Plan where you want to locate the files! You can define the locations during the installation of SharePoint Portal Server. You should consider disk space capacity and whether or not you should allocate the data over different disks in order to improve the performance. For example, the temporary files location should point to a disk other than the system drive and other than the drive containing the index files.


Disk Space Requirements

There are three disk space components : the catalogs, the property store, and the gatherer logs.

The size of the catalog is largely determined by the textual information (the corpus ) that gets indexed per workspace. This corpus includes not only the content stored within SharePoint Portal Server, but also all external information, such as Web sites, that gets indexed through content sources. The catalog itself needs about 15% of the corpus size.

The property store is a single file that is shared by all workspaces. It also contains information that is derived from the indexed documents. Consequently the size of this single file is the sum of all catalog sizes plus some extra space for the non-text properties. To be on the safe side reserve 10MB per workspace for these properties.

The size of the gatherer log largely depends on the settings that can be specified in the SharePoint Portal Server Administration. If you specified log successes, you can approximate the size of the log by allowing 100 bytes per URL. If items excluded by rules are logged as well, the log can be substantially larger (as much as 10 times). For example, in Web crawls each .GIF file found will generate a 100-byte exclusion message. This size needs to be multiplied by the number of gatherer log files that should be kept. You can set the number of kept log files as well as the other logging options by using the Administration settings as outlined later in this chapter.

Changing the File Locations

To change the location of indices and gatherer logs of workspaces, do the following:

  1. Log in as Administrator on the server running SharePoint Portal Server.

  2. Open Programs, Administrative Tools, SharePoint Portal Server Administration.

  3. In the Microsoft Management Console (MMC) interface, select the Data tab (see Figure 19.1) .

    Figure 19.1. The default locations of all SharePoint Portal Server files can be changed through the Microsoft Management Console.

    graphics/19fig01.jpg

  4. Click the browse button to change the appropriate setting.

  5. Click OK.

TIP

Although you can change the path , the existing indexes or gatherer logs will not be moved to the new location. To do this, you must use the unsupported CATUTIL tool, which you can find in the Support\Tools directory on the SharePoint Portal Server CD. With this tool you can also move the property store to a different directory.



                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net