Factors in Query Latency

Latency in query execution is the amount of time it takes to execute the query and get back a result set. A number of factors affect query latency.

Corpus Size

Size of corpus. The number of documents in the document library and external sources incorporated into the workspace and content from index workspaces can increase query latency.

Query Syntax and Search Terms

The structure of search queries and the search terms that you use can affect query latency. The section describes how specific syntax and terms can impact query performance.

  • Choosing search terms. Searches over words with few occurrences in the indexare faster than those performed over words with many occurrences. Internet sites often include large documents containing high hit counts for certain words. You can improve performance by determining such files and using site path rules to exclude these files from being crawled. The following query returns the documents with high hit counts for the word "software".
     SELECT HITCOUNT, PATH FROM SCOPE('DEEP TRAVERSAL OF "http://server01/workspace1"') WHERE CONTAINS('software') AND HITCOUNT > 500 ORDER BY HITCOUNT DESC 
  • Choosing query predicates. Choice of full-text predicates can increase query latency. You can use the CONTAINS predicate to search for exact strings of words or phrases. A query for "SharePoint Portal Server" returns only exact matches on the phrase. In a FREETEXT query on the same search terms, the search engine searches for the words individually as well as the entire phrase. For example, the query for "SharePoint Portal Server" matches documents that contain "SharePoint" OR "Portal" OR "Server". FREETEXT uses the thesaurus and also invokes a stemmer on the query text. The result is that the query runs for all permutations of the inflected forms for individual words and phrases in the query text. FREETEXT also uses a probabilistic ranking algorithm, whereas CONTAINS does not. FREETEXT queries typically return more hits than CONTAINS queries. Consequently, this may increase query latency.
  • Setting query traversal depth. Deep traversal queries are executed by MSSearch and generally have better performance than shallow traversal queries. This is because deep traversal queries do not check the traversal depth for folders and sub-folders traversed by the query. Checking folder scope using the SCOPE operator increases query latency. Shallow traversal queries, such as Category browsing through the dashboard site, have low latency when they are executed by the SharePoint Portal Server store. For more information about how traversal affects query execution, see Appendix B, For More Information.
  • Querying over property values of metadata. Querying over property values of metadata can increase query latency because the value for the property must be read from the property store to confirm that it is an exact match.
  • Limiting result set size. Specifying a limit on the maximum number of rows returned by the search engine can significantly reduce query latency.
  • If you submit a search query through WebDAV, you can specify the maximum number of rows returned by that query by setting the MS-SEARCH-MAXROWS header in the WebDAV request. Setting the header to return 200 rows or fewer reduces query latency.

    MS-SEARCH-MAXROWS should not exceed 2000.

    For more information, see Appendix B.

  • Managing query time-outs. You can reduce query latency by specifying a limit on how long to wait before a query is timed out. For information on configuring query time-outs, see Appendix B.
  • Setting the ORDER BY clause. The ORDER BY clause sorts the query results based on the value of one or more specified columns. Ordering queries by rank does not impact query latency because the Search engine calculates the rank at query time. Ordering queries by anything else causes the query to access the property store to retrieve that value, and, as a result, increasing query latency.
  • Checking equality and regular expressions. Queries that require the search engine to retrieve property values from the property store have higher latency than other queries. When queries check equality for a property value and when regular expressions are used the search engine retrieves property values from the property store. You use regular expressions in non-full-text predicates, such as LIKE and MATCHES to perform complex pattern matching on text columns. LIKE and MATCHES queries also cause the search engine to access the property store to verify property values and increase the latency for that query. The property store is always accessed when you use the "=" operator to check the value of a property in a query. You can also perform basic pattern matching, such as prefix matching, by using wildcards with full-text-predicates, such as CONTAINS.

Resource Limitations

Server capacity can also affect query latency. This section describes particular resources that can impact query performance.

Managing resource limitations. Query performance improves if components of Microsoft Search (MSSearch) do not compete for resources on the same Microsoft® SharePoint Portal Server 2001 computer. One way to improve query performance is to deploy resource-intensive tasks to different servers. For example, you can create an index workspace on a separate server dedicated to creating and maintaining indexes. For more information about configuring a server dedicated to creating and updating indexes, see Appendix B.

Property Store Configuration and Caching

Queries perform best when a high proportion of the property store is loaded into memory. You can control the size of the in-memory cache for the property store through the registry. You can load Properties from the property store into this cache using one of two methods, index propagation or by using a special query.

Registry settings for MSSearch place limits on how much of the property store is loaded and cached in-memory. You can increase the size of this cache by setting the following two registry keys:

Incorrectly editing the registry may severely damage your system. Back up the current version of the registry before making any changes. You should also back up any valued data on the computer.

  • Minimum cache size. \HKEY_LOCAL_MACHINE\Software\Microsoft\Search\1.0\Databases\CacheSizeMin
  • Maximum cache size. \HKEY_LOCAL_MACHINE\Software\Microsoft\Search\1.0\Databases\CacheSizeMax

The value for the key equals the number of 4KB pages of memory, represented in hexadecimal, which is allocated for the cache. On a system with 2 gigabytes of RAM, setting the minimum cache size to 40000 and the maximum as 48000 allows for a cache that uses between 1 and 1.2 gigabytes of RAM. You must restart MSSearch for the registry settings to take effect.

Setting a substantial difference between minimum and maximum cache size may degrade performance. Setting the maximum cache size to a large percentage of the available RAM degrades performance.

When a property appears in the SELECT clause of a query, the search engine retrieves a property from the property store, and caches that property and its value in memory. The search engine retrieves the value for that property from memory for subsequent queries, thus eliminating disk access to the property store. Query performance can be improved by executing queries that select properties so that the property store is cached into memory. After the scripts finish executing, properties loaded from the property store remain in memory until the SharePoint Portal Server computer is rebooted or the MSSearch service is restarted.

The property store for MSSearch is cached into memory as a side effect of propagating an index. For more information on how to propagate an index, see Appendix B.

Performance Monitoring

SharePoint Portal Server provides two performance objects to monitor searching on the dashboard site.

  • Microsoft Search object. You can use the Microsoft Search object to monitor the number and rate the following for the server: failed queries, queries (all), results and successful queries.
  • Microsoft Search Catalogs object. You can use the Microsoft Search Catalogs object to monitor the number and rate of the following for each workspace: failed queries, queries (all), results and successful queries.

For more information about performance monitors, see Appendix B.



Microsoft Sharepoint Portal Server 2001 Resource Kit
Microsoft SharePoint(TM) Portal Server 2001 Resource Kit (Examples & Explanations Series)
ISBN: 0735615624
EAN: 2147483647
Year: 2001
Pages: 231

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net