Generally speaking, relevance relates to how closely the search results returned to the user match what the user wanted to find. Ideally, the results on the first page are the most relevant, so users do not have to look through several pages of results to find the best result for their search.
The product team for SharePoint Server 2007 has added a number of new features that substantially improve relevance in the result set. The following sections detail each of these improvements.
Click distance refers to how far each content item in the result set is from an "authoritative" site. In this context, "sites" can be either Web sites or file shares. By default, all the root sites in each Web application are considered first-level authoritative.
You can determine which sites are designated to be authoritative by simply entering the sites or file shares your users most often visit to find information or to find their way to the information they are after. Hence, the logic is that the "closer" in number of clicks a site is to an authoritative site, the more relevant that site is considered to be in the result set. Stated another way, the more clicks it takes to get from an authoritative site to the content item, the less relevant that item is thought to be and the lower it will appear in the result set.
You will want to evaluate your sites over time to ensure that you've appropriately ranked sites that your users visit. When content items from more than one site appear in the result set, it is highly likely that some sites' content will be more relevant to the user than other sites' content. Use this three-tired approach to explicitly set primary, secondary, and tertiary levels of importance to individual sites in your organization. SharePoint Server 2007 allows you to set primary (first-level), secondary (second-level), and tertiary (third level) sites, as well as sites that should never be considered authoritative. Determining which sites should be placed at which level is probably more art than science and will be a learning process over time.
To set authoritative sites, you'll need to first open the SSP in which you need to work, click the Search Settings link, and then scroll to the bottom of the page and click the Relevance Settings link. This will bring you to the Edit Relevance Settings page, as illustrated in Figure 16-1.
Figure 16-1: Edit Relevance Settings page
Note that on this page, you can input any URL or file share into any one of the three levels of importance. By default, all root URLs for each Web application that are associated with this SSP will be automatically listed as most authoritative. Secondary and tertiary sites can also be listed. Pages that are closer (in terms of number of clicks away from the URL you enter in each box) to second-level or third-level sites rather than to the first-level sites will be demoted in the result set accordingly. Pages that are closer to the URLs listed in the Sites To Demote pane will be ranked lower than all other results in the result set.
When you hover your mouse over a link, the descriptive text that appears is called anchor text. The hyperlink anchor text feature ties the query term or phase with that descriptive text. If there is a match between the anchor text and the query term, that URL is pushed up in the result set and made to be more relevant. Anchor text only influences rank and is not the determining factor for including a content item in the result set.
Search indexes the anchor text from the following elements:
HTML anchor elements
Windows SharePoint Services link lists
Office SharePoint Portal Server listings
Office Word 2007, Office Excel 2007, and Office PowerPoint 2007 hyperlinks
Important or relevant content is often located closer to the top of a site's hierarchy, instead of in a location several levels deep in the site. As a result, the content has a shorter URL, so it's more easily remembered and accessed by the user. Search makes use of this fact by looking at URL depth, or how many levels deep within a site the content item is located. Search determines this level by looking at the number of slash (/) characters in the URL; the greater the number of slash characters in the URL path, the deeper the URL is for that content item. As a consequence, a large URL depth number lowers the relevance of that content item.
If a query term matches a portion of the URL for a content item, that content item is considered to be of higher relevance than if the query term had not matched a portion of the content item's URL. For example, if the query term is "muddy boots" and the URL for a document is http://site1/library/muddyboots/report.doc, because "muddy boots" (with or without the space) is part of the URL with an exact match, the report.doc will be raised in its relevance for this particular query.
Microsoft has built a number of classifiers that look for particular kinds of information in particular places within Microsoft documents. When that type of information is found in those locations and there is a query term match, the document is raised in relevance in the result set. A good example of this is the title slide in PowerPoint. Usually, the first slide in a PowerPoint deck is the title slide that includes the author's name. If "Judy Lew" is the query term and "Judy Lew" is the name on the title slide of a PowerPoint deck, that deck is considered more relevant to the user who is executing the query and will appear higher in the result set.
Documents that are written in the same language as the query are considered to be more relevant than documents written in other languages. Search determines the user's language based on Accept-Language headers from the browser in use. When calculating relevance, content that is retrieved in that language is considered more relevant. Because there is so much English language content and a large percentage of users speak English, English is also ranked higher in search relevance.
Certain document types are considered to be inherently more important than other document types. Because of this, Microsoft has hard-coded which documents will appear ahead of other documents based on their type, assuming all other factors are equal. File type relevance biasing does not supersede or override other relevance factors. Microsoft has not released the file type ordering that it uses when building the result set.