After a query is executed against the index, the browser is redirected automatically to the Search Results page. The results page can potentially contain the following types of results:
High confidence results
Keyword and Best Bet results
The layout of the information on the results page is a direct result of combining the search Web Parts for the results page. Because the results are passed from the index to the search results page in XML format, many of these Web Parts use XSLT to format the results. This is why you'll need to enter additional XSLT commands if you want to perform certain actions, such as adding another property to the Advanced Search drop-down list.
To add a property to the advanced search Web Part, perform these steps:
Navigate to the results page and then click Edit Page from the Site Actions menu.
Open the edit menu for the search Web Part, and select Modify Shared Web Part to open the Web Part property pane.
Expand the Miscellaneous section in the properties pane, and find the property called "properties." You'll find an XML string that allows you to define what properties will be displayed in the advanced search. Best practice here is to copy the string to NotePad for editing.
Edit the XML string, and save it back into the property. You can save the XML in the format shown below. This XML is copied directly from the Web Part. You'll need a profile property in the schema for the XML to hold any real value.
<Properties> <Property Name="Department" ManagedName="Department" ProfileURI="urn:schemas -microsoft-com:sharepoint:portal:profile:Department"/> <Property Name="JobTitle" ManagedName="JobTitle" ProfileURI="urn:schemas- microsoft-com:sharepoint:portal:profile:Title"/> <Property Name="Responsibility" ManagedName="Responsibility" ProfileURI="urn :schemas-microsoft-com:sharepoint:portal:profile:SPS-Responsibility"/> <Property Name="Skills" ManagedName="Skills" ProfileURI="urn:schemas- microsoft-com:sharepoint:portal:profile:SPS-Skills"/> <Property Name="QuickLinks" ManagedName="QuickLinks" ProfileURI="urn:schemas -microsoft-com:sharepoint:portal:profile:QuickLinks"/> </Properties>
The elements that you'll need to pay attention to are as follows:
Profile URI (Uniform Resource Identifier)
If you look at the URN (Uniform Resource Name) string carefully, you'll see that the profile name is being pulled out of the profile URN. This is why you'll need a profile property in the schema before this XML will have any real effect.
URIs, URNs, and URLs play an important, yet quiet, role in SharePoint Server 2007. A Uniform Resource Identifier (URI) provides a simple and extensible means for identifying a resource uniquely on the Internet. Because the means of identifying each resource are unique, no other person, company, or organization can have identical identifiers of their resources on the Internet.
The identifier can either be a "locator" (URL) or a "name" (URN). The URI syntax is organized hierarchically, with components listed in order of increasing granularity from left to right. For example, referring back to the XML data for the advanced search Web Part, we found that Microsoft had at least this URN:
As you move from left to right, you move from more general to more specific, finally arriving at the name of the resource. No other resource on the Internet can be named exactly the same as the sps-responsibility resource in SharePoint. The hierarchical characteristic of the naming convention means that governance of the lower portions of the namespace is delegated to the registrant of the upper portion of the namespace. For example, the registered portion of the URN we're using in our running example is "schemas-microsoft-com." The rest of the URN is managed directly by Microsoft, not the Internet registering authority.
You will find URIs, URLs, and URNs throughout SharePoint and other Microsoft products. Having a basic understanding of these elements will aid your administration of your SharePoint deployment.
When you modify any one of the search Web Parts, you'll notice that a publish toolbar appears. This toolbar enables you to modify this page without affecting the current live page. You can then publish this page as a draft for testing before going live.
Server name mappings are crawl settings you can configure to override how server names and URLs are displayed or accessed in the result set after content has been included in the index. For example, you can configure a content source to crawl a Web site via a file share path, and then create a server name mapping entry to map the file share to the Web site's URL. Another way to look at this feature is that it gives you the ability to mask internal file server names with external names so that your internal naming conventions are not revealed in the result set.
The thesaurus is a way to manually force or deny certain types of query terms at the time the user enters a query in the Search box. Using the thesaurus, you can implement expansion sets, replacement sets, weighting, and stemming. This section focuses on the expansion and replacement sets.
The thesaurus is held in an XML file, which is located in the drive:\program files\office sharepoint server\data\ directory and has the format of TS<XXX>.XML, where XXX is the standard three-letter code for a specific language. For English, the file name is tsenu.xml.
Here are the contents of the file in its default form:
<XML > <thesaurus xmlns="x-schema:tsSchema.xml"> <expansion> <sub weight="0.8">Internet Explorer</sub> <sub weight="0.2">IE</sub> <sub weight="0.9">IE5</sub> </expansion> <replacement> <pat>NT5</pat> <pat>W2K</pat> <sub weight="1.0">Windows 2000</sub> </replacement> <expansion> <sub weight="0.5">run**</sub> <sub weight="0.5">jog**</sub> </expansion> </thesaurus> --> </XML>
There are two parts to the code: an expansion set and a replacement set.
You use expansion sets to force the expansion of certain query terms to automatically include other query terms. For example, you could do this when a product name changes but the older documents are still relevant, if acronyms are commonly used as query terms, or when new terms arise that refer to other terms, such as slang or industry-specific use of individual words.
If a user enters a specified word, other hits that match that word's configured synonyms will also be displayed. For instance, if a user searches on the word "car", you can configure the thesaurus to force a search on the word "sedan" as a synonym for the word "car" so that the result set will include content items that include the word "car" but also the word "sedans", whether or not those "sedan" content items also mention the word "car."
For example, to use the car illustration, you create the following code:
<XML > <thesaurus xmlns="x-schema:tsSchema.xml"> <expansion> <sub>car</sub> <sub>sedan</sub> </expansion> </thesaurus>
You can have more than two terms in the expansion set, and the use of any term in the set will invoke the expansion of the query to include all the other terms in the expansion set.
If you want multiple expansion sets created-say, one for "car" and the other for "truck"-your code would look like this:
<XML > <thesaurus xmlns="x-schema:tsSchema.xml"> <expansion> <sub>car</sub> <sub>sedan</sub> <sub>automobile</sub> </expansion> <expansion> <sub>truck</sub> <sub>pickup truck</sub> <sub>SUV</sub> </expansion> </thesaurus>
You can see how each expansion set is its own set of synonyms. This file can be as long as you want, and expansion sets need not be topically similar.
You can use the thesaurus to create a replacement set of words by specifying an initial word or pattern of query terms that will be replaced by a substitution set of one or more words. For example, you could create a replacement set that specifies the pattern "book writer" with the substitution "author" or "wordsmith." In this example, when a user executes a query against the phrase "book writer," the result set returns documents that have the words "author" and "wordsmith," but not documents that contain the phrase "book writer."
You do this is to ensure that commonly misspelled words are correctly spelled in the actual query. For example, the word "chrysanthemum" is easily misspelled, so placing various misspellings into a replacement set might help your users get the result set they're looking for even though they might not be able to reliably spell the query term. Another example of using the replacement set is for product name changes where the old documents to the old product line are not needed any more or a person's name has changed.
Your code would look like this:
<XML > <thesaurus xmlns="x-schema:tsSchema.xml"> <replacement> <pat>book writer</pat> <sub>author</sub> <sub>wordsmith</sub> </replacement> </thesaurus>
Creating replacement sets for each misspelling is more time consuming, but it is also more accurate and helps those who are "spelling-challenged" to get a better result set.
So how can you use this? Let's assume that you have a product-line name change. Use an expansion set to expand searches on the old and new names if both sets of documents are relevant after the name change. If the documents referring to the old name are not relevant, use a replacement set to replace queries on the old name with the new name.
The noise word file, by default, contacts prepositions, adjectives, adverbs, articles, personal pronouns, single letters, and single numbers. You'll want to place any additional words in the noise word file that will not hold any discriminatory value in your environment. Further examples of words that don't highly discriminate between documents in your environment include your company name and names of individuals who appear often in documents or Web pages.
Keywords are words or phrases that site administrators have identified as important. They provide a way to display additional information and recommended links on the initial results page that might not otherwise appear in the search results for a particular word or phrase. Keywords are a method to immediately elevate a content item to prominence in the result set simply by associating a keyword with the content item. The content item, in this context, is considered a Best Bet. Best Bets are items that you want to appear at the top of the result set regardless of what other content items appear in the result set. For example, you could make the URL to the human resource policy manual a Best Bet so that anytime a user queries "human resources", the link to the policy manual appears at the top of the result set.
Keywords are implemented at the site-collection level. You'll create a keyword and then give it one or more synonyms. As part of creating the keyword, you'll need to enter at least one Best Bet. After you create the keyword, add the synonym. After associating at least one Best Bet, you'll find that when you search on the synonym, the Best Bet will appear in the Best Bet Web Part in the right-hand portion of the results page (by default).
Here is an example. Create a keyword by clicking the Search Keywords link in the Site Collection Administration menu. Then click the Add Keyword button. In the Keyword Phrase text box, type Green, and in the Synonym text box, type Color. Next, associate the keyword and the synonym with the Green folder in the Docs share as a Best Bet. (See Figure 16-23.)
Figure 16-23: Creating the Green keyword Best Bet
After doing this, when you search on the word "color," you see the Green folder appear in the Best Bet Web Part to the right of the core result set. (See Figure 16-24.)
Figure 16-24: Green folder appearing in the Best Bet Web Part on the results page
Remember, Best Bets are merely a link to the information that is especially relevant to the keyword or its synonym. Be sure to look through your reports to find terms that are being queried many times, with users going to the same location many times. This is an indication that these terms can be grouped into a keyword with synonyms and these destinations become the Best Bet.
The result set can be modified and managed in a number of ways. It will be impossible to fully cover every aspect of each Web Part in the result set, so this section highlights some of the more important configurations.
First, it is possible that there will be times when a user queries to find a document and receives a separate listing in the result set for each document in the version history of a document lineage. If this happens, try crawling the document library using an account with read permissions to the document library that isn't the same as the application pool account.
The application pool account has pervasive access to content in a way that is not displayed in the user interface and is not configurable by the site administrator. Regardless of the type of versioning that is turned on, if you crawl that library using the application pool account, all versions in the history of that document will be crawled and indexed and may be displayed in the result set for your users.
Secondly, it is important for search administrators (and anyone modifying the results page) to grasp is that a major portion of the configurations are pushed into the Web Part properties rather than being given links on an Administration menu page. This makes it a bit more difficult to remember where to go when trying to manage an individual element on the page. Just remember, you're really trying to manage the Web Part, not the page itself.
You modify Web Parts by clicking Edit Page under Site Actions. The page will immediately be placed into edit mode. Your actions from this point forward will not be seen by your users until you successfully publish the page.
The first thing that you'll notice you can do is add more tabs across the top of the page. If you click on the Add New Tab link (shown in Figure 16-25), you're taken to the Tabs In Search Results: New Item page. On this page, you can reference an existing page, enter a new tab name for that page, and enter a tooltip that will pop up when users hover their mouse over the tab (as shown in Figure 16-26). Remember, you're not creating a new Web page at this location. You're merely referencing a page that you've already created in the Pages library. You do this to map a new search results page with one or more search scopes.
Figure 16-25: Add New Tab link in the Edit Page screen for the Search Center Result Set
Figure 16-26: Tabs In Search Results- New Item page
When users receive a result set from Search, they have the ability to continue receiving notifications from search results based on their query. If they like the results of the query they've executed and expect to re-execute the query multiple times to stay informed about new or modified information that matches the query, users can choose to either create an alert based on their query or set up a Real Simple Syndication (RSS) feed to the query.
By using RSS, users can stay updated about new information in individual lists or libraries. The RSS feed will be automatically added to their Outlook 2007 client. (Earlier versions of Outlook do not support this feature.) In addition, to run the RSS client successfully, users will need to download the Microsoft desktop search engine. The desktop search engine will not be automatically installed: you need to install it manually.
If you need to remove the RSS link feature, first generate a result set and then under the Site Actions menu, click Edit Page. Navigate to the Search Actions Link Web Part, click Edit, and then click Modify Shared Web Part. Under the Search Results Action Links list, clear the Display "RSS" Link check box.
|Best Practices|| |
Clear the Display "RSS" Link check box until you have deployed the Microsoft Desktop Search Engine. There is no sense in giving your users an option in the interface if they can't use it.
By default, the following Web Parts are on the Search Center's results page:
Search Action Links
Search Best Bets
Search High Confidence Results
Search Core Results
This section covers the management of the more important Web Parts individually. All of these Web Parts are managed by clicking the Edit button in the Web Part (refer back to Figure 16-25) and then selecting Modify Shared Web Part in the drop-down menu list.
This is the Web Part that will allow you to select whether or not scopes appear in the drop-down list when executing a query. The Dropdown Mode drop-down list in the Scope Dropdown section offers you several choices. You can completely turn off scopes or ensure that contextual scopes either appear or don't appear. (See Figure 16-27.)
Figure 16-27: Scope configuration options
The "s" parameter can be used when the scopes drop-down is hidden. Note that you can enter the scope in the URL, if needed, as follows:
But if the scopes drop-down is hidden, the "s" parameter lets you indicate whether you want to default to the "contextual" scope, for example, "this site" or re-use the scope specified by a search box on the originating page, which is passed in the 's' parameter.
This is the mode used within the search center, allowing for the search scope to be carried through across tabs. It allows you to consruct a user interface where some tabs make use of the 's' parameter and others don't, but the parameter is preserved as the user navigates through the tabs.
For example, let's assume a user picks the scope "Northwest." After executing the query, the user is presented with the "northwest" results of the query in the search results page in the search center. On the search results page, the search box there has no scopes drop-down list displayed. So the user modifies the query slightly and re-executes the search. The search box re-uses the same scope for the second query, because it is specified in the 's' parameter.
In addition, you can enter a label to the left of the scope drop-down list that explains what scopes are, how the drop-down list works, or what each scope is focused on.
The Search Core Results Web Part displays the result set. The rest of the Web Parts on the page can be considered helpful and supportive to the Search Core Results Web Part. In the configuration options of this Web Part, you can specify fixed keyword queries that are executed each time a query is run along with how to display the results and query results options.
Probably of most importance are the query results options. In this part of the configuration options (shown in Figure 16-28), you have the following choices to make:
Remove Duplicate Results Select this option if you want to ensure that different result items of the same content item are removed.
Enable Search Term Stemming By default, stemming in the result set is turned off, even though the indexer performs stemming on inbound words into the index. For example, if the crawler crawls the word "buy," the indexer will stem the word and include "buy," "buying," buys," and "bought." But in the result set, by default, if the query term is "buy," the result set will only display content times that contain the word "buy." If you select to enable search term stemming, then, in this scenario, the result set will contain content items that include the stemmed words for "buy." You'll want to enable this if you want the result set to include content items that might only have stemmed forms of the query terms.
Permit Noise Word Queries This is a feature that allows noise words in one language to be queried in another language when working in a cross-lingual environment. For example, "the" in the ENG language is a noise word, but it's equivalent in another language might not be a noise word. So you enable the permit noise word queries so that a user who searches on "the" will obtain content items in other languages where "the" is not considered a noise word.
Enable URL Smashing In this feature, we take the query and "smash" the query terms together and then see if there is a URL that matches exactly the smashed query terms. For example, if someone searches on "campus maps" and there is an intranet Web site with the URL http://campusmaps, then this URL will become the first result in the result set. This is different from URL Matching in that the smashed query terms must match exactly the URL, whereas in URL matching, the query terms only need to match a portion of the URL.
Figure 16-28: Configuration options for the Search Core Results Web Part
Search results frequently contain several items that are the same or very similar. If these duplicated or similar items are ranked highly and returned as the top items in the result set, other results that might be more relevant to the user appear much further down in the list. This can create a scenario where users have to page through several redundant results before finding what they are looking for.
Results collapsing can group similar results together so that they are displayed as one entry in the search result set. This entry includes a link to display the expanded results for that collapsed result set entry. Search administrators can collapse results for the following content item groups:
Duplicates and derivatives of documents
Windows SharePoint Services discussion messages for the same topic
Microsoft Exchange Server public folder messages for the same conversation topic
Current versions of the same document
Different language versions of the same document
Content from the same site
By default, results collapsing is turned on in SharePoint Server 2007 Search. Results collapsing has the following characteristics when turned on:
Duplicate and Derivative Results Collapsing When there are duplicates in the search results and these are collapsed, the result that is displayed in the main result set is the content item that is the most relevant to the user's search query. With duplicated documents, factors other than content will affect relevance, such as where the document is located or how many times it is linked to.
Site Results Collapsing When search results are collapsed by site, results from that site are collapsed and displayed in the main result set in one of two ways: content from the same site is grouped together, or content from the same folder within a particular site is grouped together, depending on the type of site (as described in the following sections).
SharePoint Sites All results from the same SharePoint site are collapsed. No more than two results from the same SharePoint site will be displayed in the main results set.
Sites Other than SharePoint Sites For Web sites that are not SharePoint sites, results are collapsed based on the folder. Results from the same site but from different folders within that site are not grouped together in the collapsed results. Only results from the same folder are collapsed together. No more than two results from the same folder for a particular site will be displayed in the main results set.
Much of your organization's most important information is not found in documents or Web sites. Instead, it is found in people. As users in your organization learn to expose (or "surface") information about themselves, other users will be able to find them based on a number of metadata elements, such as department, title, skills, responsibilities, memberships, or a combination of those factors.
Use the People tab (shown in Figure 16-29) to execute queries for people. When you do this, the result set will list the public-facing My Sites for users whose names match the search query. In addition, the result set will default to sorting by social distance, so the My Colleagues Web Part will appear listing the users who fit the search query and who are also part of your colleague (or social) network.
Figure 16-29: People tab for executing queries for people
If you want to view these results by relevance rather than by social distance, click the View By Relevance link and the result set will be displayed in rank order. If you want to change the default ordering of people to display by rank rather than by social distance, change the Default Results View setting in the Modify Shared Web Part properties of the People Search Core Results Web Part.
Real World Best Practices for Search and Indexing
It is difficult to read a chapter and be expected to remember all the best practices for implementing and administrating a feature set like this. This sidebar summarizes some of what I've learned over the last three years regarding search and indexing.
I'll start by mentioning some planning elements that often crop up when I work with customers in the field. After customers learn about the breadth and depth of what search and indexing can do, they realize that they need to take a long step back and ask two important questions:
Where is our information?
Which information do we want to crawl and index?
These are not inconsequential questions. One of the main reasons for implementing a software package like SharePoint Server 2007 is to aggregate your content. If you don't know where your content is, how can you aggregate it? The reason, often, that we don't know where our content resides is because it resides in so many different locations. Think about it. In most organizations, content is held all over the place, such as in the following places:
Local hard drives
The exciting element about search and indexing in SharePoint Server 2007 is the existence of the Business Data Catalog and its ability to act as an abstraction layer between SharePoint and the different data interfaces so that we can finally aggregate content using SharePoint Server 2007.
But if you haven't take the time to genuinely understand where your data resides, you'll have a hard time knowing what data should be aggregated using search and indexing.
However, once you know where you data resides and you've decided which data will be indexed, you can begin the process of building your content source structure in way that makes sense. Most organizations have found that they can't "swerve" into success by creating content sources as the need arises. Most have found that it is very helpful to put some forethought and planning into the process of determining which content in their environment will be aggregated using search and indexing, as opposed to aggregating the content by either hosting it in Share-Point (which automatically gets indexed) or linking to the content (which might or might not index the content).
So, as you start your planning process, be sure to ask yourself some key questions:
Where does the information reside today that is mission critical to my organization's success?
Of that information set, what information should we index? (Presumably most, if not all, should be indexed.)
How many content sources will be needed to crawl this information?
Are there any crawl rules that will need to be established to help the content sources accurately crawl information?
What is the schedule on which this information should be crawled? The more often the information changes relative to the urgency of that updated information appearing in the index will dictate how often the content source should be crawled.
When should full index builds be run vs. incremental index builds?
What search scopes will be needed to help the users commit effective queries?
Are there any search results page modifications that we should configure?
Asking yourself questions like this will help you avoid mistakes when implementing a robust search and indexing topology. Some best practices to keep in mind include the following:
Crawl a content source only when the target server is not being backed up or when it is not running any other resource-intensive activity. Scheduling is key here.
Ensure that you've tested the crawls using a test server before implementing it in production. Nothing generates a help desk call more quickly than a user expecting to see a document in the result set and then not seeing it.
Ensure that you've trained your end users well on how to execute queries, what the search scopes mean in your environment, and how to use the result set to their advantage.
Don't crawl just everything that every user in your organization wants crawled. Build a set of criteria that can act as a business-case analysis for when new content sources get created and when they don't.
Expand your horizons a bit and take a look at the range of information your users might need. For example, in a medical setting, don't be afraid to crawl and index medical journals or online research portals to help your staff stay up to date with their continuing education.
In larger organizations or in organizations with heavy search and indexing needs (say, 75 or 100 content sources or more), you might find that you need a full-time staff member just to manage this area. Consider your staffing needs as your content source topology grows and becomes more complex.