Using Properties to Enhance Documents

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert  Ferguson

Table of Contents
Chapter  5.   Overview of Indexing and Searching Content


Documents can be much more than just simple text files! That is, documents consist of more than just text that might be of interest to end-user searches. They also include properties such as "Creation Date" and "File Size ." Moreover, Microsoft Office products have a set of predefined properties such as "Author" or "Last Saved At." Obviously, the index should understand the concept of properties, because they allow the user to formulate specific searches, such as "all documents written last week by Mr. Miles". The SharePoint Portal Server index indeed does support properties, not only of type string but also of other common basic types such as dates and integers.

How are properties being defined and how are their values specified? This is accomplished through Document Profiles, which allow you to associate existing properties or to define new properties. The actual value is specified whenever the Edit Profile dialog (see Figure 5.5) is invoked.

Figure 5.5. In the Edit Profile dialog, properties such as "Author" or "Company" can be set.

graphics/05fig05.jpg

To drill down into Document Profiles, see "Introduction to the Workspace," p. 220.

TIP

In enhanced folders, the Edit Profile dialog is always invoked with the Check In command. The authors are therefore encouraged to provide the additional properties. On standard folders, authors need to invoke this dialog explicitly, something that is easy to forget.


The SharePoint Portal Server search will not rely just on the properties that are explicitly assigned through the Document Profile, but instead all properties that are found in a document will be placed into the index. The component that is responsible for this is the filter responsible for the particular document format. For documents of an unknown format, of course, only properties assigned in the Document Profile plus a couple of system properties, like the last modification date, are available.

So let's take one more look at the earlier example searching for "all documents written last week by Mr. Miles". It should be pointed out that in the current release of SPS, you cannot formulate the query exactly like this except through the Advanced Search Web Part (see Figure 5.6). Here, you can specify the appropriate properties and conditions.

Figure 5.6. Notice the modified search condition in the Advanced Search Web Part.

graphics/05fig06.jpg

The Advanced Search Web Part shows us that the "Author" property should contain "Miles". The exact match is not applied, as "Mr. Miles" may author information with his first name included. The "written last week" condition is now formulated as "Created in the last 7 days".

This little example also illustrates that properties may affect the relevance of a search result. If the user ”being lazy ”just enters "Miles" in the simple search dialog, any "Miles" in the index will be matched. This will not only include references made in other documents to Mr. Miles's work, but may also contain documents that refer, for example, to "five miles of cabling". Still, the simple search will return documents written by Mr. Miles early in the list. This is because SharePoint Portal Server is smart enough to know that a match with the Author property is more relevant than a match within the document. This ability of SPS is referred to as rank coercion . It is interesting to note that the rank of search results (based on specific document properties that likely contain useful information) can be changed. For detailed information on how to modify rank coercion, refer to the SharePoint Portal Server SDK.

Though a match with a string property often indicates a better match, this is not always true; in fact some properties should not be included at all. One such property is the type of a document (defined as content-class), which should become clear by the following example: When looking for the word "Calendar", clearly items of type "Calendar" should be excluded. This feature is called property weighting , or more often, even attribute weighting .

TIP

Property weighting only applies to text searches. Boolean or numeric conditions cannot be weighted; they are either true or false.


Keywords

Keywords are a great mechanism that allow authors classifying their documents to add one or more keywords that characterize the content. For a Sales and Marketing department, the list of keywords could, for example, include all product names . This allows authors to select the appropriate keywords for documents that refer to one or more products.

As with any list type property, the Coordinator is responsible for defining the list of keywords. The Coordinator also can specify whether the list of keywords is fixed or whether the list is extensible. In the first case, the author can only choose from the list of defined keywords; in the latter case, new keywords can be defined. These new keywords, however, will not show up for other documents. This can cause similar keywords to be used for the same purpose. An example could be "Sales Report" and "Sales-Report".

If you don't know a good set of keywords, allow authors to add their own keywords. By monitoring the usage, you can populate the keyword dictionary with commonly used terms. In cases where almost identical keywords are used, you should change them all to one term . Consider restricting the usage to the list of defined keywords as soon as you feel comfortable. This will ensure consistent usage.

Obviously, a keyword match is of high relevance, as the author explicitly tagged the document with this information. Try to keep this list short and manageable so authors can select the correct keyword quickly.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net