After you create a category hierarchy, the next step is to categorize the content in the workspace. There are two methods of associating documents with categories:
There are two ways to assign individual documents to categories:
In addition to these two methods, this section describes how to apply categories to shortcuts (links) to content stored outside the workspace.
You can manually categorize a document by editing the Search and Categories tab on the Properties page of the document. By using Windows Explorer, an author or coordinator can select one or more values from the checklist of workspace categories. If the document is stored in an enhanced folder, you must check out the document before you can change the document's category assignments. For a small number of documents, you can use this method of categorization exclusively.
You can also categorize a document by using document profiles. If the coordinator has configured the document profile to display categories, the author will be able to select categories when they check in the document. Adding the Categories property to document profiles provides a way to enforce category assignment when authors check in a document. It also distributes the task of document categorization among multiple authors. This method is particularly useful for bulk categorization scenarios, such as when a large set of documents is migrated into the workspace. Both methods are illustrated in the following figure.
Figure 17.2 Two ways to manually assign categories to a document
When stored in a SharePoint Portal Server folder, shortcuts provide the ability to annotate content stored outside the workspace with metadata. By using shortcuts, you can manually assign categories to information stored outside the workspace. SharePoint Portal Server includes a special document profile called the Web Link profile, which includes a property called Link, for this purpose.
When you add a .URL file to a workspace folder and apply the Web Link profile, SharePoint Portal Server uses the Link property to determine the target of a shortcut. SharePoint Portal Server automatically updates the Link property for a .URL file but not for a .LNK file. To fill in this property, right-click the shortcut to open the profile form and then select Edit Profile. When the form opens, SharePoint Portal Server populates the Link property. You can close the form by clicking OK.
If you categorize multiple shortcuts at the same time (bulk edit), SharePoint Portal Server does not automatically update this property. The shortcuts do not display correctly until you open each shortcut individually.
When you add a shortcut to the workspace, the object created in the store is a file called a stub. When the Link property is set on a stub, two things happen:
In a folder associated with the Web Link document profile, add a shortcut to the content that you want to categorize and apply the Web Link document profile to the shortcut. Edit the document profile to apply the appropriate categories. Ensure that you fill in the Link property correctly before closing the profile form. To preserve the Link property of the shortcut when you drag and drop it in the workspace, ensure that the default document profile for the folder includes the Link property.
The Title property of the Web Link document profile overwrites the actual title of the document retrieved by the shortcut. This is true even if the Title property remains empty. To avoid this problem, create a document profile for the shortcuts that includes the Link and Categories properties but not the Title property.
If you crawl a large quantity of content outside the workspace, you can also apply categories automatically. The next section of this chapter describes this method of automatic categorization.
Efficiently categorizing documents presents a significant challenge to coordinators. Not only can there be a vast amount of information aggregated for the dashboard site but also this information typically lacks inherent structure, making it hard to organize sensibly. To solve this problem, SharePoint Portal Server includes technology that will automatically categorize crawled documents as well as documents published in the workspace. If you plan to use categories for a large number of files, the Category Assistant can efficiently assign categories from your category structure to existing documents and add them automatically to new documents. This reduces the time required to implement categories for your users.
The Category Assistant is based on an adaptive algorithm that can learn the "definition" of a topic if given sufficient training examples. Before using it, you must manually apply categories to a representative selection of documents for the Category Assistant to use as training examples. The Category Assistant compares documents assigned to one category with documents from other categories to identify the most characteristic features (words). Ultimately, the definition of a category is the list of words that best distinguish documents in one category from documents in other categories.
When SharePoint Portal Server updates the index, the Category Assistant compares the category definition to the list of words contained in each new document encountered. More distinguishing words, such as those in the document's title, are given greater weight in the category definitions. The comparison of category definition to document yields a number that represents the confidence with which the Category Assistant would place the document in the given category. SharePoint Portal Server tags the document with the category only if this confidence number is above the precision level set by the coordinator. SharePoint Portal Server can and often does automatically categorize a single document into multiple categories.
SharePoint Portal Server associates documents with categories when it updates the index. For this reason, there may be a delay before you see a document appear in the assigned category. The length of the delay depends on the index method you use and the amount of content that is included in the index.
The Category Assistant categorizes documents by stamping them with metadata. Specifically, there is a hidden property on the base document profile called Autocategories (urn:attributes:autocategories). The Category Assistant populates this property with the categories that best describe the document. This property is different from the Categories property, which users update manually. There are two reasons for this difference:
When you enable the Category Assistant, SharePoint Portal Server queries both properties to create category views in Web folders and the dashboard site. When you disable the Category Assistant, SharePoint Portal Server eliminates the query for Autocategories, leaving only the query for the Categories property. If the Category Assistant is not functioning as the coordinator expects, this makes it easy to turn it off and eliminate all automatically categorized documents from category views.
You can access the Category Assistant from the Properties page of the top-level category folder. SharePoint Portal Server enables the feature by default but the Category Assistant does not perform any categorization until you train it.
Consider the following points before training the Category Assistant:
Figure 17.3 Category Assistant property page
Training the Category Assistant is the most important step in categorizing documents automatically. The Category Assistant needs training examples for each category. Without good training examples, the accuracy of the Category Assistant is limited. It is recommended that you use a minimum of 10 documents per category to train the Category Assistant successfully.
Ideal training documents are
Good training examples for each category improve the accuracy of the Category Assistant. The more training examples you provide, the more precise the Category Assistant can be.
You can assign the task of training the Category Assistant to one person or several. Two training models are:
Note that SharePoint Portal Server treats any document that you manually categorize as a training example. Therefore, if contributors check in their documents and categorize them on a day-to-day basis, they are implicitly training the Category Assistant. The benefits of this design is that far more documents will be treated as training examples and the coordinator need not worry about managing a special set of training documents.
At times, you may want to override automatically chosen categories on individual documents. To support this, a property (urn:content-classes:item::issuggestedcategoryused) indicates whether the automatically selected categories should be included in category views or not. If it is set to TRUE, then the document will appear in category listings in the Web folder view and on the dashboard site. The property is set by selecting the Display document in suggested categories check box on the Search and Categories tab on the Properties page of a document.
If the Category Assistant does not select the appropriate categories for a document, a coordinator can override the Category Assistant by using the following methods:
It is difficult to return to an automatic categorization system after you override the Category Assistant for more than a few documents. There is no automated way to do this. If you override the Category Assistant, and then want to undo that action, you must manually update the Search and Categories tab on the Properties page of the document. Your changes will take effect at the next index update.