CategoriesA Different View on Information

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 5.  Overview of Indexing and Searching Content


Users typically will search for documents if they know what they are looking for, but do not know where to find it. But there is also the case where users prefer to browse for information, or where the designer of the portal wants to direct users to the information they seek through an organized hierarchy of topics. For this, a mechanism is needed to structure the information in a different, intuitive manner. An important aspect here is that these users are typically not the authors of the information otherwise , they would know where the information could be found in the first place. Rather, these users are typically readers.

This is where classification of information into categories can become very important, since categories allow consumers of content to browse for information rather than search. Categories are typically represented as a tree with main categories and subcategories . Setting up this tree requires knowledge of the type of content that is to be categorized, as well as knowledge of the target audience, the readers of information. This kind of knowledge is not typical for an administrator with a strong IT background, but rather for a librarian who understands both the content and the principles of categorization. This is a profession that existed well before any computerany organization found in a library is built around the same principles of categorization.

SharePoint Portal Server provides support for such a category tree (see Figure 5.7), which can be browsed by the end user through the categories dashboard or through dedicated Web Folders.

Figure 5.7. The Categories dashboard shows the category tree of an IT Research and Development portal.

graphics/05fig07.jpg

Defining a Category Tree

The setup of the category tree is done by the Coordinator of the workspace, not the administrator. To create categories, the

Coordinator needs to do the following:

  1. Open the Categories Web Folder of the workspace.

  2. Right-click New and select Category.

  3. It will now take a little while until a folder is created which will be named "New Category."

  4. Rename that folder to reflect the desired category.

To create subcategories, navigate first to the correct parent folder before the new category is created.

CAUTION

Out of the box, SharePoint Portal Server installs a sample category tree with terms such as "Category 1." As this most certainly does not meet your categorization scheme, you may wish to delete these categories before you add any content, to avoid confusion.


The preceding steps will create a new category, but you should consider specifying a little more information to enhance the user experience. This can be done using the properties page of the category folder, as seen in Figure 5.8.

Figure 5.8. Fill in all details in the Categories Properties dialog to see a result as in Figure 5.9.

graphics/05fig08.jpg

Figure 5.9. The category dashboard with the details as specified in Figure 5.8.

graphics/05fig09.jpg

TIP

If you specify an image, make sure that it does not exceed 32 x 32 pixels. Otherwise it will overlap with the description on the Categories dashboard.


The description field will be shown as a tool tip in the portal whenever you hover over the category. But more importantly, if a category matches a specific query, the description will be included in the results pane. Users that are not familiar with the category can see in a quick glance if this is relevant to the category tree. Associating an image with a category will cause a user browsing through the category tree to see these images. Also, the ability to specify keywords for a category makes it possible for users to find a differently named category should a keyword match occur. Finally, entering a contact with an email address allows the reader to comment, for example, on the usage or value of a particular category.

TIP

Start with a simple category tree. Keep the end user in mind when you design your categories; deeply nested categories, for example, are generally not appreciated. If you cannot resist implementing a complex tree, keep in mind that there is a limit of about 550 categories that can be defined for a given workspace.


Categories are stored in SharePoint Portal Server in the order that they are created. This may be confusing to the end user when he or she needs to select a category from this unsorted tree. Fortunately, the SharePoint Portal Server Resource Kit (http://www.microsoft.com/sharepoint/techinfo/reskit/category_sort.asp) includes a tool to sort categories alphabetically . One way of automating a solution to this problem includes scheduling a task to resort all categories on a daily or other regular basis.

Assigning Categories

Once a category tree is defined, you can assign an individual document to zero, one, or more categories. This is done by the Author through the Web Folder's Properties dialog of the document. To assign a category to a document, do the following:

  1. Ensure that the document is checked out. You can see if a document is checked out by looking for the little icon representing a pen. If the document is not checked out, right-click on the document and select the Check Out option.

  2. Right-click on the document and select the Properties option.

  3. Select the Search and Categories tab (see Figure 5.11).

    Figure 5.11. This figure shows the components that are involved with an SPS query. It complements the architecture displayed earlier in Figure 5.4.

    graphics/05fig11.gif

  4. Select the appropriate categories from the drop-down box.

Figure 5.10. When you click on the categories drop-down box, the selected and available categories will be shown.

graphics/05fig10.jpg

In the previous screenshot, you also see that SharePoint Portal Server can suggest categories. This comes into play once you have enabled auto categorization, which will be discussed next .

Categories are implemented as multi-valued properties, using the colon as a special separator such that a tree view can be built. In Figure 5.11, for example, you see ":Technologies:Index and Search", where "Index and Search" is a subcategory of "Technologies" as shown in Figure 5.8.

TIP

If you are expecting frequent use of categories, place the category property on the Document Profile. This allows the author to define the categories during check in. This is also the only means by which you can specify categories through the browser.


Auto Categorization

For a useful implementation of categories, you need to be aware of two important aspects. First, as outlined above, the set of categories and subcategories must be defined. Once a categorization tree is defined, the second challenge arisesthe actual categorization of content. Many years ago, this was done manually, by asking someone to read the information before applying the correct categories. This process is obviously easy to automate by simply defining a dedicated category property that gets filled in whenever necessary. But regardless of manual or automated processes, it is a tedious , time-consuming task to categorize a lot of information.

This is where auto categorization comes into play, another feature of SharePoint Portal Server originating from research out of Microsoft. This feature allows you to define a sample set of categorized documents that is used to train the auto categorization engine. After training, SharePoint Portal Server then automatically assigns categories to any documents that get indexed. For documents that are stored within the workspace, the Coordinator can choose to either propose categories or assign them automatically to documents that get published. But probably even more powerful is the ability to automatically assign categories to external documents. The proposal of categories is obviously no choice for external documents, because the publishing process is outside the control of SharePoint Portal Server.

The first step for auto categorization is the definition of a set of training documents, for which categories are assigned manually. Obviously, automatically assigned categories match best if as many examples as possible can be provided, taking into account that documents belong to multiple categories. Training categories will result ultimately in a list of words that best distinguish documents in one category versus documents in other categories. Consequently, the variety or sheer number of words within the document is important. Therefore, you should strive to use longer, primarily textual documentsspreadsheets, for example, typically don't contain much text, and therefore serve as poor training documents.

NOTE

If you want to include external content in the training exercise, you can use Web Links, so that the document itself does not need to get imported to SharePoint Portal Server.


In practice, you will probably need to train iteratively, in particular for external content.

For more information on working with auto categorization, see "Managing Categories," p. 319.

Categories Versus Keywords

Categories and keywords have some similar characteristics. Both are used for the classification of documents. Documents can be simultaneously assigned to several different categories and keywords, too.

One difference, though, is how both are presented to the end user. A user can browse through the category tree, both through the Web Folder interface and through the portal's dedicated categories dashboard. Categories thus allow presenting a folder structure for readers without changing the existing folder structure. Keywords are not shown that explicitly on the user interface. Moreover, they are not organized in a tree structure, even though one easily could define such a structure using conventions similar to the categories.

Categories are solely defined by the Coordinator. To create a new category, the Coordinator must be consulted. The list of keywords can be kept unrestricted, such that any author can add new keywords.

In larger deployment scenarios, categories will not be available on the enterprise dashboard if the content is propagated from dedicated indexing servers. Unlike keywords, categories will not be included in the propagated index. They are thus not available in an environment that almost exclusively consists of readers, an environment that is ideal for categories.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net