Managing Critical Workspace Components

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 9.  Managing the Workspace


Managing Critical Workspace Components

While we have already looked at many discrete workspace components with an eye toward security and holistic management, management of some of the really critical SharePoint Portal Server components or resources like the following are detailed below:

  • Managing and tuning indexes

  • Index workspaces

  • User access

  • Taxonomies

  • Categories and the Category Assistant

Managing and Tuning Indexes

Microsoft SharePoint Portal Server can create indexes for published content stored on Web sites and pages, file systems, Lotus Notes databases, Microsoft Exchange Server 5.5/2000 servers, and other SharePoint Portal Servers. Managing indexes represents a special challenge to the Workspace Coordinator. Every workspace includes indexes that allow users to search for documents available from that workspace. These documents can be located in a different workspace on the same server, on another server on your intranet, or on the Internet.

Before we proceed, it should again be noted that SharePoint Portal Server automatically creates an index for a workspace during the installation process. And when documents are added to the workspace, or existing documents are modified, the portal modifies the index to include the changes. Also, when new content is added or their settings are changed, the content source must be crawled to update the index. An index may be manually updated as well by using SharePoint Portal Server Administration, or by using Web folders. Finally, you can schedule SharePoint Portal Server to update indexes automatically.

Creating and updating an index can incur heavy processor and disk utilization. The indexing process can also consume quite a bit of time, depending upon the amount of text in the content being crawled. These factors are impacted by the methods employed by SharePoint Portal Server to break down a document and add its contents to an index, and consist of the following:

  • Filtering the document A filter removes formatting and extracts the text of the document and any properties defined in the file itself. SharePoint Portal Server has a limit of 16 megabytes (MB) of text data (graphics data is not included in this 16MB limit) that it filters from a single document. Should this limit be exceeded, SharePoint Portal Server enters a warning in the gatherer log and the document is considered successfully indexed. Note that filters are available for text files, Microsoft Office documents, HTML files, and Tagged Image File Format (TIFF) files.

  • Word-breaking the document A word-breaker is a component that determines where the word boundaries are in the stream of characters in the query or in the document being crawled. When SharePoint Portal Server crawls documents that are in multiple languages, the customized word-breaker for each language enables the resulting terms to be more accurate for that language. If no word-breaker is available for a particular language, the neutral word-breaker is used. Words are broken at neutral characters such as spaces and punctuation marks.

To drill down into more details on word breaking, see "Word Breaking," p. 114.

As for tuning indexes, the Workspace Coordinator should specify the Query Time-Out value, which represents the wait period for querying an index. In this way, the risks of long-running, complex, or poorly running queries is minimized, freeing valuable computing resources for other users and processes. Of course, if this value is set too low, queries will actually time-out before they complete.

TIP

For fastest average response times, a good rule of thumb for initially setting query time-outs is 1,000 milliseconds, or one seconduse this as your starting point to create a baseline. By default, this time-out is 10,000 milliseconds .


To specify the Query Time-Out, perform the following:

  1. In the console tree, select the appropriate server which houses the workspace for which you want to specify the query time-out.

  2. Expand the server, and select the workspace.

  3. From the Action menu, click Properties. Or, right-click the workspace name and click Properties.

  4. Select the Index tab.

  5. Type in the number of milliseconds representing the period of time a query may execute before timing out.

  6. Click Apply.

If the Query Time-Out value appears to be too low, it may be increased by following the previous process, to the point where the specific workspace's typical "large" query completes successfully. One nice thing about search results in SharePoint Portal Server is that partial results may be presented. In this case, a message box is displayed indicating that not all results are being displayed.

The Index Workspace

For workspaces with multiple indexes or heavy index activity, a dedicated index workspace, or index dedicated to managing content sources by building and updating indexes, may be created. This workspace usually resides on a dedicated server, as well, one dedicated to indexing. Special considerations include

  • Unlike other workspaces, there exists no dashboard site for index workspaces.

  • Propagation allows for distributing index resources. Indexes can only be propagated from index workspaces and only to a single destination workspace on another server, typically one dedicated to searching.

  • A destination workspace can only accept indexes from up to four index workspaces.

As mentioned previously, creating a dedicated index workspace is simply a matter of running through the New Workspace Wizard, clicking the Advanced button, and selecting the option to Configure as an index workspace, as displayed in Figure 9.9:

Figure 9.9. Creating a dedicated index workspace.

graphics/09fig09.jpg

As an aid to workspace management, Microsoft recommends that the Workspace Coordinator maintain a list of the index workspace names , the server(s) on which they are stored, and the server(s) and workspace(s) to which they are propagated.

Using Server Administration to Manage Indexes

Much of the work of managing SharePoint Portal Server indexes involves using Server Administration or Web folders, and includes performing tasks like

  • Pausing, resuming, or stopping an index from updating

  • Starting a full, incremental, or adaptive update of an index

  • Starting or stopping the propagating of an index

  • Resetting an index

  • Viewing index properties, such as the period of time to wait when querying an index

To read more about creating and managing indexes and index workspaces, see "Infrastructure and Security Considerations," p. 512, and "Setting Up a Dedicated Indexing Server," p. 513.

Managing User Access to a Workspace

Before a user can do more than simply read selected content in a workspace, the user must be assigned a role. This gives the user permission to perform specific tasks in the workspace. Like many other management tasks, this is accomplished via Server Administration, and applies to the highest level of the workspace (called the workspace node ) and to any folders that inherit security from the node. However, to provide access to a subset of folders in a workspace, Web folders must be used.

When a user is added to a workspace, SharePoint Portal Server automatically assigns the role of Reader to that user. In this way, the new user is instantly granted read permissions on all published documents in the workspace. To change or augment these permissions, a new role must be assigned to the user.

Leveraging Taxonomies to Categorize

Another significant challenge for Workspace Coordinators is how to effectively and efficiently categorize the vast array of documentsinternal and external contentfound in many workspaces. Doing so involves organizing folders into a hierarchy or structure that makes sense to the user community. This organization of folders is also termed a taxonomy a good taxonomy will help to produce a workspace that is both functional and familiar to end users.

The creation of an effective taxonomy is especially challenging due to the randomness of much of the data, though. There is often no inherent structure to the data residing in the portal, for instance, making organization difficult. So, one of the Workspace Coordinator's most important management tasks involves working with the business units to co-develop a folder and category hierarchy for organizing portal content.

In this section, some of the tools available in Microsoft SharePoint Portal Server for organizing informationfor delegated coordination, collaboration, and browsingare discussed. We also briefly cover a method of importing an existing folder hierarchy into a workspace.

Microsoft believes that one of the best ways to get started building taxonomies is to simply jump in and learn as you go, working in a pilot or technical sandbox environment prior to the future development and production environments. Certainly, experimentation goes a long way toward fully understanding the impact that different taxonomy approaches have on workspace configuration and ultimate folder layout. Regardless of how you start, though, it is nearly guaranteed that the first cut at building a taxonomy will quickly evolve into something new. Some of the reasons for this include

  • The folder creation rate never ceases, though it may be slower than in traditional file systems.

  • Business needs may drive adding additional document profiles.

  • Properties add dictionary values.

  • Actually using a taxonomy forces expansion of the number of categories.

One method that may be employed to create taxonomies for a workspace follows , and came out of Microsoft's own project management group tasked with deploying SharePoint Portal Server. A complex folder hierarchy already existed, but it was quickly determined that simply dragging and dropping this existing folder hierarchy into the workspace would not achieve goals around optimization and organization of the portal's data.

The Goal of Developing a Taxonomy

One of the initial goals in building a taxonomy is to adapt the current folder structure into a set of folders, document profiles, properties, and more. By doing so, we seek to not only capture the original folder hierarchy details, but also add to it "richer" data. The result should include

  • Fewer folders than the original hierarchy.

  • Creating a set of workspace document profiles, properties, and dictionaries.

  • Illuminating and capturing metadata as explicit document properties.

It will quickly become apparent that building taxonomies requires "knowing" the contentengaging the business units and their functional experts is therefore key to successfully developing a useful taxonomy. See the SharePoint Portal Server Software Development Kit (SDK), available at http://msdn.microsoft.com/, for more information in regard to creating scripts to build a workspace taxonomy.

Managing Categories and the Category Assistant

Once the taxonomycategory hierarchy or structureis established, the next step is to actually categorize the content in the workspace. This may be accomplished by manually assigning categories (by editing the document properties), or by automatically assigning categories using the Category Assistant. SharePoint Portal Server possesses the ability to automatically categorize published and crawled documents in the workspace. This seemingly complex task is actually pretty straightforwardSharePoint's Category Assistant assigns categories from your category structure to existing documents. How does this work? Based on an adaptive algorithm, the Category Assistant actually learns how to organize by being "taught" to do so via examples. This requires manually applying categories to a representative collection of documents for the Category Assistant to use as training examples. The Category Assistant compares the representative documents assigned to one category with documents from other categories, and in this way identifies the most characteristic words. Eventually, each category is distinguished from others in terms of the list of words that best describes its content. Not surprisingly, the more distinguishing words like those found in a document's title are given greater weight in the category definitions.

graphics/troubleshoot_icon.gif

Since SharePoint Portal Server associates documents with categories when it updates the index, there may be a delay before a document appears in an assigned category. The length of the delay depends on the index method utilized, and the amount of content that is included in the index. See "Index Method Causes Differing Delays" in the "Troubleshooting" section at the end of the chapter.

Accessing and Configuring the Category Assistant

The Category Assistant may be accessed from the Properties page of the top-level category folder. SharePoint Portal Server enables the feature by default, but the Category Assistant does not perform any categorization until it is trained. A Web Part called Category Management also exists, allowing a Workspace Coordinator to create, edit, and delete categories from the dashboard without the benefit of SharePoint Portal Server's client components. To learn more about this Web Part, refer to Microsoft's SharePoint Portal Server Resource Kit.

Regardless of how the Category Assistant is invoked, it makes a lot of sense from a management perspective to spend the time necessary to train well the Category Assistant. Adhere to the following management best practices for best results:

  • Provide as many examples as possible. These examples should encompass as many facets of the category as possible.

  • Consider applying multiple categories to a document. This is done as a matter of course, and should not be disabled.

  • If working manually, assign a document to multiple/any categories that a user might search or access.

The Importance of Training the Category Assistant

Training the Category Assistant is the most important step in categorizing documents automaticallyit requires training examples for each category. Without good training examples, the accuracy of the Category Assistant is limited, as already discussed. The following are management rules of thumb regarding training the Category Assistant:

  • It is recommended that you use a minimum of 10 documents per category for training purposes.

  • Ideal training documents are related to the same category topic.

  • Training documents should also be mainly text-based. Word processing documents are excellent training examples. Documents such as spreadsheets do not offer as much text for the Category Assistant to use for categorization.

  • Good training documents are also lengthy enough to include enough text for the Category Assistant to analyze the documents and identify the keywords that define a category.

To read more about actually configuring and training the Category Assistant, see "Auto Categorizing Documents," p. 308.

The task of managing or actually performing the training of the Category Assistant usually falls into the hands of the Workspace Coordinator. A couple training options exist, however. In the first, Authors are allowed to categorize documents. This distributes training responsibilities quite nicely , and allows for a large pool of documents. In the second option, training responsibilities may instead be assigned to an individual.

Documents may be manually categorized as well, by editing the Search and Categories tab on the Properties page of the document. By using Windows Explorer, an Author or Coordinator can select one or more values from the checklist of workspace categories. If the document is stored in an enhanced folder, it must be checked out prior to changing the document's category assignments. For a small number of documents, you can use this method of categorization exclusively.

A document may also be categorized by using document profiles. If the Coordinator has configured the document profile to display categories, the Author will be able to select categories when they check in the document. Adding the Categories property to document profiles provides a way to enforce category assignment when Authors check in a document. It also nicely distributes the task of document categorization among multiple Authors.

If a lot of content is crawled outside of the workspace, it may be preferable to apply categories automatically. Finally, should it become necessary, note that the Category Assistant may be disabled by clearing the Enable Category Assistant check box.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net