Indexing Engine

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert  Ferguson

Table of Contents
Chapter  2.   SharePoint Portal Server Features and Capabilities


Data is easily managed and organized, as SharePoint Portal Server allows content to be indexed from a variety of information sources. Today, information exists in a variety of formats and languages. In addition, information often resides in multiple locations, which introduces challenges with traditional intranet tools as end users have to follow links and jump back and forth from site to site.

The indexing engine shipped with SharePoint Portal Server is an extremely mature technology. This is a technology that evolved from other existing Microsoft products such as MSSQL, Index Server, and Commerce Server, and includes the following benefits:

  • Protection against invalid and incorrectly formatted documents

  • Protection against network problems and faulty Web servers

  • Multi-thread data source access and document crawling

SharePoint Portal Server was designed to withstand crawling documents that are incorrectly formatted or even potentially corrupt. In addition, SharePoint Portal Server will not halt or crash due to interruption while accessing content due to local/wide area network issues or problematic Web servers. Since SharePoint is CPU bound, not I/O bound, the scalability and performance of SharePoint Portal Server allow efficient performance even when network latency is an issue. Furthermore, SharePoint Portal Server allows multiple processors to be added to address CPU utilization. Lastly, when you consider how SharePoint optimizes crawling throughput by indexing many documents at once, the latency is further reduced.

To learn more about the crawling process, see "The Crawling Process," p. 455.

Standard Crawling Technology

SharePoint Portal Server keeps indexes up to date by using existing industry standard crawling technology. This technology crawls and updates, indexing in the background so that users do not have the real-time latency involved with refreshing an index. SharePoint Portal Server updates and indexes as documents match search criteria, which ensures optimal search performance. SharePoint Portal Server includes the required tools to manage indexes for a variety of information sources and data repositories.

The crawling options allow for the following index updates:

  • Scheduled or on-demand ” The crawling options, managed by the Windows Task Scheduler, can be configured to be on a scheduled basis or initiated on-demand.

  • Incremental ” Incremental updates involve data sources being interrogated by SharePoint Portal Server to compare the delta of change since the last crawl activity. This capability allows the rapid incremental index update process.

  • Notification-based ” Servers can be configured to launch a crawl process when a crawling event exists. Windows 2000, Exchange 2000 Public Folder servers, and other SharePoint Portal Servers are all examples of data sources that can be specified to notify a specific SharePoint Portal Server to execute a crawl event. As a result of the notification and crawl event, the index is then updated for optimized use for your end users.

  • Adaptive ” Similar to the incremental index update, adaptive crawling only allows crawling of content changed since the last update. The difference between incremental and adaptive is that adaptive uses complex historical analysis for documents that are likely to have changed.

Indexed File Types

SharePoint Portal Server allows the inclusion of a variety of file types within an index. In addition, an IFilter can be used to include other file types. An IFilter is a component exposed by the Platform SDK (SDK) and is used by the Index Server.

For information on IFilters, see "Configuring IFilters," p. 213.

SharePoint Portal Server includes indexing the following file types out of the box:

  • Office 95, Office 97, and Office 2000

  • HTML files

  • Text files

  • TIFF files

  • Adobe Acrobat files

  • Corel documents

  • Other custom formats

The index includes multi-file HTML-format documents as well as document discussions for Office 2000 “specific documents. For HTML files, the index includes the files and meta tags. Text files can be indexed. For TIFF files, SharePoint Portal Server includes an optical character recognition (OCR) package that incorporates document images within an index based on the words the images contain.

NOTE

For Adobe Acrobat files and Corel documents, it is required that an IFilter be installed. Both of these filters need to be obtained from their respective companies. You can access the manufacturer's Web site, such as http://www.adobe.com, and search for "ifilter".


For information on how to register TIFF files, see "Advanced Topics," p. 212.

Additional custom file formats are supported through the Windows 2000 IFilter interface. Third parties and customers will continue to develop and publicly share custom IFilters.

Content Source Types

We know that data resides in multiple locations and across multiple types of data sources. This section provides details on the types of data sources that can be accessed with SharePoint Portal Server. When we use the term content source, we refer to a location where data resides within or outside your environment. Figure 2.4 displays the content source types. The benefit of using SharePoint Portal Server is that administrators can leave information in its current location, while creating a robust search capability to provide a single search portal to search multiple data repositories. SharePoint Portal Server allows access to the following content sources:

  • Microsoft Windows NT 4.x/Windows 2000

  • Microsoft Exchange 5.x/2000 Public Folders

  • Lotus Notes databases

  • Intranet/Internet sites

Figure 2.4. Graphic showing available content sources within SharePoint Portal Server.

graphics/02fig04.jpg

Indexing maintains existing security, so if an end user does not have the appropriate access through the Windows file system or Exchange public folder, they will not have the access from within the workspace. In other words, depending on the end user 's rights, a series of documents may be returned in their search, but they may not be able to access the documents.

It is important to note that the index includes attributes, attachments, and embedded documents, and SharePoint Portal Server allows both intranet and Internet servers to be accessed and indexed. When Lotus Notes databases are indexed, Notes database security is also maintained .

Adjusting Weight

Within a document, certain properties contain the most relevant information. Examples of these types of properties are Title, Subject, and so on. With SharePoint Portal Server, Coordinators can adjust the weight of certain properties so that when searches are made according to a criteria set, a relevance match ensures that documents are returned during a search.

Preserving Existing Security

Even though several documents may be returned within a given search, SharePoint Portal Server ensures that only documents which an individual user has rights to open can be opened. This is a key security feature of SharePoint as it leverages the existing NT file system security model.

Integrating Third-Party Content

SharePoint Portal Server search engine can be integrated with third-party content, performed as follows :

  • IFilter ” Using an IFilter allows end users and third-party providers to integrate with custom document formats.

  • Protocol Handler ” Using the protocol handler allows end users and third-party solution providers to develop custom protocol handlers to integrate additional content sources with the existing search engine. An example of an integrated third-party solution is Lotus Notes.

  • SQL-based query language ” Custom search and dashboard site applications are easier to develop due to SharePoint Portal Server being based on extensions to industry-standard SQL. This also minimizes your required investment and training.

Dashboard Site Scalability

Dashboards can be developed in a variety of sizes. SharePoint Portal Server will scale from small workgroup-based solutions to medium- sized solutions, from a business unit up to enterprise dashboard solutions for an entire corporation.

For more information on Dashboard Sites, see "Dashboards and Web Parts," p. 344.

Category Integration

Content within SharePoint Portal Server utilizes the concept of categories to enable users to navigate through a classification system to find desired information. The categories used are completely up to the Coordinator, and organizational taxonomies can be planned and integrated according to the business requirements. This classification system is closely coupled with the SharePoint Portal Server document management system to allow an efficient management process. Within the dashboard, end users can navigate through and search for categories.

When categories are established, it is a good idea to have a description, an owner, a picture, a set of best bets, and the associated documents. Managed through the Windows Explorer, categories are presented through the dashboard site.

It is important to note the importance of getting the design of category taxonomy correct, as moving subscriptions around is currently not an easy and efficient process.

For information on categories, see "Categories ”A Different View on Information," p. 108.


Subscriptions

With SharePoint Portal Server, users can now utilize a key feature called subscriptions to subscribe to search criteria, and can request to be notified when changes occur or when specific information becomes available. This feature within SharePoint Portal Server expands on the current Office 2000 Server Extensions subscription feature, and the following additional subscription types are enabled:

  • Search-Based Subscriptions ” From the dashboard site, users can subscribe to interest-based searches. For example, a user can search for all documents of type "Services" authored by "Robert Ferguson". If the search does not return the desired results, the end user can then set up a subscription to be notified if these criteria become available in the future. See Figures 2.5 and 2.6.

    Figure 2.5. In this example, we searched for "Services", and only for documents created by the author "Robert Ferguson".

    graphics/02fig05.jpg

    Figure 2.6. An end user could potentially click on the search summary to set up a subscription to the search, and request intervals for how often they want to be notified when new information is posted related to their search.

    graphics/02fig06.jpg

  • Documents and Directory-Based Subscriptions ” SharePoint Portal Server allows users to subscribe to changes that may occur for documents and folders. Notifications can be provided through a personalized Web Part or via Simple Mail Transport Protocol (SMTP) email. Furthermore, end users can choose from notification intervals such as when a change occurs, daily, or weekly.

  • Category-Based Subscriptions ” The last subscription option allows users to subscribe to changes for document management categories. For example, a reader could navigate through a list of categories from the Categories menu option within the workspace. The user could then click on Subscribe to this category.

NOTE

Once the subscription is enabled, the end user will be notified when content is added to the category. The notification will occur through email or personalized dashboard site, according to the notification route and interval that was specified during the subscription.


A default Web Part provides a counter for the pending notifications.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net