Microsoft SharePoint Portal Server

SharePoint Portal Server is the flexible portal solution with which you can find, share, and publish information easily. With SharePoint Portal Server, you can use existing information effectively and capture information in new ways that are appropriate for your business. In addition, you can rapidly deploy a prepackaged dashboard site and easily use Web Parts technology to customize a Web-based view of your organization.

SharePoint Portal Server targets dashboard site solutions, starting with the team portal and ending at the enterprise portal.

SharePoint Portal Server presents the most current and the richest set of search and information discovery features. Figure 5.1 illustrates components of the SharePoint Portal Server Search architecture.

Figure 5.1.  SharePoint Portal Server content crawling and search architecture

The following list describes the components of the SharePoint Portal Server Search architecture.

  • Search Engine. Component of MSSearch that runs queries written in the SQL full-text extension syntax against the full-text index.
  • Index Engine. Component of MSSearch that processes chunks of text and properties filtered from content sources, and determines which properties are written to the full-text index.
  • Gatherer. Component of MSSearch that manages the content crawling process and that has rules that determine what content is crawled.
  • Word breakers. Components shared by the Search and Index engines that break up compound words and phrases.
  • Stemmers. Components shared by the Search and Index engines that generate inflected forms of a word.
  • Filter Daemon. Component that handles requests from the Gatherer. Uses protocol handlers to access content sources, and IFilters to filter files. Provides the Gatherer with a stream of data containing filtered chunks and properties.
  • Protocol Handlers. Open content sources in their native protocol and expose documents and other items to be filtered.
  • IFilters. Open documents and other content source items in their native format and filter into chunks of text and properties.
  • Content sources. Collection of data MSSearch must crawl, and specific rules for crawling items in that content source. Items in content sources are identified by URLs. The protocol portion of the URL is what distinguishes different types of content sources.
  • Data Access. SharePoint Portal Server uses protocol handlers and the Gatherer to crawl and provide search results over data from diverse content sources. Without modification, SharePoint Portal Server can crawl documents from file systems, Web sites, Exchange 2000 Server and Exchange Server 5.5 computers, Lotus Notes servers, and other SharePoint Portal Server workspaces.

Although it does not provide direct access to OLE DB, Open Database Connectivity (ODBC), or other relational data access standards, SharePoint Portal Server can crawl information from databases by using HTTP. To do this, you must create an Active Server Pages (ASP) page that renders information from each row in the database.

The Microsoft SharePoint Portal Server SDK describes the protocol handler interface. The protocol handler interface enables developers to write a protocol handler for document repositories with other, proprietary, data access methods, such as document management systems or archiving solutions. For more information about this interface, see Appendix B. The Microsoft SharePoint Portal Server Resource Kit CD-ROM includes protocol handlers that you can use to crawl File Transfer Protocol (FTP) sites and SharePoint Team Services sites. You can access these protocol handlers in the \Tools directory of the CD. For a complete listing of tools and Web Parts available on the CD, see Appendix A, Tools, Samples, eBooks, and More.

  • Filters. SharePoint Portal Server includes filters for Microsoft Office documents, HTML files, Tagged Image File Format (TIFF) files, and text files. The TIFF filter enables SharePoint Portal Server to crawl the textual content of saved fax data based on Optical Character Recognition (OCR) technology. When filtering messages from Exchange public folders, SharePoint Portal Server uses the Multipurpose Internet Mail Extensions (MIME) filter that is included with Windows 2000. SharePoint Portal Server also supports third-party and custom file types, such as the Adobe PDF filter. For more information about the Adobe PDF filter, see Appendix B.
  • Ranking. SharePoint Portal Server offers an advanced probabilistic ranking algorithm, which is based on achievements in information retrieval accomplished by Microsoft Research. This algorithm guarantees that SharePoint Portal Server returns the documents that are most relevant to a user's query at the top of the list of search results, providing increased user efficiency and satisfaction.

    Stephen Robertson, Microsoft researcher, City University professor, and winner of the prestigious Association for Computing Machinery Special Interest Group on Information Retrieval (ACM SIGIR) 2000 Salton Award, developed the formula for ranking. The ranking formula adopted and used by Microsoft full-text search is a direct result of this research. In computing the likely relevance of a document, the formula uses the following factors: the length of the document, the frequency of the query term in the entire collection of documents the number of documents containing the query term, and the number of documents in the entire collection of documents.

  • Best Bets. This feature enables users with appropriate permissions to tag individual documents as most appropriate for specific queries or categories. Even in the most advanced probabilistic ranking environment, certain documents lack the textual information to be prominent in search results for particular terms. The Best Bets feature addresses this problem most effectively, either by advancing the specially tagged documents to the top of the results list or by displaying them prominently when browsing categories. The default query included with SharePoint Portal Server also nominates Best Bet documents when the rank of the document is very high. For more information about the default query, see Chapter 24, Analyzing the Default Query for the Dashboard Site.
  • Automatic Categorization. In addition to simple search, SharePoint Portal Server provides automatic categorization. This feature enables the user to define a category hierarchy and then use a sample set of documents within the hierarchy as a training sample. After training, SharePoint Portal Server automatically tags documents stored on the server and crawled documents. After they are tagged, these documents appear in the category hierarchy.
  • Schema Support. SharePoint Portal server provides simplified schema management facilities that are compatible with Office by using promotion and demotion. Users define document profiles and associated properties. During promotion, SharePoint Portal Server copies property values in the Office document to the properties of a document profile. During demotion, SharePoint Portal Server copies property values found in a document profile to the Office document. SharePoint Portal Server tightly integrates full-text search with that schema. Advanced search uses properties and document profiles.
  • Extensibility and Programmability. The SharePoint Portal Server dashboard site uses Microsoft Digital Dashboard technology. Microsoft Digital Dashboard technology enables easy integration of business applications and custom content with the built-in search features of SharePoint Portal Server. SharePoint Portal Server provides query submission and search results as Web Parts, which can easily coexist on the dashboard site with custom Web Parts. However, the Web Parts for query submission and search results rely on each other for functionality and therefore must reside on a SharePoint Portal Server computer. You can manipulate search by using Microsoft ActiveX® Data Objects (ADO), OLE DB, or the Web-based Distributed Authoring and Versioning (WebDAV) protocol. SharePoint Portal Server does not provide automation interfaces for management of its search, document management, or dashboard site features. For more information about developing customized search solutions for SharePoint Portal Server, see Appendix B.
  • Query Languages. SharePoint Portal Server uses SQL full-text extensions. Queries are submitted using Distributed Authoring and Versioning Searching and Locating (DASL) requests, part of WebDAV, also called HTTPDAV.
  • Subscriptions. The SharePoint Portal Server subscriptions feature enables users to subscribe to changes in documents, folders, categories, and search results. SharePoint Portal Server maintains subscriptions as persistent queries. SharePoint Portal Server sends notifications to the subscriber whenever a change occurs. SharePoint Portal Server implements subscriptions by using Persistent Query Service (PQS) rules. PQS is a reverse-query processor. It evaluates a large set of queries against a single document to determine which queries match the document. This allows SharePoint Portal Server to identify matching subscriptions as each new document arrives in the document store. Subscriptions provide this "push" model to match the "pull" model of full-text search.
  • Adaptive Crawling. Site Server 3 introduced incremental crawling, which uses time stamp comparisons to include only documents that have changed since the previous update of the index. Incremental updates reduce the amount of time involved in repeated crawls. However, incremental updates do not eliminate the need to inspect the time stamp of each document previously crawled each time a crawl occurs. Adaptive crawling reduces the time required for crawling even further. During crawls, the algorithm for adaptive crawling compiles statistics about the rate of change for each document. In subsequent adaptive crawls, the algorithm targets only documents likely to have changed.

SharePoint Portal Server does not replace all of the functionality of Site Server, but the search technology used in SharePoint Portal Server is more recent than the search technology used in Site Server. In addition, SharePoint Portal Server uses an advanced ranking algorithm. You can use the advanced features of the algorithm to conduct a search from the dashboard site. These advanced features include Best Bets, categories, and Office schema integration.

When creating indexes, SharePoint Portal Server offers significantly better performance than Site Server 3 by providing a multi-threaded index engine. The introduction of adaptive crawling also reduces the amount of time it takes to perform incremental crawling when updating indexes.



Microsoft Sharepoint Portal Server 2001 Resource Kit
Microsoft SharePoint(TM) Portal Server 2001 Resource Kit (Examples & Explanations Series)
ISBN: 0735615624
EAN: 2147483647
Year: 2001
Pages: 231

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net