Excluding Content

                 

 
Special Edition Using Microsoft SharePoint Portal Server
By Robert Ferguson

Table of Contents
Chapter 19.  Managing Indexing


When you initially add a content source, you can specify whether you want to include just this page or the whole site in the index. The dialog also informs you that you can use the Configuration tab of the Properties dialog for a finer granularity. These options have been discussed in Chapter 18. SharePoint Portal Server provides another mechanism called Site Path rules. This allows independent content sources to fine-tune the access to Web sites.

TIP

When your search results do not show some documents that you would expect to get indexed, check if some Site Path rules apply. The options on an individual content source allow only for some basic settings, whereas Site Path rules provide a much richer set of options that apply for all content sources.


Excluding Some Areas of a Web Site

The options that you can specify with a particular content source will always contain a whole site, whereas you may wish to index only some portions of it. For example, how could one set up a content source which indexes just the SharePoint-specific information on the general Microsoft Web site? The URL of the content source would be http://www.microsoft.com/sharepoint/portalserver.asp, but due to the banner's All Products option you will start indexing the whole www.microsoft.com site.

The answer can be found in Site Path rules (see Figure 19.5). Not only do they provide the option to define a dedicated account for a particular content source, they also allow the exclusion of specific URLs.

Figure 19.5. This figure shows the Site Path rules for the example of Microsoft's SharePoint Portal Server Web Site discussed previously.

graphics/19fig05.jpg

Site Path rules are evaluated in order; this means that you need to enable the specific path first and then disable the more generic path. If you don't see the content being indexed as you expected, enable logging of excluded URLs and check the gatherer logs as discussed later in this chapter. Entries such as "URL is excluded because of restrictions defined in site path rules" may indicate that, for example, the ordering is incorrect.

Enabling Complex Links

Active Server Pages that, for example, reveal content of a SQL database can use parameters to show specific content. Such a URL may look like http://myserver/Northwind/orders.asp?OrderID=12029. In this example, OrderID is the parameter which is appended to the actual URL with a question mark. It is a generic principle to add parameters with a question mark, which is also known as query string. SharePoint Portal Server will not index these parameterized Web pages by default. Using Site Path rules, it is possible to enable support for this type of complex links . To do so, do the following:

  1. Log in as Coordinator.

  2. Open the Management / Content Sources Web Folder of your workspace.

  3. Click Additional Settings.

  4. Select the Rules tab.

  5. Click Site Path to open the Site Paths dialog.

  6. Click New.

  7. Enter the path that resembles the content source. To apply the site path to all items of that content source, the path should end with the * wildcard. If your content source pointed to a document, substitute the document name with *. In our example you would use http://myserver/Northwind/*.

  8. Select Include this path; the Options button will be enabled.

  9. Click Options to open the Options dialog.

  10. Check Enable Complex Links.

  11. Click OK to leave the options dialog.

  12. Click OK to leave the Site Paths dialog.


                 
Top


Special Edition Using Microsoft SharePoint Portal Server
Special Edition Using Microsoft SharePoint Portal Server
ISBN: 0789725703
EAN: 2147483647
Year: 2002
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net