Creating and Managing Content Sources and Crawl Settings


Creating content sources is the first administrative task in building an aggregated search and indexing topology. This work is accomplished inside the Shared Services Provider interface.

Content Source Types

In SharePoint Server 2007, content sources can have multiple start addresses. A start address is the URL location at which you wish the crawler to start the crawl process. The crawl settings instruct the crawler as to how deep and wide the crawl process should take place.

The content source types include the following:

  • SharePoint sites

  • Web sites

  • File shares

  • Exchange public folders

  • Business Data Catalog (BDC)

Crawl Settings

The crawl settings are contextualized to the content source type so that the settings that appear are appropriate for the content source type that is selected. For example, if you select the File Share content source type, you do not see the Crawl Everything Under The Hostname For This Start Address crawl setting because that setting doesn't apply to a file share start address.

To create a new content source, open the Home page for your Shared Services Provider (SSP), and click on the Search Settings link, as shown in Figure 12-1.

image from book
Figure 12-1: The Search Settings link in the Shared Services Provider interface.

To create a new content source, click on the Search Settings link and then follow these steps:

  1. Click on the Content Source And Crawl Schedule link. The Manage Content Sources page then appears.

  2. Click on the New Content Source button. The Add Content Source page then appears.

  3. Enter a name for the content source.

  4. Select the content source type.

  5. Enter the start address(es). Note that you can enter multiple start addresses, but each needs to match the content source type selection for the content source to work properly.

  6. Select the crawl settings.

  7. Select the crawl schedule(s).

  8. Select whether you want to crawl the content immediately.

  9. Click OK.

Crawl Rules

Formerly known as site path rules, crawl rules instruct the crawler on additional configurations for crawling a particular content source. Because these rules can be either specific to an individual content source or global to general settings, such as http://*.com, they cannot always be applied at the content source level.

Crawl rules allow you to configure include/exclude rules, specific security contexts for crawling that are different from the default content access account, and the actual path to which you want the rule to apply. Figure 12-2 illustrates the default interface with a common global rule configured. Because many Web sites use complex URLs, you might want to consider implementing a global rule that instructs the crawler to crawl any complex URL that it encounters.

image from book
Figure 12-2: The Crawl Rule interface.

To create a new crawl rule, follow these steps:

  1. In the Shared Services Provider, click on the Search Settings link. This brings you to the Configure Search Settings page.

  2. Click the Crawl Rules link. The Manage Crawl Rules page then appears.

  3. Click the New Crawl Rule button. The Add Crawl Rules page then appears.

  4. Enter a path URL in the Path input box.

  5. Configure the Crawl Rule settings as needed.

  6. Specify if you need a unique security context to support the crawl rule or if you can leave the rule using the default content access account.

  7. Click OK. Your new crawl rule should now appear on the Manage Crawl Rules page.

Managing File Types

From a global perspective, you can instruct the crawler as to what type of file is available for crawling by using the file types feature. If the file type extension doesn't appear on the Manage File Types screen, shown in Figure 12-3, then the crawler will not crawl that file type. To state this another way, the crawler only crawls the file types that appear on the Manage File Types screen.

image from book
Figure 12-3: The Manage File Types screen.

It's easy enough to add a new file type. All you need to do is follow these steps:

  1. From your Shared Services Provider's Home page, click on the Search Settings link. This presents you with the Configure Search Settings page.

  2. Click on the File Types link. The Manage File Types page then appears.

  3. Click the New File Type link. The Add File Type page then appears.

  4. Enter the file's extension in the file extension input box, as seen in Figure 12-4.

  5. Click OK.

image from book
Figure 12-4: Add File Type page showing the file extension input box.

Now, what the interface does not tell you is that entering the file type really accomplishes nothing. You also need to ensure that you have installed an iFilter for that file type on your SharePoint server(s). Because there is no interface to tell you which file types have been installed on your servers, you need to rely on your deployment documentation to inform you whether the correct iFilter for that file type has already been installed. There is no interface within SharePoint Server to install a new iFilter. Instead, rely on the setup instructions from the iFilter's manufacturer to install and use the iFilter.

Crawl Logs

The crawl logs enable users to view the successes and errors that have been experienced in the crawl efforts. By default, the log displays successes, warnings, and errors, as seen in Figure 12-5.

image from book
Figure 12-5: The default crawl log screen showing successes, warnings, and errors from completed crawls of content sources.

To view the Crawl Logs page, follow these steps:

  1. From your Shared Services Provider's Home page, click on the Search Settings link. This presents you with the Configure Search Settings page.

  2. Click the Crawl Log link. The Crawl Log page then appears.

You can view messages related to a specific content source, as shown in Figure 12-6. These messages tell you about the experience of the crawler and allow you to learn rich troubleshooting information for unsuccessful crawl efforts.

image from book
Figure 12-6: The Crawl Log page reveals crawl history messages related to a specific content source.

To view specific messages related to a specific content source, follow these steps:

  1. From the Crawl Log page, click on the link for the content source you wish to investigate. This presents you with a page similar to what is shown in Figure 12-6.

  2. Referring to Figure 12-6, note that you can sort and filter the list based on the following elements. The filter button is not shown in Figure 12-6, but it is present on the right portion of the screen that is not illustrated:

    • Content source

    • Hostname/path

    • Status type

    • Date/time

Metadata Property Mappings

Metadata fields are added to the crawled properties by the Archival plug-in. This plug-in looks at the metadata as it is being crawled and whenever a new metadata type is found, that type is added automatically to the crawled properties list. To access the crawled properties list, follow these steps:

  1. From the Shared Services Provider Home page, click on Search Settings. This presents you with the Configure Search Settings page.

  2. Click on the Metadata Property Mappings link. The Metadata Property Mappings page then appears, as illustrated in Figure 12-7. The mappings column details the metadata property's formal name along with the data type for that property.

  3. From the Metadata Property Mappings page, click on the Crawled Properties link in the left pane. The Crawled Properties View of the Metadata Property Mappings page then appears, as shown in Figure 12-8.

image from book
Figure 12-7: The Metadata Property Mappings page.

image from book
Figure 12-8: The Crawled Properties View of the Metadata Property Mappings page.

From here, you can drill down into the folders, view the metadata that have been added to the Shared Services Provider, and then open the individual metadata property and view its properties.

Within each property, you can configure managed property assignments as well as select the check box to allow that property's values to be indexed. These two configuration options are illustrated in Figure 12-9.

image from book
Figure 12-9: The configuration options for a crawled property.

The value in mapping crawled properties to managed properties is that it groups metadata into usable units. The metadata (crawled properties) are grouped together into a logical, single unit (managed properties). Managed properties can then be used to create search scopes and enable your users to search for metadata values without return false-positives from out-of-scope metadata values. Managed properties can also be included in the Advanced Search Web Part interface for surgical query of specific crawled properties.

Real World 

Grouping crawled properties into managed properties is one of the valuable new features in SharePoint Server 2007. For example, let's say that you have three document types: (a) document type A that lists the Author in the Author metadata field; (b) document type B that lists the Author in the Creator metadata field; and (c) document type C that lists the Author in the Originator metadata field. In this scenario, you have (essentially) the same metadata for three different document types residing in three different metadata fields. When these documents are crawled, each metadata field is entered into the Property Store as a separate Crawled Properties. However, you can group these three crawled properties into a single Managed Property so that you can use them as a single unit when querying for author names across these three different document types.

By default, 127 managed properties are created automatically when the SSP is created. You can create a new managed property by following the following steps:

  1. From the Home page of the Shared Services Provider, click on the Search Settings link. This presents you with the Configure Search Settings page.

  2. Click on the Metadata Property Settings link. The Metadata Property Mappings page then appears.

  3. Click on the New Managed Property link. The New Managed Property page, as illustrated in Figure 12-10, then appears.

  4. Enter a name and description for the new managed property.

  5. Select the type of information this property will represent.

  6. Enter at least one crawled property that will be grouped by this managed property.

  7. Select the check box to allow the property to be used in a search scope.

  8. Click OK.

image from book
Figure 12-10: The New Managed Property page.

To use a managed property within a search scope, follow these steps:

  1. From the Home page of the Shared Services Provider, click on the Search Settings link. This presents you with the Configure Search Settings page.

  2. Click on the View Scopes link. The View Scopes page, as shown in Figure 12-11, then appears.

  3. Click on the New Scope link. The Create Scope page, as illustrated in Figure 12-12, then appears.

  4. For purposes of this exercise, enter a title and then click OK. This action takes you back to the View Scopes page, and your new scope appears on this page, as shown in Figure 12-13.

  5. Click the Add Rules link. The Add Scope Rule page, as shown in Figure 12-14, then appears.

  6. In the Scope Rule Type section, select the Property Query option button. The screen then changes to look like the one shown in Figure 12-14. Notice that you can then select the managed property you wish to build the scope off of, as well as entering the value for that property that will help define the scope. Note that only those properties that are enabled to be used in search scopes appear in the drop-down list.

  7. Click OK. You have now defined a search scope using a managed property.

image from book
Figure 12-11: The View Scopes page.

image from book
Figure 12-12: The Create Scope page.

image from book
Figure 12-13: The View Scopes page with the new scope added.

image from book
Figure 12-14: The Add Scope Rule page with the Property Query option button selected.

To add a managed property to the advanced search Web Part, you need to work with the page that displays the result set for the end user. To do this, follow these steps:

  1. Go to the result page and click Edit Page on the Site Actions menu.

  2. Open the Edit menu for the search Web Part and select Modify Shared Web Part. This opens the Web Part properties pane.

  3. Expand the Miscellaneous section in the properties pane and look for the property called properties. There you will find an XML string that allows you to define which properties will be displayed in the advanced search. Your best option here is to copy the string to Notepad for editing.

  4. Edit the XML string and save it back into the property. You can save the XML in the following format, which is copied directly from the Web Part. For the XML to hold any real value, there needs to be a profile property in the schema.

     <Properties> <Property Name="Department" ManagedName="Department" ProfileURI= "urn:schemas-microsoft-com:sharepoint:portal:profile:Department"/> <Property Name="JobTitle" ManagedName="JobTitle" ProfileURI= "urn:schemas-microsoft-com:sharepoint:portal:profile:Title"/> <Property Name="Responsibility" ManagedName="Responsibility" ProfileURI= "urn:schemas-microsoft-com:sharepoint:portal:profile:SPS-Responsibility"/> <Property Name="Skills" ManagedName="Skills" ProfileURI= "urn:schemas-microsoft-com:sharepoint:portal:profile:SPS-Skills"/> <Property Name="QuickLinks" ManagedName="QuickLinks" ProfileURI= "urn:schemas-microsoft-com:sharepoint:portal:profile:QuickLinks"/> </Properties> 

You need to pay attention to the following elements:

  • Property name

  • Managed property name

  • Profile URI (Universal Resource Identifier)

If you look at the URN (Universal Resource Name) string carefully, you will notice that the profile name is being pulled out of the profile URN. This is why there needs to be a profile property in the schema before this XML has any real effect.

Server Name Mappings

Server name mappings allow you to hide the server's real name in the links for the result set by instructing SharePoint Server to replace the server's name with the alias you enter in the server name mappings area. For example, if your server name is something like FS01MSPNorth (File Server 01 in Minneapolis at the North Office), it might be better to give the user "Server01" in the result set. The alias hides your server-naming convention from the user and also spares the user from receiving a complicated or convoluted server name.

To create a new server name mapping, follow these steps:

  1. Click on the Server Name Mapping link from the Configure Search Settings page.

  2. Click the New Mapping link.

  3. Enter the name of the server in the Address In Index input box

  4. Enter the alias name you want to appear in the result set in Address In Search Results input box.

Search Results Removal

This feature is new to SharePoint Server 2007. It allows you to immediately remove content from the index by entering the URLs for the content you want to remove in the index. This can be helpful if objectionable material has accidentally appeared in the index.

The URLs To Remove input box is shown in Figure 12-15. After entering the URLs you wish to remove, click the Remove button.

image from book
Figure 12-15: The URLs To Remove input box allows you to instantly remove content from the index.



Microsoft SharePoint Products and Technologies Administrator's Pocket Consultant
Microsoft SharePoint Products and Technologies Administrators Pocket Consultant
ISBN: 0735623821
EAN: 2147483647
Year: 2004
Pages: 110
Authors: Ben Curry

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net