Profile Mapping Code

Modify and apply the property mapping code to map meta tags and property tags from external content to the document profile properties in SharePoint Portal Server. Example code is provided later.

Sample Code

The following code fragment details the required actions to register the propagation of properties and metadata from the external content source into SharePoint Portal Server. This is Microsoft Visual Basic® 6 code, not Microsoft Visual Basic Scripting Edition (VBScript). For VBScript examples, see the included PropMap.wsf script in CrawlingMetadataPropmap.zip. If you add this code to a Visual Basic project, you must include the PKMCDO type library as a reference. The sample that follows makes liberal use of built-in constants for SharePoint Portal Server namespaces.

The following code sample shows the path to content sources as /Management/Content Sources/. If you are running on a non-English system, replace this path with the localized string that contains the name of the content sources folder.

 Dim objCS As PKMCDO.KnowledgeStartAddress     Dim objSPR As PKMCDO.IKnowledgeCatalogSitePathRule     Dim colSitePathRules As IKnowledgeCatalogSitePathRules     Dim strUrlContentSource As String     ' Change the following values to reflect the names of your server,     ' workspace, content source, and target document profile.     Const MYSERVER = "SharePoint_Portal_Server_computer"     Const MYWORKSPACE = "SharePoint_Portal_Server_workspace"     Const MYSOURCE = "FileContentSource"     Const MYDOCPROFILE = "DocProfileName"          ' Construct the URL to the Content Sources folder for your workspace.     strUrlContentSource = "http://" & MYSERVER & "/" & MYWORKSPACE _         & «/Management/Content Sources/» & MYSOURCE     ' Open the KnowledgeStartAddress (Content Source) item.         Set objCS = New PKMCDO.KnowledgeStartAddress     objCS.DataSource.Open strUrlContentSource, , adModeReadWrite     ' Indicate which content class (Document Profile)     ' should be attributed to crawled files.     objCS.Fields(PKMCDO.cdostrURI_TargetContentClass) = _         PKMCDO.cdostrNS_ContentClasses & MYDOCPROFILE     ' All tags will have namespaces prepended to them.     ' PKMCDO provides built-in constants for most of these. Change     ' the array elements below to the property names you wish to use,     ' adding or deleting lines as needed. NOTE: it is important that     ' all three of the following arrays match up in terms of number of     ' elements and the ordering of property names.     ' If you are crawling HTML documents, all properties will have a     ' standard HTML namespace prepended to them.  The source namespace     ' may vary for other file types.  See below for details.     objCS.Fields(PKMCDO.cdostrURI_SourceProperties) = Array( _         PKMCDO.cdostrNS_HtmlMetaInfo & "ExternalTag1", _         PKMCDO.cdostrNS_HtmlMetaInfo & "AnotherTag2", _         PKMCDO.cdostrNS_HtmlMetaInfo & "TheLastTag" )     ' One data type entry is needed per source property.     objCS.Fields(PKMCDO.cdostrURI_SourceTypes) = Array( _         "string", _         "string", _         "string")     ' All SharePoint Portal Server properties are prepended with the standard Office     ' namespace.  Ensure that they match up in order with their source     ' properties.       objCS.Fields(PKMCDO.cdostrURI_TargetProperties) = Array( _         PKMCDO.cdostrNS_Office & "SharePoint_Portal_Server_Property1", _         PKMCDO.cdostrNS_Office & "SharePoint_Portal_Server_Property2", _         PKMCDO.cdostrNS_Office & "SharePoint_Portal_Server_Property3 )     ' Adding four properties to the content source definition item     ' is not enough. You must also add a site path that corresponds to     ' this content source.  You can see these by going to the Content     ' Sources management folder and opening the "Additional Settings"     ' item, then clicking the "Site Paths" button on the resulting 
dialog box. Set colSitePathRules = objCS.Workspace.Catalog.SitePathRules ' NOTE: This code does NOT check to see if a matching site path ' already exists. Before you run this code, check to see if matching ' site paths are already present, and if so, delete them. Set objSPR = colSitePathRules.Add(objCS.Address & "/*", True) objSPR.ContentClass = objCS.Fields(PKMCDO.cdostrURI_TargetContentClass) objSPR.PropertyMappingUrl = strUrlContentSource ' Clean up object references and save everything. The site path rule ' item does not need to be explicitly saved, but the content source does. Set colSitePathRules = Nothing Set objSPR = Nothing objCS.Fields.Update objCS.DataSource.Save Set objCS = Nothing

Using Script Files for Property Mapping

The previous sample code illustrates the steps to take when using PKMCDO. The sample that follows, however, is a fully functional application that you can immediately use to configure property mapping. CrawlingMetadataPropmap.zip contains this code.

The CrawlingMetadataPropmap.zip file contains a Microsoft Windows® Scripting Host script file named PropMap.wsf. PropMap.wsf accepts as input an XML file that supplies server, workspace, and content source information, plus property mapping information. An example of the file format expected by this script is provided as PropMap.xml.

You can run the script on any Microsoft Windows 2000–based computer on which you install the SharePoint Portal Server (or client) software (that is, a computer on which you install PKMCDO). The script accepts a single parameter, the path name of the XML file containing the mapping instructions to be processed. If that parameter is missing, the script assumes the file PropMap.xml residing in the same directory as the script file.

The code that follows illustrates the XML document expected by the PropMap.wsf script. The element names describe in detail the information needed to create a property map.

This format is not supported and is not suggested as a standard representation. Its scope is restricted to this chapter to add value to the sample custom code included here.

When examining the <targetContentClass> element, note that the script code prepends "urn:content-classes:" to any value that does not have a namespace prepended to it.

When examining the <sourceName> elements, note that the script code prepends "urn:schemas.microsoft.com:htmlinfo:metainfo:" to any value that does not have a namespace prepended to it.

When examining the <targetName> elements, note that the script code prepends "urn:schemas-microsoft-com:office:office#" to any value that does not have a namespace prepended to it.

 <?xml version="1.0"?> <propertyMap> <server> <name>server1</name> <workspace> <name>test1</name> <contentSource> <name>dogbreeds</name> <targetContentClass>DogBreed</targetContentClass> <property> <sourceName>breedOrigin</sourceName> <sourceType>string</sourceType> <targetName>breedOrigin</targetName> </property> <property> <sourceName>breedName</sourceName> <sourceType>string</sourceType> <targetName>breedName</targetName> </property> <property> <sourceName>breedFirstBred</sourceName> <sourceType>dateTime</sourceType> <targetName>breedFirstBred</targetName> </property> <property> <sourceName>breedWeight</sourceName> <sourceType>i4</sourceType> <targetName>breedWeight</targetName> </property> <property> <sourceName>Abstract</sourceName> <sourceType>string</sourceType> <targetName>Description</targetName> </property> <property> <sourceName>ContentClass</sourceName> <sourceType>string</sourceType> <targetName>DAV:contentclass</targetName> </property> <property> <sourceName>Categories</sourceName> <sourceType>string</sourceType> <targetName>urn:schemas-microsoft-com:publishing:Categories</targetName> </property> </contentSource> </workspace> </server> </propertyMap> 

Using the Modified HTML IFilter Wrapper

An add-on IFilter designed for Index Server is available on Microsoft Software Developers Network (MSDN®). You can register this filter in place of the standard HTML IFilter. The filter works by loading the "true" HTML IFilter, intercepting the <META> tag values it returns, and converting selected values into numbers and/or dates as they are passed to the indexing service.

A series of tests in a SharePoint Portal Server environment showed this IFilter to work properly, with little if any discernable performance penalty.

The original IFilter code is available on MSDN at http://msdn.microsoft.com/library/default.asp?URL=/library/techart/msdn_ismeta.htm. It is strongly recommended that you read this article before proceeding further. It is also recommended that you use the modified copy of that IFilter that is included with this chapter, rather than the code supplied with the original article.

CrawlingMetadataHtmlprop.zip contains the modified version that includes the source. The modified version contains support for the additional date formats mentioned in Knowledge Base article Q240390, specifically:

  • Sun Nov 6 08:49:37 1994
  • Sun, 06 Nov 1994 08:49:37
  • GMT Sunday, 06-Nov-94 08:49:37
  • GMT Sun Nov 6 08:49:37 1994

It also supports the more XML and Web Storage System–centric storage format of:

  • 1994-11-06T08:49:37.000

The original source code supported a smaller number of formats, which were less standard for HTML content crawling purposes. The original source code required the administrator to indicate in an .ini file which properties to transform into different data types.

While full source code is included, the only files necessary to begin are HTMLProp.dll and HTMLProp.ini. The .ini file contains installation (and removal) information. A ReadMe.txt file provides background context, but a large amount of its content is specific to earlier versions of Index Server. HTMLProp.ini contains all the information needed to install and register HTMLProl.dll.



Microsoft Sharepoint Portal Server 2001 Resource Kit
Microsoft SharePoint(TM) Portal Server 2001 Resource Kit (Examples & Explanations Series)
ISBN: 0735615624
EAN: 2147483647
Year: 2001
Pages: 231

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net