The deployment phase included installing hardware and software, modifying settings, and testing. After deploying the SharePoint Portal Server environment, ITG ran it in parallel with Site Server 3.
This section reviews the installation and configuration for the workspaces. In particular, it reviews the process for creating content sources.
The project team installed the hardware for the SharePoint Portal Server deployment in the same data center as the Site Server 3 environment, so network connectivity and other environmental variables remained the same.
Next, the team installed the operating system. For more information about installation requirements, see Chapter 11, Installing SharePoint Portal Server. You must deploy a server dedicated to searching before deploying a server dedicated to index workspaces. When you create an index workspace, you must specify the destination workspace, as shown in Figure 27.5. Therefore, the project team began by first configuring the server dedicated to searching and then configuring the servers that would host index workspaces.
Figure 27.5. Creating an index workspace
The team specified the settings as detailed in the following table.
Table 27.6 Workspace configuration settings
| Enterprise search | Index 1 | Index 2 |
---|---|---|---|
Catalog Name | All catalogs propagate to the Enterprise Search server | BestbetsCorpPortal, HumanResourcesWeb, corporate portal, WebCat2, WindowsUA | ITG portal, KBInt portal, corporate portal param, Product Group Portal, SAP portal, MSWordTest |
General |
|
|
|
Indexing Resource Usage | 1 (Background) | 5 (Dedicated) | 5 (Dedicated) |
Search Resource Usage | 5 (Dedicated) | 1 (Background) | 1 (Background) |
Site Hit Frequency Rules | None | None | None |
Proxy Server |
|
|
|
Do not connect using a proxy server | Disable | Enable | Enable |
Use the proxy server settings of the default content access account | Enable | Disable | Disable |
Use the proxy server specification below | Disable | Disable | Disable |
Default File Types in Site Server 3.0 removed from catalog | asp, doc, htm, html, ppt, xls, txt, exch | asp, doc, htm, html, ppt, xls, txt, exch, | asp, doc, htm, html, ppt, xls, txt, exch |
Removed from Enterprise Search: | nsf, xml, odc, tiff, eml, dot, tif, mht | nsf, xml, odc, tiff, eml, dot, tif, mht | nsf, xml, odc, tiff, eml, dot, tif, mht |
The team specified a System Resource Usage of 5 as the default for the servers hosting index workspaces. This allows full system resource usage when the server crawls content.
SharePoint Portal Server provides resource usage controls for searching and index creation, the two resource-intensive processes that are commonly performed on SharePoint Portal Server computers.
It is recommended that you balance resource usage to optimize performance depending on your server configuration. If you distribute searching and index creation across multiple servers, dedicate resources on each computer to the specific task that each computer performs. If you use one server to perform both index creation and searching, balance resource usage evenly between the two processes.
By design, this enterprise search solution does not crawl content outside the firewall. To allow SharePoint Portal Server to crawl only internal sites but without having to specify many rules (for example, exclude all *.com, *.edu, *.org), the team disabled the proxy server on each of the servers that hosted index workspaces. This prevented crawling anything outside the corporate environment.
To minimize unnecessary security changes, SharePoint Portal Server uses the same accounts to crawl and propagate content as Site Server 3. As with Site Server 3, SharePoint Portal Server respects Access Control Lists (ACLs). The use of ACLs maintains security as implemented in each of the original content sites.
The team created one workspace to correspond to each Site Server 3 catalog. After creating all the workspaces, the team created the content sources. Figure 27.6 shows an example of the content sources (called start address in Site Server 3) in one workspace.
Figure 27.6. Example of content sources
Most workspaces contained several content types. A single content source cannot refer to different content types, but you can refer to multiple content types in a workspace.
During testing, the team discovered the following tips for properly configuring hops and depth:
For tracking purposes, the project team created a matrix showing which workspaces and sites used complex URLs and which content sources used which protocols, as shown in the following table.
The team restricted the use of complex URLs to well-known parameterized URLs, to minimize the risk of crawling URLs that continued to generate additional links without end.
Table 27.7 Tracking Spreadsheet
Protocol | ||||
---|---|---|---|---|
Workspace name | Complex URL | File | HTTP | Exchange |
bestbetsCorporatePortal (Index 1) | Y | N | Y | N |
Corporate Portal Intranet (Index 1) | N | Y | Y | Y |
HumanResourcesWeb (Index 1) | Y | Y | Y | N |
WebCatalog2 (Index 1) | Y | N | Y | N |
WindowsUA (Index 1) | Y | Y | Y | Y |
Corporate Portal Param (Index 2) | Y | Y | Y | Y |
ITG portal (Index 2) | Y | Y | Y | Y |
KBInt portal (Index 2) | N | Y | N | N |
Product Group Portal (Index 2) | N | Y | Y | Y |
SAPWeb (Index 2) | Y | Y | Y | Y |
MSWordTest (Index 2) | Y | Y | Y | Y |
The team specified three additional settings when configuring content sources: site path rules, Access/Display mappings, and file types.
Figure 27.7 shows the properties page for modifying site path rules in a single workspace.
Figure 27.7. Example of site path rules
The spreadsheet of catalogs created during the Analysis and Design phase contained the site path rules and mappings. It is critically important that the site path rules be set exactly as intended. For more information about adding content sources, see Appendix B, For More Information.
The following principles can assist you when you need to create content sources:
File:*
http://*
Exch:*
SharePoint Portal Server crawls the text content of a Microsoft Office document and standard Office summary properties. If you want to include additional properties, you must create a document profile in the workspace with those properties. SharePoint Portal Server includes the metadata from the document profile in the index.
When SharePoint Portal Server propagates the indexes to a server dedicated to searching, the destination server must possess the same document profiles.
You must map properties of HTML documents or custom metadata of external documents to a document profile. This allows SharePoint Portal Server to crawl the additional properties. HTML files usually store custom properties in <META> tags. For more information about mapping custom properties, see Chapter 25, Crawling Custom Metadata.
To map properties between servers, the project team performed the following procedure.
The team created a document profile called "Search Custom Tags" for each index work-space. Each workspace included additional metadata, as shown in the following table.
Table 27.8 Example of property mapping for index workspaces
Workspace: bestbetsCorporatePortal | Workspace: CorporatePortal |
---|---|
META_Categories | META_Categories |
META_PageURL | META_PageURL |
META_XMLTerms | META_XMLTerms |
META_Keyword | META_Keyword |
Keywords | Keywords |
Description | Description |
Title | Title |
Author | Author |
Workspace: HumanResourcesWeb | Workspace: LibraryCatalog |
META_Categories | META_MainAuthor |
META_PageURL | META_itemtype |
META_XMLTerms | META_pubdate |
META_Keyword | META_subtitle |
Keywords | Keywords |
Description | Description |
Title | Title |
Author | Author |
The team created a document profile with the same name used in step 1 on the server dedicated to searching. This document profile includes all the properties of the document profiles from each index workspace, as shown in the following table.
Table 27.9 Example of property mapping for server dedicated to searching
Server dedicated to searching | |
---|---|
META_Categories | META_PageURL |
META_XMLTerms | META_Keyword |
META_MainAuthor | META_itemtype |
META_pubdate | META_subtitle |
Keywords | Description |
Title | Author |
The document profile on the server dedicated to searching must contain the union of the properties of all the document profiles on the servers that host index workspaces. Any properties that are mapped and crawled on the server that maintains indexes, but are not present in the document profile on the server dedicated to searching, are not available in the index workspace that propagates to the search server.
The team ran the property mapping script for each index workspace. For more information about this script, see Chapter 25.
It is important to note that the account credentials under which the property mapping script runs must have administrator rights on the server and coordinator roles on the workspace.
To flush the caches, the team restarted the following services on the servers hosting index workspaces:
After restarting the services, the team reset the index and began a full update.
The existing search solution allowed customized query and results sets for each portal. Because of this, the team chose not to use the default dashboard site provided as part of SharePoint Portal Server.
By contrast, many customers may have only a single centralized search page to which all internal sites link. These customers could simply replace the existing page with the Search dashboard from SharePoint Portal Server and avoid creating custom search pages.
From each portal, a user uses a search box to submit queries. After submission, the user is redirected to a hosted ASP page on the server dedicated to searching. Site Server 3 takes the following steps during this process:
The transition from Site Server 3 to SharePoint Portal Server required the project team to modify step 2 and step 4 of the preceding process. For step 2, the team changed the query so that it used the Structured Query Language (SQL) syntax with full-text extensions instead of native Site Server 3 COM objects.
The following example illustrates a SELECT statement using WebDAV in SharePoint Portal Server.
SELECT "urn:schemas-microsoft-com:office:office#Office", "DAV:parentname",
"DAV:href", "urn:schemas-microsoft-com:office:office#Title",
"urn:schemas.microsoft.com:fulltextqueryinfo:description", "urn:schemas-
microsoft-com:office:office#META_PageURL","urn:schemas-microsoft-
com:office:office#META_Categories", rank, "DAV:getcontentlength",
"DAV:getcontenttype", "DAV:getlastmodified" FROM TABLE corpportal..SCOPE() WHERE WITH ("urn:schemas-microsoft-com:office:office#Title",
"urn:schemas.microsoft.com:fulltextqueryinfo:description",
"urn:schemas.microsoft.com:fulltextqueryinfo:contents") AS #DocDesc
(FREETEXT (#DocDesc, '401k') RANK BY COERCION ABSOLUTE , 1000)) ORDER BY rank DESC
The SELECT list returns the mapped meta properties (in the Office namespace).
The team used the workspace-level scope to restrict results to one of the index workspaces. They also used group aliasing in addition to freetext and rank coercion. For more information about restricting search results, see Appendix B.
To modify step 4 in the preceding process, the team modified the process for formatting results. Originally, the page used a custom routine to create XML from the results set for Site Server 3, but SharePoint Portal Server returns XML natively. This eliminated the need to convert results to XML. The team simply applied an Extensible Stylesheet Language (XSL) transformation to achieve the formatting they wanted.
Samples of the ASP pages for Site Server 3 and SharePoint Portal Server are provided in the following code.
This is a sample of the Site Server 3 ASP code.
<%@LANGUAGE="VBScript" %> <% ' Copyright 1997-1998 Microsoft Corporation. All rights reserved. %> <% DisplayText=Request("q1") RecordNum=Request("RecordNum") if RecordNum= "" then RecordNum=1 %> <html> <head><title>Search Page</title> <meta http-equiv=content-type content="text/html; charset=iso-8859-1"> <meta http-equiv=»content-language» content=»EN»> </head> <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=15 bgcolor="ffffff"> <form method=get>Search: <input type=Text name="q1" value="<%=DisplayText%>" size="23"> <input type=submit name="Search" value="Go"> <input type=hidden name="ct" value="MyCatalog"> </form> <% If DisplayText <> "" Then %>Searching for <b><%=DisplayText%></b> <% ' Set query and utility objects, and define query object properties. set util = Server.CreateObject("MSSearch.util") set Q = Server.CreateObject("MSSearch.Query") Q.SetQueryFromURL(Request.QueryString) Q.MaxRecords = 25 Q.SortBy = "Rank[d],DocTitle" Q.Columns = "DocTitle, DocAddress, FileWrite, Size, Description,
FileName, DocSignature, Rank, DetectedLanguage, MimeType, SiteName,
NNTP_MessageID" ' Create the recordset holding the search results. on error resume next set RS = Q.CreateRecordSet("sequential") if err then createerror = err.description createerrnumber = err.number end if ' Error description. if err then Response.write createerror ' Display results else Response.write "<table><tr><td><font size = 2>" ' Set up number found. NumberFound= RS.Properties("RowCount") if RS.Properties("RowLimitExceeded") = true then NumberFound = "More than " & NumberFound end if ' Set up loop to iterate through results. Do while not RS.EOF ' Set up title for links, providing an alternative if DocTitle is blank. if RS("DocTitle") <> "" then Title = RS("DocTitle") else Title = "No title: " & RS("DocAddress") end if ' Set up link itself. Link = RS("DocAddress") ' One table is used for each search result. Response.write "</font></td></tr><tr><td> </td></tr></table>" Response.write "<table cellpadding=0 cellspacing=0>" Response.write "<tr><td width=21><font size=2><p>" Response.write "<table cellpadding=1 cellspacing=1 border=0><tr><td align=top>" Response.Write "<font size='2'>" & RS("Rank") & "</font>" %> </td></tr></table> <% Response.Write "</font></td>" Response.Write "<td bgcolor='#80BBDD'><font size=2>" %> <a <% = LinkTarget %> href='<% = Link %>'><% = Title %></a> </font></td></tr><tr><td></td><td><font size=2> <% Response.write util.TruncateToWhiteSpace(RS(«Description»),250) %> </font></td></tr> <tr><td></td><td height=5></td></tr> <tr><td></td><td> <font color=808080 size=1>[<% = util.TruncateToWhiteSpace(RS("FileWrite"), 12 ) %>] <% iSize = CInt(CLng(RS("Size"))/1024) %> (<% = iSize %>k) </font> <% ' Increment the results. RS.MoveNext RecordNum = RecordNum + 1 Loop Response.write "</font></td></tr></table>" ' If there are more results pages, set up the "More Results" link. if RS.Properties("MoreRows") = true then Q.StartHit = RS.Properties("NextStartHit") ' Repeat query with new start hit. L_MoreResults_link = "More Results" MoreLink = "<a href=?" & Q.QueryToURL & "&" _ & "DisplayText=" & Server.URLEncode(DisplayText) & "&" _ & "RecordNum=" & RecordNum _ & ">" & L_MoreResults_link & "</a>" end if %><% = MoreLink %> </font></td> </tr> </table> <% End if End If %>
This is a sample of the SharePoint Portal Server ASP code.
<%@LANGUAGE="VBScript" %> <% ' Copyright 2001 Microsoft Corporation. All rights reserved. %> <% DisplayText=Request("q1") ct=Request("ct") If DisplayText = "" Then %> <html> <head><title>Search Page</title> <meta http-equiv=content-type content="text/html; charset=iso-8859-1"> <meta http-equiv="content-language" content="EN"> </head> <body text="#000000" link="#000000" alink="#000000" vlink="#000000" topmargin=17 leftmargin=15 bgcolor="ffffff"> <form method=get>Search: <input type=Text name="q1" value="<%=DisplayText%>" size="23"> <input type=submit name="Search" value="Go"> <input type=hidden name="ct" value="MyCatalog"> </form> <% Else Response.ContentType = "text/xml" Response.Write("<?xml version='1.0' encoding='ISO-8859-1'?>" & vbCRLF) Response.Write("<Results xmlns:dt='urn:schemas-microsoft-com:datatypes'>") set oProc = Application("StyleTransform").createProcessor Set xh = Server.CreateObject("Msxml2.SERVERXMLHTTP") strQuery = "<?xml version=""1.0"" encoding=""utf-8""?><a:searchrequest xmlns:a=""DAV:""><a:sql>" &_ "SELECT ""rank"", ""DAV:href"", ""urn:schemas-microsoft-
com:office:office#Title"",
""urn:schemas.microsoft.com:fulltextqueryinfo:description"",
""DAV:getcontentlength"", ""DAV:getlastmodified""" &_ "FROM " & ct & "..SCOPE() " &_ "WHERE WITH (""urn:schemas-microsoft-
com:office:office#Title"",
""urn:schemas.microsoft.com:fulltextqueryinfo:description"",
""urn:schemas.microsoft.com:fulltextqueryinfo:contents"") AS #DocDesc
(FREETEXT (#DocDesc, '" & DisplayText & "')) " &_ "ORDER BY ""rank"" DESC</a:sql></a:searchrequest>" 'Make DAV request xh.setTimeouts 0, 6000, 6000, 0 xh.open "SEARCH", "http://myServer/myWorkspace", False xh.setRequestHeader "content-type", "text/xml" xh.setRequestHeader "range", "rows=0-9" xh.setRequestHeader "MS-Search-MaxRows", 200 xh.setRequestHeader "MS-Search-UseContentIndex", "t" xh.send strQuery 'Process DAV response if xh.Status <> 207 then Response.Write "<error>Status: " & xh.Status & ". Status Text: " & xh.statusText & "</error>" Response.Write "<errorReason><![CDATA[" & xh.responseText & "]]></errorReason>" else if xh.responseXML.parseError.errorCode <> 0 then Response.Write "<error>XML response error code = " &
xh.responseXML.parseError.errorCode & " " &
xh.responseXML.parseError.reason & "</error>" end if 'Display results if xh.responseXML.selectSingleNode("a:multistatus").haschildnodes = false then Response.Write("<ResultSet totalhits='0'><error>No documents match your query.</error></ResultSet>") else oProc.input = xh.responseXML.documentElement oProc.transform Response.Write(oProc.output) end if end if Response.Write "</Results>" End If %>
Testing included two tasks. The project team verified that SharePoint Portal Server met the criteria for creating and maintaining indexes for the identified content. In addition, they verified that SharePoint Portal Server met the criteria for searching, including the criteria for workspace propagation process and speed, basic functionality, and the custom search page.
The team identified two goals for testing the process of creating an index:
The second goal verified the scalability of the SharePoint Portal Server search solution. ITG's goal was 6 million documents. That number was based on 3 million documents in the index at the beginning of the test, plus additional occasional sources, and an additional number used as a growth factor.
To measure crawl performance, the test team established several metrics. The following table shows these metrics according to source.
Table 27.10 Index Test Metrics
Data collection | Found at |
---|---|
Number of documents | Event viewer, SharePoint Portal Server Administration in Microsoft Management Console (MMC), ASP event log |
Crawl status | SharePoint Portal Server Administration in MMC, Web folders view Event viewer application log Event viewer application log Manual calculation using the preceding data |
Crawl start time | |
Crawl end time | |
Crawl duration | |
Catalog size | SharePoint Portal Server Administration in MMC |
Property store Note Property store size is applied at server level and not at catalog level | Folder <...\SharePoint Portal Server \ \FTData\SharepointPortalServer\sps.edb>, by using Windows Explorer |
The team executed each crawl several times. They refined the rules until they were satisfied the proper content was actually being included in the index. They used the dashboard search on the server dedicated to searching to assist with this check.
The team used the event viewer and gatherer log viewer from SharePoint Portal Server to examine the system to ensure that the index was operating normally and without problems. Figure 27.8 shows an example of the event viewer entries for starting and stopping the index.
Figure 27.8. Example event viewer entries
The following table shows an example of the data collected to track crawls.
Table 27.11 Example Index Test Metrics
Catalog name | Full crawl | ||||
---|---|---|---|---|---|
| # of docs | Crawl duration | Prop. duration | Catalog size | Property store Size |
bestbetsCorpPortal (Index 1) | 851 | 1 min | 1 min | 1 MB | 4.61 GB |
Corporate Portal Intranet (Index 1) | 2,920,1 78 | 3,127 min | 65 min | 5,081 MB |
|
HumanResourcesWeb (Index 1) | 3,927 | 24 min | 1 min | 4 MB |
|
WebCatalog2 (Index 1) | 17,882 | 24 min | 1 min | 14 MB |
|
WindowsUA (Index 1) | 14,198 | 8 min | 1 min | 14 MB |
|
CorpPortal Param (Index 2) | 694 | 3 min | 1 min | 1 MB | 1.04GB |
ITG portal (Index 2) | 13,250 | 37 min | 1 min | 13 MB |
|
KBInt portal (Index 2) | 226,474 | 269 min | 15 min | 325 MB |
|
Product Group Portal (Index 2) | 159,257 | 224 min | 19 min | 605 MB |
|
SAPWeb (Index 2) | 3,609 | 47 min | 1 min | 3 MB |
|
MSWordTest (Index 2) | 15,233 | 11 min. | 1 min | 24 MB |
|
SharePoint Portal Server completed the full crawls with satisfactory results at a volume of about 3 million documents. ITG added more content sources for scale testing. Eventually, SharePoint Portal Server crawled just over 6 million documents. Crawl performance did not drop off due to the size of the index.
Next, the team tested incremental updates on each of the catalogs. The incremental crawls took about half the time of the original full index and proved successful.
Finally, the team tested adaptive crawling on the largest catalogs in multiple passes until the number of documents modified converged. In doing so, the team discovered that convergence took about eight passes for the largest workspace. In these passes, crawl time was reduced from 51 hours for a full index to less than 8 hours for the shortest adaptive crawl, a nearly sevenfold improvement. Figure 27.9 shows the index times per pass.
Figure 27.9. Adaptive crawl times
The testing process involved the following steps:
When an index reaches a steady state of number of documents updated or crawl time, it has converged. After convergence, the crawl time remains approximately the same each night, unless SharePoint Portal Server detects a large change in content such as a new site coming online.
ITG tested three additional features. First, they tested the workspace propagation process and times. Next, they tested the basic searching by using the dashboard site. Finally, they tested the custom search page.
When examining propagation, it is important to determine that propagation completes successfully. In addition, ITG needed an estimate of how long the propagation took to complete. The following table outlines the metrics and their sources.
You should measure the duration of propagation, from the start of the process on the server hosting the index workspace to the end of the process on the search server.
Table 27.12 Search Test Metrics
Data collection | Currently found at |
---|---|
Propagation status | SharePoint Portal Server Administration in MMC, Event Viewer |
Propagation start time | Event Viewer (on both servers) |
Propagation end time | Event Viewer (on search server) |
Propagation duration | Manual calculation from data collected |
The ITG team tested the results for simple full-text queries that used SharePoint Portal Server. After performing queries, they compared the results seen in Site Server 3 queries with those in SharePoint Portal Server to ensure that crawling returned the proper documents and appropriately followed the rules.
Finally, after completing the custom ASP page modifications, they tested the ASP pages. The test involved both the query and results pages. Final tests measured performance and accuracy of the results sets.
For query latency, the ASP page recorded the exact time of the request and the exact time of the response in an SQL database, along with other relevant data used to track usage metrics. The team created a set of 47 queries, most of them from the top 100 queries run the previous month. This set included one-term and two-term phrases and some unusual queries. They ran this set of queries on Site Server 3.0 and then on SharePoint Portal Server. The data collected included the time of the first request of a query and then the results of the next four queries for the same term. These latency times, in seconds, are shown in the following table.
Table 27.13 ASP Page Performance Testing
Product | Initial | #2 | #3 | #4 | #5 |
---|---|---|---|---|---|
Site Server 3 | 1.11 | 0.84 | 0.81 | 0.81 | 0.86 |
SharePoint Portal Server | 4.28 | 0.65 | 0.65 | 0.65 | 0.65 |
ITG determined the disparity between initial response times with SharePoint Portal Server and Site Server 3 to be the cache. Because Site Server 3 was already in use and taking queries, many queries and terms were already loaded into memory. This helped reduce the initial response time. SharePoint Portal Server had none of the terms in memory, so all queries required reading from the disk. Subsequent queries with SharePoint Portal Server were 22 percent faster than Site Server 3.
In addition to faster query rates with SharePoint Portal Server, tests determined that the server dedicated to searching was capable of taking advantage of additional memory. When the team increased RAM from 1 GB to 2 GB on this server, latency time dropped. They allowed 1 GB of RAM for running the operating system and SharePoint Portal Server and 1 GB of RAM for caching the property store. Loading a large part of the property store helped improve performance by speeding access to data used in search queries. The numbers in the previous table were from the testing once the team added the additional memory, but before they ran the "warm-up" script.
To facilitate this pre-loading or "warm up" of the cache, the team developed a script that runs immediately after crawling completes and propagates. This script loads the cache with data, so the cache is ready when the service enters production. For more information about this script, see Appendix B.
If you set the maximum cache size too high, you can leave insufficient memory for SharePoint Portal Server, the operating system and any other applications on your server. A good rule-of-thumb is to leave at least 0.5 GB for use by SharePoint Portal Server and the operating system. For example, on a server with 2 GB of physical memory, set the minimum cache size to 1 GB and the maximum cache size to 1.5 GB (or less, if you have other applications running).
You must leave enough memory for other processes and for monitoring Microsoft Search objects in Performance Monitor.
The search tests yielded the following results:
After developing the custom ASP pages, the team validated the search results through testing. They added a link to the results page for Site Server 3, asking users to try the new search page that relied on SharePoint Portal Server. From this process, the team monitored the following data: