In this section, you'll learn about the different types of Search topologies that you can implement using SharePoint Server 2007. You'll find that your choices are both limited and expanded when compared to Microsoft SharePoint Portal Server 2003. Those of you coming from the SharePoint Portal Server 2003 platform will likely be pretty happy with the changes, overall. However, as with all systems, you'll encounter some considerations that need to be discussed and understood for your environment before you embark on a robust Search implementation.
Search is not that difficult to design in smaller environments. Many of you will find that you can implement a single Index/Query server in your farm and that will be sufficient to meet both your indexing and query needs.
However, those with larger environments (over 50 to 75 content sources multiple SSPs, or both) and those with unique security requirements will find that a more complex set of decisions need to be melded into a single whole for your farm. In this section, you'll explore those decisions, and the server roles will be the starting point for that exploration.
There are essentially three server roles that work together to form the complete Search topology and that execute the actions necessary to deliver a robust, aggregated search experience. Those three roles are the WFE server, query server, and index server. We'll start by looking at the index server role.
The index server is responsible for executing the crawl process (which is described in Chapter 16). The index server is handed the URL from the SSP and ensures that the target location hosting the content you want to include in your index is crawled according to the rules you specify in the content source configuration and crawl rules. The index server works with the WFE server to ensure that the hosting target is crawled efficiently.
Most administrators have little idea of how important the WFE server is in the execution of the crawl function. When the index server initiates a crawl process for a content source, the URLs are passed to the WFE server from the index server, which is the server responsible for connecting to the location that hosts the content. It is the WFE server that actually crawls the content and retrieves both the metadata and content streams from the target hosting the content. The WFE then proxies those data streams back to the index server.
This is why you are given two choices about the incorporation of WFE servers into your Search topology. You can choose to dedicate a WFE server to be the only WFE server that the index server will use, or you can choose to have the index server use all the WFE servers in the farm. The design choice here is significant because you can assign only one index server to each SSP. The planning and design issues for a single SSP implementation are discussed briefly in the "Planning Your WFE Topology for Crawling Content Sources" sidebar.
Real World Planning Your WFE Topology for Crawling Content Sources
Both the index and WFE servers that are configured to participate in the crawling process will have a heavy load placed on them for memory, processor, and network subsystems resources during crawl times. You'll have only two basic choices:
Route all your crawling activities through a designated, single WFE server.
Route all your crawling activities through all the WFE servers in your farm.
One choice is not better than the other; each choice has associated pros and cons, as outlined here.
If you choose to route all the crawl processes through a single WFE server, and if you have a large number of content sources that consume a majority of a 24-hour period to complete, you should consider using a single WFE server if the server has the resources necessary to avoid becoming a bottleneck in the crawling process. Best practice is to adhere to the following guidelines:
Do not allow the WFE dedicated to crawling activities to participate in being a member of any Network Load Balancing (NLB) cluster. This ensures that the WFE is fully dedicated to working with the index server to crawl the content sources, and it represents the most efficient use of this WFE server.
Ensure there is reliable, high-speed bandwidth between the index server and the dedicated WFE server, because all the crawling processes will be routed through this WFE server.
Configure the WFE server to have similar hardware capacity as that of the Index server since the load that is placed on the Index server will be repeated on the WFE server during the crawl process.
The upside of routing all the crawling through a single WFE server is that the other WFEs are free to service user requests for other farm services. The downside is that you now have a single point of failure in your crawling architecture, and this architecture is not scalable to a second WFE server. Remember, your choice is either a single WFE server or all WFE servers; you can't choose two out of five, or three out of eight, WFE servers for the index server to route calls through.
If you choose to route all your crawling activities through all your WFE servers, you need to understand that while those WFE servers are participating in helping the index server crawl a content source, they will also be participating in NLB clusters, helping users access data and other farm resources. The obvious, positive aspect of this configuration is that all the work (both user demand and crawling demand) is load balanced over the sum of the WFE servers in your farm. If you need additional resources, you can scale out with additional WFE servers.
How do you know whether you should devote a WFE to crawling activities rather than load balancing those activities over all your WFE servers? Here are some principles with which to work:
If the number of content sources to crawl is increasing, user demand for other farm resources is constant, and your WFE servers are not heavily taxed by current user demand, consider load balancing your crawling activities over all your WFE servers.
As client demand for other farm resources increases and the number of content sources remains constant, consider using a dedicated WFE server that will not participate in the NLB clusters but that can handle the crawling demand.
If both client demand for farm resources and crawling demand for additional content sources increases, consider load balancing crawling activities across all your WFE servers and scale out by adding WFE servers to meet both needs.
If you configure multiple SSPs in your farm, each with their own dedicated Index server, then route all crawling processes through all of your WFE servers and scale out your WFE topology if you need additional throughput. It is not recommended to route multiple Index servers through a single dedicated WFE server as this will likely turn that dedicated WFE server into a bottleneck. You could give each Index server its own dedicated WFE server, but since you're scaling out the WFE topology anyways, it seems logical to have all the Index servers use the entire set of WFE servers.
If you're deploying a multi-SSP environment, the design choices for using a single WFE server-rather than using all your WFE servers-for crawling activities becomes more complex. The complexity comes from the interplay of having multiple SSPs that might or might not crawl overlapping content and whose crawl schedules might or might not be aggressive. The way to address all this is to start by looking at what is the best content to crawl for each SSP, and then determine how much, if any, overlap there might be between the SSPs. Ideally, you'll have no overlap, but if there is overlap, you'll need to coordinate crawl schedules between the two SSPs.
For example, assume you have an SSP for your research division and another SSP for your sales and marketing division. Both divisions are likely to be interested in some common content, such as the corporate intranet site. To ensure that you don't overload the server (or servers) hosting the corporate intranet site, best practice is to stagger the crawl schedules of both SSPs so that they aren't crawling the intranet site at the same time.
The query server fields user queries for the index. Users cannot query across multiple indexes (SSPs), so it is important to ensure that your users understand what content their SSP is crawling and that they have a way to communicate back to you other sources of content they would like crawled and included in the index.
Users can query the index by navigating to the Search center in the portal or by using the simple search Web Part to execute queries across their team sites. In fact, any time they are presented with a search Web Part, they are given the opportunity to query the index of the SSP that the Web application is associated with.
The search database is held on the file system of the index and query servers in the path you specify when the SSP is created. The metadata is held in the SQL database (named Search Database in the interface) for the SSP. Unlike it was in SharePoint Portal Server 2003, the index is no longer associated with a portal; it is created and managed by the SSP.