Understanding and Documenting Your Environment | MicrosoftВ® Office SharePointВ® Server 2007 Administrators Companion

When you begin to plan and design your disaster recovery plan, be sure to include all aspects of your implementation-not just the SharePoint Server 2007 servers. A medium-scale to large-scale SharePoint Server 2007 installation will have many infrastructure dependencies, along with the core components, such as Web front-ends, search servers, and database servers. It is nearly impossible to successfully execute a disaster recovery plan if you do not know all the dependencies in your environment. Your particular implementation might have dependencies in addition to those covered in the following section. Be sure to document and plan for all supporting components.

Documenting Your Infrastructure and Plan for Disaster

Begin your SharePoint Server 2007 disaster recovery plan by listing all hardware, software, and network components that support your installation. Talk to the administrators of these systems and identify any scheduled outages, such as maintenance windows, you need to take into account when planning. Unplanned outages can affect database mirroring or transaction log shipping, rendering your disaster recovery plan ineffective. Your disaster recovery plan is only as good as the weakest link, so involve the stakeholders early on and convince your peers that a good disaster recovery plan is a solid investment.

Network and System Administrators

Many disaster recovery plans adequately cover all hardware, software, and system components, but they leave out the most important part of the equation-you. For example, if the network administrator is on vacation when a disaster occurs, your restoration of your SharePoint Server 2007 server farm is of little help if the network is down. Many organizations keep a spreadsheet of all administrators that includes vacation and shift schedules. Using a list or Microsoft Office Excel spreadsheet in the browser are good ways to solve this problem. Include home telephone numbers, cell phone numbers, and any other relevant contact information that might be required for restoration of service.

Operating System

The most obvious component to SharePoint Server 2007 is the Microsoft Windows Server operating system. Several versions are available, so documenting your specific installation and having the installation media available is a great place to start.

Many organizations create snapshot images of all non-database servers for restores, and this is a recommended solution. Service packs, patches, or custom code might have to be applied after a server is restored from a system image. If you have large content indexes, you should restore the search server from an image and restore the indexes using native tools. The recommended approach is to use a freshly installed Windows Server image with SharePoint Server installed, all customizations installed, and patches applied. Keep a different image for each server because changing the security identifier (SID) of a single image to create multiple SharePoint Server 2007 Web front-end (WFE) servers and application servers has been proven to be a bad idea. Re-create the image every time service packs, patches, and site definition changes occur. This safeguard allows for rapid restoration of the server while retaining your Office SharePoint Server 2007 farm consistency.

Note

The 12 hive is located at C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\.

You should also create a network share with all your system images, installation sources, patches, and third-party software additions. Doing this speeds recovery of systems by reducing media load times and version issues. You should back up this drive at least on a weekly basis.

Third-Party Software

Many organizations have third-party solutions running on their SharePoint Server 2007 server farms, including backup software, Web Parts, language packs, antivirus programs, and custom code. Document how these are installed, include installation keys, if necessary, and keep the installation media in a central location that is easily accessible. Remember to reinstall any third-party Web Parts and custom code before redeploying a WFE server. Not reinstalling all Web Parts on a load-balanced farm of WFE servers results in page errors and an inconsistent experience for the end user.

Network Components

Because SharePoint Server 2007 is primarily a Web service and is heavily network dependent, knowing all your connecting pieces is crucial to recovery or continuity of services. Include your network team early in your disaster recovery planning process and document all connected pieces. The following are examples of components to discuss with your network team:

Switches Redundancy, virtual local area networks (LANS), Network Interface Card (NIC) teaming, port speed, duplex, dedicated backup LANs
Routers Redundant paths, latency, hardware load balancing
Firewalls Rules, redundancy, operating system version
Storage area network (SAN) Compatibility, capacity, speed, Host Bus Adapter
Cabling and electrical topology Redundant cabling, processes for working in your raised floor, redundant power, uninterruptible power supplies, generators

Note

If you are using an Internet Service Provider (ISP), be sure to get a service-level agreement (SLA) that defines their strategies and obligations regarding these and all provided services.

Documenting Your Server Farm Configuration

When developing a disaster recovery plan for your installation, use a ground-up approach. This approach helps you to avoid missing components and to prepare for disasters. Be very methodical in documenting your configuration, beginning with your operating systems and SharePoint Server 2007-related services. The components discussed in the following sections are a good starting point for documenting your configuration.

Central Administration

With the exception of the SQL server, the server that hosts the Central Administration Web application is the most important component when recovering an SharePoint Server 2007 installation. In a complete loss of service, you need to bring up this server first and re-establish connections to your databases. It is important to completely document the installation of all servers, but this is especially true for your Central Administration Web application server. If the Central Administration server is the only failed component in your implementation, you can simply promote another server in the farm to the Central Administration server. This is done using the SharePoint Products and Technologies Configuration Wizard, but be sure to document this change if it will be permanent.

You will use your Central Administration server Web application console to access the Backup and Restore user interface (UI), or optionally, the command-line tool. Restoration of this server can be from a system image or by using the Windows Server Backup Or Restore Wizard.

WFE Servers

In an out-of-the-box SharePoint Server 2007 implementation, WFE servers are stateless servers, meaning they don't track client access and any WFE can serve SQL content data. This eases restoration of a WFE by allowing installation of the application binaries and then connecting to an existing SQL Server configuration database. The SQL Server configuration database populates any required information on the WFE to serve native SharePoint content. The exception to this is when you are customizing Web application content. As an example, many WFEs have branded images, custom pages, excluded managed paths, Web Parts, and specialized authentication mechanisms. All these components must be re-installed after a WFE system is rebuilt, reinforcing the importance of carefully documenting customized environments. The following is a list of items that must be documented to successfully back up and restore a customized WFE:

Internet Information Services (IIS) configuration
Customized authentication software
Transmission Control Protocol (TCP) ports on Web applications and extended Web applications
IIS-excluded managed paths and associated content
Centrally located repository for IIS Metabase backups
Secure Sockets Layer (SSL) certificate backups
IIS Logs at %systemroot%\system32\LogFiles\w3svc<IIS Virt Server ID>
Web Parts installed into the global assembly cache (GAC)
Customized code in the 12 hive

Search Server

If your indexes are not large, rebuilding an index after a system image restore is a quick and simple way to return up-to-date search and query functionality. Alternatively, you can reinstall SharePoint Server to an existing farm and enable it as a Search server in Central Administration. If your index sizes are measured in gigabytes or terabytes, you should back up your indexes so that they can be restored, providing a reasonably timed return to service. The native backup and restore tools support backup and restore of the Shared Services Provider, and that includes the content indexes on the Index and Query servers, as well as the SQL databases containing the correlating metadata. If you do not back up large content indexes, the result is incomplete search results for hours or days, depending on the size of your content sources and speed of your hardware. Using the Index server as a dedicated WFE for indexing is recommended for most implementations, but in large implementations you should consider using all WFE servers for crawling (default). Doing so reduces the impact to end users when crawling large content sources. Document your index locations, and routinely monitor the size of your content indexes because your content sources can grow quickly. Figure 30-1 shows the traffic flow when using a dedicated WFE server for indexing SharePoint content in the same server farm.

image from book
Figure 30-1: Using a dedicated WFE server for content indexing

Shared Services Provider

A Shared Services Provider (SSP) is simply another Web application in SharePoint Server 2007. For the most part, SSPs are backed up and restored using the same methods as any other Web application. However, there are a couple of differences. The first difference is when using a dedicated Web application for My Sites. This Web application and the associated IIS virtual server must be documented carefully, including IP addresses, TCP ports, and DNS entries as required. It is imperative that the namespace be exactly as it was before or the application will fail. If you are consuming My Sites hosted by a different Shared Services Provider, verify that your Trusted My Site Host Locations in Shared Services Administration is present and correct. The page that allows you to do this is shown in Figure 30-2, or refer to Chapter 18 for more information on My Site Trusted Host Locations.

image from book
Figure 30-2: My Site Trusted Host Locations page

The second difference between SSPs and other Web applications occurs when using multiple SSPs in the same farm (that is, when you are using Intra-Farm Shared Services). If you lose multiple Intra-Farm SSPs, take your time and restore the first (default) SSP, and then continue restoring after successful testing of the first. Note that you cannot delete the only SSP in a farm if you make a mistake during data restoration. You must add another SSP, make it the default, and then delete the original. If you use multiple SSPs, you can benefit by creating detailed Microsoft Office Visio diagrams. These diagrams are a valuable asset when troubleshooting complex implementations before, during, and after disasters.

Best Practices

When creating visual references of your farm structure, include the full names of your actual content databases and corresponding Web applications. This reduces confusion if you are forced to install your farm from SQL back-ups. Give your databases meaningful names (for example, WSS_Content_SSP1 for the first SSP in a farm) during setup, because doing so aids in restoration.

Excel Calculation Services

Excel Calculation Services is backed up by a combination of a Shared Services Provider backup and a backup of the application server itself. The best method for backing up the Excel Calculation Services application server is creating a system image. Creating a new system image whenever major service packs or SharePoint Server 2007 application changes occur produces the fastest return to service. Remember this image needs to be a freshly installed copy of Windows Server, without a SID change. You can also back up ECS services using the Windows Server Backup Or Restore Wizard or third-party backup utilities. This method lengthens the return to service, but it works if insufficient disk space prevents you from creating full system images. Don't forget that the flexibility of SharePoint Server 2007 allows for easy reinstallation of an application server from scratch if one fails. If you must restore custom settings on the service, be sure to restore the SSP in which Excel Calculation Services was hosted under as well. If your organization heavily relies on Excel Calculation Services, you can benefit from having redundant application servers serving Excel Calculation Services.

Note

When restoring a SharePoint Server from an image, there is always the possibility the server account in Active Directory will not synchronize correctly. In this event, simply remove the server from Active Directory, reboot, and re-add the server to Active Directory. Remember to move the server to the correct organizational unit (OU) if required.

Documenting Your Farm Installation

Documenting your installation is a common theme throughout this chapter. A robust product such as SharePoint Server 2007 relies on many resources to provide its services. This section is an overview of how to document SharePoint Servers and related infrastructure.

Server Documentation

It is highly recommended to have detailed installation documentation defining every setting and keystroke required to build a server from scratch. This is a great way to document all the nuances with your servers, such as WFE 12 hive customizations, and it prevents you from missing configuration options and forgetting software when rebuilding servers. Create a separate document for each server that contains all relevant hardware information, such as the server name, BIOS and backplane versions, network interface cards, RAID controllers, and so on. Documenting your hardware configuration makes it easier to troubleshoot, download correct drivers, and effectively communicate with technical support in the event of failure. You should also document all service packs, hot fixes, antivirus, and any other software additions. When you have servers in a load-balanced cluster, it is very important that all machines have an identical configuration. If months have gone by since a server build without documentation, a Web Part or similar piece of software will definitely be forgotten when restoring the server. This creates an inconsistent, negative user experience that can be very difficult to troubleshoot.

It is wise to have these documents backed up to a source that is easily restored in a disaster. If you have them stored only on your SharePoint site and SharePoint fails, you cannot retrieve this documentation. Storing hard copies of all your disaster recovery documents onsite and offsite is a good idea. In addition, versioning this server documentation is an invaluable method of rolling back changes when patches or third-party software affect usability and performance.

Post-Installation Changes

After you have thoroughly documented your farm installation, continually update your server documents. This practice creates a "living" document set that is always current and can be used when restoring services. Create an appendix in your server documentation with version history, and note the reason for changing your specific installation. If possible, verify any changes you make with your peers.

Testing Your Disaster Recovery Plan

After you create a disaster recovery plan, you must test your plan. Having a plan that won't work is of little use, so execute a simulation of your disaster recovery plan annually, making sure to coordinate with your colleagues and stakeholders. Executing a disaster recovery plan on a production farm is generally a bad idea, but it's a good idea to test secondary server farms and system image restores. Build a lab with a mock-up of your production environment, if possible, and test your disaster recovery plan. Try things such as Search server failures, SQL content database corruption, IIS Metabase corruption, network card failures, hard drive failures, and any other common issues you might face. This type of testing provides valuable knowledge about how to bring back a failed server farm.