General Troubleshooting Approach | Special Edition Using Microsoft SharePoint Portal Server

	Special Edition Using Microsoft SharePoint Portal Server By Robert Ferguson
	Table of Contents

	Chapter 23. Troubleshooting

The way in which we approach a problem, whether SharePoint Portal Server related or otherwise , is vital to actually solving the problem. Our approach recommended here is nothing short of common sense, backed up by best practices, and includes the following:

Identify and document the problem or issue, sometimes termed the mode of failure, or MOF
Address the problem in terms of general troubleshooting before escalating
Compare the MOF with known problems/issues
Verify the solution stack(s) as appropriate, focusing on supported combinations of both hardware and software
Verify that changes to the solution stack have been promoted through appropriate "change control" throughout the system landscape (see note)
Verify the integration points as appropriate (see note)

NOTE

Change control refers to the practice of first testing a suggested or potential change to a production system in a technical sandbox or other SPS environment. Eventually, after a period of perhaps a few days to a few months, the change (such as the application of a new Service Pack for Windows 2000 Advanced Server) is promoted, or applied, to the next system in the system landscape, like a test, or development, or training system (see Figure 23.1).

Figure 23.1. A sample SharePoint Portal Server five-system landscape, depicting Development, Test, Training, Production, and a Disaster Recovery environment.

graphics/23fig01.jpg

It should be noted that by the time a change is finally promoted through the entire landscape to the actual production environment, it has been thoroughly tested for an extended period of time in environments as near identical to the production environment as possible. This, combined with other best-practice approaches to managing change, allows for a world-class production SPS system in terms of stability and minimal unplanned downtime.

NOTE

Integration points simply refer to where the solution stack layers "touch" or communicate with each other. In fact, integration points are sometimes referred to as touch points.

A discussion around the SPS client components makes for a good example. The client components are extensions to Microsoft Windows Explorer and various Microsoft Office applications. As such, these extensions represent the integration points between SharePoint Portal Server and Windows Explorer and the MS Office applications. See Figure 23.2 for an illustration of a typical high-level solution stack, as well as some of the integration points common to SharePoint Portal Server.

Figure 23.2. SharePoint Portal Server and SPS clients exhibit a number of integration or touch points.

graphics/23fig02.jpg

Documenting the Issue

An excellent starting point or approach to troubleshooting is simply to document the problem and ensure that the problem itself is clear to the technical team that will work toward resolving it. Only when a complex problem can be adequately and clearly explained is there hope for a resolution. Think about your experiences with various technical support organizations, and it will quickly become evident that identifying or documenting the issue is usually the first step taken to resolve it.

Perform General Troubleshooting

This section addresses potential problems associated with Microsoft SharePoint Portal Server 2001, as well as possible solutions. Use the following list of general troubleshooting steps as a standard approach to solving problems:

Maintain careful notes of the installation process and of any unexpected errors or attempts to correct problems that may have occurred.
Review your notes, and the notes of your colleagues performing similar tasks , when-ever a new problem occurs, looking back to see if you have seen and resolved this issue in the past.
Even if the problem is new to you, the methods you employed to solve previous problems might still be of use.
Continue taking excellent notes as you persist in troubleshooting and taking steps to try to resolve the issue. Worse case, you can leverage these if a call to a technical support organization is required. And clear notes are of great value in the middle of the night, after you have tried a multitude of fixes and are still struggling with a problem. In this case, your notes will be critical in terms of keeping the team on the right path , and helping to avoid repeating troubleshooting tasks that are of no value.

Further, the more complex your systems environment has become, the greater importance a good internal knowledge base plays in helping you to resolve a problem. We have all been in a position where we find ourselves saying "didn't we see a problem like this six months ago?" Maintaining a knowledge base helps us answer that question with a solution.

It is also often very helpful to step back at times, and start looking over the more obvious causes of system failures. For example, you might perform or check the following, so as to quickly rule out these easy fixes as potential culprits (see Figure 23.3 for an illustration of some of these common and easy "quick fixes"):

Verify you have power. Really, do it!
Check all other physical connections and then restart the server, verifying whether the problem still occurs.
Review the event logMicrosoft Windows 2000's Event Viewer can be extremely helpful in pinpointing problem areas, even if the events in the log are only really symptoms of a bigger problem.
When unknown errors occur, check that the network connections between computers are actually functioning before proceeding with further troubleshooting. Utilize the ping utility to test whether client and server computers are connected (for example, ping 200.200.100.1 to ensure a response, and then ping the host name to ensure name resolution works). You might also ping a 200.200.100.1, so as to check reverse name lookup as well.
Refer to the SharePoint Portal Server Readme, other help files, and similar technical support resources to determine whether the problem you are experiencing is a known issue.
Refer to the Microsoft Product Support Services Web site at http://support.microsoft.com/. And leverage experienced consulting resources you may already have on staff or within your local organization.
Check your enterprise management application, like HP Openview or Compaq Insight Manager, for obvious hardware, software, or other failures.
Gather information from the Server Installation Logs, such as the errorlog.txt file and the eventlog.txt file, by viewing them with a text editor.
Review the setup.log file. This file is located in Program Files\Microsoft Integration\SharePoint Portal Server\Logs.
Review the spsclisrv.log file. This file is located in Program Files\Microsoft Integration\SharePoint Portal Server\Logs for successful server installations.
Review the Exchange Server Setup Progress.log file, if applicable . This file is located at the root of the operating system drive. This log clearly indicates whether the Web Storage System installed correctly, helping you identify some common problems such as

A server name has an illegal character.

IIS 5 is not installed, so the Web Storage System cannot be installed.

The SMTP service is not installed.

Figure 23.3. Never forget to turn to and rule out the "quick fixes" before moving forward with more complex troubleshooting.

graphics/23fig03.jpg

When you finally must call someone outside of your organization, whether it is a Product Support specialist, a technical support organization, your management team or other escalation team, or simply one of your remote colleagues, be prepared to answer the following questions:

What is the problem?
What is the last thing that you changed before you started seeing the problem?
How often is it recurring? Only when something else occurs first?
Can you reproduce the problem? Often the solution is found when you try to recreate the problem in a testing environment.
Is more than one server affected?
Is more than one user affected?
Can you characterize the problem? That is, does the problem seem to be related to authentication, content, access, or personalization? Try to isolate the problem.

Verify Known Issues

Next, leverage known issues, or other pockets of information to determine whether the problem at handyour MOFis actually a common problem that has been seen and rectified in the past. Like stated above, your own technical notes or the notes of your colleagues who perform similar tasks may be a good place to start.

An excellent example of one of these pockets of information, and a generally good place to start troubleshooting once the easy fixes are ruled out, is Microsoft's TechNet resource.

TechNet has been called by some the Microsoft Biblelike the real thing, it provides rules and gives us great examples that we can emulate, so as to keep our Microsoft solutions alive and well. A CD-based version has existed for years . More recently, though, TechNet has been made available as a Web-based resource. See http://www.microsoft.com/technet/sharepoint for help on deploying, supporting, and maintaining SharePoint Portal Server.

Other valuable pockets of information exist, too, and include similar or related online toolsets by various Microsoft hardware and software partners . Examples could be

Your server or disk subsystem hardware vendors ' online toolsets
Your hardware vendor's technical support organizations
Microsoft's technical support organizations (specific to OS and product)
Your SharePoint Portal Server systems integrator
Other systems integrator toolsets and approaches available to you
Other large and capable systems integrators, such as Compaq Global Services, Microsoft Consulting Services, IBM Global Services, and so on.

It should be noted that if your technical team can justify the time and cost, one or more individuals should be made responsible for "keeping an eye out" for new SPS issues, patches, and other updates from the various vendors that play a supporting role in your particular SPS solution stacks.

Verify the Solution Stacks

Should the problem at hand not easily fall into the convenient categories of easy fixes or known issues, we find ourselves in new territory. At this point, it makes sense to thoroughly verify that the solution stack foundation itself is indeed sound. This is especially important if we find ourselves with a problem cropping up after changes, however minute, have been made to the solution. Any changes may be related to, and usually ultimately are, the root cause. Even if the change itself is seemingly benign , the impact that the change has in terms of integration points may be detrimental. For example, a tape drive OS driver may continue to function apparently well even when upgraded to a new version, but the performance of the tape drive might be horrendous if a firmware upgrade is not also performed. Note that this example represents a classic case of creating an unsupported configuration out of a stable solution. Sound change control practices minimize these kinds of problems in productive SPS implementations , but issues still manage to crop up from time to time. In our tape drive example, though, even if our change control process fails, we still have the hardware vendor(s), Microsoft, and any number of systems integrators to fall back on for support.

Verify Change Control

This brings us to our next approachverifying that any changes to the environment performed after the initial installation, but prior to the solution going live, actually underwent sound change control processes. In many cases, troubles arise simply because a change never actually was tested in a customer's enterprise environment. The smallest of changesa firmware upgrade to a system board, or a post-Service Pack driver updatemay cause a seemingly robust solution to crater.

To read more about change control, also commonly referred to as change management, see "Addressing Change Management," p. 536.

Suffice it to say that not enough can be said about testing the impact of a change before implementing it into your production portal. This means first testing potential changes in your technical sandbox, or test system, or elsewheresee Figure 23.4 for a simple three-system landscape "Promote to Production" change management process used by many organizations running SPS. Successful testing implies that the solution stack remains stable, and therefore testing is critical across the board. Only once we know we have a stable production-ready foundationa sound solution stackcan we proceed into more detailed troubleshooting and analysis.

Figure 23.4. This simple three-system landscape illustrates the "Promote to Production" change management process used by many organizations that have deployed SharePoint Portal Server.

graphics/23fig04.jpg

Verify Integration Points

Another common problem area in a SharePoint Portal Server installation is the "touch points," or integration points, that many installations require simply to provide a business solution. That is, touch points from SharePoint Portal Server to Exchange Server 2000 public folders, SQL Server 2000 or Lotus Notes databases, network file servers, proxy servers, other Web servers, and more are often required to actually create a productive SPS solution, one that solves the business problems for which SPS is being implemented. No SharePoint Portal Server is an island.

Yet another layer of integration exists within the local SharePoint installation, tooSharePoint resides on top of the operating system and stores data in a Web-based database, for example. Additionally, a particular solution may have been architected to provide greater server availability or redundancytwo servers may be providing substantially the same services under the guise of one SharePoint solution. Similarly, servers dedicated to specific functions may exist in the environment, tied together both logically at an application layer, and physically from a network infrastructure perspective. An example might be a heavy-duty enterprise search solution, where multiple search servers and multiple crawl servers are configured for maximum performance, availability, and high levels of scalability, yet represent a single productive system.

Using the proxycfg.exe Tool

If you wish to use SharePoint Portal Server with fully qualified domain names , or FQDNs, you must run the proxycfg.exe tool. Access to this tool is available via the SharePoint Portal Server \Bin directory. The proxycfg.exe tool is used to exclude access to an SPS server through the proxy server. Simply type proxycfg at a command prompt to view the current proxy settings.

To configure the proxy settings, type

 proxycfg d p  proxy_name:port_number  "  bypass_address  "

where proxy_name is the name of your proxy server, and port_number is the port number used on the proxy server.

bypass_address is a mask for all of the addresses not using the specified proxy server. The bypass address is of the form *domain, such as *microsoft.com. You can use <local> as the bypass address to specify all intranet addresses. You can separate multiple bypass addresses with a semicolon.

If you are using fully qualified domain names (FQDNs) on your intranet, do not use <local>. Instead, you must use the form *domain to specify each address that the dashboard site accesses without going through the proxy server.

Top