Problem-Solving Techniques


After you've got a well-documented network, all you have to do is sit back and wait for problems to occur. Spurious as that may seem, it's true. Sometime, some day, when you least expect it, something out of the blue will knock a server offline, disable a printer, and so on. If you have good documentation, you can tackle the problem and do so from a structured point of view.

The troubleshooting method known as the problem resolution cycle builds on accurate documentation for the network and uses a simple question/answer technique to determine what has changed to bring about the problem.

The Problem Resolution Cycle

The problem resolution cycle is a method designed to meet two needs: to solve the immediate problem that prevents the network (or a component of the network) from working, and to provide insights as to the cause of the problem so that it can be avoided or quickly solved in the future. The elements of a structured problem resolution cycle approach are as listed here:

  • Accurate and complete descriptions of the symptoms. Determine whether a problem really exists, or whether the user is using the computer or application improperly.

  • Understanding how the network functions from a logical and physical point of view.

  • Solving the problem instead of creating a makeshift fix.

  • Providing a follow-up mechanism for recording and distributing solutions to others who may have a need to know, such as staff at a help desk or a departmental supervisor.

  • Development of a solution-tracking system to keep you from having to solve the same problem over and over again.

In most cases, the more data you can collect about a problem, the easier the problem will be to solve. When selecting employees who will serve as help-desk personnel, for example, try to get someone with both good verbal and good listening skills. Although the initial problem report might be something like "I can't print this document," a good help-desk technician can usually walk the user through a series of questions to determine whether other symptoms are present. In the example just given, it would be prudent to ask whether the user can print other documents, or whether the problem is with just the one document. What about different types of documents? If the user can print a spreadsheet but not a word processor document, the problem may be with the application. Another good question would be to ask whether any other users of the printer are having a problem. As you gather more data, you can focus your troubleshooting efforts on the local user PC or the printer. If the user can't print anything but no one else is having a problem, you can begin to troubleshoot the printer configuration (has the user made changes you are unaware of?). Or perhaps the user has lost network connectivity and it's a simple matter to try to ping the computer. You can use utilities such as ping or tracert to determine whether connectivity exists between the user and the printer or print server. After that, you could start investigating to be sure that the correct print driver is installed, and so on.

Utilities such as ping and tracert are covered in Chapter 28, "Troubleshooting Tools for TCP/IP Networks."


This brings up the network maps mentioned earlier in this chapter. You can quickly locate what hub, switch, or other network device the user's computer is attached to by using a physical map of the network. Using a logical map, you can find other users or computers that make use of the same information flow through the network.

Sometimes things just fix themselves . For example, it may be that the user could not print because a router standing between the user and the printer was overloaded temporarily and was not able to route packets from the user's network segment to the printer. In these situations, don't let sleeping dogs lie. Instead, keep investigating (using your network maps) and try to determine what caused the problem. You can use performance and capacity reporting techniques for servers and network devices. In the next chapter we'll talk about the Simple Network Management Protocol (SNMP) and RMON (Remote Monitoring Protocol) that enable you to gather statistical information about network devices. Find out what caused a problem so that you can anticipate when it might happen again, and try to take measures to prevent it.

Keep track of all incidents in an orderly fashion, and make the information known to others who might encounter the same problem. A help desk should have a log of some sort so that every problem called into the help desk is tracked from the time the call is placed until the problem is solved and the call is closed. Provide feedback to the user about how the problem was solved. This is especially important when you have problems that are self-induced, such as when users try to change the configuration of their computer although they know only enough to be dangerous to themselves!

Don't repeat past mistakes. By tracking problems and recording the troubleshooting effort and the solution to the problem, you make it easier to solve the same, or similar, problems in the future. Your help desk should have a database of some sort (such as a spreadsheet, or perhaps a Web site with documentation linked via HTML code) that can be used to see whether a problem with similar symptoms has been called in before.

Is There Really a Problem?

Sometimes, as noted in the preceding section, problems just fix themselves. There are times when you can't ever find the reason for a particular problem. In many cases, you'll find that sporadic problems are caused not by equipment or software failure, but by users who are not using the system correctly. When any new application is deployed on a network, you need to be sure that the end users receive adequate training for using the application or else you may find that user errors begin to account for many of your help-desk calls. For example, a user may have corrupted files on a hard disk. Should you replace the disk? Should you search for a virus or another harmful program? These sound like logical things to do.

Or you could simply ask whether the user is properly shutting down the computer or just "power cycling" it when he gets stuck in an application and can't find a way out. Some people find that just turning a computer off and back on again is a fine way to start anew, without realizing the problems they may encounter down the line. So, when troubleshooting, try to find out what has led up to the problem. It may be a simple case of user training that needs to be addressed.

I can't stress enough the importance of training new users in the workings of the environment in which they will be placed. If you have configured a desktop in a certain manner, you can't assume that a new employee will be able to make proper use of it. Although it's easy to check someone's r sum to determine what applications they are skilled at using, it's difficult to be sure what the configuration of the application was at their previous place of employment. The same goes for training classes offered by temp agencies and other similar organizations. Although they may have used a standard installation for training purposes, any customizations or configuration changes you make need to be explained to the new user. So, as a general rule, no matter how qualified a new employee may appear to be, it's just an appearance. You should have in place a structured training program and require each new employee to attend , or at least initiate a mentoring system so that one user can teach another.

Tip

Remember that training doesn't stop at new hire orientation. As the network, applications, and so on change over time, retraining should also be a requirement.

Has This Happened Before ”What Is the Procedure to Follow?

Keeping track of how problems were solved will keep you from expending a lot of effort solving the same problem again and again. Using documentation that enables a quick lookup of information based on symptoms can help you find older problem reports or perhaps standard help-desk documentation that was written specifically because a particular problem frequently occurs. Indeed, when a problem does occur frequently, it's time to find a better solution to the problem. So by tracking problems and the methods used to troubleshoot and solve the problem, you can not only find it easier to solve the current incident, but also provide a feedback mechanism so that you will know that a particular problem needs a better long- term solution.

For problems that occur on a frequent basis, but that you don't have a lot of control over (such as a user causing errors by not using an application or the network in the appropriate manner), you can at least create a step-by-step outline for solving the problem to make life at the help desk a little less frustrating.

First Things First: The Process of Elimination

If you understand how your network is put together, from both a logical and a physical point of view, then it is possible to use the process of elimination to narrow the focus of your troubleshooting efforts. Some things to think about when trying to pinpoint the cause of a network program include the following:

  • What devices ” computers, hubs, switches, cables, and so on ”are involved? Can you use troubleshooting tools to narrow your search to a single device or a subset of the network?

  • If a single computer or device appears to be the only part of the network affected, what is unique about it? If another similar device is up and running, how do the devices differ in their configuration or location in the network?

  • If the problem is occurring on multiple systems, what do they all have in common? Are they all on the same network segment? Do they all share a common subnet address? Do they all use the same path through the network to access a device or service that now appears to be unreachable?

  • What task was the user performing when the problem occurred? Get specifics about exactly what the user was doing, both up to and when the event occurred. For example, was he using more than one application, printing to more than one printer, or perhaps doing something he should not (like opening an attachment from email that came from outside the local network)?

  • Can the problem be reproduced? Walk the user through the same set of steps again and see whether the problem recurs . Next try the same with another user to determine whether the problem is localized to only one computer or is a symptom of a bigger problem or configuration issue.

By narrowing your focus to only the section of the network that experiences the problem, you can more quickly look at the computers and other components of that part of the network to solve the problem. By reproducing the problem, you can be sure that you've isolated the cause. Eliminate the obvious ("Is it plugged in?") and get to the specifics as quickly as you can. Actually, silly as it may sound, asking whether a computer is plugged in is really a very good question. More than once I've come in to work to find that a monitor or another device was off. A quick glance at the power strip can indicate that someone, perhaps a housekeeping employee, may have accidentally unplugged the strip, or flipped the switch to turn off the power.

Auditing the Network to Locate Problem Sources

It is important to know how your network operates from a logical and physical point of view. It's also important to know the capacity of the components of the network, and the degree to which they are utilized. Sometimes problems are simply due to congestion on the network. You can determine these problems by using monitoring software, such as SNMP and RMON, and by baselining your network so that you know what the typical usage patterns are. Knowing when components of the network are stressed close to their usable capacity allows you to plan an upgrade to eliminate the bottleneck, or to reschedule user work habits to make more efficient use of the network.



Upgrading and Repairing Networks
Upgrading and Repairing Networks (5th Edition)
ISBN: 078973530X
EAN: 2147483647
Year: 2003
Pages: 434

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net