Lesson 2:Troubleshooting a Network

One of the key elements of troubleshooting a network problem is having a plan of action. Many of the trouble calls you will receive are likely to be user issues involving things like the improper use of software. When you're faced with what appears to be a real problem, you should follow a set troubleshooting procedure, which should consist of a series of steps like the following:

  1. Establish the symptoms.
  2. Identify the affected area.
  3. Establish what has changed.
  4. Select the most probable cause.
  5. Implement a solution.
  6. Test the result.
  7. Recognize the potential effects of the solution.
  8. Document the solution.

The steps you follow can be slightly different, or you can perform the steps in a slightly different order, but the overall process should be similar. The following sections examine each of these steps.

After this lesson, you will be able to

  • Understand the steps involved in troubleshooting a network problem
  • List the rules for prioritizing problem calls
  • Describe the process of isolating the source of a network problem

Estimated lesson time: 15 minutes

Establishing the Symptoms

The first step in troubleshooting a network problem is to determine exactly what is going wrong, and to note the effect of the problem on the network so that you can assign a priority to a problem. In a large network environment, there are often many more calls for support than the network support staff can handle at one particular time. Therefore, it is essential to establish a system of priorities that dictate which calls get addressed first. As in the emergency department of a hospital, the priorities should not necessarily be based on who is first in line. More often, it is the severity of the problem that determines who gets attention first, although it is usually not wise to ignore the political reality that senior management problems get addressed before those of the rank and file.

The following rules can help you to establish priorities:

  • Shared resources take precedence over individual resources.  A problem with a server or other network component that prevents many users from working must take precedence over one that affects only a single user.
  • Network-wide problems take precedence over workgroup or departmental problems.  Resources that provide services to the entire network, such as e-mail servers, should be considered before departmental resources, such as file and print servers.
  • Rate departmental issues according to the function of the department. Problems with resources belonging to a department that is critical to the organization, such as order entry or customer service call centers, should take precedence over departments that can better tolerate a period of down time, such as research and development.
  • System-wide problems take precedence over application problems.  A problem that puts an entire computer out of commission, preventing a user from getting any work done, should take precedence over a problem a user is experiencing with a single device or application.

Sometimes it's difficult to determine the exact nature of the problem from the story told by a relatively inexperienced user, but part of the process of narrowing down the cause of a particular problem involves obtaining accurate information about what has occurred. Users are often vague about what they were doing when they experienced the problem, or even what the indications of the problem were. For example, in many cases, users call the help desk because they received an error message, but they neglect to write down the wording of the message. Gentle training of users in the proper procedures for documenting and reporting problems is part of the network technician's job as well. It might not be any help to you now, but it can help you the next time a user receives an error.

For now, you can begin by asking questions like the following:

  • What exactly were you doing when the problem occurred?
  • Have you had any other problems?
  • Was the computer behaving normally just before the problem occurred?
  • Has any hardware or software been installed, removed, or reconfigured recently?
  • Did you (or anyone else) do anything to try to resolve the problem?

Identifying the Affected Area

The next step in assessing the nature of the problem is to see if it can be duplicated. Network problems that you can easily duplicate are far easier to fix, primarily because you can easily test to see if your solution was successful. However, there are many types of network problems that are intermittent, or that might occur for only a short period of time. In these cases, you might have to leave the incident open until the problem occurs again. In some instances, having the user reproduce the problem can lead to the solution. User error is a common cause of problems that can seem to be hardware- or network-related to the inexperienced eye.

When you've determined that the problem can be duplicated, you can set about determining the actual source of the difficulty. If, for example, a user has trouble opening a file in a word processing application, the difficulty might lie in the application, in the user's computer, in the file server where the file is stored, or in any of the networking components in between. The process of isolating the location of the problem consists of eliminating the elements that are not the cause, in a logical and methodical manner.

If it's possible to duplicate the problem, you can begin to isolate the cause by reproducing the conditions under which the problem occurred, using a procedure like the following:

  1. Have the user reproduce the problem on the computer repeatedly, so that you can determine whether the user's actions are triggering the error.
  2. Sit at the computer yourself and perform the same task. If the problem does not occur, the cause might be in how the user is performing a particular task. Check the user's procedures carefully to see if he or she is doing something wrong. It's entirely possible that you and the user perform the same task in different ways, and that the user's method is exposing a problem that yours doesn't.
  3. If the problem reoccurs when you perform the task, log off from the user's account, log on using an account with administrative privileges, and repeat the task. If the problem does not reoccur, it is probably the result of the user not having the rights or permissions needed to perform the task.
  4. If the problem reoccurs, try to perform the same task on another, similarly equipped computer connected to the same network. If you can't reproduce the problem on another computer, you know that the cause lies in the user's computer or its connection to the network. If the problem does reoccur on another computer, you know that you're dealing with a network problem, either in the server that the computer was communicating with or the hardware that connects the two.

If you determine that the problem lies somewhere in the network and not in the user's computer, you can then begin the process of isolating the area of the network that is the source of the problem. For example, if you are able to reproduce the problem on another nearby computer, you can then begin performing the same task on computers located elsewhere on the network. Again, proceed methodically and document your results. For example, you can try to reproduce the problem on another computer connected to the same hub, and then on a computer connected to a different hub on the same LAN. If the problem occurs throughout the LAN, try a computer on a different LAN. Eventually, you should be able to narrow down the source of the problem to a particular component, such as a server, router, hub, or cable.

Establishing What Has Changed

When a computer or other network component that used to work properly now does not, it stands to reason that some change has occurred. When a user reports a problem, one of the most important pieces of information the network troubleshooter can gather is how the computing environment changed immediately prior to the malfunction. Unfortunately, getting this information from the user can often be difficult. The response to the question "Has anything changed on the computer recently?" is nearly always "No," and it's only some time later that the user remembers to mention that a major hardware or software upgrade was performed just prior to the problem occurrence. On a network with properly established maintenance and documentation procedures, it should be possible to determine if any upgrades or modifications to the user's computer have been made recently. Official records are the first place you should look for information like this.

Major changes, such as the installation of new hardware or software, are obvious possible causes of the problem that is occurring, but the network troubleshooter must be conscious of causes evidenced in more subtle changes as well. An increase in network traffic levels, for example, as disclosed by a protocol analyzer, can be a contributing cause of a reduction in network performance. Occasional problems noticed by several users of the same application, cable segment, or LAN can indicate the existence of a fault in a component of the network. Tracking down the source of a networking problem can often be a form of detective work, and learning to "interrogate" your "suspects" properly can be an important part of the troubleshooting process.

For more information about error messages and other indicators used to troubleshoot network problems, see Lesson 2: Logs and Indicators, in Chapter 18, "Network Troubleshooting Tools."

Selecting the Most Probable Cause

There's an old medical school axiom that says when you hear hoofbeats, think horses, not zebras. In the context of network troubleshooting, this means that when you look for possible causes of a problem, start with the obvious first. For example, if a workstation is unable to communicate with a file server, don't start by checking the routers between the two systems; check the simple things on the workstation first, such as whether the network cable is plugged into the computer. The other important part of the process is to work methodically and document everything you check, so that you don't duplicate your efforts.

Implementing a Solution

After you have isolated the problem to a particular piece of equipment, you can proceed to try and determine if it is caused by hardware or software. If it's a hardware problem, you might then proceed by replacing the unit that is at fault or by using an alternate. Communication problems, for example, might force you to try replacing network cables until you find one that is faulty. If the problem is in a server, you might need to replace components, such as hard drives, until you find the culprit. If you determine that the problem is caused by software, you might want to try running an application or storing data on a different computer, or reinstalling the software on the offending system.

In some cases, the process of isolating the source of a problem includes the resolution of the problem. If, for example, you end up replacing network patch cables until you find the one that is faulty, replacing the bad cable is the resolution of the problem. In other cases, however, the resolution might be more involved, such as having to reinstall a server application or operating system. Because other users might need to access that server, you might have to defer the resolution of the problem until a later time, when the network is not in use and after you've backed up the data stored on the server. In some cases, you might even have to bring in outside help, such as a contractor to pull new cables. This can require careful scheduling to avoid having the contractor's work conflict with the activities of you and your users. Sometimes, you might want to provide an interim solution, such as a substitute workstation or server, until you can definitively resolve the problem.

Testing the Results

When you have implemented your resolution to the problem, you should return to the very beginning of the process and repeat the task that originally caused the problem. If the problem no longer occurs, you should test the other functions related to the changes you've made to ensure that in fixing one problem, you haven't created another. It is at this point that the time you spent documenting the troubleshooting process becomes worthwhile. You should repeat the procedures you used to duplicate the problem exactly, to ensure that the problem the user originally experienced has been completely eliminated, and not just temporarily masked. If the problem was intermittent to begin with, it may take some time to ascertain if your solution has been effective. You might need to check with the user several times to make sure that the problem is not reoccurring.

Recognizing the Potential Effects of the Solution

It is important, throughout the troubleshooting process, to keep an eye on the big network picture, and not let yourself become too involved in the problems experienced by one user (or application, or LAN). It is sometimes possible, while implementing a solution to one problem, to create another that is more severe or that affects more users. For example, if users on one LAN are experiencing high traffic levels that diminish their workstation performance, you might be able to remedy the problem by connecting some of their computers to a different LAN. However, although this solution might help the users originally experiencing the problem, you might overload another LAN in the process, causing another problem that is more severe than the first one. You might want to consider a more far-reaching solution instead, such as creating an entirely new LAN and moving some of the affected users over to it.

Documenting the Solution

Although it is presented here as a separate step, the process of documenting your actions should begin as soon as the user calls for help. A well-organized network support organization should have a system in place in which each problem call is registered as a trouble ticket that eventually contains a complete record of the problem and the steps taken to isolate and resolve it. In many cases, a technical support organization operates using tiers, which are groups of technicians of different skill levels. Calls come in to the first tier, and if the problem is sufficiently complex or the first-tier technician is unable to resolve it, the call is escalated to the second tier, which is composed of senior technicians. As long as everyone involved in the process documents his or her activities, there should be no problem when one technician hands off the ticket to another. In addition, keeping careful notes prevents people from duplicating each other's efforts.

The final phase of the troubleshooting process is to explain to the user what happened and why. Of course, the average network user is probably not interested in hearing all the technical details, but it's a good idea to let users know whether their actions caused the problem, exacerbated it, or made it more difficult to resolve. This gradual education of the network's users can lead to a quicker resolution next time, or even prevent a problem from occurring altogether.

Exercise 1: Network Troubleshooting

Place the following steps of the problem isolation process in the proper logical order:

  1. Reproduce the problem using a different computer.
  2. Reproduce the problem yourself.
  3. Have the user reproduce the problem.
  4. Reproduce the problem using a different user account.

Lesson Review

  1. Which of the following problems would you assign the highest priority for your network support team? Explain why.
    1. The printer in the order entry department isn't working.
    2. The corporate e-mail server is down.
    3. A hub is malfunctioning in the sales department.
    4. The president of the company's workstation is locked up.
  2. In a two-tiered network support system, what do the tiers refer to?
    1. File servers storing network documentation
    2. Priorities for trouble tickets
    3. Problem call databases
    4. Technicians of different skill levels
  3. How does a UPS protect a network?

Lesson Summary

  • The process of troubleshooting a network should proceed through several steps, including identifying, duplicating, isolating, resolving, and documenting the problem.
  • Isolating a network problem is a matter of eliminating hardware and software components that are not possible causes.
  • Maintaining a carefully documented and methodically applied troubleshooting process is an essential part of maintaining a network.

Network+ Certification Training Kit
Self-Paced Training Kit Exam 70-642: Configuring Windows Server 2008 Network Infrastructure
ISBN: 0735651604
EAN: 2147483647
Year: 2001
Pages: 105

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net