Regardless of the problem, effective network troubleshooting follows some specific troubleshooting steps. These steps provide a framework in which to perform the troubleshooting process and, when followed, can reduce the time it takes to isolate and fix a problem. The following sections discuss the common troubleshooting steps and procedures as identified by the CompTIA Network+ objectives. CompTIA lists the troubleshooting steps as follows:
Identify the Symptoms and Potential CausesThe first step in the troubleshooting process is to establish exactly what the symptoms of the problem are. This stage of the troubleshooting process is all about information gathering. To get this information, we need a knowledge of the operating system used, good communication skills, and a little patience. It is very important to get as much information as possible about the problem. You can glean information from three key sources: the computer (in the form of logs and error messages), the computer user experiencing the problem, and your own observation. Once you have identified the symptoms, you can begin to formulate some of the potential causes of those symptoms.
Identifying the Affected AreaSome computer problems are isolated to a single user in a single location; others affect several thousand users spanning multiple locations. Establishing the affected area is an important part of the troubleshooting process, and it will often dictate the strategies you use in resolving the problem.
Problems that affect many users are often connectivity issues that disable access for many users. Such problems can often be isolated to wiring closets, network devices, and server rooms. The troubleshooting process for problems that are isolated to a single user will often begin and end at that user's workstation. The trail might indeed lead you to the wiring closet or server, but that is not likely where the troubleshooting process would begin. Understanding who is affected by a problem can provide you with the first clues about where the problem exists. Establishing What Has ChangedWhether there is a problem with a workstation's access to a database or an entire network, keep in mind that they were working at some point. Although many claim that the "computer just stopped working," it is unlikely. Far more likely is that there have been changes to the system or the network that caused the problem. Look for newly installed applications, applied patches or updates, new hardware, a physical move of the computer, or a new username and password. Establishing any recent changes to a system will often lead you in the right direction to isolate and troubleshoot a problem.
Selecting the Most Probable Cause of the ProblemThere can be many different causes for a single problem on a network, but with appropriate information gathering, it is possible to eliminate many of them. When looking for a probable cause, it is often best to look at the easiest solution first and then work from there. Even in the most complex of network designs, the easiest solution is often the right one. For instance, if a single user cannot log on to a network, it is best to confirm network settings before replacing the NIC. Remember, though, that at this point, you are only trying to determine the most probable cause, and your first guess might, in fact, be incorrect. It might take a few tries to determine the correct cause of the problem. Implement an Action Plan and Solution Including Potential EffectsAfter identifying a cause, but before implementing a solution, you should develop a plan for the solution. This is particularly a concern for server systems in which taking the server offline is a difficult and undesirable prospect. After identifying the cause of a problem on the server, it is absolutely necessary to plan for the solution. The plan must include details around when the server or network should be taken offline and for how long, what support services are in place, and who will be involved in correcting the problem. Planning is a very important part of the whole troubleshooting process and can involve formal or informal written procedures. Those who do not have experience troubleshooting servers might be wondering about all the formality, but this attention to detail ensures the least amount of network or server downtime and the maximum data availability. With the plan in place, you should be ready to implement a solutionthat is, apply the patch, replace the hardware, plug in a cable, or implement some other solution. In an ideal world, your first solution would fix the problem, although unfortunately this is not always the case. If your first solution does not fix the problem, you will need to retrace your steps and start again. It is important that you attempt only one solution at a time. Trying several solutions at once can make it very unclear which one actually corrected the problem.
Testing the ResultsAfter the corrective change has been made to the server, network, or workstation, it is necessary to test the resultsnever assume. This is when you find out if you were right and the remedy you applied actually worked. Don't forget that first impressions can be deceiving, and a fix that seems to work on first inspection might not actually have corrected the problem. The testing process is not always as easy as it sounds. If you are testing a connectivity problem, it is not difficult to ascertain whether your solution was successful. However, changes made to an application or to databases you are unfamiliar with are much more difficult to test. It might be necessary to have people who are familiar with the database or application run the tests with you in attendance. Identify the Results and Effects of the SolutionSometimes, you will apply a fix that corrects one problem but creates another problem. Many such circumstances are hard to predictbut not always. For instance, you might add a new network application, but the application requires more bandwidth than your current network infrastructure can support. The result would be that overall network performance would be compromised. Everything done to a network can have a ripple effect and negatively affect another area of the network. Actions such as adding clients, replacing hubs, and adding applications can all have unforeseen results. It is very difficult to always know how the changes you make to a network are going to affect the network's functioning. The safest thing to do is assume that the changes you make are going to affect the network in some way and realize that you just have to figure out how. This is when you might need to think outside the box and try to predict possible outcomes. Documenting the SolutionAlthough it is often neglected in the troubleshooting process, documentation is as important as any of the other troubleshooting procedures. Documenting a solution involves keeping a record of all the steps taken during the fixnot necessarily just the solution. For the documentation to be of use to other network administrators in the future, it must include several key pieces of information. When documenting a procedure, you should include the following information:
|