Troubleshooting Strategies


After you observe symptoms, check technical information sources, and review your system s history, you might be ready to test a possible solution based on the information that you have gathered. If you are unable to locate information that applies to your problem or find more than one solution that applies, try to further isolate your problem by grouping observations into different categories such as software-related symptoms (due to a service or application), hardware-related symptoms (by hardware types), and error messages. Prioritize your list by frequency of occurrence and eliminate symptoms that you can attribute to user error. This enables you to methodically plan the diagnostic steps to take, or to select the next solution to try.

Isolate and Resolve Hardware Problems

When troubleshooting hardware, start with and work toward the simplest configuration possible by disabling or removing devices. Then incrementally increase or decrease complexity until you isolate the problem device. In safe mode, Windows XP Professional starts with only essential drivers and is useful for diagnosing problems. For more information about safe-mode troubleshooting, see Windows XP Professional Help and Support Center, and Tools for Troubleshooting in this book.

Check Your Hardware

If your diagnostic efforts point to a hardware problem, you can run diagnostic software available from the manufacturer. These programs run self-tests that confirm if a piece of hardware has malfunctioned or failed and needs replacing. You can also install the device on different computers to verify that the problem is not due to system-specific configuration issues. Replacing defective hardware and diagnosing problems on a spare or test computer minimizes impact to the user due to the system being unavailable. If diagnostic software shows that the hardware is working, consider upgrading or rolling back device drivers.

Reverse Driver Changes

If a hardware problem causes a Stop error that prevents Windows XP Professional from starting in normal mode, you can use the Last Known Good Configuration startup option. The Last Known Good Configuration enables you to recover from problems by reverting driver and registry settings to those used during the last user session. If you are able to start Windows XP Professional in normal mode after using the Last Known Good Configuration, disable the problem driver or device. Restart the computer to verify that the Stop message does not recur. If the problem persists, repeat this procedure until you isolate the hardware that is causing the problem.

Another method to recover from problems that occur after updating a device driver is by using Device Driver Roll Back in safe or normal mode. If you updated a driver since installing Windows XP Professional, you can roll back the driver to determine if the older driver restores stability. If another driver is not available, disable the device by using Device Manager until you are able to locate an updated driver.

Using Device Manager to disable devices is always preferable to physically removing a part because using Device Manager does not risk damage to internal components. If you cannot disable a device by using Device Manager, uninstall the device driver, turn off the system, remove the part, and restart the computer. If this improves system stability, the part might be causing or contributing to the problem and you need to reconfigure it.

For more information about or the Last Known Good Configuration startup option and Device Driver Roll Back, see Windows XP Professional Help and Support Center. Also, see Troubleshooting Startup and Tools for Troubleshooting in this book. For more information about disabling devices and drivers, see Managing Devices in this book. For more information about Stop messages, see Common Stop Messages for Troubleshooting in this book.

Isolate and Resolve Software Issues

If you suspect that a software problem or a recent change to system settings is preventing applications or services from functioning properly, use safe mode to help diagnose the problem. You can also use the Last Known Good startup option or System Restore to undo changes made by a recently installed application, driver, or service. You can isolate issues by using the following methods.

Closing applications and processes

Close applications one at a time, and then observe the results. A problem might occur only when a specific application is running. You can use Task Manager to end applications that have stopped responding. For more information about ending applications and processes using Task Manager, see Windows XP Professional Help and Support Center.

Temporarily disabling services

By using the Services snap-in (Services.msc) or the System Configuration Utility (Msconfig.exe), you can stop and start most system services. For some services, you might need to restart the computer for changes to take effect. For more information about disabling services by using the Services snap-in and the System Configuration Utility, see Windows XP Professional Help and Support Center and Troubleshooting Startup in this book.

To isolate a service-related problem, you can choose to do the following:

  • Disable services one at a time until the problem disappears. You can then enable all other services to verify that you found the cause of the problem.

  • Disable all non safe mode services and then re-enable them one at time until the problem appears. Use the System Configuration Utility and boot logging to determine the services and drivers initialized in normal and safe mode. You can then disable all non safe mode drivers and re-enable them one at a time until the problem returns.

For more information about System Restore, System Configuration Utility and boot logging, see Windows XP Professional Help and Support Services and Tools for Troubleshooting in this book. For more information about disabling applications and services while troubleshooting startup problems, see Troubleshooting Startup in this book.

Avoid Common Pitfalls

You can complicate a problem or troubleshooting process unnecessarily by acting too quickly. Avoid the following common pitfalls that can hinder your efforts:

  • Not adequately identifying the problem before taking action

  • Not observing the effects of diagnostic changes

  • Not documenting changes while troubleshooting

  • Not restoring previous settings

  • Troubleshooting several problems at one time

  • Using incompatible or untested hardware

  • Using incompatible software

Not Identifying the Problem Adequately

If you fail to make essential observations before responding, you can miss important information in the critical moments when symptoms first appear. Here are some typical scenarios.

Failing to record information before acting

An error occurs and you start your research without recording important information such as the complete error message text and the applications running. During your research, you check technical information resources but find that you are unable to narrow the scope of your search due to insufficient information.

For more information about the types of information to record during troubleshooting, see Identify Problem Symptoms earlier in this chapter.

Restarting the computer too soon

In response to frequent random errors users experience with a certain application, you restart the affected computers without observing and recording the symptoms. Although users can resume work for the day, a call to technical support later that day is less effective because you cannot reproduce the problem. You must wait for the problem to recur before you can gather critical information needed to determine the root cause. For example, symptoms can be caused by power surges, faulty power supplies, excessive dust, or inadequate ventilation. Restarting the computer might be a temporary solution that does not prevent recurrence.

Failing to check for scheduled maintenance events or known service outages

A user comes to work early and finds that network resources or applications are not responding. You spend time troubleshooting the problem without success only to discover that both you and the user failed to read e-mail announcing that scheduled maintenance would cause temporary early morning outages.

Assuming that past solutions always work

Prior experience can shorten the time to solve a recurring problem because you already know the remedy. However, the same solution might not always solve a problem that looks familiar. Always verify the symptoms before acting. If your initial assumptions are incorrect, and you misdiagnose the problem, your actions might make the situation worse. Keep an open mind when troubleshooting. When in doubt, verify your information by searching technical information sources (including technical support) and obtain advice from experienced colleagues. Do not ignore new information and question past procedures that seem inappropriate.

Neglecting to check the basics

A user cannot print to a new local inkjet printer. You verify cable and power connections, check the ink cartridge, and run the printer s built-in diagnostics, but find nothing wrong. Windows XP Professional cannot detect the printer, so you manually install the most recent drivers without success. Reinstalling Windows XP Professional does not solve the problem, and you later realize that you neglected to find out if printing to any local printer from this computer has ever been successful. You find that the user has never tried this, and a firmware check reveals that the parallel port is disabled. Enabling the parallel port resolves all printing problems.

Not Observing the Effects of Diagnostic Changes

System setting changes do not always take effect immediately. For example, when troubleshooting replication issues, you must wait to observe changes. If you do not allow adequate time to pass, you might prematurely conclude that the change was not effective. To avoid this situation, familiarize yourself with the feature that you are troubleshooting and thoroughly read the information provided by technical support before judging the effectiveness of a workaround or update.

Not Documenting Changes while Troubleshooting

Documenting the steps that you take while troubleshooting allows you to review your actions after you have resolved the problem. This is useful for very complex problems that require lengthy procedures to resolve. Documenting your steps allows you to verify that you are not duplicating or skipping steps and enables others to assist you with the problem. It also allows you to identify the exact steps to take if the problem recurs and enables you to evaluate the effectiveness of your efforts.

Not Restoring Previous Settings

If disabling a feature or changing a setting does not produce the results you want, restore the feature or setting before trying something else. For example, record firmware settings before changing them to diagnose problems. Not restoring settings can make it difficult to determine which of your actions resolved the problem. When verifying solutions that require you to make extensive changes or restart the computer multiple times, perform backups before troubleshooting so that you can restore the system if your actions are ineffective or cause startup problems.

Review backup procedures

Backups are essential for all computers, from personal systems to high-availability servers. If you suspect that your troubleshooting efforts might worsen the problem or risk important data, perform a backup. This enables you to restore your system if you experience data loss, Stop errors, or other startup problems. Backups allow you to partially or completely restore the system and continue where you left off. When you evaluate or create backup procedures, consider the following:

For more information about using Backup for troubleshooting, see Tools for Troubleshooting in this book. For more information about performing and planning backups, see Backup and Restore in this book.

Windows XP Professional also provides other ways to restore system settings such as System Restore and the Last Known Good Configuration startup option. For more information, see Windows XP Professional Help and Support Center and Tools for Troubleshooting in this book.

Troubleshooting Several Problems at One Time

If multiple problems affect your system, avoid troubleshooting them as a group. Instead, identify shared symptoms, and then isolate and treat each separately. For example, faulty video memory can cause Stop messages, corrupted screen images, and system instability. While diagnosing the symptoms, you might find that errors occur only with multimedia applications that use advanced three-dimensional rendering. When you attempt to rule out the possibility of failed video hardware by replacing the VGA adapter, you might find that this action also resolves the other issues.

Using Incompatible or Untested Hardware

For many organizations, standards for selecting hardware and purchasing new systems and replacement parts do not exist, are not fully defined, or are simply ignored. Standards that are well defined, refined, maintained, and followed can reduce hardware variability and optimize troubleshooting efforts.

If you need to replace hardware, record your troubleshooting actions as thoroughly as possible. Before installing a new device or replacement part, verify that it is on the Windows Hardware Compatibility List (HCL), that the firmware version for the system motherboard and devices are current, and that any replacement part is pre-tested or burned-in before deployment.

Checking the Windows Hardware Compatibility List

Hardware problems can occur if you use devices that are not compatible with Windows XP Professional. The HCL is a Web-based searchable database, which is continuously updated as additional hardware is tested and approved. The HCL outlines the hardware components that have been tested for use with Windows XP Professional.

If several variations of a device are available from one manufacturer, it is best to select only models listed in the HCL.

Table 25-2 explains the differences between HCL logo designations.

Table 25-2: Logo Icons and Compatible Icon in the HCL

HCL Designation

Description

Indicates that this product has met all Windows Logo requirements.

Indicates that this product has met all Windows Logo requirements and that a driver is available for download.

Indicates that this product has met all Windows Logo requirements and that a driver is available on the Windows XP Professional operating system CD.

Indicates that this product might not meet all Windows Logo requirements, but has been deemed compatible with the operating system.

A driver for the compatible device is available on the Windows XP Professional operating system CD.

When you upgrade to Windows XP Professional, device hardware resource settings are not migrated. Instead, all devices are redetected and enumerated during installation. Typically, upgrades to Windows XP Professional follow this migration path:

You might find after installation that devices that functioned before the upgrade behave differently or do not work after the upgrade. This problem might be due to the following:

Do not attempt to re-install older drivers because doing so might cause system instability, startup problems, or Stop errors and other startup problems. For more information about troubleshooting Stop errors and startup problems, see Common Stop Messages for Troubleshooting and Troubleshooting Startup in this book.

For best results, always use HCL-specified devices. It is especially important to refer to the HCL before purchasing modems, tape backup units, and SCSI adapters. If you must use non-HCL hardware, check the manufacture s Web site for the latest updated device driver.

Note 

If your system has non-HCL hardware installed, uninstall drivers for these devices before installing Windows XP Professional. If you cannot complete setup, remove the hardware from your system temporarily and rerun Setup.

For more information about the HCL, see the Hardware Compatibility List link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources

Testing new and replacement parts

If you must replace or upgrade older parts with newer ones, first purchase a small number of new parts and conduct performance, compatibility, and configuration tests before doing a general deployment. The evaluation is especially important when a large number of systems are involved, and it might lead you to consider similar products from other manufacturers.

When replacing devices, use pre-tested or burned-in parts whenever possible. A burn-in involves installing an electronic component and observing it several days for signs of abnormal behavior. Typically, computer components fail early or not at all, and a burn-in period reveals manufacturing defects that lead to premature failure. You can choose to do additional testing by simulating worst-case conditions. For example, you might test a new hard disk by manually copying files or creating a batch file that repeatedly copies files, filling the disk to nearly full capacity.

Using Incompatible Software

Before installing software on multiple computers, test it for compatibility with existing applications in a realistic test environment. Observe how the software interacts with other programs and drivers in memory. If only the test application and the operating system are active, testing does not provide a realistic or valid indication of compatibility or performance. Testing is necessary even if a manufacturer guarantees full Windows XP Professional compatibility, because older programs might affect new software in unpredictable ways.

For large organizations, consider limited pre-deployment test rollouts to beta users who can provide real-world feedback. Select testers who have above-average computer skills to get technically accurate descriptions of problems they observe.

Setup and stability criteria are equally important in evaluating software and hardware for purchase. Testing is critical for upgrading systems from earlier versions of Windows such as Windows 98 or Windows NT 4.0. Software and drivers that were installable and stable on earlier versions of Windows might exhibit problems or not function in the Windows XP Professional environment. Video, sound, and related multimedia drivers and tools (such as audio, CD ROM mastering, and DVD playback software) are especially sensitive to operating system upgrades.

For more information about application testing guidelines, see Planning Deployments in this book and the Windows Application Compatibility link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources Also see Testing Applications for Compatibility with Windows 2000 in the Deployment Planning Guide of the Microsoft Windows 2000 Server Resource Kit and article Q244632, How to Test Programs for Compatibility with Windows 2000, in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources

Document and Evaluate the Results

You can increase the value of information collected during troubleshooting by keeping accurate and thorough records of all work done. You can use your records to reduce redundant effort and to avoid future problems by taking preventive action.

Create a configuration management database to record the history of changes, such as installed software and hardware, updated drivers, replaced hardware, and altered system settings. Periodically verify, update, and back up this data to prevent permanent loss. To maximize use of your database, note details such as:

When planning this database, keep in mind the need to balance scope and detail when deciding which items or attributes to track. For more information, see the Information Library (ITIL) and Microsoft Operations Framework (MOF) Web site links provided in Check Technical Information Resources earlier in this chapter.

Update baseline information after installing new hardware or software to compare past and current behavior or performance levels. If previous baseline information is not available, use System Information, Device Manager, the Performance tool, or industry standard benchmarks to generate data.

Baselines combined with records kept over time enable you to organize experience gained, evaluate maintenance efforts, and judge troubleshooting effectiveness. Analysis of this data can form the basis of a troubleshooting manual or lead to changes in control policy for your organization.

A post-troubleshooting review, or post-mortem, can help you pinpoint troubleshooting areas that need improvement. Some questions you might consider during this self-evaluation period include:

Write an Action Plan

An action plan is a set of relevant troubleshooting objectives and strategies that fits within your organization s configuration and management strategies. After you identify the problem and find a potential solution or workaround that you have tested on one or more computers, you might need an action plan if the solution is to be deployed across your organization, possibly involving hundreds or thousands of computers. Coordinate your plan with supervisors and staff members in the affected areas to keep them informed well in advance and to verify that the schedule does not conflict with important activity. Include provisions for troubleshooting during non-peak work hours or dividing work into stages over a period of several days. Evaluate your plan, and as you uncover weaknesses, update it to increase its effectiveness and efficiency.

As the number of users grows, the potential loss of productivity due to disruption increases. Your plan must account for dependencies and allow last-minute changes. Factor in contingency plans for unforeseen circumstances.

For more information about creating a configuration management database, see the ITIL and MOF links listed in Table 25-1.

Take Proactive Measures

You can combine information gathered while troubleshooting major and chronic problems to create a proactive plan to prevent or minimize problems for the long term. When planning a maintenance or upgrade process for your organization, consider the following goals:

Improve the Computing Environment

External factors can have a major impact on the operation and lifespan of a computer. Some basic precautions include labeling connecting cables, periodically testing uninterruptible power supply (UPS) batteries, and placing computers far from high-traffic areas where they might be bumped or damaged. It is important to check environmental factors such as room temperature, humidity, and air circulation to prevent failures due to excessive heat. Dust can clog cooling equipment such as computer fans and cause them to fail. Install surge suppressors, dedicated power sources, and backup power devices to protect equipment from electrical current fluctuations, surges, and spikes that can cause data loss or damage equipment. Other precautions include:

Monitor System and Application Logs

Monitor your system to detect problems early and avoid having software or hardware failure be your first or only warning of a problem. When using a monitoring tool such as Performance (Perfmon.msc) to evaluate changes that might affect performance, compare baseline information to current performance. The resulting data helps you isolate bottlenecks and determine if actions such as upgrading hardware, updating applications, and installing new drivers are effective. You can also use the data to justify expenditures, such as additional CPUs, more RAM, and increased storage space. Checking the Event Viewer regularly helps you to identify chronic problems and detect potential failures. This allows you to take corrective action before a problem worsens. For more information about monitoring your system, see Overview of Performance Monitoring in the Operations Guide of the Microsoft Windows 2000 Server Resource Kit.

Document Changes to Hardware and Software

In addition to recording computer-specific changes, do not neglect to record other factors that directly affect computer operation such as Group Policy and network infrastructure changes. For more information about developing and implementing a standard process for recording configuration changes, see Document and Evaluate the Results earlier in this chapter.

Plan for Hardware and Software Upgrades

Regardless of how advanced your system hardware or software is at the time of purchase, computer technologies have a limited lifespan. Your maintenance plan must account for the following factors that can make updates and upgrades necessary.

Increased demand for computing resources

When computing needs grow beyond the capability of your hardware, it makes sense to upgrade hardware components or entire systems. Performance degradation might be due to system bottlenecks caused by hardware that has reached maximum capacity. Optimizing drivers and updating applications can help in the short term, but user demand for computing resources eventually makes it necessary to upgrade to more powerful hardware.

Discontinued support for a device or software

Operating system or manufacturer support for a device or software might be discontinued, causing compatibility issues that can block upgrades to new operating systems or prevent full use of certain features in Windows XP Professional. To minimize effort when upgrading hardware and software for many computers, purchase similar computers and follow replacement standards for your organization. Failure to standardize applications and hardware can make upgrading more difficult and expensive, especially if technicians and users need retraining.

Added capabilities

Having a process for upgrading operating systems or installing application patches, hotfixes, and operating system Service Packs helps to maintain the stability, performance, and reliability of your equipment. Schedule time to stay current with new developments and product updates.




Microsoft Windows XP Professional Resource Kit 2003
Microsoft Windows XP Professional Resource Kit 2003
ISBN: N/A
EAN: N/A
Year: 2005
Pages: 338
BUY ON AMAZON

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net