After you observe symptoms, check technical information sources, and review your system s history, you might be ready to test a possible solution based on the information that you have gathered. If you are unable to locate information that applies to your problem or find more than one solution that applies, try to further isolate your problem by grouping observations into different categories such as software-related symptoms (due to a service or application), hardware-related symptoms (by hardware types), and error messages. Prioritize your list by frequency of occurrence and eliminate symptoms that you can attribute to user error. This enables you to methodically plan the diagnostic steps to take, or to select the next solution to try.
When troubleshooting hardware, start with and work toward the simplest configuration possible by disabling or removing devices. Then incrementally increase or decrease complexity until you isolate the problem device. In safe mode, Windows XP Professional starts with only essential drivers and is useful for diagnosing problems. For more information about safe-mode troubleshooting, see Windows XP Professional Help and Support Center, and Tools for Troubleshooting in this book.
If your diagnostic efforts point to a hardware problem, you can run diagnostic software available from the manufacturer. These programs run self-tests that confirm if a piece of hardware has malfunctioned or failed and needs replacing. You can also install the device on different computers to verify that the problem is not due to system-specific configuration issues. Replacing defective hardware and diagnosing problems on a spare or test computer minimizes impact to the user due to the system being unavailable. If diagnostic software shows that the hardware is working, consider upgrading or rolling back device drivers.
If a hardware problem causes a Stop error that prevents Windows XP Professional from starting in normal mode, you can use the Last Known Good Configuration startup option. The Last Known Good Configuration enables you to recover from problems by reverting driver and registry settings to those used during the last user session. If you are able to start Windows XP Professional in normal mode after using the Last Known Good Configuration, disable the problem driver or device. Restart the computer to verify that the Stop message does not recur. If the problem persists, repeat this procedure until you isolate the hardware that is causing the problem.
Another method to recover from problems that occur after updating a device driver is by using Device Driver Roll Back in safe or normal mode. If you updated a driver since installing Windows XP Professional, you can roll back the driver to determine if the older driver restores stability. If another driver is not available, disable the device by using Device Manager until you are able to locate an updated driver.
Using Device Manager to disable devices is always preferable to physically removing a part because using Device Manager does not risk damage to internal components. If you cannot disable a device by using Device Manager, uninstall the device driver, turn off the system, remove the part, and restart the computer. If this improves system stability, the part might be causing or contributing to the problem and you need to reconfigure it.
For more information about or the Last Known Good Configuration startup option and Device Driver Roll Back, see Windows XP Professional Help and Support Center. Also, see Troubleshooting Startup and Tools for Troubleshooting in this book. For more information about disabling devices and drivers, see Managing Devices in this book. For more information about Stop messages, see Common Stop Messages for Troubleshooting in this book.
If you suspect that a software problem or a recent change to system settings is preventing applications or services from functioning properly, use safe mode to help diagnose the problem. You can also use the Last Known Good startup option or System Restore to undo changes made by a recently installed application, driver, or service. You can isolate issues by using the following methods.
Close applications one at a time, and then observe the results. A problem might occur only when a specific application is running. You can use Task Manager to end applications that have stopped responding. For more information about ending applications and processes using Task Manager, see Windows XP Professional Help and Support Center.
By using the Services snap-in (Services.msc) or the System Configuration Utility (Msconfig.exe), you can stop and start most system services. For some services, you might need to restart the computer for changes to take effect. For more information about disabling services by using the Services snap-in and the System Configuration Utility, see Windows XP Professional Help and Support Center and Troubleshooting Startup in this book.
To isolate a service-related problem, you can choose to do the following:
Disable services one at a time until the problem disappears. You can then enable all other services to verify that you found the cause of the problem.
Disable all non safe mode services and then re-enable them one at time until the problem appears. Use the System Configuration Utility and boot logging to determine the services and drivers initialized in normal and safe mode. You can then disable all non safe mode drivers and re-enable them one at a time until the problem returns.
For more information about System Restore, System Configuration Utility and boot logging, see Windows XP Professional Help and Support Services and Tools for Troubleshooting in this book. For more information about disabling applications and services while troubleshooting startup problems, see Troubleshooting Startup in this book.
You can complicate a problem or troubleshooting process unnecessarily by acting too quickly. Avoid the following common pitfalls that can hinder your efforts:
Not adequately identifying the problem before taking action
Not observing the effects of diagnostic changes
Not documenting changes while troubleshooting
Not restoring previous settings
Troubleshooting several problems at one time
Using incompatible or untested hardware
Using incompatible software
If you fail to make essential observations before responding, you can miss important information in the critical moments when symptoms first appear. Here are some typical scenarios.
An error occurs and you start your research without recording important information such as the complete error message text and the applications running. During your research, you check technical information resources but find that you are unable to narrow the scope of your search due to insufficient information.
For more information about the types of information to record during troubleshooting, see Identify Problem Symptoms earlier in this chapter.
In response to frequent random errors users experience with a certain application, you restart the affected computers without observing and recording the symptoms. Although users can resume work for the day, a call to technical support later that day is less effective because you cannot reproduce the problem. You must wait for the problem to recur before you can gather critical information needed to determine the root cause. For example, symptoms can be caused by power surges, faulty power supplies, excessive dust, or inadequate ventilation. Restarting the computer might be a temporary solution that does not prevent recurrence.
A user comes to work early and finds that network resources or applications are not responding. You spend time troubleshooting the problem without success only to discover that both you and the user failed to read e-mail announcing that scheduled maintenance would cause temporary early morning outages.
Prior experience can shorten the time to solve a recurring problem because you already know the remedy. However, the same solution might not always solve a problem that looks familiar. Always verify the symptoms before acting. If your initial assumptions are incorrect, and you misdiagnose the problem, your actions might make the situation worse. Keep an open mind when troubleshooting. When in doubt, verify your information by searching technical information sources (including technical support) and obtain advice from experienced colleagues. Do not ignore new information and question past procedures that seem inappropriate.
A user cannot print to a new local inkjet printer. You verify cable and power connections, check the ink cartridge, and run the printer s built-in diagnostics, but find nothing wrong. Windows XP Professional cannot detect the printer, so you manually install the most recent drivers without success. Reinstalling Windows XP Professional does not solve the problem, and you later realize that you neglected to find out if printing to any local printer from this computer has ever been successful. You find that the user has never tried this, and a firmware check reveals that the parallel port is disabled. Enabling the parallel port resolves all printing problems.
System setting changes do not always take effect immediately. For example, when troubleshooting replication issues, you must wait to observe changes. If you do not allow adequate time to pass, you might prematurely conclude that the change was not effective. To avoid this situation, familiarize yourself with the feature that you are troubleshooting and thoroughly read the information provided by technical support before judging the effectiveness of a workaround or update.
Documenting the steps that you take while troubleshooting allows you to review your actions after you have resolved the problem. This is useful for very complex problems that require lengthy procedures to resolve. Documenting your steps allows you to verify that you are not duplicating or skipping steps and enables others to assist you with the problem. It also allows you to identify the exact steps to take if the problem recurs and enables you to evaluate the effectiveness of your efforts.
If disabling a feature or changing a setting does not produce the results you want, restore the feature or setting before trying something else. For example, record firmware settings before changing them to diagnose problems. Not restoring settings can make it difficult to determine which of your actions resolved the problem. When verifying solutions that require you to make extensive changes or restart the computer multiple times, perform backups before troubleshooting so that you can restore the system if your actions are ineffective or cause startup problems.
Backups are essential for all computers, from personal systems to high-availability servers. If you suspect that your troubleshooting efforts might worsen the problem or risk important data, perform a backup. This enables you to restore your system if you experience data loss, Stop errors, or other startup problems. Backups allow you to partially or completely restore the system and continue where you left off. When you evaluate or create backup procedures, consider the following:
Use the verification option of your backup software to check that your data is correctly written to backup media.
Routinely check the age and condition of backup media and follow the manufacturer s recommendations for using backup media.
Follow the hardware manufacturer s recommendations for maintaining the backup device.
For more information about using Backup for troubleshooting, see Tools for Troubleshooting in this book. For more information about performing and planning backups, see Backup and Restore in this book.
Windows XP Professional also provides other ways to restore system settings such as System Restore and the Last Known Good Configuration startup option. For more information, see Windows XP Professional Help and Support Center and Tools for Troubleshooting in this book.
If multiple problems affect your system, avoid troubleshooting them as a group. Instead, identify shared symptoms, and then isolate and treat each separately. For example, faulty video memory can cause Stop messages, corrupted screen images, and system instability. While diagnosing the symptoms, you might find that errors occur only with multimedia applications that use advanced three-dimensional rendering. When you attempt to rule out the possibility of failed video hardware by replacing the VGA adapter, you might find that this action also resolves the other issues.
For many organizations, standards for selecting hardware and purchasing new systems and replacement parts do not exist, are not fully defined, or are simply ignored. Standards that are well defined, refined, maintained, and followed can reduce hardware variability and optimize troubleshooting efforts.
If you need to replace hardware, record your troubleshooting actions as thoroughly as possible. Before installing a new device or replacement part, verify that it is on the Windows Hardware Compatibility List (HCL), that the firmware version for the system motherboard and devices are current, and that any replacement part is pre-tested or burned-in before deployment.
Hardware problems can occur if you use devices that are not compatible with Windows XP Professional. The HCL is a Web-based searchable database, which is continuously updated as additional hardware is tested and approved. The HCL outlines the hardware components that have been tested for use with Windows XP Professional.
If several variations of a device are available from one manufacturer, it is best to select only models listed in the HCL.
Table 25-2 explains the differences between HCL logo designations.
HCL Designation | Description |
---|---|
Indicates that this product has met all Windows Logo requirements. | |
Indicates that this product has met all Windows Logo requirements and that a driver is available for download. | |
Indicates that this product has met all Windows Logo requirements and that a driver is available on the Windows XP Professional operating system CD. | |
Indicates that this product might not meet all Windows Logo requirements, but has been deemed compatible with the operating system. A driver for the compatible device is available on the Windows XP Professional operating system CD. |
When you upgrade to Windows XP Professional, device hardware resource settings are not migrated. Instead, all devices are redetected and enumerated during installation. Typically, upgrades to Windows XP Professional follow this migration path:
An upgrade to Windows XP Professional from Windows 98, Microsoft Windows Millennium Edition (Me), Microsoft Windows NT 4.0 Workstation, or Windows 2000 Professional.
You might find after installation that devices that functioned before the upgrade behave differently or do not work after the upgrade. This problem might be due to the following:
A driver for the device is not on the Windows XP Professional operating system CD and Device Manager lists it as unknown or conflicting hardware.
Windows XP Professional Setup installed a generic driver that might be compatible with your device, but it does not fully support enhanced features. Many hardware manufacturers also provide tools that add value to their products, but they are not available in Windows XP Professional. Windows XP Professional Setup installs the basic feature set needed to enable your product to function. For additional software that enhances functionality or adds additional features, download the latest Windows XP Professional compatible drivers and tools from the manufacturer s Web site.
Do not attempt to re-install older drivers because doing so might cause system instability, startup problems, or Stop errors and other startup problems. For more information about troubleshooting Stop errors and startup problems, see Common Stop Messages for Troubleshooting and Troubleshooting Startup in this book.
For best results, always use HCL-specified devices. It is especially important to refer to the HCL before purchasing modems, tape backup units, and SCSI adapters. If you must use non-HCL hardware, check the manufacture s Web site for the latest updated device driver.
Note | If your system has non-HCL hardware installed, uninstall drivers for these devices before installing Windows XP Professional. If you cannot complete setup, remove the hardware from your system temporarily and rerun Setup. |
For more information about the HCL, see the Hardware Compatibility List link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources
If you must replace or upgrade older parts with newer ones, first purchase a small number of new parts and conduct performance, compatibility, and configuration tests before doing a general deployment. The evaluation is especially important when a large number of systems are involved, and it might lead you to consider similar products from other manufacturers.
When replacing devices, use pre-tested or burned-in parts whenever possible. A burn-in involves installing an electronic component and observing it several days for signs of abnormal behavior. Typically, computer components fail early or not at all, and a burn-in period reveals manufacturing defects that lead to premature failure. You can choose to do additional testing by simulating worst-case conditions. For example, you might test a new hard disk by manually copying files or creating a batch file that repeatedly copies files, filling the disk to nearly full capacity.
Before installing software on multiple computers, test it for compatibility with existing applications in a realistic test environment. Observe how the software interacts with other programs and drivers in memory. If only the test application and the operating system are active, testing does not provide a realistic or valid indication of compatibility or performance. Testing is necessary even if a manufacturer guarantees full Windows XP Professional compatibility, because older programs might affect new software in unpredictable ways.
For large organizations, consider limited pre-deployment test rollouts to beta users who can provide real-world feedback. Select testers who have above-average computer skills to get technically accurate descriptions of problems they observe.
Setup and stability criteria are equally important in evaluating software and hardware for purchase. Testing is critical for upgrading systems from earlier versions of Windows such as Windows 98 or Windows NT 4.0. Software and drivers that were installable and stable on earlier versions of Windows might exhibit problems or not function in the Windows XP Professional environment. Video, sound, and related multimedia drivers and tools (such as audio, CD ROM mastering, and DVD playback software) are especially sensitive to operating system upgrades.
For more information about application testing guidelines, see Planning Deployments in this book and the Windows Application Compatibility link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources Also see Testing Applications for Compatibility with Windows 2000 in the Deployment Planning Guide of the Microsoft Windows 2000 Server Resource Kit and article Q244632, How to Test Programs for Compatibility with Windows 2000, in the Microsoft Knowledge Base. To find this article, see the Microsoft Knowledge Base link on the Web Resources page at http://www.microsoft.com/windows/reskits/webresources
You can increase the value of information collected during troubleshooting by keeping accurate and thorough records of all work done. You can use your records to reduce redundant effort and to avoid future problems by taking preventive action.
Create a configuration management database to record the history of changes, such as installed software and hardware, updated drivers, replaced hardware, and altered system settings. Periodically verify, update, and back up this data to prevent permanent loss. To maximize use of your database, note details such as:
Changes made
Times and dates of changes
Reasons for the changes
Users who made the changes
Positive and negative effects the changes had on system stability or performance
Information provided by technical support
When planning this database, keep in mind the need to balance scope and detail when deciding which items or attributes to track. For more information, see the Information Library (ITIL) and Microsoft Operations Framework (MOF) Web site links provided in Check Technical Information Resources earlier in this chapter.
Update baseline information after installing new hardware or software to compare past and current behavior or performance levels. If previous baseline information is not available, use System Information, Device Manager, the Performance tool, or industry standard benchmarks to generate data.
Baselines combined with records kept over time enable you to organize experience gained, evaluate maintenance efforts, and judge troubleshooting effectiveness. Analysis of this data can form the basis of a troubleshooting manual or lead to changes in control policy for your organization.
A post-troubleshooting review, or post-mortem, can help you pinpoint troubleshooting areas that need improvement. Some questions you might consider during this self-evaluation period include:
What changes improved the situation?
What changes made the problem worse?
Was system performance restored to expected levels?
What work was redundant or unnecessary?
How effectively were technical support resources used?
What other tools or information not used might have helped?
What unresolved issues require further root-cause analysis?
An action plan is a set of relevant troubleshooting objectives and strategies that fits within your organization s configuration and management strategies. After you identify the problem and find a potential solution or workaround that you have tested on one or more computers, you might need an action plan if the solution is to be deployed across your organization, possibly involving hundreds or thousands of computers. Coordinate your plan with supervisors and staff members in the affected areas to keep them informed well in advance and to verify that the schedule does not conflict with important activity. Include provisions for troubleshooting during non-peak work hours or dividing work into stages over a period of several days. Evaluate your plan, and as you uncover weaknesses, update it to increase its effectiveness and efficiency.
As the number of users grows, the potential loss of productivity due to disruption increases. Your plan must account for dependencies and allow last-minute changes. Factor in contingency plans for unforeseen circumstances.
For more information about creating a configuration management database, see the ITIL and MOF links listed in Table 25-1.
You can combine information gathered while troubleshooting major and chronic problems to create a proactive plan to prevent or minimize problems for the long term. When planning a maintenance or upgrade process for your organization, consider the following goals:
Improving the computing environment
Monitoring system and application logs
Documenting hardware and software changes
Anticipating hardware and software updates
External factors can have a major impact on the operation and lifespan of a computer. Some basic precautions include labeling connecting cables, periodically testing uninterruptible power supply (UPS) batteries, and placing computers far from high-traffic areas where they might be bumped or damaged. It is important to check environmental factors such as room temperature, humidity, and air circulation to prevent failures due to excessive heat. Dust can clog cooling equipment such as computer fans and cause them to fail. Install surge suppressors, dedicated power sources, and backup power devices to protect equipment from electrical current fluctuations, surges, and spikes that can cause data loss or damage equipment. Other precautions include:
Performing regular file and system state backups to prevent data loss. For more information about Windows Backup, see Windows XP Professional Help and Support Center and Backup and Restore in this book.
Using Windows XP Professional compatible virus-scanning software and regularly download the latest virus signature updates. A virus signature data file contains information that enables virus-scanning software to identify infected files.
Monitor your system to detect problems early and avoid having software or hardware failure be your first or only warning of a problem. When using a monitoring tool such as Performance (Perfmon.msc) to evaluate changes that might affect performance, compare baseline information to current performance. The resulting data helps you isolate bottlenecks and determine if actions such as upgrading hardware, updating applications, and installing new drivers are effective. You can also use the data to justify expenditures, such as additional CPUs, more RAM, and increased storage space. Checking the Event Viewer regularly helps you to identify chronic problems and detect potential failures. This allows you to take corrective action before a problem worsens. For more information about monitoring your system, see Overview of Performance Monitoring in the Operations Guide of the Microsoft Windows 2000 Server Resource Kit.
In addition to recording computer-specific changes, do not neglect to record other factors that directly affect computer operation such as Group Policy and network infrastructure changes. For more information about developing and implementing a standard process for recording configuration changes, see Document and Evaluate the Results earlier in this chapter.
Regardless of how advanced your system hardware or software is at the time of purchase, computer technologies have a limited lifespan. Your maintenance plan must account for the following factors that can make updates and upgrades necessary.
When computing needs grow beyond the capability of your hardware, it makes sense to upgrade hardware components or entire systems. Performance degradation might be due to system bottlenecks caused by hardware that has reached maximum capacity. Optimizing drivers and updating applications can help in the short term, but user demand for computing resources eventually makes it necessary to upgrade to more powerful hardware.
Operating system or manufacturer support for a device or software might be discontinued, causing compatibility issues that can block upgrades to new operating systems or prevent full use of certain features in Windows XP Professional. To minimize effort when upgrading hardware and software for many computers, purchase similar computers and follow replacement standards for your organization. Failure to standardize applications and hardware can make upgrading more difficult and expensive, especially if technicians and users need retraining.
Having a process for upgrading operating systems or installing application patches, hotfixes, and operating system Service Packs helps to maintain the stability, performance, and reliability of your equipment. Schedule time to stay current with new developments and product updates.