Using Diagnostic Tools

I l @ ve RuBoard

Various support tools monitor errors and faults, configuration information, and troubleshooting for hardware components , including the CPU, system memory, and tape devices. Some of these support tools also monitor software configurations, to track changes.

Support Tool Manager

HP's Support Tool Manager (STM) provides access to a set of tools for verifying and troubleshooting HP-UX system hardware. These online diagnostic tools provide the ability to determine device status, get configuration information, and diagnose hardware problems. These tools are available by using a GUI or through commands, and have the flexibility to be invoked automatically at periodic intervals.

STM discovers the hardware devices on a system and can diagnose memory errors, Low-Priority Machine Check (LPMC) errors, I/O driver errors, Logical Volume Manager (LVM) errors, and over-temperature events. Memory errors include single-bit errors and page deallocation events.

STM includes the Automatic Configuration Mapper, shown in Figure 4-8, which gives a graphical view of your hardware configuration using color -coded icons, showing device status as well as logical relationships, such as the peripherals connected to an I/O card. Each icon on the map represents a hardware device. These icons display the device type, device identifier, device path , last active tool, and test status (from last active tool). You can launch other STM tools from this view as well.

Figure 4-8. STM Configuration Mapper showing the latest status of the CPU and memory.

graphics/04fig08.gif

The Information tool provides product identifier information, product description, hardware path, vendor name , firmware revision, and error log statistics, including read errors, which can be used to trend and anticipate problems. This tool also checks onboard log information, and can be used to track configuration changes.

Several other tools under STM perform varying levels of testing to stress a device or determine and diagnose problems:

  • Verifier tool: Can be invoked on a particular device to verify quickly that it is connected and functioning properly.

  • Exerciser tool: Stresses a device, to help reproduce and troubleshoot intermittent problems by stressing the hardware to the maximum point expected in a customer environment.

  • Diagnose tools: Perform a complete test of the hardware, to help isolate failures down to the component or FRU level.

  • Expert tools: Are sophisticated troubleshooting tools for expert users.

  • Logtool tool: Helps you to format, filter, and extract error information from raw data contained in system logs. You can monitor recoverable errors detected by the computer, such as I/O device errors. This data can be used to troubleshoot and trend historical information, so that you can fix failures before they become critical. The errors that you see here are automatically forwarded to the EMS Hardware Monitors , which generate an event if an error is serious enough.

  • Firmware Update tools: Provide a customer-usable way to update the firmware on hardware devices.

STM enables an operator to run a module on several devices simultaneously . In addition, the operator can start diagnostic tests running on more than one system from within the user interface.

STM provides both configuration and fault monitoring capabilities for the system. STM tools detect the same errors as the EMS Hardware Monitors, but the EMS Hardware Monitors report them in real time. After getting an EMS event, you can run STM to further diagnose a problem.

STM is used to diagnose local or remote systems. It is available on HP-UX releases 10.01 and later. STM replaces the Sherlock diagnostics. The software (product number is B4708AA) is being distributed on the HP-UX Diagnostic/IPR Media.

HP Predictive Support

HP Predictive Support detects and predicts system- related faults. When problem conditions are detected, notification is sent to the HP Response Center. This level of care is meant for customers with special support contracts with Hewlett-Packard. The Predictive Support software proactively monitors the system and automatically reports information back to the HP Response Center via modem access. Because the HP Response Center is available 24 hours a day, 7 days a week, this procedure can lead to a quick response to problems.

The Predictive Support software focuses on system event information for memory and I/O devices. Error logs are analyzed daily, with potential problems diagnosed. By proactively warning of potential problems, scheduled maintenance can replace the unplanned downtime associated with a failed component.

Predictive Support uses a set of rules on a managed node to determine when events should be sent to the HP Response Center. These conditions can be updated periodically by downloading new rules from HP. Event correlation ensures that duplicate messages are suppressed and that the Response Center is not repeatedly warned of the same root problem.

Predictive Support analyzes on-board logs, system logs, and memory logs. The software can automatically dial the HP Response Center to transmit error data and logs, or the system administrator can initiate modem transmission. Similarly, Predictive Support software updates, to include new rules for generating predictive events, can be triggered automatically or controlled by the administrator. Configuration and administration is controlled through a menu-driven interface.

System logs are scanned for I/O errors and LPMCs. Logged data is analyzed for trends associated with specific disk or tape devices, such as correctable errors. LPMC records are analyzed for internal cache errors. Memory logs are also scanned to look for error rates exceeding specified thresholds.

The Response Center determines where a failed device is located, its model number, its manufacturer, and its serial number, so that repairs can be made. This information is sent in the failure notification messages.

HP Predictive Support does not help with other areas of system monitoring, such as resource and performance management. Also, the software runs only on HP-UX systems.

HA Observatory

HA Observatory is a suite of tools used to detect and quickly diagnose system problems. The products include the Configuration Tracker, which keeps track of the server's software configuration, Network Node Manager, and HP Predictive Support. A support system and network router are also maintained at the customer site.

HA Observatory relies on HP Predictive Support to report hardware failures. In addition, configuration information collected by the Configuration Tracker is available. The Configuration Tracker generates and maintains a snapshot of the configuration so that it can detect software configuration changes.

HA Observatory uses a secure network link to HP's High Availability Support Center from a special system at the customer site. This support system, an HP 9000 Series 700 workstation, collects system configuration information from key servers and can be used to view network status and topology information. Hardware failure notifications and configuration information can be sent to HP. When permitted by the customer, HP support engineers can access the customer servers over the secure link to gather additional information.

HA Observatory is supported only on HP-UX systems and is available only to customers with BCS and CCS support contracts.

I l @ ve RuBoard


UNIX Fault Management. A Guide for System Administrators
UNIX Fault Management: A Guide for System Administrators
ISBN: 013026525X
EAN: 2147483647
Year: 1999
Pages: 90

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net