< Day Day Up > |
Troubleshooting ProLiant problems is described in this section. Obviously, there isn't sufficient space here to cover every issue, but the basics are covered along with some helpful Web sites. Whether you fix the machine yourself or call for HP support, these steps will help you narrow the problem to help support personnel. Proactive TroubleshootingBecoming proactive in managing your environment is probably the most efficient way to avoid problems. Although HP endeavors to enhance system and application fault tolerance, problems and failures still occur, most often because of lack of due diligence and application of recommended procedures. Accessing and acknowledging risk is a first step in recognizing the need to be proactive. The risk can be defined by asking some questions:
These examples demonstrate the point of avoiding risk, but might seem like exaggerations and risks that no one would take. However, these are real situations reported to HP's call center. HP includes system management, monitoring, troubleshooting tools, and utilities with each ProLiant server; however, none of these tools will prevent the risk just described. Using the tools as they were designed will go a very long way in avoiding risk. Common sense and due diligence are your greatest assets in avoiding and managing risk. The management tools included with ProLiant servers allow you to proactively monitor server health and keep systems software, drivers, and utilities up-to-date. This, along with a thorough backup and disaster recovery plan, will substantially ease management and reduce the risk of data loss. Troubleshooting ProLiant ServersProLiant ML, DL, and BL servers include troubleshooting procedures in the Installation and Setup and User Guides shipped with each server. The guides provide hardware troubleshooting steps to assist in locating and fixing hardware configuration issues. The underside hood label in every ProLiant system will also assist with server configuration, as shown in Figure 10.58. Figure 10.58. ProLiant server hood labels assist with server configuration.The hood labeling provides supported information for configuration and adding processors, memory, and other components . Configuring New Servers and TroubleshootingAn out-of-the-box system should be prepared according to the instructions included with the server. The fast start setup guide will assist in preparing the server for initial powerup. The following list covers basic steps for prepping a new server to install the OS:
Each ProLiant server is tested prior to shipping and should operate without error out of the box. However, during shipping, components might become unseated or loose. If problems are encountered during initial setup of a server, refer to the Servers Setup and Installation Guide for troubleshooting information. Some of the common conditions encountered are Error: Loose ConnectionsActions:
Problems Adding Options to a ServerActions:
The basic troubleshooting steps for ProLiant hardware problems are
Problem: (Unknown Problem )Actions:
warning Only authorized technicians trained by HP should attempt to remove the system board. If you believe the system board requires replacement, contact HP Technical Support (see "Contacting HP Technical Support or Authorized Reseller" in the HP ProLiant Server Troubleshooting Guide) before proceeding.
Third-Party Device ProblemsActions:
Testing the DeviceActions:
HP ProLiant Server Troubleshooting GuideSetup and Installation and User Guides are available for each ProLiant server that contain troubleshooting information specific to that server model. In addition to those guides, the HP ProLiant Server Troubleshooting Guide is also available at the HP Web site at http://h20000.www2.hp.com/bc/docs/support/UCR/SupportManual/TPM_338615-2/TPM_338615-2.pdf . A new version of the guide was released in October of 2003 that contains information tailored to the ProLiant ML, DL and BL servers, which shipped with SmartStart 6.0 or later. The topics covered in the guide include
Previous versions have been used as a reference guide for the Array Diagnostics Utility (ADU) error messages, POST error messages, and beep codes and the event list error messages. The new guide contains additional features that assist in troubleshooting. Troubleshooting flowcharts are a new feature that will assist engineers in diagnosing problems. Sample flowcharts are shown in Figure 10.59 and 10.60. Figure 10.59. General hardware installation and failure troubleshooting flowchart.Figure 10.60. Troubleshooting flowchart for OS issues.The chart shown in Figure 10.58 is a general troubleshooting flow chart for installing the hardware or detecting a hardware failure. The OS Boot Problems flowchart, shown in Figure 10.59 assists with the following:
Troubleshooting UtilitiesThere are several troubleshooting utilities that you should be familiar with. Survey UtilityEach ProLiant 300, 500, and 700 series server comes with Web-enabled Management Agents called HP Insight Management Agents. They are included on the SmartStart CD in the PSP and installed with a SmartStart assisted install or when applying the PSP. The Survey Utility is now included in the HP Insight Management Agent software and included in the PSP. The Survey features are available when selecting the tools tab in System Management Homepage. Survey sessions are now stored in XML files, displayed as HTML in the browser interface, and also used to perform session comparisons. The XML files can also be viewed by standard browsers. The survey.idi and survey.txt files used by the legacy Survey Utility are not used by Insight Diagnostics. The following steps describe how to use this utility:
Survey Utility Legacy VersionThe Survey Utility is the legacy online information-gathering agent that runs on ProLiant servers, and Netware, Windows, and Linux platforms. This utility was designed to facilitate the resolution of problems without taking the server offline. It gathers critical hardware and software information from various sources and saves it to the survey.txt file. A collection of the last 10 snapshots, or sessions, is saved in the survey.idi file. Sessions captured on Linux systems are saved in individual survey text files that include date and time stamps in the file name. The current configuration can be viewed by browsing to the Survey Utility Web page. To use this utility, follow these steps:
HP Insight Diagnostics Online Edition Maintenance UtilityThis new utility replaces the Survey Utility. Deployed from the PSP, the HP Insight Diagnostics Online Edition maintenance utility displays information about your server's hardware configuration. It is a new Web-enabled Management Agent provided with the ProLiant Essentials suite of products. As of SmartStart 7.1, HP Insight Diagnostics Online Edition featuring Survey and the Integrated Management Log (IML) Viewer will be replacing the Survey Utility previously included with SmartStart. The online version of HP Insight Diagnostics acts like the Survey Utility it is replacing and does not perform any hardware tests on the system. You will need to uninstall the Survey Utility before beginning the installation of HP Insight Diagnostics Online Edition. Insight Diagnostics uses a Web browser interface in addition to the command-line interface in an online mode. This enables remote control of the utility and facilitates easy transfer of configuration information from remote machines to a service provider. It can be updated from VCRM and VCA, and with SIM offers proactive notification when an updated version is available. You can use Insight Diagnostics Online Edition to
Integrated Management Log (IML)The IML can be viewed using the System Management Homepage at the Logs tab in the Survey Utility or the IML Viewer by clicking Start, Programs, HP System Tools. The IML utility allows you to view and manage the HP IML on both local and remote systems. The IML is a nonvolatile log used to record significant events that occur in a system and its components. The IML records system events, critical errors, power-on messages, and memory errors. It also records catastrophic hardware and software errors that typically cause a system to fail. The information contained in the log helps you quickly identify and correct problems, thus minimizing downtime. The displayed IML entries include the following information:
Additionally, a colored icon representing the severity of the event is displayed in the column with the event description.
The displayed log can be printed, sorted, filtered, saved to a disk file (for historical purposes), and exported to a CSV file (for import into third-party applications). Users with Administrator privileges can mark selected entries as "repaired" (after action has been taken to resolve the problem), and can clear all of the entries from a given machine's log. Logs that have been saved can also be viewed in the utility. These logs can be printed, sorted, and filtered just like the online logs; they can also be exported to a CSV file. The command functions (Mark Repaired and Clear All Entries), as well as Refresh, are disabled when viewing saved logs. note You must have Administrator privileges to enable the command functions of this utility. Insight Diagnostics (Offline)Insight Diagnostics is a browser-based hardware testing application that can be run offline from the SmartStart. It replaces the Server Diagnostics Utility, which is a DOS-based offline-only tool. It provides the following major features: three types of diagnostic testing, access to the IML, hardware data collection (Survey), and diagnostic test logs. Tests can be configured to run in time-based or loop-based modes, and interactive or unattended testing modes; and can be customized to test any desired combination of hardware devices. Failures and other errors are gathered in the error log, and a report ticket can be generated that contains all diagnostic errors and IML records. The report ticket can be printed or saved for further troubleshooting. To use Insight Diagnostics:
BIOS Serial Console and EMSBIOS SerialConsole, which is the focus of Insight Diagnostics, can be enabled in RBSU. By default, BIOS Serial Console is disabled. EMS Support OverviewEmergency Management Service (EMS) support is a Microsoft feature for the Windows Server 2003 OS, which is enabled by default in the OS, but which also must be enabled in the system ROM. Refer to "Operating System Support" in Chapter 2 of the HP BIOS Serial Console User Guide for more information about using supported OSs. By default, EMS support is disabled for ML and DL servers, and is enabled for BL servers. Configuration in RBSUAs discussed, the BIOS serial Console/EMS feature is enabled in RBSU. When Enable Local is selected, the OS redirects through the local serial port. When Enable Remote is selected, the OS redirects through iLO or RILOE II. Data becomes available through the browser configured for iLO instead of through the serial port. Enabling remotely requires iLO 1.10 firmware or later. Emergency Management Services (EMS)EMS support provides I/O support for all Windows kernel components: the loader, setup, recovery console, OS kernel, blue screens, and the Special Administration Console. The Special Administration Console is a text-mode management console available after Windows Server 2003 OS has initialized . For more information on EMS support, go to http://www.microsoft.com/hwdev/headless. Microsoft enables EMS support in the OS, but EMS support also requires ROM support. EMS support, when enabled, assumes the serial port for redirection and can cause interference with other devices attached to the serial port. To avoid interference, EMS is disabled in the system ROM by default on ML and DL servers. To enable this feature, Enable Local or Enable Remote must be selected under the BIOS Serial Console/EMS Support menu in RBSU before installing Windows Server 2003. If you install Windows Server 2003 with EMS disabled, and later decide to enable it, perform the following steps to update the boot.ini file:
Using iLO and RILOEII for Remote TroubleshootingiLO-Based DiagnosticsiLO is an intelligent processor integrated into newer ProLiant servers that provides remote management and administration of a server through a standard browser. iLO provides the following reporting and diagnostic features: access to the IML, iLO event log, server POST results, iLO self-test results, graphical remote console, virtual power button, virtual floppy, and virtual CD. The Remote Console can be used to monitor the system for POST error messages. The IML and iLO event log record events are useful for troubleshooting server issues. Virtual floppy and CD-ROM (if licensed) can be used to boot and run Server Diagnostics. A new feature with iLO is the capability to record server Port 84 POST Codes as the system boots. These codes document the progress of the server through the bootstrap process. To access iLO or RILOE II, use any of the following methods :
Running Diagnostics on a Remote System Using the iLO or RILOE IITo run Insight Diagnostics from iLO or RILOE II virtual CD, follow these steps:
To run Server Diagnostics from the iLO or RILOE II virtual floppy, follow these stps:
The tools and methods used in troubleshooting ProLiant servers provide powerful capabilities for Administrators and managers to reduce downtime by assisting HP in locating the problem in a timely fashion. note iLO now provides "Terminal Services Pass Through" for Windows remote console sessions. ProLiant servers with the iLO advanced pack enabled can leverage iLO's remote console function to provide Terminal Services pass through of a Windows Remote Desktop Connection to Windows Server. See Chapter 15 for details. |
< Day Day Up > |