If a hardware component detects an error, it is reported by OpenVMS. Information about the failure, whether it is recoverable or not, is recorded in SYS$ERRORLOG:ERRLOG.SYS. The primary command for examining this log file is SHOW ERROR, and its use in a cluster environment is as follows:
$ mcr sysman SYSMAN> set environment/cluster command environment: Clusterwide on local cluster Username SYSTEM will be used on nonlocal nodes SYSMAN> do show error command execution on node BEAVER Device Error Count PEAO: 3 command execution on node LOON Device Error Count PEA0: 2 command execution on node CSWWW Device Error Count PEA0: 2 command execution on node EAGLE %SHOW-S-NOERRORS, no device errors found
The previous report shows errors on the cluster communications device, PEA0. Clusters are discussed in Chapter 10.
SHOW ERROR displays a summary of hardware errors and thus alerts the manager to possible hardware failures. It is important to monitor all errors, because hardware usually does not fail catastrophically without warning; rather minor problems are encountered before total failure. So, if an error is observed in the SHOW ERROR display, the manager should use additional programs and commands to investigate the error further.
Device and CPU errors are examined with DIAGNOSE (on OpenVMS/Alpha) and ANALYZE/ ERROR. An illustration of the latter command is as follows. As shown elsewhere in other commands, reporting levels are generally available. Commonly / BRIEF (or /SUMMARY) displays the least amount of information.
$ ANALYZE/ERROR/BRIEF Error Log Report Generator Version V7.1 ******************************* ENTRY 1. ******************************* ERROR SEQUENCE 0. LOGGED ON: SID 13002602 DATE/TIME 16-SEP-2002 00:05:35.93 SYS_TYPE 04130002 SYSTEM UPTIME: 11 DAYS 14:23:12 SCS NODE: BEAVER VAX/VMS V7.1 ERRLOG.SYS CREATED KA49 CPU Microcode Rev # 2. CONSOLE FW REV# 1.3 Standard Microcode Patch Patch Rev # 19. ******************************* ENTRY 2. ******************************* ERROR SEQUENCE 2343. LOGGED ON: SID 13002602 DATE/TIME 16-SEP-2002 22:55:36.05 SYS_TYPE 04130002 SYSTEM UPTIME: 12 DAYS 13:12:45 SCS NODE: BEAVER VAX/VMS V7.1 TIME STAMP KA49 CPU Microcode Rev # 2. CONSOLE FW REV# 1.3 Standard Microcode Patch Patch Rev # 19. ******************************* ENTRY 3.******************************** ERROR SEQUENCE 2344. LOGGED ON: SID 13002602 DATE/TIME 16-SEP-2002 22:57:12.85 SYS_TYPE 04130002 SYSTEM UPTIME: 12 DAYS 13:14:22 SCS NODE: BEAVER VAX/VMS V7.1 ERL$LOGMESSAGE KA49 CPU Microcode Rev # 2. CONSOLE FW REV# 1.3 Standard Microcode Patch Patch Rev # 19. NI-SCS SUB-SYSTEM, _BEAVER$PEA0: PORT HAS CLOSED VIRTUAL CIRCUIT ******************************* ENTRY 4. ******************************** ERROR SEQUENCE 2350. LOGGED ON: SID 13002602 DATE/TIME 16-SEP-2002 23:55:36.05 SYS_TYPE 04130002 SYSTEM UPTIME: 12 DAYS 14:12:45 SCS NODE: BEAVER VAX/VMS V7.1 TIME STAMP KA49 CPU Microcode Rev # 2. CONSOLE FW REV# 1.3 Standard Microcode Patch Patch Rev # 19.
In my system, the ERRLOG.SYS is reinitialized every day to keep its size manageable. Entries 1, 2, and 4 in the previous display pertain to stopping and restarting the error logger at midnight and identifying the hardware. Entry 3 relates to the PEA0 error. More data about that error is available but not presented here.
In my experience, memory problems start with correctable errors that later may become permanent. In the case of memory errors, OpenVMS will automatically stop using memory pages that have reported errors during the boot memory scan. SHOW MEMORY is the primary program used to examine memory status, although ANALYZE/ERROR can also be used. SHOW MEMORY is illustrated as follows:
$ SHOW MEMORY System Memory Resources on 21-SEP-2002 18:51:31.05 Physical Memory Usage (pages): Total Free In Use Modified Main Memory (256.00Mb) 32768 28089 4357 322 Virtual I/O Cache (Kbytes): Total Free In Use Cache Memory 3200 16 3184 Granularity Hint Regions (pages): Total Free In Use Released Execlet code region 512 0 506 6 Execlet data region 96 4 92 0 S0/S1 Executive data region 477 0 477 0 S2 Executive data region 160 0 160 0 Resident image code region 512 0 319 193 Slot Usage (slots): Total Free Resident Swapped Process Entry Slots 35 15 20 0 Balance Set Slots 33 15 18 0 Dynamic Memory Usage (bytes): Total Free In Use Largest Nonpaged Dynamic Memory 4243456 373824 3869632 5952 Paged Dynamic Memory 4661248 1119936 3541312 1069792 Buffer object Usage (pages): In Use Peak 32-bit System Space Windows (S0/S1) 0 0 64-bit System Space Windows (S2) 0 0 Memory Reservations (pages): Reserved In Use Type Total (0 Mb reserved) 0 0 Paging File Usage (blocks): Free Reservable Total DISK$ALPHASYS:[SYS0.SYSEXE]SWAPFILE.SYS 4480 4480 4480 DISK$ALPHASYS:[SYS0.SYSEXE]PAGEFILE.SYS 532480 479472 532480 Of the physical pages in use, 2847 pages are permanently allocated to OpenVMS.
This command presents a plethora of statistics about various aspects of how the memory is used. This data can be used for performance tuning, described later. It will also include error reports, if there are any.