This is a fundamental problem with our root/boot disk. It may have been caused by a disk head problem or a sudden system crash due to a power outage . When the system has rebooted and tries to boot from the root/boot disk, we may get a message of this form:
IPL error: bad LIF magic
This indicates that the format of the boot header on the disk is inconsistent, meaning that the PDC code has tried to initialize the LIF volume indicated by the Primary Boot Path stored in Stable Storage. At this point, the system can boot no further. If we cannot get past this problem, we can forget all about single- user mode, maintenance mode, and other boot options. Without a valid LIF header and LIF volume, we can go no further. This should be our first troubleshooting diagnostic question: Is the primary boot path actually bootable?
The most acceptable solution to this problem is a simple one: have another disk that is bootable and have mirrored all the logical volumes in the root volume group . This module assumes that you do not have that luxury or that your luxurious solution has been corrupted in a similar fashion.
I start this section by using an LVM root disk that has become corrupted in such a way that the entire LIF header, PVRA, BDRA, LIF volume, and my VGRA are missing. This is quite serious.
The /stand filesystem always starts at address 2912KB from the beginning of the disk. This is taken as read for the ISL routines in order for them to bypass all LVM structures and boot in maintenance mode. The problems associated with the disk in this demonstration introduced significant corruption to the boot and LVM configuration while stopping short of any potential serious corruption of /stand . Any further corruption involving /stand would normally require a reinstallation of the operating system to correct.
After the system performed its power-on self-test, it tried to boot from the Primary Boot Path. I received this error on the console:
Duplex Console IO Dependent Code (IODC) revision 1 ----------------------------------------------------------------------------- (c) Copyright 1995-2001, Hewlett-Packard Company, All rights reserved ----------------------------------------------------------------------------- Processor Speed State CoProcessor State Cache Size Number State Inst Data --------- -------- ------------------- ----------------- ----------- 0 750 MHz Active Functional 750 KB 1 5 MB Central Bus Speed (in MHz) : 120 Available Memory : 1048576 KB Good Memory Required : 31168 KB Primary boot path: 0/0/1/1 15 Alternate boot path: 0/0/2/1 15 Console path: 0/0/4/1 643 Keyboard path: 0/0/4/0 0 Processor is booting from first available device To discontinue, press any key within 10 seconds 10 seconds expired Proceeding Trying Primary Boot Path ------------------------ Booting Boot IO Dependent Code (IODC) revision 1 IPL error: bad LIF magic Main Menu: Enter command or menu > display ---- Main Menu -------------------------------------------------------------- Command Description ------- ----------- BOot [PRIALT<path>] Boot from specified path PAth [PRIALT] [<path>] Display or modify a path SEArch [DIsplayIPL] [<path>] Search for boot devices COnfiguration menu Displays or sets boot values INformation menu Displays hardware information SERvice menu Displays service commands DIsplay Redisplay the current menu HElp [<menu><command>] Display help for menu or command RESET Restart the system ---- Main Menu: Enter command or menu >
The error message IPL error: bad LIF magic indicates to me that I have a corrupt LIF header on my boot/root disk. I could try to boot pri again, but I know I will receive the same error message. I have a CD/DVD-ROM attached with Disc 1 of my Core OS Install and Recovery media. If I knew the hardware address of this device, I could boot from it. As such, I will perform a search for any bootable devices:
Main Menu: Enter command or menu > search ipl Searching for device(s) with bootable media This may take several minutes To discontinue search, press any key (termination may not be immediate) Path# Device Path (dec) Device Path (mnem) Device Type and Utilities ----- ----------------- ------------------ ------------------------- P0 0/0/2/0 2 extscsib 2 Random access media IPL Main Menu: Enter command or menu >
As you can see, the search has not found my boot/root disk, because it doesn't contain a valid LIF header. This device happens to be the only bootable device on my system. I can issue the boot command to boot from this; I don't need to interact with the ISL prompt in this instance because the install program now has an option to Run Recovery Shell. Previously, I would have to interact with the ISL prompt and use the command 800SUPPORT (or 700SUPPORT for maintaining a workstation). Don't use these commands now because they attempt to run a boot utility called ERECOVERY that doesn't exist for HP-UX 11.X. At this point, if you have set up an Ignite/UX server, you may consider using it to access the Install/Recovery command. This is sometimes more convenient than carrying a CD/DVD-ROM device around with you. Here, I am searching for an Ignite/UX server on my network:
Main Menu: Enter command or menu > sea lan install Searching for potential boot device(s) - on Path 0/0/0/0 This may take several minutes. To discontinue search, press any key (termination may not be immediate). Path# Device Path (dec) Device Path (mnem) Device Type ----- ----------------- ------------------ ----------- P0 0/0/0/0 lan.192.168.0.35 LAN Module Main Menu: Enter command or menu >
I will use this Ignite-UX depot because it is more up to date than the CD-ROM I have:
Main Menu: Enter command or menu > bo P0 Interact with IPL (Y, N, or Cancel)?> n Booting Network Station Address 00306e-5c3ff8 System IP Address 192.168.0.45 Server IP Address 192.168.0.35 Boot IO Dependent Code (IODC) revision 2 HARD Booted. ISL Revision A.00.43 Apr 12, 2000 ISL booting hpux (;0)/boot/INSTALL Boot : lan(0/0/0/0;0)/boot/WINSTALL 9777152 + 1687552 + 2602664 start 0x2012e8 alloc_pdc_pages: Relocating PDC from 0xf0f0000000 to 0x3fb01000. gate64: sysvec_vaddr = 0xc0002000 for 2 pages NOTICE: nfs3_link(): File system was registered at index 4 NOTICE: autofs_link(): File system was registered at index 6 NOTICE: cachefs_link(): File system was registered at index 7 td: claimed Tachyon XL2 Fibre Channel Mass Storage card at 0/4/0/0 td: claimed Tachyon XL2 Fibre Channel Mass Storage card at 0/6/2/0 asio0_init: unexpected SAS subsystem ID (1283) System Console is on the Built-In Serial Interface asio0_init: unexpected SAS subsystem ID (1283) Swap device table: (start & size given in 512-byte blocks) entry 0 - auto-configured on root device; ignored - no room WAR NING: no swap device configured, so dump cannot be defaulted to primary swap WARNING: No dump devices are configured Dump is disabled Starting the STREAMS daemons-phase 1 Create STCP device files <CR> $Revision: vmunix: vw: -proj selectors: CUPI80_BL2000_1108 -c 'Vw for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1108' Wed Nov 8 19:24:56 PST 2000 $ Memory Information: physical page size = 4096 bytes, logical page size = 4096 bytes Physical: 1048576 Kbytes, lockable: 727900 Kbytes, available: 853684 Kbytes ======= 09/23/03 09:04:28 EDT HP-UX Installation Initialization (Tue Sep 23 09:04:28 EDT 2003) @(#) Ignite-UX Revision B 3 8 201 @(#) install/init (opt) $Revision: 10 268 $ * Scanning system for IO devices * Querying disk device: 0/0/1/1 15 0 * Querying disk device: 0/0/2/1 15 0 * Setting keyboard language Welcome to the HP-UX installation/recovery process! Use the <tab> key to navigate between fields, and the arrow keys within fields. Use the <return/enter> key to select an item. Use the <return> or <spacebar> to pop-up a choices list. If the menus are not clear, select the "Help" item for more information. Hardware Summary: System Model: 9000/800/A500-7X +-----------------------+----------------+-------------------+ [ Scan Again ] Disks: 2 ( 67 8GB) Floppies: 0 LAN cards: 5 CD/DVDs: Tapes: 0 Memory: 1024Mb Graphics Ports: 0 IO Buses: 6 CPUs: 1 [ H/W Details] +-----------------------+----------------+-------------------+ [ Install HP-UX ] [ Run a Recovery Shell ] [ Advanced Options ] [ Reboot ] [ Help ]
I would suggest that you use the most up to date Ignite/UX depot you have, be it a CD/DVD-ROM or an Ignite/UX server on your network.
At this point, I don't want to Install HP-UX but Run a Recovery Shell . This will give me access to the automated and manual tasks I require:
Networking must be enabled in order to load a shell. (Press any key to continue.)
This makes sense because I am booting across the network from an Ignite-UX server:
LAN Interface Selection More than one network interface was detected on the system. You will need to select the interface to enable. Only one interface can be enabled, and it must be the one connected to the network that can be used in contacting the install and/or SD servers. Use the <tab> and/or arrow keys to move to the desired LAN device to enable, then press <Return>. HW Path Interface Station Address Description ---------------------------------------------------------- [ 0/0/0/0 lan0 0x00306E5C3FF8 HP_PCI_10/100Base-TX_Core ] [ 0/2/0/0/4/0 lan1 0x00306E467BF0 HP_A5506B_PCI_10/100Base-TX_4_ ] [ 0/2/0/0/5/0 lan2 0x00306E467BF1 HP_A5506B_PCI_10/100Base-TX_4_ ] [ 0/2/0/0/6/0 lan3 0x00306E467BF2 HP_A5506B_PCI_10/100Base-TX_4_ ] [ 0/2/0/0/7/0 lan4 0x00306E467BF3 HP_A5506B_PCI_10/100Base-TX_4_ ]
I will need to set up an IP address for one of my LAN interfaces:
NETWORK CONFIGURATION This system's hostname: hpeos003 Internet protocol address (e.g., 188.8.131.52) of this host: 192.168.0.33 Default gateway routing internet protocol address: 192.168.0.33 The subnet mask (e.g., 255.255.248.0 or 0xfffff800): 0xffffffe0 IP address of the Ignite-UX server system: 192.168.0.35 Is this networking information only temporary? [ No ] [ OK ] [ Cancel ] [ Help ]
This is just to give the interface an IP configuration. I am using the real IP configuration just to ensure that I don't conflict with an IP address on the network.
* Releasing DHCP allocated IP address... * Bringing up Network (lan0) add net default: gateway 192.168.0.33 * Reading configuration information from server... * Loading insf to create disk device files... Checking for required components on the Ignite Server........ loading commands into memory.... NOTE: Commands residing in the RAM-based file system are unsupported 'mini' commands. These commands are only intended for recovery purposes. Loading minimal set of commands needed for recovery... WARNING: If ANYTHING is changed on a root(/) that is mirrored a 'maintenance mode'(HPUX -lm) boot MUST be done in order to force the mirrored disk to be updated!! Press <return> to continue. HP-UX NETWORK SYSTEM RECOVERY MAIN MENU s. Search for a file b. Reboot l. Load a file r. Recover an unbootable HP-UX system x. Exit to shell c. Instructions on chrooting to an lvm /(root). This menu is for listing and loading the tools contained on the core media. Once a tool is loaded, it may be run from the shell. Some tools require other files to be present in order to successfully execute. Select one of the above:
From here, I want to choose option r. Recover an unbootable HP-UX system .
Select one of the above: r HP-UX Recovery MENU Select one of the following: a. Rebuild the bootlif (ISL, HPUX, and the AUTO file) and install all files required to boot and recover HP-UX on the root file system. b. Do not rebuild the bootlif, but install files required to boot and recover HP-UX on the root file system. c. Rebuild only the bootlif. d. Replace only the kernel on the root file system. m. Return to 'HP-UX Recovery Media Main Menu'. x. Exit to the shell. Use this menu to select the level of recovery desired. Selection:
Do not be tempted to simply choose the first option on the list. At this time, all we know is that the root/boot disk is not bootable. To that end, all we want to do is c. Rebuild only the bootlif.
Selection: c BOOTLIF PATH VERIFICATION MENU This menu must be used to determine the path to the bootlif (ISL, HPUX and the AUTO file). When the information is correct, select 'a'. INFORMATION to verify: Path to the bootlif is 0/0/1/1.15.0 Select one of the following: a. The above information is correct. b. WRONG!! The path to bootlif is incorrect. m. Return to the 'HP-UX Recovery MENU.' x. Exit to the shell. Selection:
Here we need to ensure that the hardware path to our root/boot disk is correct. It is in this instance:
Selection: a BOOT STRING VERIFICATION MENU This menu must be used to verify the system's boot string. When the information is correct, select 'a'. INFORMATION to verify: The system's boot string should be: 'hpux -lm /stand/vmunix' Select one of the following: a. The above information is correct. b. WRONG!! Prompt the user for the system's boot string. m. Return to the 'HP-UX Recovery MENU.' x. Exit to the shell. NOTE: For an LVM '/'(ROOT) the '-lm' option MUST be specified (example: 'hpux -lm (2.3.4)/stand/vmunix' ) Selection:
This is an interesting part of the procedure. The Recovery Shell wants to build an AUTO file in the LIF volume. It is suggesting that we insert a command to boot the system in maintenance mode. I think this is a good idea. However, once we effect the necessary repairs , we will need to remember to change this to the simple command of just " hpux ". It's up to you what command you put in here. Remember these things:
I will choose an option here and make a note for myself to remember to change the boot string to be a normal hpux boot string after the repairs have taken place.
Selection: a *********** Installing bootlif *********** mkboot -a hpux -lm /stand/vmunix /dev/rdsk/c1t15d0 After bringing the system back online the SWAP, BOOT and DUMP volume information in the BOOTLIF should be restored. For example: BOOT: If '/stand' is on a separate logical volume, then use lvlnboot -b /dev/<rootvg>/<lv of '/stand'> . Otherwise BOOT and ROOT are considered identical. DUMP: lvlnboot -d /dev/<rootvg>/<lv0 of dump> lvlnboot -d /dev/<rootvg>/<lv2 of dump> SWAP: lvlnboot -s /dev/<rootvg>/<lv0 of swap> Refer to the lvlnboot man page for more information. <Press return to continue> RECOVERY COMPLETION MENU Use this menu after the recovery process has installed all requested files on your system. Select one of the following: a. REBOOT the customer's system and continue with recovery. b. Return to the HP-UX Recovery Media Main Menu. Selection:
This is important to remember!
We need to ensure that the LABEL file in the new LIF volume is complete and correct. Just because we see the correct information from the lvlnboot “v command DOES NOT mean that this reflects the contents of the LABEL file. I think it is ALWAYS good practice in these instances to delete the boot, root, swap, and dump specifications and start again with the commands listed above. In this way, we will rebuild the AUTO file from scratch. If there is a LABEL file in the LIF volume that is inconsistent, this can cause problems with subsequent boots; EVEN A MAINTENANCE MODE BOOT CAN FAIL ! If a LABEL file exists, a maintenance mode boot will use its contents to help it locate the boot file system. You need to ensure that any LABEL file in the LIF volume is completely consistent. You have been warned !
We are now in a position to reboot the system.
Selection: a NOTE: System rebooting... NOTE: run_cmd: Process: 69 (/sbin/sh): killed by signal: 9. sync'ing disks (0 buffers to flush): 0 buffers not flushed 0 buffers still dirty Closing open logical volumes... Done
The system should now come back in maintenance mode due to the AUTO file we created earlier. We can continue with the repairs at that time. Here are some extracts from the boot process:
Trying Primary Boot Path ------------------------ Booting... Boot IO Dependent Code (IODC) revision 1 HARD Booted. ISL Revision A.00.43 Apr 12, 2000 ISL booting hpux -lm /stand/vmunix Boot : disk(0/0/1/184.108.40.206.0.0.0;0)/stand/vmunix 10018816 + 1753088 + 1499968 start 0x1f41e8 alloc_pdc_pages: Relocating PDC from 0xf0f0000000 to 0x3fb01000.
This looks promising because we now have a bootable boot/root disk and the system is booting in maintenance mode. I won't display all the output from the boot process, but here are the last few lines:
sbin/ioinitrc: Can't open /dev/vg00/lvol1, errno = 6 /dev/vg00/lvol1: CAN'T CHECK FILE SYSTEM. /dev/vg00/lvol1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. /dev/vg00/lvol1: No such device or address Unable to mount /stand - please check entries in /etc/fstab Skipping KRS database initialization - /stand can't be mounted INITSH: /sbin/init.d/vxvm-startup2: not found INIT: Overriding default level with level 's' INIT: SINGLE-USER MODE INIT: Running /sbin/sh #
You can see that we can't check the consistency of /dev/vg00/lvol1 because LVM has not been started up yet. We are in single-user mode to effect the necessary repairs to /dev/vg00 .
# vgcfgrestore -l -n /dev/vg00 Volume Group Configuration information in "/etc/lvmconf/vg00.conf" VG Name /dev/vg00 ---- Physical volumes : 1 ---- /dev/rdsk/c1t15d0 (Bootable) #
I can see I have a backup of vg00 . I could try to activate vg00 to see if the volume group structures are consistent. If I were to follow my principle of only fixing that which I know is broken, I could simply try and activate the volume group with the vgchange command. In this instance, I want to ensure that the LVM structures for vg00 are consistent. I am going to restore the LVM structures for vg00 , which I can do safely because vg00 is not active.
# vgcfgrestore -n /dev/vg00 /dev/rdsk/c1t15d0 Volume Group configuration has been restored to /dev/rdsk/c1t15d0 #
I can now try to activate the volume group. In lots of literature, it is suggested that you never activate vg00 in maintenance mode. In this instance, I want to see what damage has occurred to the BDRA.
# vgchange -a y /dev/vg00 Activated volume group Volume group "/dev/vg00" has been successfully changed. # vgdisplay vg00 --- Volume groups --- VG Name /dev/vg00 VG Write Access read/write VG Status available Max LV 255 Cur LV 8 Open LV 8 Max PV 16 Cur PV 1 Act PV 1 Max PE per PV 4350 VGDA 2 PE Size (Mbytes) 8 Total PE 4340 Alloc PE 935 Free PE 3405 Total PVG 0 Total Spare PVs 0 Total Spare PVs in use 0 #
This looks promising. Let's see what's happening with the BDRA.
# lvlnboot -v Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c1t15d0 (0/0/1/1.15.0) -- Boot Disk No Boot Logical Volume configured Root: lvol3 on: /dev/dsk/c1t15d0 Swap: lvol2 on: /dev/dsk/c1t15d0 Dump: lvol2 on: /dev/dsk/c1t15d0, 0 Dump: lvol1 on: /dev/dsk/c1t15d0, 1 #
This is a good example of where some of the BDRA information looks okay, with only the boot volume not being configured. The temptation would be to repair only the boot definition. I have tried this before, and it has caused me subsequent problems with later maintenance mode boots. To be consistent, I will always delete the BDRA definitions and start again in order to ensure that the BDRA and the LABEL file are consistent.
# lvrmboot -r /dev/vg00 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf # lvlnboot -v /dev/vg00 lvlnboot: The Boot Data Area is empty. Boot Definitions for Volume Group /dev/vg00: The Boot Data Area is empty. # # lvlnboot -b /dev/vg00/lvol1 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf # lvlnboot -r /dev/vg00/lvol3 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf # lvlnboot -s /dev/vg00/lvol2 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf # lvlnboot -d /dev/vg00/lvol2 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf # # lvlnboot -vR /dev/vg00 Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c1t15d0 (0/0/1/1.15.0) -- Boot Disk Boot: lvol1 on: /dev/dsk/c1t15d0 Root: lvol3 on: /dev/dsk/c1t15d0 Swap: lvol2 on: /dev/dsk/c1t15d0 Dump: lvol2 on: /dev/dsk/c1t15d0, 0 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf #
The last thing I want to ensure is that the AUTO file contains a valid boot string to boot the system in multi-user mode. In my case, I don't have mirroring in place, so I can simply use the boot string " hpux ".
# # mkboot -a "hpux" /dev/rdsk/c1t15d0 #
I can now attempt to reboot the system.
# reboot Shutdown at 00:39 (in 0 minutes) System shutdown time has arrived
I never take the system to multi-user mode direct from a maintenance mode boot because LVM has been bypassed to get to this point. You should always ensure that the system performs a full startup after a maintenance mode boot.
Start X print server(s) ............................................ N/A Start the HP SureStore E Disk Array XP Raid Manager ................ N/A Start Highly Available cluster ..................................... N/A Starting the Apache subsystem ...................................... N/A Start CDE login server .............................................. OK The system is ready. GenericSysName [HP Release B.11.11] (see /etc/issue) Console Login:
This looks promising. I can now log in and ascertain the state of the system.
root@hpeos003 who -r . run-level 3 Sep 24 05:43 3 0 S root@hpeos003 bdf Filesystem kbytes used avail %used Mounted on /dev/vg00/lvol3 1335296 1054576 263205 80% / /dev/vg00/lvol1 111637 52664 47809 52% /stand /dev/vg00/lvol8 2048000 151267 1778197 8% /var /dev/vg00/lvol7 1122304 769301 330952 70% /usr /dev/vg00/lvol4 65536 31159 32294 49% /tmp /dev/vg00/lvol6 851968 685519 156067 81% /opt /dev/vg00/lvol5 24576 5765 17678 25% /home root@hpeos003
So far so good. In some earlier versions of HP-UX, I have seen the root filesystem listed as /dev/root . This is a throwback to the maintenance mode boot. If you do see it, then you can simply send the commands rm /etc/mnttab and run mount “a to recreate it. It can cause problems with Software Distributor not knowing where the root filesystem is. Later releases of HP-UX don't have that feature.
I can't stress enough that the problems we see here are unique and can be very serious. I have taught the HP-UX Troubleshooting class on many occasions and have gone through a similar problem with the class delegates. In nearly every class, I have at least one student come to me with a system that is un-bootable and in the end we had to reinstall the OS. On investigation, they did not perform the steps as specified or more commonly including some additional steps like trying to mount additional filesystems and run commands such as vi while in maintenance mode; remember that maintenance mode is a specific boot option. All normal operations are null and void during a maintenance mode boot. The steps taken above are very detailed and specific to the solution. Omitting a step, changing the order of the steps, or adding unrelated steps may have catastrophic results. The solution above was performed as-is, on a live A-500 server running HP-UX 11i using the Install and Recover Media from March 2003. Be careful. In these situations, we are dealing with a very sick system; a single slip up may make it even sicker , requiring a complete reinstall of the operating system.