14.3 Recovering from a Missing Critical Boot File: standrootconf

     

14.3 Recovering from a Missing Critical Boot File: /stand/rootconf

This last scenario doesn't happen very often but when it does, it can be difficult to recover from. The /stand/rootconf file is used by the kernel in a maintenance mode boot to locate the root filesystem when it is a separate filesystem; this is common in any version of HP-UX as of 11.0. If the file doesn't exist during a normal boot process, it is recreated if necessary by the sysint process /sbin/ioinitrc . During a normal boot, the kernel can locate the root filesystem via the LABEL file and the BDRA. During a maintenance mode boot, these structures are not necessarily there or may be corrupt. This means that, just when you need a maintenance mode, it is going to fail if /stand/rootconf is not there.

Some people have commented to me that this is a bit of a corner case scenario . I agree because it doesn't happen often. The reason I am including it here is to demonstrate that we can get lots of functionality from the Recovery Shell by simply mounting the boot volume and performing some manual configuration changes ( otherwise impossible ) in order to recover an un-bootable situation. The premise here is that we are trying to boot the system in maintenance mode and even that is failing; it may be due to a corrupt or missing /stand/rootconf file.

The file doesn't contain much information; it's usually around 12 bytes in size . Essentially, the file contains the following:

  • A magic label of 0xdeadbeef

  • Starting block address of the root LV

  • Size of the root LV

As such, this shouldn't be too difficult to recreate or is it?

14.3.1 A magic label of 0xdeadbeef

This is simply a hexadecimal magic number at the beginning of the file, which the kernel uses to identify that the file is not corrupt.

14.3.2 Start block address of the root LV

This is a little more difficult. To calculate this, we need some additional information relating to vg00:

  • The block address where the user data starts

  • The extent size of vg00

  • The size of lvol1 in extents

  • The size of lvol2 in extents

In my configuration the following values are:

  • The block address where the user data starts = 2192

    Unless HP makes a massive change to the structure of an LVM disk (which is unlikely ), this will always remain the same at 2192KB. A block is regarded as being 1K in size.

  • The extent size of vg00 = 8MB = 8192

  • The size of lvol1 = 112MB/8 = 14 extents

  • The size of lvol2 = 2048MB/8 = 256 extents

The extent size of lvol1 and lvol2 allow us to work out the first physical extent of lvol3:

  • lvol1 = 14 extents = extents 0-13

  • lvol2 = 256 extents = extents 14 “ 269

  • lvol3 starts at extent 270

You can see this if you run the command pvdisplay “v <boot/root disk> .

We can calculate the starting block address of lvol3:

(extent size) * (starting block address of lvol3) + (start of user data)

8192 * 270 + 2912

= 2214752

= 0x21CB60


14.3.3 Size of the root LV

The size of my lvol3 = 1304MB = 163 extents

Block address of 163 extent = 163 * 8192 = 1335296 = 0x146000

14.3.4 Creating the /stand/rootconf file by hand

We have now established the three 32-bit words of information that we need to construct the /stand/rootconf file by hand:

  • A magic label of 0xdeadbeef

  • Starting block address of the root LV = 0x21CB60

  • Size of the root LV = 0xA30000

The three words will look like this:

dead beef 0021 cb60 0014 6000

We must remember that in the Recovery Shell we don't have lots of utilities, especially a tool that will allow us to edit a hexadecimal file. The easiest way I have found to recreate the rootconf file is to take the three decimal/hexadecimal words and convert them to octal. We can then use the echo command to echo the octal values (prefixed with \0) and redirect the output to create the rootconf file. Here's how it works:

HEX

de

Ad

be

ef

00

21

cb

60

00

14

60

00

OCTAL

336

255

276

357

00

41

313

140

00

24

140

00


This goes to make a command line look like:

 

 echo "36557657 
 echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > rootconf 
011340
 echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > rootconf 
0440
 echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > rootconf 
0\c" > rootconf

If it makes you feel better, here is the above example run on my system that currently does have a valid rootconf file:

 

 root@hpeos003[stand]  echo "36557657 
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
011340
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
0440
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
0\c" > graphics/ccc.gif rootconf.test
root@hpeos003[stand] xd -c rootconf 0000000 de ad be ef
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
! cb `
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
14 `
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
000000c root@hpeos003[stand] xd -c rootconf.test 0000000 de ad be ef
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
! cb `
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
14 `
 root@hpeos003[stand]  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > graphics/ccc.gif rootconf.test  root@hpeos003[stand]  xd -c rootconf  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand]  xd -c rootconf.test  0000000 de ad be ef \0 ! cb ` \0 14 ` \0 000000c root@hpeos003[stand] 
000000c root@hpeos003[stand]

Now let's get to the actual problem itself: I will recreate this by removing the rootconf file and attempting a maintenance mode boot. You will typically receive one of two PANIC messages:

  • Up to HP-UX 11.0

     

     panic: (display==0xb800, flags==0x0) all VFS_MOUNTROOTs failed:        NEED DRIVERS ??? 

  • HP-UX 11i

     

     panic: Could not create /dev/ip 

I particularly like the message panic: Could not create /dev/ip . It gives no clue as to what the problem is, but now you know. The PANIC messages are not unique to this problem. They could be caused by a number of factors including the following:

  • Bad contents of the AUTO file

    If someone has been trying to be clever with your AUTO file ( putting hardware addresses of root/boot disks which subsequently get moved to a new hardware location), you can rectify it by replacing the AUTO file with a default command from the ISL prompt:

     

     ISL> hpux set autofile "hpux" 

  • Corrupt LVM Boot Data Reserved Area (BDRA)

    We have looked at rebuilding the BDRA from a maintenance mode boot.

  • Missing /dev/console on 10.20 with JFS for root filesystem

    This sounds bizarre, but we are talking about HP-UX 10.20! We can rectify by exiting to the shell from the Recovery Shell, performing a chroot_lvmdisk (more on that in a minute), and ensuring that the following device files exist:

     

     #  mknod /dev/systty c 0 0x000000  #  mknod /dev/console c 0 0x000000  #  mknod /dev/tty c 207 0x000000  #  ln /dev/systty /dev/syscon  

  • Root filesystem is corrupted beyond repair

    This is a nasty one. Anything could have happened ; the problem could be anything from a suspect disk (get an HP Hardware Customer Engineer to diagnose a potential hardware problem) to someone destroying the root disk ( accidentally of course). We can try to repair this by exiting to the shell from the Recovery Shell and running fsck using an alternate super-block for the filesystem. The list of alternate super-blocks for an HFS filesystem can be found in the file /etc/sbtab . We can use any of them; if the corruption was severe, we may want to try a super-block deep inside the filesystem (a high number).

     

     #  /sbin/fs/hfs/fsck -b <no. from sbtab> /dev/rdsk/cXtYdZs1lvm  (ie if the root and boot logical volumes are the same) OR #  /sbin/fs/hfs/fsck -b <no. from sbtab> /dev/rdsk/cXtYdZs2lvm  (ie if the root and boot logical volumes are separate) 

    Once we have repaired the primary super-block, we need to run fsck a number of times until we achieve a clean filesystem. It is always advisable to have HP look at the disk in these circumstances to rule out a hardware problem.

  • Missing driver (or Bad/Corrupted kernel)

    This is rare, but if we have a backup kernel, we can try booting from it.

  • Other problems

    I can't think of many other problems that could cause such a PANIC message, but you never know. It's always worthwhile to check on the state of the level of patching that your system has currently achieved. Keeping abreast of current patches can mean that you don't experience known problems that have already been resolved.

Now back to our original problem. We are in a position where we need to boot in maintenance mode but are prevented from doing so because of a missing or corrupt /stand/rootconf file. First, we need to boot our system from our Core OS Install and Recovery media and get to the Recovery Shell. I am assuming that you have already read section 14.1, so I will continue our discussion from that point.

 

 HP-UX NETWORK SYSTEM RECOVERY                             MAIN MENU         s.  Search for a file         b.  Reboot         l.  Load a file         r.  Recover an unbootable HP-UX system         x.  Exit to shell         c.  Instructions on chrooting to an LVM /(root). This menu is for listing and loading the tools contained on the core media. Once a tool is loaded, it may be run from the shell. Some tools require other files to be present in order to successfully execute. Select one of the above: 

It is worth mentioning option c. Instructions on chrooting to an LVM /(root) at this point. If you have never seen the chroot command before, it allows you to run a command relative to a new root directory. FTP uses it with the anonymous ftp user. The ftpd recognizes a user (usually) called ftp , a shell of /usr/bin/false and a non-null password, and limits access to the system by making the user's root directory that of the home directory of the ftp user. See man ftpd for more details. In a similar way, we can run a shell from the Recovery Shell but have our root directory relative to a different directory to the root directory created via the mini-kernel on the Recovery media. Our new root directory will be the mount-point where the Recovery Shell mounts our boot/root disk. This makes traversing our disk-based boot/root filesystem easier; after the chroot , we can run a command like cd /etc , and it will take us to the /etc directory on disk because our new root directory is relative to the root-based filesystem, not the RAM-based Recovery Shell. It also means that our PATH statement will reference directories on disk when trying to locate commands; the Recovery media has only a limited number of commands available. I will use this to effect the changes necessary for this problem.

 

 HP-UX NETWORK SYSTEM RECOVERY                             MAIN MENU         s.  Search for a file         b.  Reboot         l.  Load a file         r.  Recover an unbootable HP-UX system         x.  Exit to shell         c.  Instructions on chrooting to an LVM /(root). This menu is for listing and loading the tools contained on the core media. Once a tool is loaded, it may be run from the shell. Some tools require other files to be present in order to successfully execute. Select one of the above:  c  Exit to the shell and run 'chroot_lvmdisk'.     Type <return> to return to the MAIN MENU. 

Let's try the chroot_lvdisk command. It will attempt to run the fsck command against the / and /stand filesystems via two special device files /dev/dsk/cXtYdZs[12]lvm and then give us further instructions:

 

 Select one of the above:  x  Type menu to return to the menu environment. # #  chroot_lvmdisk  Loading commands needed for recovery!  Enter the hardware path associated with the '/'(ROOT) file system  (example: 0/0/1/1.15.0) Is 0/0/1/1.15.0 the hardware path of the root/boot disk?[ynq]-  y  /sbin/fs/hfs/fsck -c 0 -y /dev/rdsk/c1t15d0s1lvm ** /dev/rdsk/c1t15d0s1lvm ** Last Mounted on /ROOT ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 64 files, 0 icont, 65777 used, 45860 free (108 frags, 5719 blocks) Mounting c1t15d0s1lvm to the Core Tape's /ROOT directory... /sbin/fs/vxfs/fsck -y /dev/rdsk/c1t15d0s2lvm file system is clean - log replay is not required /sbin/fs/vxfs/mount /dev/dsk/c1t15d0s2lvm /ROOT /sbin/fs/hfs/mount /dev/dsk/c1t15d0s1lvm /ROOT/stand loading /usr/sbin/chroot x ./usr/sbin/chroot, 12288 bytes, 24 tape blocks Enter 'cd /ROOT; chroot /ROOT /sbin/sh' at the shell prompt to chroot to the customer's /(root) disk. # 

Our root filesystem has been mounted on /ROOT with /stand being mounted under /ROOT/stand . We just need to follow the instructions on screen to run the chroot command. After that, all our commands will be relative to the disk-based filesystem and not relative to the RAM-based mini-root filesystem created by the Recovery media. Here goes:

 

 #  cd /ROOT  #  chroot /ROOT /sbin/sh  #  cd /stand  #  ls al  total 125580 drwxrwxrwx  10 bin        bin           1024 Sep 24 05:59 . drwxr-xr-x  24 root       root          1024 Sep 24 06:25 .. -rw-------   1 root       root             0 Mar 10  2003 .kminstall_lock -rw-r--r--   1 root       sys             20 Sep 23 07:44 bootconf drwxr-xr-x   4 root       sys           2048 Sep 24 02:59 build drwxr-xr-x   5 root       root          1024 Sep 24 03:03 dlkm drwxr-xr-x   5 root       sys           1024 Sep 24 01:22 dlkm.vmunix.prev -rw-r--r--   1 root       sys           3440 Sep 24 05:57 ioconfig -r--r--r--   1 root       sys             82 Feb 18  2003 kernrel drwxr-xr-x   2 root       sys           1024 Sep 24 05:59 krs drwxr-xr-x   2 root       root          1024 Sep 24 05:57 krs_lkg drwxr-xr-x   2 root       root          1024 Sep 24 05:59 krs_tmp drwxr-xr-x   2 root       root          8192 Feb 18  2003 lost+found -rw-------   1 root       root            12 Sep 24 05:57 rootconf.test -rw-r--r--   1 root       root          1104 Sep 24 02:57 system drwxr-xr-x   2 root       sys           1024 Sep 23 07:49 system.d -r--r--r--   1 root       sys           1104 Sep 24 02:56 system.prev -rwxr-xr-x   1 root       root       25931352 Sep 24 02:57 vmunix -rw-r--r--   1 root       sys        12342712 Sep 24 02:17 vmunix.prev -rwxr-xr-x   1 root       sys        25927256 Sep 24 01:21 vmunixBK # 

This certainly looks like my /stand filesystem; you can see my rootconf.test file from before. Now I can effect the changes necessary; I suppose I could just rename the rootconf.test file and reboot. I will run the echo command just for completeness:

 

 #  echo "36557657 
 #  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > rootconf  #  ls -al rootconf*  -rw------- 1 root root 12 Sep 24 07:04 rootconf -rw-rw-rw- 1 root sys 12 Sep 24 05:39 rootconf.test # 
011340
 #  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > rootconf  #  ls -al rootconf*  -rw------- 1 root root 12 Sep 24 07:04 rootconf -rw-rw-rw- 1 root sys 12 Sep 24 05:39 rootconf.test # 
0440
 #  echo "\0336\0255\0276\0357\000\041\0313\0140\000\024\0140\000\c" > rootconf  #  ls -al rootconf*  -rw------- 1 root root 12 Sep 24 07:04 rootconf -rw-rw-rw- 1 root sys 12 Sep 24 05:39 rootconf.test # 
0\c" > rootconf
# ls -al rootconf* -rw------- 1 root root 12 Sep 24 07:04 rootconf -rw-rw-rw- 1 root sys 12 Sep 24 05:39 rootconf.test #

It would be helpful to perform my xd command to compare the content of the files and ensure that they are the same, but the xd command is located in the /usr filesystem. I suppose I could use the cat command and some shell tests:

 

 #  x=$(cat -v rootconf)  #  y=$(cat -v rootconf.test)  #  [[ "$x" = "$y" ]]  #  echo $?  0 # 

Using shell tests, we can say that the two files look the same. This looks as good as we can expect under the circumstances. Now I can try my maintenance mode boot. I could exit from this shell and return to the Recovery Shell. I am just going to reboot from here, because this is sufficient for what I need to do right now.

 

 #  reboot  Shutdown at 07:13 (in 0 minutes) System shutdown time has arrived 

I will interact with the boot process to ensure that I can boot into maintenance mode:

 

 Processor is booting from first available device. To discontinue, press any key within 10 seconds. Boot terminated. ---- Main Menu --------------------------------------------------------------      Command                           Description      -------                           -----------      BOot [PRIALT<path>]             Boot from specified path      PAth [PRIALT] [<path>]           Display or modify a path      SEArch [DIsplayIPL] [<path>]     Search for boot devices      COnfiguration menu                Displays or sets boot values      INformation menu                  Displays hardware information      SERvice menu                      Displays service commands      DIsplay                           Redisplay the current menu      HElp [<menu><command>]           Display help for menu or command      RESET                             Restart the system ---- Main Menu: Enter command or menu >  bo pri  Interact with IPL (Y, N, or Cancel)?>  y  Booting... Boot IO Dependent Code (IODC) revision 1 HARD Booted. ISL Revision A.00.43  Apr 12, 2000 ISL>  hpux -lm  Boot : disk(0/0/1/1.15.0.0.0.0.0;0)/stand/vmunix 10018816 + 1753088 + 1499968 start 0x1f41e8 alloc_pdc_pages: Relocating PDC from 0xf0f0000000 to 0x3fb01000. ... /sbin/ioinitrc: fsck: /dev/vg00/lvol1: possible swap device (cannot determine) fsck SUSPENDED BY USER. /dev/vg00/lvol1: No such device or address Unable to mount /stand - please check entries in /etc/fstab Skipping KRS database initialization - /stand can't be mounted INITSH: /sbin/init.d/vxvm-startup2:  not found INIT: Overriding default level with level 's' INIT: SINGLE-USER MODE INIT: Running /sbin/sh # 

As you can see, I am now in maintenance mode, because fsck cannot find /dev/vg00/lvol1: No such device or address . I can now continue with repairing whatever it is I need to repair while in maintenance mode. At this time I don't know if the LVM structures are consistent. As in previous demonstration, I would proceed with a full recovery of this system on a step-by-step basis.



HP-UX CSE(c) Official Study Guide and Desk Reference
HP-UX CSE(c) Official Study Guide and Desk Reference
ISBN: N/A
EAN: N/A
Year: 2006
Pages: 434

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net