LVM mirroring is the process of having multiple Physical Extents per Logical Extent. Ideally, the additional Physical Extents are on a separate disk drive (strict allocation) and the disk is connected to a separate physical interface card (PVG-strict can be set up to assist in this). The default write behavior to a mirrored volume (parallel IO) assumes that you have taken these steps to alleviate a bottleneck in the mirroring configuration. The default mirror catch-up policy is somewhat different from a performance perspective. The Mirror Write Cache (MWC) can be seen to introduce additional IO to a disk whenever a write to a block is undertaken. In LVM-speak, a mirrored block is known as a Logical Track Group (LTG) and is 256KB in size. (This is handy because HP-UX can merge IO on consecutive disk blocks into a single merged-IO transaction, which happens to be 256KB in size ) The use of the MWC does allow for quicker recoveries after a disk failure because the MWC can be used to quickly resync only the stale extents (LTGs actually). There is a slight confusion sometimes when we talk about the MWC when it is written to disk. In order to perform a quick-resync after a disk failure, we need to ensure that the MWC is written to disk in case the system is rebooted. When the MWC is written to disk, it is referred to as the Mirror Consistency Record (MCR). However, if we choose not to use the MWC but we still want to offer some form of recovery, the no-MWC option is known as Mirror Consistency Recovery (MCR). Was that a bad choice of name ? Possibly. The way to remember it is simple: -
MWC : Fast resync. Disk version of MWC is simply that: a disk-based MWC. I don't refer to it as a Mirror Consistency Recovery because it gets confusing when you mention -
MCR : Slow resync. No disk-based MWC, so we need to resync the entire volume. -
No MCR : No resync at all. The data is never resynchronized. Probably, totally transient data that will be rebuilt after a reboot. We are using this simply to avoid an interruption to service due to a disk failure, e.g., swap space, scratch area for RDBMS system. 6.2.1 PVG-strict PVG-strict is a strictness policy in relation to how the mirroring of logical volumes is performed. I have seen PVG-strict used in a variety of crazy ways. The idea behind PVG-strict is to allow you to put disk in a Volume Group into what is in effect a subgroup . These sub-groups are intended to house disks connected to the same interface. By using PVG-strict allocation policy, you are forcing the additional Physical Extents to come from a disk from a different Physical Volume Group (PVG), which must mean they are from a disk connected to a different interface card. This is good because we are not sending mirror-IO down the same interface as the IO for the original Physical Extents (see Figure 6-1). Figure 6-1. Physical Volume Groups. The correct setup of PVGs is entirely the responsibility of the administrator. You know what you are doing, don't you?! The other consideration is that LVM allows you to explicitly specify which disk you want to mirror onto. I have known some novice administrators complain that a volume in PVG0 on disk 0, for example, was being mirrored to PVG1 (good) but disk 1. Someone pointed out that "it didn't really matter which disk we mirrored to, as long as it is in a different PVG." The (common) response is that "it makes my diagrams look messy." Yes, but that's not the point of PVGs. If you want "nice diagrams," then explicitly specify at the command line on which disk you want your mirror volume to reside. In my experience, I prefer to explicitly specify on which disk my data and mirrored data are housed. Consequently, I don't use PVGs except in specific situations (Distributed volumes): I simply create a logical volume on a specified physical volume and then set up the mirror(s) on specified physical volumes. It's relatively simple and makes your diagrams easy to understand. I am not going to go through an example of this simplistic case, as I believe it is trivial. I will go through an example of Physical Volume Groups because I have been privy to some horrendous configurations and want to ensure that you don't do the same. I set up a single mirrored logical volume in the /dev/vgora1 group as depicted in Figure 6-1 using PVGs. I set up the logical volume in a good configuration as well as an oh-not-that configuration. Here's an example of how to do it correctly (thinking about high availability and performance). root@hpeos003[] mkdir /dev/vgora1 root@hpeos003[] mknod /dev/vgora1/group c 64 0x010000 root@hpeos003[] pvcreate /dev/rdsk/c0t1d0 Physical volume "/dev/rdsk/c0t1d0" has been successfully created. root@hpeos003[] pvcreate /dev/rdsk/c0t2d0 Physical volume "/dev/rdsk/c0t2d0" has been successfully created. root@hpeos003[] pvcreate /dev/rdsk/c0t3d0 Physical volume "/dev/rdsk/c0t3d0" has been successfully created. root@hpeos003[] pvcreate /dev/rdsk/c4t9d0 Physical volume "/dev/rdsk/c4t9d0" has been successfully created. root@hpeos003[] pvcreate /dev/rdsk/c4t10d0 Physical volume "/dev/rdsk/c4t10d0" has been successfully created. root@hpeos003[] pvcreate /dev/rdsk/c4t11d0 Physical volume "/dev/rdsk/c4t11d0" has been successfully created. root@hpeos003[] root@hpeos003[] vgcreate /dev/vgora1 /dev/dsk/c0t[123]d0 /dev/dsk/c4t9d0 /dev/dsk/c4t10d0 /dev/dsk/c4t11d0 Increased the number of physical extents per physical volume to 17501. Volume group "/dev/vgora1" has been successfully created. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] That's the volume group taken care of. Now I can create the /etc/lvmpvg file by hand. This is where I need to be careful: root@hpeos003[] vi /etc/lvmpvg VG /dev/vgora1 PVG PVG0 /dev/dsk/c0t1d0 /dev/dsk/c0t2d0 /dev/dsk/c0t3d0 PVG PVG1 /dev/dsk/c4t9d0 /dev/dsk/c4t10d0 /dev/dsk/c4t11d0 root@hpeos003[] root@hpeos003[] vgdisplay -v /dev/vgora1 --- Volume groups --- VG Name /dev/vgora1 VG Write Access read/write VG Status available Max LV 255 Cur LV 0 Open LV 0 Max PV 16 Cur PV 6 Act PV 6 Max PE per PV 17501 VGDA 12 PE Size (Mbytes) 4 Total PE 104994 Alloc PE 0 Free PE 104994 Total PVG 0 Total Spare PVs 0 Total Spare PVs in use 0 --- Physical volumes --- PV Name /dev/dsk/c0t1d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c0t2d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c0t3d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c4t9d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c4t10d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c4t11d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On --- Physical volume groups --- PVG Name PVG0 PV Name /dev/dsk/c0t1d0 PV Name /dev/dsk/c0t2d0 PV Name /dev/dsk/c0t3d0 PVG Name PVG1 PV Name /dev/dsk/c4t9d0 PV Name /dev/dsk/c4t10d0 PV Name /dev/dsk/c4t11d0 root@hpeos003[] This looks fine; each PVG is made up of disks on separate interface cards. Let's create a PVG strict logical volume of 1000MB on c0t1d0 : root@hpeos003[] lvcreate s g -n db /dev/vgora1 Logical volume "/dev/vgora1/db" has been successfully created with character device "/dev/vgora1/rdb". Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] lvextend l 1 /dev/vgora1/db /dev/dsk/c0t1d0 Logical volume "/dev/vgora1/db" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] I know I haven't created the 1000MB yet. The reason is that I created it with one extent, set up the mirroring, and then extended the volume to its correct size. In this way, the initial mirroring has to mirror only one extent. root@hpeos003[] lvextend -m 1 /dev/vgora1/db The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vgora1/db" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] lvdisplay -v /dev/vgora1/db --- Logical volumes --- LV Name /dev/vgora1/db VG Name /dev/vgora1 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 4 Current LE 1 Allocated PE 2 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation PVG-strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c0t1d0 1 1 /dev/dsk/c4t9d0 1 1 --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c0t1d0 00000 current /dev/dsk/c4t9d0 00000 current root@hpeos003[] Subsequent allocation will be on both halves of the mirror simultaneously . There is the outside possibility that LVM could give me additional extents on a different disk (if the volume group had some previously allocated extents), but because there are no other volumes in this volume group yet, I am confident that my volume will grow on c0t1d0 and c4t9d0 . This is due to the existence of PVGs and the use of PVG-strict allocation. If I simply used strict allocation, I would need to be very careful how I created and extended the volume. Here is an example where I use simple strict allocation: root@hpeos003[] lvcreate -l 1 -n strict /dev/vgora1 Logical volume "/dev/vgora1/strict" has been successfully created with character device "/dev/vgora1/rstrict". Logical volume "/dev/vgora1/strict" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/ vgora1.conf root@hpeos003[] lvextend -m 1 /dev/vgora1/strict /dev/dsk/c4t9d0 The newly allocated mirrors are now being synchronized. This operation will take some time . Please wait .... Logical volume "/dev/vgora1/strict" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] lvdisplay -v /dev/vgora1/strict --- Logical volumes --- LV Name /dev/vgora1/strict VG Name /dev/vgora1 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 4 Current LE 1 Allocated PE 2 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c0t1d0 1 1 /dev/dsk/c4t9d0 1 1 --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c0t1d0 00250 current /dev/dsk/c4t9d0 00250 current root@hpeos003[] Now if I simply extend the volume to 1000MB, it is interesting where the additional extents are obtained: root@hpeos003[] lvextend -L 1000 /dev/vgora1/strict Logical volume "/dev/vgora1/strict" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] lvdisplay -v /dev/vgora1/strict more --- Logical volumes --- LV Name /dev/vgora1/strict VG Name /dev/vgora1 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 1000 Current LE 250 Allocated PE 500 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c0t1d0 250 250 /dev/dsk/c0t2d0 249 249 /dev/dsk/c4t9d0 1 1 Standard input --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c0t1d0 00250 current /dev/dsk/c4t9d0 00250 current 00001 /dev/dsk/c0t1d0 00251 current /dev/dsk/c0t2d0 00000 current 00002 /dev/dsk/c0t1d0 00252 current /dev/dsk/c0t2d0 00001 current 00003 /dev/dsk/c0t1d0 00253 current /dev/dsk/c0t2d0 00002 current ... 00246 /dev/dsk/c0t1d0 00496 current /dev/dsk/c0t2d0 00245 current 00247 /dev/dsk/c0t1d0 00497 current /dev/dsk/c0t2d0 00246 current 00248 /dev/dsk/c0t1d0 00498 current /dev/dsk/c0t2d0 00247 current 00249 /dev/dsk/c0t1d0 00499 current /dev/dsk/c0t2d0 00248 current root@hpeos003[] You can see that the additional mirror extents are obtained from the first available physical volume in the volume group. In such a situation, you need to be precise as to how to extend volumes. I will rectify this situation: root@hpeos003[] lvreduce -l 1 /dev/vgora1/strict When a logical volume is reduced useful data might get lost; do you really want the command to proceed (y/n) : y Logical volume "/dev/vgora1/strict" has been successfully reduced. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] lvextend -L 1000 /dev/vgora1/strict /dev/dsk/c0t1d0 /dev/dsk/c4t9d0 Logical volume "/dev/vgora1/strict" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] lvdisplay -v /dev/vgora1/strict more --- Logical volumes --- LV Name /dev/vgora1/strict VG Name /dev/vgora1 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 1000 Current LE 250 Allocated PE 500 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c0t1d0 250 250 /dev/dsk/c4t9d0 250 250 --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c0t1d0 00250 current /dev/dsk/c4t9d0 00250 current 00001 /dev/dsk/c0t1d0 00251 current /dev/dsk/c4t9d0 00251 current 00002 /dev/dsk/c0t1d0 00252 current /dev/dsk/c4t9d0 00252 current .. 00247 /dev/dsk/c0t1d0 00497 current /dev/dsk/c4t9d0 00497 current 00248 /dev/dsk/c0t1d0 00498 current /dev/dsk/c4t9d0 00498 current 00249 /dev/dsk/c0t1d0 00499 current /dev/dsk/c4t9d0 00499 current root@hpeos003[] We can see that this is now configured as we would expect. It is worth noting this behavior of LVM because it can lead to a less-than -optimal solution unless you are exceptionally careful. The other issue I wanted to point out is the way we create mirrored volumes. If you create the 1000MB volume first and then mirror it, it has lots of extents to mirror even though there isn't any data in there yet. In my examples above, I create a one-extent volume and then mirror only that one extent. When I come to extend this volume, LVM already knows it must assign additional extents on both sides of the mirror. This can save lots of setup time, especially if you have a large number of volumes to create. Obviously, this is not possible for existing volumes. Let's extend the db volume to its correct size and view the results: root@hpeos003[] lvextend -L 1000 /dev/vgora1/db Logical volume "/dev/vgora1/db" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] lvdisplay -v /dev/vgora1/db more --- Logical volumes --- LV Name /dev/vgora1/db VG Name /dev/vgora1 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 1000 Current LE 250 Allocated PE 500 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation PVG-strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c0t1d0 250 250 /dev/dsk/c4t9d0 250 250 --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c0t1d0 00000 current /dev/dsk/c4t9d0 00000 current 00001 /dev/dsk/c0t1d0 00001 current /dev/dsk/c4t9d0 00001 current 00002 /dev/dsk/c0t1d0 00002 current /dev/dsk/c4t9d0 00002 current 00003 /dev/dsk/c0t1d0 00003 current /dev/dsk/c4t9d0 00003 current ... 00248 /dev/dsk/c0t1d0 00248 current /dev/dsk/c4t9d0 00248 current 00249 /dev/dsk/c0t1d0 00249 current /dev/dsk/c4t9d0 00249 current root@hpeos003[] This looks okay because my mirror is situated on a disk in the other PVG, which is made up of disks on another interface. Now, let's delete this volume and rework our /etc/lvmpvg file to look like the wrong configuration in Figure 6-1: root@hpeos003[] lvremove /dev/vgora1/db The logical volume "/dev/vgora1/db" is not empty; do you really want to delete the logical volume (y/n) : y Logical volume "/dev/vgora1/db" has been successfully removed. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] vi /etc/lvmpvg VG /dev/vgora1 PVG PVG0 /dev/dsk/c0t1d0 /dev/dsk/c4t9d0 PVG PVG1 /dev/dsk/c0t2d0 /dev/dsk/c4t10d0 PVG PVG2 /dev/dsk/c0t3d0 /dev/dsk/c4t11d0 root@hpeos003[] root@hpeos003[] vgdisplay -v /dev/vgora1 --- Volume groups --- VG Name /dev/vgora1 VG Write Access read/write VG Status available Max LV 255 Cur LV 0 Open LV 0 Max PV 16 Cur PV 6 Act PV 6 Max PE per PV 17501 VGDA 12 PE Size (Mbytes) 4 Total PE 104994 Alloc PE 0 Free PE 104994 Total PVG 3 Total Spare PVs 0 Total Spare PVs in use 0 --- Physical volumes --- PV Name /dev/dsk/c0t1d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c0t2d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c0t3d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c4t9d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c4t10d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On PV Name /dev/dsk/c4t11d0 PV Status available Total PE 17499 Free PE 17499 Autoswitch On --- Physical volume groups --- PVG Name PVG0 PV Name /dev/dsk/c0t1d0 PV Name /dev/dsk/c4t9d0 PVG Name PVG1 PV Name /dev/dsk/c0t2d0 PV Name /dev/dsk/c4t10d0 PVG Name PVG2 PV Name /dev/dsk/c0t3d0 PV Name /dev/dsk/c4t11d0 root@hpeos003[] Now let's recreate our volume again: root@hpeos003[] lvcreate -s g -n db /dev/vgora1 Logical volume "/dev/vgora1/db" has been successfully created with character device "/dev/vgora1/rdb". Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] lvextend -l 1 /dev/vgora1/db /dev/dsk/c0t1d0 Logical volume "/dev/vgora1/db" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] lvextend -m 1 /dev/vgora1/db The newly allocated mirrors are now being synchronized. This operation will take some time . Please wait .... Logical volume "/dev/vgora1/db" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] lvextend -L 1000 /dev/vgora1/db Logical volume "/dev/vgora1/db" has been successfully extended. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] lvdisplay -v /dev/vgora1/db more --- Logical volumes --- LV Name /dev/vgora1/db VG Name /dev/vgora1 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 4 Current LE 1 Allocated PE 2 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation PVG-strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c0t1d0 250 250 /dev/dsk/c0t2d0 250 250 --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c0t1d0 00000 current /dev/dsk/c0t2d0 00000 current 00001 /dev/dsk/c0t1d0 00001 current /dev/dsk/c0t2d0 00001 current 00002 /dev/dsk/c0t1d0 00002 current /dev/dsk/c0t2d0 00002 current ... 00247 /dev/dsk/c0t1d0 00247 current /dev/dsk/c0t2d0 00247 current 00248 /dev/dsk/c0t1d0 00248 current /dev/dsk/c0t2d0 00248 current 00249 /dev/dsk/c0t1d0 00249 current /dev/dsk/c0t2d0 00249 current root@hpeos003[] As you can see, my mirror adheres to PVG-strict but the disk the mirror is situated on is connected to the same interface as my original disk not a good configuration! This shows you that you need to fully understand device files and how they relate to the physical connections to your disks, especially when you have a technology such as a Fibre Channel SAN involved where decoding device files can be somewhat interesting . 6.2.2 Mirroring vg00 The first thing we need to remember is the physical layout of an LVM boot disk (Figure 6-2). Figure 6-2. Layout of a bootable LVM disk. With a non-bootable disk, the BDRA is missing. The LIF header and LIF data would have to fit into the 8K of space normally reserved just for the LIF Header; consequently, it is of little use. When we set up a mirror of the logical volumes in vg00 , we must ensure that the boot, root, and primary swap volumes are created in exactly the same order as they are on the original disk. The PDC/IODC make certain assumptions regarding the order of volumes on the root/boot disk, especially in maintenance mode. Most administrators will duplicate the order of all volumes from the original disk to the disk(s) being used for the mirror(s). It is important to remember to make the other disk(s) bootable: root@hpeos003[] pvcreate -B /dev/rdsk/c3t15d0 Physical volume "/dev/rdsk/c3t15d0" has been successfully created. root@hpeos003[] vgextend /dev/vg00 /dev/dsk/c3t15d0 Volume group "/dev/vg00" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf root@hpeos003[] root@hpeos003[] lifls /dev/rdsk/c3t15d0 lifls: Can't list /dev/rdsk/c3t15d0; not a LIF volume root@hpeos003[] lifls -l /usr/lib/uxbootlf volume ISL10 data size 19521 directory size 2 filename type start size implement created =============================================================== ISL -12800 16 306 0 00/11/08 20:49:59 AUTO -12289 328 1 0 00/11/08 20:49:59 HPUX -12928 336 848 0 00/11/08 20:50:00 PAD -12290 1184 1580 0 00/11/08 20:50:00 root@hpeos003[] If you have installed the OnlineDiag product, your original root/boot disk will have a plethora of offline diagnostics tool that you may want included on the new root/boot disk mirror. root@hpeos003[] lifls /dev/rdsk/c1t15d0 ODE MAPFILE SYSLIB CONFIGDATA SLMOD2 SLDEV2 SLDRV2 SLSCSI2 MAPPER2 IOTEST2 PERFVER2 PVCU SSINFO ISL AUTO HPUX LABEL root@hpeos003[] The default file for the mkboot command (/usr/lib/uxbootlf ) does not contain the offline diagnostics tools. root@hpeos003[] lifls /usr/lib/uxbootlf ISL AUTO HPUX PAD root@hpeos003[] One way of ensuring that the diagnostics tools are included on the new disk is to use your original disk as the source of the mkboot command: root@hpeos003[] mkboot -b /dev/rdsk/c1t15d0 /dev/rdsk/c3t15d0 root@hpeos003[] lifls -l /dev/rdsk/c3t15d0 volume ISL10 data size 7984 directory size 8 filename type start size implement created =============================================================== ODE -12960 584 848 0 02/07/08 12:33:46 MAPFILE -12277 1432 128 0 02/07/08 12:33:46 SYSLIB -12280 1560 353 0 02/07/08 12:33:46 CONFIGDATA -12278 1920 235 0 02/07/08 12:33:46 SLMOD2 -12276 2160 141 0 02/07/08 12:33:46 SLDEV2 -12276 2304 135 0 02/07/08 12:33:46 SLDRV2 -12276 2440 205 0 02/07/08 12:33:46 SLSCSI2 -12276 2648 131 0 02/07/08 12:33:46 MAPPER2 -12279 2784 142 0 02/07/08 12:33:46 IOTEST2 -12279 2928 411 0 02/07/08 12:33:46 PERFVER2 -12279 3344 124 0 02/07/08 12:33:46 PVCU -12801 3472 64 0 02/07/08 12:33:46 SSINFO -12286 3536 2 0 02/07/08 12:33:46 ISL -12800 3544 306 0 00/11/08 20:49:59 AUTO -12289 3856 1 0 00/11/08 20:49:59 HPUX -12928 3864 848 0 00/11/08 20:50:00 LABEL BIN 4712 8 0 03/10/01 15:59:56 root@hpeos003[] root@hpeos003[] lifls -l /dev/rdsk/c1t15d0 volume ISL10 data size 7984 directory size 8 filename type start size implement created =============================================================== ODE -12960 584 848 0 02/07/08 12:33:46 MAPFILE -12277 1432 128 0 02/07/08 12:33:46 SYSLIB -12280 1560 353 0 02/07/08 12:33:46 CONFIGDATA -12278 1920 235 0 02/07/08 12:33:46 SLMOD2 -12276 2160 141 0 02/07/08 12:33:46 SLDEV2 -12276 2304 135 0 02/07/08 12:33:46 SLDRV2 -12276 2440 205 0 02/07/08 12:33:46 SLSCSI2 -12276 2648 131 0 02/07/08 12:33:46 MAPPER2 -12279 2784 142 0 02/07/08 12:33:46 IOTEST2 -12279 2928 411 0 02/07/08 12:33:46 PERFVER2 -12279 3344 124 0 02/07/08 12:33:46 PVCU -12801 3472 64 0 02/07/08 12:33:46 SSINFO -12286 3536 2 0 02/07/08 12:33:46 ISL -12800 3544 306 0 00/11/08 20:49:59 AUTO -12289 3856 1 0 00/11/08 20:49:59 HPUX -12928 3864 848 0 00/11/08 20:50:00 LABEL BIN 4712 8 0 03/10/01 15:59:56 root@hpeos003[] We just need to ensure that the LABEL and the AUTO files are updated whenever we make changes to the root/boot configuration. Now we can begin mirroring the volumes in the correct order. The best way to know the correct order is to look at the minor number for the exiting logical volumes: root@hpeos003[] ll /dev/vg00/[!rg]* brw-r----- 1 root sys 64 0x0000 01 Oct 29 08:34 /dev/vg00/lvol1 brw-r----- 1 root sys 64 0x0000 02 Oct 1 21:21 /dev/vg00/lvol2 brw-r----- 1 root sys 64 0x0000 03 Oct 1 21:21 /dev/vg00/lvol3 brw-r----- 1 root sys 64 0x0000 04 Oct 1 21:21 /dev/vg00/lvol4 brw-r----- 1 root sys 64 0x0000 05 Oct 1 21:21 /dev/vg00/lvol5 brw-r----- 1 root sys 64 0x0000 06 Oct 1 21:21 /dev/vg00/lvol6 brw-r----- 1 root sys 64 0x0000 07 Oct 1 21:21 /dev/vg00/lvol7 brw-r----- 1 root sys 64 0x0000 08 Oct 1 21:21 /dev/vg00/lvol8 brw-r----- 1 root sys 64 0x0000 09 Oct 24 13:24 /dev/vg00/lvol9 root@hpeos003[] Because I have only two disks in vg00 , I don't need to specify the target Physical Volume on the lvextend command line. The mirroring will take a few minutes to complete. Time for coffee/tea/beer/ : root@hpeos003[] for x in 1 2 3 4 5 6 7 8 9 > do > lvextend -m 1 /dev/vg00/lvol${x} > done The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol1" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol2" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol3" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol4" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol5" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol6" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol7" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol8" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf The newly allocated mirrors are now being synchronized. This operation will take some time. Please wait .... Logical volume "/dev/vg00/lvol9" has been successfully extended. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/ vg00.conf root@hpeos003[] Now I can ensure that we override quorum requirements in the event that we lose a disk and a reboot occurs (ensure that you update the AUTO file on both disks): root@hpeos003[] mkboot -a "hpux -lq" /dev/rdsk/c1t15d0 root@hpeos003[] mkboot -a "hpux -lq" /dev/rdsk/c3t15d0 root@hpeos003[] lifcp /dev/rdsk/c1t15d0:AUTO - hpux -lq root@hpeos003[] lifcp /dev/rdsk/c3t15d0:AUTO - hpux -lq root@hpeos003[] We should ensure that the BDRA (and the LABEL file) is updated (it should already have been done) with the complete list of volumes in vg00 : root@hpeos003[] lvlnboot -vR /dev/vg00 Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c1t15d0 (0/0/1/1.15.0) -- Boot Disk /dev/dsk/c3t15d0 (0/0/2/1.15.0) -- Boot Disk Boot: lvol1 on: /dev/dsk/c1t15d0 /dev/dsk/c3t15d0 Root: lvol3 on: /dev/dsk/c1t15d0 /dev/dsk/c3t15d0 Swap: lvol2 on: /dev/dsk/c1t15d0 /dev/dsk/c3t15d0 Dump: lvol2 on: /dev/dsk/c1t15d0, 0 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf root@hpeos003[] Now we can set our alternate boot path to be the hardware path of the mirror disk: root@hpeos003[] lssf /dev/dsk/c3t15d0 sdisk card instance 3 SCSI target 15 SCSI LUN 0 section 0 at address 0/0/2/1.15.0 /dev/dsk /c3t15d0 root@hpeos003[] setboot -a 0/0/2/1.15.0 root@hpeos003[] setboot Primary bootpath : 0/0/1/1.15.0 Alternate bootpath : 0/0/2/1.15.0 Autoboot is ON (enabled) Autosearch is ON (enabled) root@hpeos003[] You can see that Autoboot and Autoseatch are both ON . This looks good. On newer , partitioned servers, you have access to an additional boot device known as the High Availability Alternate ( HAA ). This was designed as the second boot device. In such a situation, the HAA would be set (using parmodify ) to be the address of the disk containing the mirrored volumes. You could set the Alternate bootpath to be a third mirror if you had configured it. At the moment, all appears well with the system and our mirroring appears to be in place and working. One small issue is the use of the MWC for our primary swap area: root@hpeos003[] lvdisplay /dev/vg00/lvol2 --- Logical volumes --- LV Name /dev/vg00/lvol2 VG Name /dev/vg00 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 2048 Current LE 256 Allocated PE 512 Stripes 0 Stripe Size (Kbytes) 0 Bad block off Allocation strict/contiguous IO Timeout (Seconds) default root@hpeos003[] Swap space is one of these situations where we don't want LVM to worry about tracking and recovering changes in the volume. After a reboot, the data in a swap area is gone, i.e., it is completely transient. The only problem with changing the Consistency Recovery setting for a volume is that the volume cannot be opened in order to change the configuration. The only way this can be achieved for Primary Swap is to boot the system in LVM Maintenance Mode and to activate vg00 without starting the resynchronization process. It is lots of work to go through, but it will mean a reduction in IO to the root/boot disk, and on reboot, it will reduce the time the system takes to resynchronize all volumes in vg00 after a failure. Processor is booting from first available device. To discontinue, press any key within 10 seconds. Boot terminated. ---- Main Menu -------------------------------------------------------------- Command Description ------- ----------- BOot [PRIALT<path>] Boot from specified path PAth [PRIALT] [<path>] Display or modify a path SEArch [DIsplayIPL] [<path>] Search for boot devices COnfiguration menu Displays or sets boot values INformation menu Displays hardware information SERvice menu Displays service commands DIsplay Redisplay the current menu HElp [<menu><command>] Display help for menu or command RESET Restart the system ---- Main Menu: Enter command or menu > bo pri Interact with IPL (Y, N, or Cancel)?> y Booting... Boot IO Dependent Code (IODC) revision 1 HARD Booted. ISL Revision A.00.43 Apr 12, 2000 ISL> hpux lm Boot : disk(0/0/1/1.15.0.0.0.0.0;0)/stand/vmunix 10485760 + 1781760 + 1515760 start 0x1f8fe8 alloc_pdc_pages: Relocating PDC from 0xf0f0000000 to 0x3fb01000. gate64: sysvec_vaddr = 0xc0002000 for 2 pages NOTICE: nfs3_link(): File system was registered at index 4. NOTICE: autofs_link(): File system was registered at index 5. NOTICE: cachefs_link(): File system was registered at index 6. td: claimed Tachyon XL2 Fibre Channel Mass Storage card at 0/4/0/0 td: claimed Tachyon XL2 Fibre Channel Mass Storage card at 0/6/2/0 asio0_init: unexpected SAS subsystem ID (1283) System Console is on the Built-In Serial Interface asio0_init: unexpected SAS subsystem ID (1283) Logical volume 64, 0x3 configured as ROOT Logical volume 64, 0x2 configured as SWAP Logical volume 64, 0x2 configured as DUMP the kernel tunable maxswapchunks of size 1048576 is too big. please decrease max swapchunks to size 16384 or less and re-configure your system Swap device table: (start & size given in 512-byte blocks) entry 0 - major is 31, minor is 0x1f003; ... /sbin/ioinitrc: fsck: /dev/vg00/lvol1: possible swap device (cannot determine) fsck SUSPENDED BY USER. /dev/vg00/lvol1: No such device or address Unable to mount /stand - please check entries in /etc/fstab Skipping KRS database initialization - /stand can't be mounted INIT: Overriding default level with level 's' INIT: SINGLE USER MODE INIT: Running /sbin/sh # # vgchange -a y -s /dev/vg00 Activated volume group Volume group "/dev/vg00" has been successfully changed. # lvchange -M n -c n /dev/vg00/lvol2 Logical volume "/dev/vg00/lvol2" has been successfully changed. Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf # # lvdisplay /dev/vg00/lvol2 --- Logical volumes --- LV Name /dev/vg00/lvol2 VG Name /dev/vg00 LV Permission read/write LV Status available/syncd Mirror copies 1 Consistency Recovery NONE Schedule parallel LV Size (Mbytes) 2048 Current LE 256 Allocated PE 512 Stripes 0 Stripe Size (Kbytes) 0 Bad block off Allocation strict/contiguous IO Timeout (Seconds) default # # reboot Shutdown at 10:56 (in 0 minutes) System shutdown time has arrived One of the last parts of the configuration we need to test is the ability for the system to sustain a real disk failure. If we have external disks, we can simply turn off one of the external disks. If not, we may have to be a bit crueler: one test I was involved in was to destroy the data stored on disk (using the dd command) to overwrite the beginning of the disk acting as our current Primary bootpath and then reboot the system. We go through two different tests: -
Lose a disk online, but have it replaced while the system is still running. -
Lose a disk, and sustain a reboot before the disk can be replaced. These tests will destroy the data on the current disks. Some administrators are reluctant to perform such tests. If you are one of those reluctant administrators ask yourself this question: How do you know your recovery procedures really work? Here goes. 6.2.3 Lose a disk online, but have it replaced while the system is still running In this scenario, a disk fails while the system is up and running. Because we have mirroring in place, we should see no interruption to service. In our case, it will be the disk from which the system booted. The steps to initiate the recovery are similar for a root/boot volume group as a data volume group, the main differences being the reinstatement of the LIF/boot data. I am using vg00 because it is a more dramatic test and I want to be sure that my system can sustain the loss of a root/boot disk. Although both disks are viewed as the same, I am going to lose the disk I booted from. I can establish which disk I booted from by looking at the kernel- maintained variable, boot_string : root@hpeos003[] echo "boot_string/S" adb /stand/vmunix /dev/kmem boot_string: boot_string: disk(0/0/1/1.15.0.0.0.0.0;0)/stand/vmunix root@hpeos003[] We can see which disk (and which kernel) we booted from. I am going to physically remove this disk from the system. I am using self-terminating SCSI cables, which allows me to remove SCSI disks from the system without generating noise on the SCSI interface and could cause a system panic. This should have a minimal impact on the system. The impact should be limited to the length of PV Timeout ( default = the default returned by the device driver for the specific device, normally 30 or 60 seconds for most disks), where any outstanding IOs to the primary disk fail and are re-routed to the mirror disk. (Any current commands may pause slightly while the PV Timeout takes effect. This is just about enough time for a user to look quizzically at her screen, look-up the number of the internal Help Desk, and just as she's calling the number, the PV Timeout expires and the command breathes back into life.) We should see a SCSI lbolt error in syslog : root@hpeos003[] more /var/adm/syslog/syslog.log ... Oct 29 17:47:23 hpeos003 vmunix: SCSI: Reset detected -- lbolt: 644442, bus: 1 Oct 29 17:47:23 hpeos003 vmunix: lbp->state: 4060 Oct 29 17:47:23 hpeos003 vmunix: lbp->offset: ffffffff Oct 29 17:47:23 hpeos003 vmunix: lbp->uPhysScript: f8040000 Oct 29 17:47:23 hpeos003 vmunix: From most recent interrupt: Oct 29 17:47:23 hpeos003 vmunix: ISTAT: 02, SIST0: 02, SIST1: 00, DSTAT: 80 , DSPS: f8040028 Oct 29 17:47:23 hpeos003 vmunix: lsp: 0000000000000000 Oct 29 17:47:23 hpeos003 vmunix: lbp->owner: 0000000000000000 Oct 29 17:47:23 hpeos003 vmunix: scratch_lsp: 0000000000000000 Oct 29 17:47:23 hpeos003 vmunix: Pre-DSP script dump [fffffffff80400e0]: Oct 29 17:47:23 hpeos003 vmunix: e0340004 00000000 e0100004 00000000 Oct 29 17:47:23 hpeos003 vmunix: 48000000 00000000 78350000 00000000 Oct 29 17:47:23 hpeos003 vmunix: Script dump [fffffffff8040100]: Oct 29 17:47:23 hpeos003 vmunix: 50000000 f8040028 80000000 0000000b Oct 29 17:47:23 hpeos003 vmunix: 0f000001 f80405c0 60000040 00000000 Oct 29 17:48:02 hpeos003 vmunix: DIAGNOSTIC SYSTEM WARNING: Oct 29 17:48:02 hpeos003 vmunix: The diagnostic logging facility has started receiving excessive Oct 29 17:48:02 hpeos003 vmunix: errors from the I/O subsystem. I/O error entries will be lost Oct 29 17:48:02 hpeos003 vmunix: until the cause of the excessive I/O logging is corrected. Oct 29 17:48:02 hpeos003 vmunix: If the diaglogd daemon is not active, use the Daemon Startup command Oct 29 17:48:02 hpeos003 vmunix: in stm to start it. Oct 29 17:48:02 hpeos003 vmunix: If the diaglogd daemon is active, use the logtool utility in stm Oct 29 17:48:02 hpeos003 vmunix: to determine which I/O subsystem is logging excessive errors. Oct 29 17:48:02 hpeos003 vmunix: Oct 29 17:48:02 hpeos003 vmunix: SCSI: Async write error -- dev: b 31 0x01f000,errno: 126, resid: 2048, Oct 29 17:48:02 hpeos003 vmunix: blkno: 6230152, sectno: 12460304, offset: 2084708352, bcount: 2048. Oct 29 17:48:02 hpeos003 vmunix: blkno: 5750354, sectno: 11500708, offset: 1593395200, bcount: 2048. Oct 29 17:48:02 hpeos003 vmunix: blkno: 5625944, sectno: 11251888, offset: 1465999360, bcount: 2048. Oct 29 17:48:02 hpeos003 vmunix: blkno: 5045018, sectno: 10090036, offset: 871131136, bcount: 2048. Oct 29 17:48:02 hpeos003 vmunix: blkno: 4848930, sectno: 9697860, offset: 670337024 , bcount: 2048. Oct 29 17:48:02 hpeos003 vmunix: blkno: 4848962, sectno: 9697924, offset: 670369792 , bcount: 2048. ... Oct 29 17:48:02 hpeos003 vmunix: LVM: vg[0]: pvnum=0 (dev_t=0x1f01f000) is POWERFAILED Oct 29 17:48:02 hpeos003 vmunix: SCSI: Write error -- dev: b 31 0x01f000 , errno: 126, resid: 10240, Oct 29 17:48:02 hpeos003 vmunix: blkno: 2438, sectno: 4876, offset: 2496512, bcount : 10240. ... root@hpeos003[] I have italicized and underlined the points of interest. We can see from the LVM POWERFAIL message the dev_t = block device known by LVM = 0x1f01f000 . This can be decoded as follows : Hexadecimal | Major Number | Minor Number | 0x | 1f (=31 in decimal) | 01f000 | When we look in /dev/dsk , we see the major number is 31 as expected. When we look for the relevant minor number, we see the disk in question: /dev/dsk/c1t15d0 . root@hpeos003[] ll /dev/dsk total 0 brw-r----- 1 bin sys 31 0x000000 Oct 25 12:05 c0t0d0 brw-r----- 1 bin sys 31 0x001000 Oct 25 12:05 c0t1d0 brw-r----- 1 bin sys 31 0x002000 Oct 25 12:05 c0t2d0 brw-r----- 1 bin sys 31 0x003000 Oct 29 11:40 c0t3d0 brw-r----- 1 bin sys 31 0x004000 Oct 29 11:40 c0t4d0 brw-r----- 1 bin sys 31 0x005000 Oct 29 11:40 c0t5d0 brw-r----- 1 bin sys 31 0x006000 Oct 29 11:40 c0t6d0 brw-r----- 1 bin sys 31 0x01f000 Oct 1 17:29 c1t15d0 brw-r----- 1 bin sys 31 0x03f000 Oct 28 11:30 c3t15d0 brw-r----- 1 bin sys 31 0x04a000 Oct 29 11:40 c4t10d0 brw-r----- 1 bin sys 31 0x04b000 Oct 29 11:40 c4t11d0 brw-r----- 1 bin sys 31 0x04c000 Oct 29 11:40 c4t12d0 brw-r----- 1 bin sys 31 0x04d000 Oct 29 11:40 c4t13d0 brw-r----- 1 bin sys 31 0x04e000 Oct 29 11:40 c4t14d0 brw-r----- 1 bin sys 31 0x048000 Oct 29 11:40 c4t8d0 brw-r----- 1 bin sys 31 0x049000 Oct 29 11:40 c4t9d0 root@hpeos003[] lssf /dev/dsk/c1t15d0 sdisk card instance 1 SCSI target 15 SCSI LUN 0 section 0 at address 0/0/1/1.15.0 /dev/dsk /c1t15d0 root@hpeos003[] This is the disk from which we booted. The system is still up and running, but we are starting to accumulate stale extents. root@hpeos003[] lvdisplay -v /dev/vg00/lvol4 --- Logical volumes --- LV Name /dev/vg00/lvol4 VG Name /dev/vg00 LV Permission read/write LV Status available/stale Mirror copies 1 Consistency Recovery MWC Schedule parallel LV Size (Mbytes) 64 Current LE 8 Allocated PE 16 Stripes 0 Stripe Size (Kbytes) 0 Bad block on Allocation strict IO Timeout (Seconds) default --- Distribution of logical volume --- PV Name LE on PV PE on PV /dev/dsk/c1t15d0 8 8 /dev/dsk/c3t15d0 8 8 --- Logical extents --- LE PV1 PE1 Status 1 PV2 PE2 Status 2 00000 /dev/dsk/c1t15d0 00433 stale /dev/dsk/c3t15d0 00433 current 00001 /dev/dsk/c1t15d0 00434 current /dev/dsk/c3t15d0 00434 current 00002 /dev/dsk/c1t15d0 00435 current /dev/dsk/c3t15d0 00435 current 00003 /dev/dsk/c1t15d0 00436 current /dev/dsk/c3t15d0 00436 current 00004 /dev/dsk/c1t15d0 00437 current /dev/dsk/c3t15d0 00437 current 00005 /dev/dsk/c1t15d0 00438 current /dev/dsk/c3t15d0 00438 current 00006 /dev/dsk/c1t15d0 00439 current /dev/dsk/c3t15d0 00439 current 00007 /dev/dsk/c1t15d0 00440 current /dev/dsk/c3t15d0 00440 current root@hpeos003[] root@hpeos003[] lvdisplay -v /dev/vg00/lvol* grep stale wc -l 46 root@hpeos003[] If this disk was simply having an intermittent problem, it could come back online and LVM would recognize the disk, because it would still contain a full compliment of LVM headers. LVM would resynchronize the stale extents with no further input from the administrator. However, if the disk has really failed, it will need replacing. In my case, I need to replace the failed disk with a new one. In this instance if, I try to reactivate the volume group, it will fail to recognize the LVM headers on the disk; they're missing ”it's a new disk! Here's the message from LVM found in syslog : root@hpeos003[] tail /var/adm/syslog/syslog.log Oct 29 18:14:31 hpeos003 vmunix: Oct 29 18:14:36 hpeos003 above message repeats 3 times Oct 29 18:14:31 hpeos003 vmunix: SCSI: Read error -- dev: b 31 0x01f000, errno:126, resid: 2048, Oct 29 18:14:36 hpeos003 above message repeats 3 times Oct 29 18:14:31 hpeos003 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 2048. Oct 29 18:14:37 hpeos003 above message repeats 3 times Oct 29 18:14:41 hpeos003 vmunix: Oct 29 18:14:41 hpeos003 vmunix: SCSI: Read error -- dev: b 31 0x01f000, errno:126, resid: 2048, Oct 29 18:14:41 hpeos003 vmunix: blkno: 8, sectno: 16, offset: 8192, bcount: 2048. Oct 29 18:14:46 hpeos003 vmunix: LVM: Failed to restore PV 0 to VG 0! Identifier mismatch. root@hpeos003[] root@hpeos003[] vgchange -a y /dev/vg00 vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk/c1t15d0": Cross-device link Volume group "/dev/vg00" has been successfully changed. You have mail in /var/mail/root root@hpeos003[] I need to restore the LVM headers onto the disk using vgcfgrestore and then try to activate the volume group to allow the kernel to recognize the new disk as being part of this volume group: root@hpeos003[] vgcfgrestore -l -n /dev/vg00 Volume Group Configuration information in "/etc/lvmconf/vg00.conf" VG Name /dev/vg00 ---- Physical volumes : 2 ---- /dev/rdsk/c1t15d0 (Bootable) /dev/rdsk/c3t15d0 (Bootable) root@hpeos003[] vgcfgrestore -n /dev/vg00 /dev/rdsk/c1t15d0 Volume Group configuration has been restored to /dev/rdsk/c1t15d0 root@hpeos003[] vgchange -a y /dev/vg00 Volume group "/dev/vg00" has been successfully changed. root@hpeos003[] Because this is a new disk, LVM will have marked all extents as being stale. We will need to resynchronize the entire volume group: root@hpeos003[] lvdisplay -v /dev/vg00/lvol* grep stale wc -l 1114 root@hpeos003[] root@hpeos003[] vgsync /dev/vg00 Resynchronized logical volume "/dev/vg00/lvol1". Resynchronized logical volume "/dev/vg00/lvol2". Resynchronized logical volume "/dev/vg00/lvol3". Resynchronized logical volume "/dev/vg00/lvol4". Resynchronized logical volume "/dev/vg00/lvol5". Resynchronized logical volume "/dev/vg00/lvol6". Resynchronized logical volume "/dev/vg00/lvol7". Resynchronized logical volume "/dev/vg00/lvol8". Resynchronized logical volume "/dev/vg00/lvol9". Resynchronized volume group "/dev/vg00". root@hpeos003[] Because this can take some time to complete, it might have been a good idea to run vgsync in the background. If this was a non-boot disk, we would be finished; however, this is vg00 , so our job is far from over. We need to reinstate the boot and LIF data: root@hpeos003[] lifls -l /dev/rdsk/c1t15d0 lifls: Can't list /dev/rdsk/c1t15d0; not a LIF volume root@hpeos003[] mkboot -b /dev/rdsk/c3t15d0 /dev/rdsk/c1t15d0 root@hpeos003[] lifls -l /dev/rdsk/c1t15d0 volume ISL10 data size 7984 directory size 8 filename type start size implement created =============================================================== ODE -12960 584 848 0 02/07/08 12:33:46 MAPFILE -12277 1432 128 0 02/07/08 12:33:46 SYSLIB -12280 1560 353 0 02/07/08 12:33:46 CONFIGDATA -12278 1920 235 0 02/07/08 12:33:46 SLMOD2 -12276 2160 141 0 02/07/08 12:33:46 SLDEV2 -12276 2304 135 0 02/07/08 12:33:46 SLDRV2 -12276 2440 205 0 02/07/08 12:33:46 SLSCSI2 -12276 2648 131 0 02/07/08 12:33:46 MAPPER2 -12279 2784 142 0 02/07/08 12:33:46 IOTEST2 -12279 2928 411 0 02/07/08 12:33:46 PERFVER2 -12279 3344 124 0 02/07/08 12:33:46 PVCU -12801 3472 64 0 02/07/08 12:33:46 SSINFO -12286 3536 2 0 02/07/08 12:33:46 ISL -12800 3544 306 0 00/11/08 20:49:59 AUTO -12289 3856 1 0 00/11/08 20:49:59 HPUX -12928 3864 848 0 00/11/08 20:50:00 LABEL BIN 4712 8 0 03/10/01 15:59:56 root@hpeos003[] root@hpeos003[] mkboot -a "hpux -lq" /dev/rdsk/c1t15d0 root@hpeos003[] lifcp /dev/rdsk/c1t15d0:AUTO - hpux -lq root@hpeos003[] root@hpeos003[] lvlnboot -vR /dev/vg00 Boot Definitions for Volume Group /dev/vg00: Physical Volumes belonging in Root Volume Group: /dev/dsk/c1t15d0 (0/0/1/1.15.0) -- Boot Disk /dev/dsk/c3t15d0 (0/0/2/1.15.0) -- Boot Disk Boot: lvol1 on: /dev/dsk/c1t15d0 /dev/dsk/c3t15d0 Root: lvol3 on: /dev/dsk/c1t15d0 /dev/dsk/c3t15d0 Swap: lvol2 on: /dev/dsk/c1t15d0 /dev/dsk/c3t15d0 Dump: lvol2 on: /dev/dsk/c1t15d0, 0 Volume Group configuration for /dev/vg00 has been saved in /etc/lvmconf/vg00.conf root@hpeos003[] This now looks okay. 6.2.4 Lose a disk, and sustain a reboot before the disk can be replaced There is essentially no difference in this scenario except that if the volume group in question is vg00 , you might not realize that your system has rebooted: it could happen at 02:00 and unless you have some form of automated monitoring, you may not realize it has happened . Consequently, you may be running with a system that is lacking a fundamental high availability feature: your root/boot disk has no mirroring functionality. If the volume group in question is a data volume group, you may well be in a situation where the volume group has not been activated; if you do not have a quorum (more than 50 percent of the disk online), the volume group will not be activated at boot time. Overriding quorum for a non-vg00 volume group requires that you manually modify the startup script /sbin/lvmrc to use the “q n option to vgchange . I would be very careful in such an instance. If you lose three out of six disks, it may be okay to override the quorum, i.e., you lost one half of your mirror configuration, but beyond that I think it is more than a little suspicious if I don't have quorum for a volume group. In our example with vg00 earlier, we only had two disks in vg00 , the original and a mirror, so overriding quorum was acceptable. In this simple example, I want to show you the system attempting to boot from a broken Primary bootpath but succeeding because we have stipulated the “lq option to hpux in order to override quorum. We will also see LVM failing to activate a data volume group at boot time, because we do not have quorum. Processor is booting from first available device. To discontinue, press any key within 10 seconds. 10 seconds expired. Proceeding... Trying Primary Boot Path ------------------------ Booting... Boot IO Dependent Code (IODC) revision 1 IPL error: bad LIF magic. .... FAILED. Trying Alternate Boot Path -------------------------- Boot IO Dependent Code (IODC) revision 1 .... SUCCEEDED! HARD Booted. ISL Revision A.00.43 Apr 12, 2000 ISL booting hpux -lq Boot : disk(0/0/2/1.15.0.0.0.0.0;0)/stand/vmunix 10485760 + 1781760 + 1515760 start 0x1f8fe8 alloc_pdc_pages: Relocating PDC from 0xf0f0000000 to 0x3fb01000. gate64: sysvec_vaddr = 0xc0002000 for 2 pages NOTICE: nfs3_link(): File system was registered at index 4. NOTICE: autofs_link(): File system was registered at index 5. root@hpeos003[] vgdisplay /dev/vgora1 vgdisplay: Volume group not activated. vgdisplay: Cannot display volume group "/dev/vgora1". root@hpeos003[] root@hpeos003[] vgchange -a y /dev/vgora1 vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk/ c4t9d0": Cross-device link vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk/ c4t10d0": Cross-device link vgchange: Warning: Couldn't attach to the volume group physical volume "/dev/dsk/ c4t11d0": Cross-device link vgchange: Warning: couldn't query physical volume "/dev/dsk/ c4t9d0": The specified path does not correspond to physical volume attached to this volume group vgchange: Warning: couldn't query physical volume "/dev/dsk/c4t10d0": The specified path does not correspond to physical volume attached to this volume group vgchange: Warning: couldn't query physical volume "/dev/dsk/c4t11d0": The specified path does not correspond to physical volume attached to this volume group vgchange: Warning: couldn't query physical volume "/dev/dsk/c4t11d0": The specified path does not correspond to physical volume attached to this volume group vgchange: Warning: couldn't query all of the physical volumes. vgchange: Couldn't activate volume group "/dev/vgora1": Quorum not present, or some physical volume(s) are missing. root@hpeos003[] At this point, I would need to have the faulty disk replaced and instigate a recovery of vg00 and vgora1 . The recovery of vg00 follows the same tasks detailed in the previous example. I will document the recovery of vgora1 . root@hpeos003[] vgcfgrestore -l -n /dev/vgora1 Volume Group Configuration information in "/etc/lvmconf/vgora1.conf" VG Name /dev/vgora1 ---- Physical volumes : 6 ---- /dev/rdsk/c0t1d0 (Non-bootable) /dev/rdsk/c0t2d0 (Non-bootable) /dev/rdsk/c0t3d0 (Non-bootable) /dev/rdsk/c4t9d0 (Non-bootable) /dev/rdsk/c4t10d0 (Non-bootable) /dev/rdsk/c4t11d0 (Non-bootable) root@hpeos003[] root@hpeos003[] vgcfgrestore -n /dev/vgora1 /dev/rdsk/c4t9d0 Volume Group configuration has been restored to /dev/rdsk/c4t9d0 root@hpeos003[] vgcfgrestore -n /dev/vgora1 /dev/rdsk/c4t10d0 Volume Group configuration has been restored to /dev/rdsk/c4t10d0 root@hpeos003[] vgcfgrestore -n /dev/vgora1 /dev/rdsk/c4t11d0 Volume Group configuration has been restored to /dev/rdsk/c4t11d0 root@hpeos003[] root@hpeos003[] vgchange -a y /dev/vgora1 Activated volume group Volume group "/dev/vgora1" has been successfully changed. root@hpeos003[] root@hpeos003[] vgsync /dev/vgora1 Resynchronized logical volume "/dev/vgora1/db". Resynchronized volume group "/dev/vgora1". root@hpeos003[] As you can see, the process follows a very similar sequence as the process when we recovered vg00 earlier. 6.2.5 Spare volumes The idea behind a spare volume is the same concept as a hot-standby . The disk is up and running, part of the volume group, sitting there waiting for the failure of a live disk. The spare volume is used in conjunction with mirroring. In the event of a live disk failing, the spare volume will automatically be written to in order re-establish the mirror configuration. This maintains our high-availability configuration with no administrative input. We can then schedule for the failed disk to be replaced. A major drawback with spare volumes is that you are losing all use of the spare volume. It cannot be used for anything other than being a spare volume; you cannot create any logical volumes on it. The main advantage is the possibility of automatically maintaining a high-availability configuration in the event of a disk failure. Most high-availability disk arrays offer this as an option, whereas if you enable a hot-standby, it is expected that you are going to lose capacity and performance. (I know some disk arrays, such as the HP Virtual Array, spread the hot-standby capacity across all spindles, allowing you to utilize all disks in the array. That's a specific, and rather clever, solution.) If we take our current volume group /dev/vgora1 , we are not currently using disk c4t10d0 . We could configure this as a spare volume. In the event that we were to lose c4t9d0 , all the mirrored volumes configured on c4t9d0 would be re-established on c4t10d0 : root@hpeos003[] pvchange -z y /dev/dsk/c4t10d0 Physical volume "/dev/dsk/c4t10d0" has been successfully changed. Volume Group configuration for /dev/vgora1 has been saved in /etc/lvmconf/vgora1.conf root@hpeos003[] root@hpeos003[] vgdisplay /dev/vgora1 --- Volume groups --- VG Name /dev/vgora1 VG Write Access read/write VG Status available Max LV 255 Cur LV 4 Open LV 4 Max PV 16 Cur PV 6 Act PV 6 Max PE per PV 17501 VGDA 12 PE Size (Mbytes) 4 Total PE 87495 Alloc PE 1700 Free PE 85795 Total PVG 2 Total Spare PVs 1 Total Spare PVs in use 0 root@hpeos003[] All we need to do is wait for c4t9d0 to fail 6.2.6 Conclusions on mirroring Mirroring is one of the most common high-availability tasks undertaken by LVM or any advanced disk management software. Without mirroring, we are extremely vulnerable to a loss of access to our data and, hence, our applications and consequently the ability of our organizations to function as normal (read: make money ). There are still other tasks I want to perform concerning mirroring, such as splitting and merging mirrors. I will wait until we talk about filesystems and especially VxFS snapshots. I know it might sound a bit weird to split up a discussion regarding mirroring with a discussion regarding filesystems, but this whole book is about getting the job done . I have tried to approach subjects on a non-theoretical basis. I am trying to convey my hands-on experiences to you from an on-the-job mindset. I hope that makes sense. |