Another way to avoid the problem of losing a disk is to add spare volumes into the disk group . If a disk fails, the vxrelocd (started at boot time) will try to relocate subdisks for redundant volumes, utilizing spare space in the disk group. If you have striped/mirrored/RAID 5 volumes, this may not be possible if you want to maintain the integrity of the layout policy you have adopted. In such situations, you will simply have to accept the drop in redundancy until you can have the disk replaced . This is where Spare Disks come to the fore. They work under the same principle as a Spare PV in LVM whereby they will be used only in the event of a disk failure (although you can override this in VxVM by explicitly stating the disk name on a vxmake / vxassist command line). The process is not too complicated and can improve your chances of sustaining multiple disk failures: -
Initialize a free disk. root@hpeos003[] vxdisk init c4t14d0 nlog=2 nconfig=2 root@hpeos003[] -
Add the disk to the disk group. root@hpeos003[] vxdg -g ora1 adddisk ora_spare=c4t14d0 root@hpeos003[] vxprint -ht -g ora1 ora_spare DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE dm ora_spare c4t14d0 simple 1024 71682048 - root@hpeos003[] -
Mark the disk as a spare disk. root@hpeos003[] vxedit -g ora1 set spare=on ora_spare root@hpeos003[] root@hpeos003[] vxprint -g ora1 TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 dg ora1 ora1 - - - - - - dm ora_disk1 c0t4d0 - 71682048 - - - - dm ora_disk2 c0t5d0 - 71682048 - - - - dm ora_disk3 c4t12d0 - 71682048 - - - - dm ora_disk4 c4t13d0 - 71682048 - - - - dm ora_spare c4t14d0 - 71682048 - SPARE - - v archive RAID 5 ENABLED 4194304 - ACTIVE - - pl archive-01 archive ENABLED 4194304 - ACTIVE - - sd ora_disk3-06 archive-01 ENABLED 2097152 0 - - - sd ora_disk2-02 archive-01 ENABLED 2097152 0 - - - sd ora_disk4-04 archive-01 ENABLED 2097152 0 - - - pl archive-02 archive ENABLED 1440 - LOG - - sd ora_disk1-04 archive-02 ENABLED 1440 0 - - - ... root@hpeos003[] -
In this instance, I have pulled ora_disk3 from its cabinet. This will be seen as a disk failure. I received this error in syslog : Nov 11 16:50:54 hpeos003 vmunix: NOTICE: vxvm:vxdmp: disabled path 31/0x4c000 belonging to the dmpnode 0/0xc Nov 11 16:50:54 hpeos003 vmunix: Nov 11 16:50:54 hpeos003 vmunix: NOTICE: vxvm:vxdmp: disabled dmpnode 0/0xc Nov 11 16:50:54 hpeos003 vmunix: WARNING: vxvm:vxio: Subdisk ora_disk3-01 block 0: Uncorrectable read error I also received this email from the vxrelocd daemon: root@hpeos003[] mail From root@hpeos003 Tue Nov 11 16:51:14 GMT 2003 Received: (from root@localhost) by hpeos003 (8.11.1 (Revision 1.5) /8.9.3) id hABGpEE13268 for root; Tue, 11 Nov 2003 16:51:14 GMT Date: Tue, 11 Nov 2003 16:51:14 GMT From: root@hpeos003 Message-Id: <200311111651.hABGpEE13268@hpeos003 > To: root@hpeos003 Subject: Attempting VxVM relocation on host hpeos003 Mime-Version: 1.0 Content-Type: text/plain; charset=X-roman8 Content-Transfer-Encoding: 7bit Volume data2 Subdisk ora_disk3-02 relocated to ora_spare-02, but not yet recovered. ? The length of time this process takes to complete will depend on the number of subdisks that need to be relocated. When complete, root will receive another email from vxrelocd in this form: root@hpeos003[] mail From root@hpeos003 Tue Nov 11 16:57:45 GMT 2003 Received: (from root@localhost) by hpeos003 (8.11.1 (Revision 1.5) /8.9.3) id hABGviV13409 for root; Tue, 11 Nov 2003 16:57:44 GMT Date: Tue, 11 Nov 2003 16:57:44 GMT From: root@hpeos003 Message-Id: <200311111657.hABGviV13409@hpeos003 > To: root@hpeos003 Subject: Attempting VxVM relocation on host hpeos003 Mime-Version: 1.0 Content-Type: text/plain; charset=X-roman8 Content-Transfer-Encoding: 7bit Status: RO Recovery complete for volume data2 in disk group ora1. ? In the meantime, I can organize a replacement disk to replace the failed disk. As we can see from vxprint , the subdisks that have been relocated are now housed on the Spare disk: root@hpeos003[] vxprint -g ora1 data2 TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 v data2 fsgen ENABLED 4194304 - ACTIVE - - pl data2-01 data2 ENABLED 4194304 - ACTIVE - - sd ora_disk1-03 data2-01 ENABLED 2097152 0 - - - sd ora_spare-02 data2-01 ENABLED 2097152 0 - - - pl data2-02 data2 ENABLED 4194304 - ACTIVE - - sd ora_disk2-03 data2-02 ENABLED 2097152 0 - - - sd ora_disk4-03 data2-02 ENABLED 2097152 0 - - - pl data2-03 data2 ENABLED LOGONLY - ACTIVE - - sd ora_disk1-02 data2-03 ENABLED 66 LOG - - - root@hpeos003[] Once we have replaced the disk, we can choose to un-relocate the subdisks if we choose. This is entirely up to the administrator, but it's a good idea in that if we had a design with specific High Availability and/or Performance attributes, it will return our configuration to its original state. This process is going to be IO intensive because we move subdisks back to their original location. We may decide to wait until a quiet, off-line time to run the vxunreloc command. root@hpeos003[] /etc/vx/bin/vxunreloc -g ora1 ora_disk3 & root@hpeos003[] Finally, we should take care in choosing which disks are to be selected as spare disks. If they are located on an interface with other high-activity disks, it may be that overall IO performance is significantly degraded during a hot relocation. However, where interfaces are scarce , most administrators will deem the benefits to High Availability in light of the fact that we hope we will never need to use the Spare Disk. |