Disk-Resident Data Structures

   

We start our examination of LVM data structures with the layout of the physical volumes. Figure 11-4 shows us an overview of the LVM metadata structures on a physical volume. Bootable LVM disks are created with the pvcreate B option and have a logical interchange format (LIF) file system header located in the first 8 KB of the disk.

The LIF header is actually an ancient file system type used by the HP-UX boot loader. In the case of a LVM disk, it is a simple directory structure containing pointers to boot files stored in the boot disk reserved area (BDRA) on bootable disks.

Following the boot block, we see the physical disk reserved area (PDRA). This structure contains the LVM record, which stores information about this specific physical volume and offset pointers for each of the LVM structures on the disk. In effect, you may think of the LVM record as a type of LVM superblock for the disk. In addition, there is a bad block directory with relocation information for blocks identified by LVM as needing to be replaced. As we mentioned there is an area for the BDRA if this is a bootable disk. Also note the duplicates of each structure.

Next is the volume group reserved area (VGRA), which in turn contains the volume group descriptor area (VGDA), the volume group status area (VGSA), and the mirror consistency record (MCR). The VGDA structure is critical to LVM's ability to map logical to physical extents. The majority of the extent-mapping information is held in the VGDA along with information about the volume group the disk belongs to. The VGSA contains stale extent and missing physical volume information. The MCR provides space for the mirror write cache (MWC) consistency data stored on the drive. As with the PVRA, there are duplicates of each structure. Figure 11-5 illustrates the PVRA and VGRA in greater detail.

Figure 11-5. PVRA and VGRA Components

graphics/11fig05.gif


After the VGRA is the bulk of the disk's space, the area mapped as physical extents. Only whole extents may be allocated within LVM, so a partial extent will remain unused. The worst case for this would be a wasted space slightly smaller than the basic extent size. This is one consideration when deciding on the extent size: the larger the extent size, the larger the possible waste may be. Considering that the size of modern disk drives is in gigabytes, the potential to waste less than 4, or even 8 or 16 mega-bytes, doesn't seem to be much of a concern.

Bringing up the end is an area for bad block relocation, known as the bad block pool, and a small optional structure for cluster-locking information if the volume group is to be used in conjunction with an HP Service Guard cluster environment. Bad block relocation under LVM control may be disabled during configuration. Considering that most modern disk drive controllers handle the relocation of bad blocks, many administrators choose to disable bad block checking within LVM.

The PVRA and VGRA

The kernel defines the location of several key structures on an LVM physical volume (see Table 11-1). Disk drives are block-oriented devices. Access to their data must be performed a sector at a time. The basic size is either 1 KB per sector or 2 KB per sector on newer drives. The primary LVM record is located at sector 8. The primary bad block directory is at sector 9 for 1 KB/sector drives and at sector 10 for 2 KB/sector drives. The secondary LVM record starts at sector 72. The secondary bad block directory is at sector 73 for 1 KB/sector drives and at sector 74 for 2 KB/sector drives. Overall size of the PVRA is set to 128 sectors, and the bad block directory is set to 55 sectors.

Table 11-1. Kernel Parameters for LVM Disk-Based Structures

Kernel

# define

Sector #

Primary LVM record

PVRA_LVM_REC_SN1

8

Primary bad block directory

PVRA_BBDIR_SN1

9 (or 10)

Secondary LVM record

PVRA_LVM_REC_SN2

72

Secondary bad block directory

PVRA_BBDIR_SN2

73 (or 74)

Primary boot data record

BDRA_BDR_SN1

128

Secondary boot data record

BDRA_BDR_SN2

136

Overall size of the BDRA

BDRA_SIZE

16

Length of the boot disk record

BDRA_BDR_LENGTH

2

Length of the physical volume list

BDRA_PVL_LENGTH

6


Now let's examine the disk-resident structures in greater detail (Listings 11.1 and 11.2). We use our friend q4 to examine the fields of the various structures. These listings have been annotated, and in some cases redundant fields have been truncated. The PVRA begins with the lv_lvmrec structure.

Listing 11.1. q4> fields struct lv_lvmrec
 The first element of this structure is the structures magic ID and is set to LVMREC01. It is followed by the double-word physical volume and volume group unique ID numbers   0 0 8 0 char[8] lvm_id   8 0 4 0 u_int   pv_id.id1  12 0 4 0 u_int   pv_id.id2  16 0 4 0 u_int   vg_id.id1  20 0 4 0 u_int   vg_id.id2 Next are the pointers and lengths of the other structures on  this disk  24 0 4 0 u_int   last_psn  28 0 4 0 u_int   pv_num  32 0 4 0 u_int   vgra_len  36 0 4 0 u_int   vgra_psn  40 0 4 0 u_int   vgda_len  44 0 4 0 u_int   vgsa_len  48 0 4 0 u_int   vgda_psn1  52 0 4 0 u_int   vgda_psn2  56 0 4 0 u_int   mcr_len  60 0 4 0 u_int   mcr_psn1  64 0 4 0 u_int   mcr_psn2  68 0 4 0 u_int   data_len  72 0 4 0 u_int   data_psn We also see the physical extent size configured for this  volume group and additional structure pointers  76 0 4 0 u_int   pxsize  80 0 4 0 u_int   pxspace  84 0 4 0 u_int   altpool_len  88 0 4 0 u_int   altpool_psn  92 0 4 0 u_int   maxdefects  96 0 4 0 u_int   io_timeout 100 0 4 0 u_int   bdra_len 104 0 4 0 u_int   bdra_psn 108 0 4 0 u_int   bdr_len 112 0 4 0 u_int   bdr_psn1 116 0 4 0 u_int   bdr_psn2 120 0 4 0 u_int   pvl_len 124 0 4 0 u_int   pvl_psn1 128 0 4 0 u_int   pvl_psn2 132 0 4 0 u_int   cl_lock_flags 136 0 4 0 u_int   cl_lock_psn 140 0 4 0 u_int   cluster_id 144 0 4 0 int     conf_act_mode 148 0 4 0 u_int   orig_pv.pv_id.id1 152 0 4 0 u_int   orig_pv.pv_id.id2 156 0 2 0 u_short orig_pv.pv_pxcount 158 0 2 0 u_short orig_pv.pv_pxalloc 160 0 1 0 u_char  orig_pv.pv_num Following the bad block directory information is the BDRA for bootable disks. 

Listing 11.2. q4> fields struct lv_bootdata
 Again we begin with a magic ID, HPLVMBDR, a timestamp, and a  version number    0 0 8 0 char[8] bd_magic    8 0 4 0 u_int   bd_timestamp   12 0 2 0 short   bd_version Next the root volume group's boot, dump, and swap volumes are  identified (note that some of these fields are not currently  being used but are in place for future enhancements)   14 0 2 0 short   bd_numrootPVs   16 0 2 0 short   bd_numswapPVs   18 0 2 0 short   bd_numdumpPVs   20 0 4 0 u_int   bd_rootVGID.id1   24 0 4 0 u_int   bd_rootVGID.id2   28 0 4 0 u_int   bd_swapVGID.id1   32 0 4 0 u_int   bd_swapVGID.id2   36 0 4 0 u_int   bd_dumpVGID.id1   40 0 4 0 u_int   bd_dumpVGID.id2   44 0 4 0 int     bd_rootvg   48 0 4 0 int     bd_swapvg   52 0 4 0 int     bd_dumpvg   56 0 4 0 int     bd_rootlv[0] --------------------------------  180 0 4 0 int     bd_rootlv[31]  184 0 4 0 int     bd_swaplv[0] --------------------------------  308 0 4 0 int     bd_swaplv[31]  312 0 4 0 int     bd_dumplv[0] --------------------------------]  436 0 4 0 int     bd_dumplv[31]  440 0 4 0 int     bd_rootPVs  444 0 4 0 int     bd_swapPVs  448 0 4 0 int     bd_dumpPVs  452 0 4 0 u_int   bd_rootPVsize  456 0 4 0 u_int   bd_swapPVsize  460 0 4 0 u_int   bd_dumpPVsize  464 0 4 0 int     bd_rootPVcksum  468 0 4 0 int     bd_swapPVcksum  472 0 4 0 int     bd_dumpPVcksum  476 0 2 0 short   bd_boot[0] -----------------------------------  482 0 2 0 short   bd_boot[3]  484 0 2 0 short   bd_rootdisks[0] -----------------------------------  546 0 2 0 short   bd_rootdisks[31]  548 0 2 0 short   bd_swapdisks[0] -----------------------------------  610 0 2 0 short   bd_swapdisks[31]  612 0 2 0 short   bd_dumpdisks[0] -----------------------------------  674 0 2 0 short   bd_dumpdisks[31]  676 0 2 0 short   bd_rootmaint[0] -----------------------------------  738 0 2 0 short   bd_rootmaint[31]  740 0 2 0 short   bd_swapmaint[0] -----------------------------------  802 0 2 0 short   bd_swapmaint[31]  804 0 2 0 short   bd_dumpmaint[0] -----------------------------------  866 0 2 0 short   bd_dumpmaint[31]  868 0 4 0 int     bd_flags  872 0 4 0 int     bd_reserved[0] ----------------------------------- 2040 0 4 0 int     bd_reserved[292] 2044 0 4 0 int     bd_checksum 

Let's switch our focus to the VGRA and its components. The first part of the VGRA is the VGDA, which includes four main structures: VG_header, lvol[], pvol[], and VG_trailer. The configurable maximum number of logical volumes and physical volumes per volume group are used to size the lvol[] and pvol[] arrays respectively. The volume group tunables max_lv and max_pv may be set during the vgcreate command. Let's take a look at Listings 11.3 and 11.4:

Listing 11.3. q4> fields struct VG_header
 First are the timestamps and identifier      0 0 4 0 int     vg_timestamp.tv_sec      4 0 4 0 int     vg_timestamp.tv_usec      8 0 4 0 u_int   vg_id.id1     12 0 4 0 u_int   vg_id.id2 Next is the maximum number of logical volumes and physical  volumes for the volume group     16 0 2 0 u_short maxlvs     18 0 2 0 u_short numpvs The maximum number of physical extents for the volume group  and its status flag     20 0 2 0 u_short maxpxs     22 0 2 0 u_short flags     24 0 4 0 u_int   reserved2     28 0 4 0 u_int   reserved3 

Listing 11.4. q4> fields struct LV_entry
 We start with the maximum size for the logical volume and its state flags    0 0 2 0 u_short maxlxs    2 0 2 0 u_short lv_flags 

LVM_LVDEFINED

Logical volume entry defined

LVM_DISABLED

lvol unavailable

LVM_RDONLY

lvol read only

LVM_NORELOC

bad blocks not relocated

LVM_VERIFY

all writes to be verified

LVM_STRICT

allocate mirror on distinct pvols

LVM_NOMWC

no mirror consistency checks for this lvol

LVM_PVG_STRICT

allocate mirrors from distinct PVG's

LVM_CONSISTENCY

mirror consistency recovery required

LVM_CLEAN

lvol has no pending writes

LVM_CONTIGUOUS

allocate contiguous physical extents for this lvol


 Next is the configured scheduling strategy (sequential or  parallel)    4 0 1 0 u_char  sched_strat The number of mirrors, number of stripes and the stripe size are recorded in the next three parameters    5 0 1 0 u_char  maxmirrors    6 0 2 0 u_short stripes    8 0 2 0 u_short stripe_size   10 0 2 0 u_short reserved2 The timeout is the number of seconds allowed before a  scheduled LV I/O fails   12 0 4 0 u_int   lv_io_timeout 

Each pvol[] entry consists of a PV_header and a PX_entry[]. The PX_entry[] array is sized in accordance with the maximum number of extents allowed per physical volume (max_pe is settable during the vgcreate command). This array contains the final word when it comes to which logical extent is mapped to which physical extent. The index into the PX_entry[] array represents the physical extent number; the array data contains the logical volume and logical extent IDs to which it is mapped. See Listings 11.5, 11.6, and 11.7.

Listing 11.5. q4> fields struct PV_header
 The physical volume identifier, an extent count, and the pvol  flags    0 0 4 0 u_int   pv_id.id1    4 0 4 0 u_int   pv_id.id2    8 0 2 0 u_short px_count   10 0 2 0 u_short pv_flags 

LVM_PVDEFINED

this entry is used

LVM_PVNOALLOC

no extent allocation is allowed

LVM_NOVGDA

pvol contains a VGDA

LVM_PVRORELOC

no new defects relocated

LVM_PVMISSING

pvol is missing

LVM_NOATTACHED

pvol not attached

LVM_PVPOWERFAIL

pvol is power-failing

LVM_PVNEEDSYNC

pvol needs re-sync

LVM_PVALTLINK

pvol not the primary link

LVM_PVINUSE

pvol is being configured

LVM_PVCFGRSTORD

pvol had config data restored

LVM_PVSWITCHLINK

pvol path requires switch

LVM_PVMOSWBACK

don't switch links back

LVM_PVSPARD

pvol is a spare

LVM_PVDATA_SPARED

pvol failed, data has been spared


 The number of entries in the pvol extent map   12 0 2 0 u_short pv_msize The maximum number of defects that may be relocated   14 0 2 0 u_short pv_defectlim 

Listing 11.6. q4> fields struct PX_entry
 The physical extent table entries map to a logical volume and  a logical extent number    0 0 2 0 u_short lv_index    2 0 2 0 u_short lx_num 

Listing 11.7. q4> fields struct VG_trailer
 The trailer structure is a finishing thought to the VGDA  0 0 4 0 int     vg_timestamp.tv_sec  4 0 4 0 int     vg_timestamp.tv_usec  8 0 4 0 u_int   reserved1 12 0 4 0 u_int   reserved2 16 0 4 0 u_int   reserved3 20 0 4 0 u_int   vgda_cksum 24 0 8 0 char[8] vgda_magic 

You may be wondering why there are duplicate copies of so many of the disk-resident data structures. When a disk-based structure is updated by the LVM pseudo-driver, only one of the copies is written on each physical volume (except in the case of a volume group with a single physical volume, where both copies are updated). When the data needs to be read by the LVM driver, it chooses the one with the newest copy. As an additional sanity check, the timestamps in the structure header and trailer are compared. If they match, we can assume that the last write to the disk was successful; if they don't match, we try another copy. Remember that all the disks in the volume group contain the same extent-mapping information for redundancy.

Following the VGDA is the VGSA consisting of the SA_header and a trailer (Listing 11.8).

Listing 11.8. q4> fields struct SA_header
 The first two structures hold timestamps    0 0 4 0 int     sa_h_timestamp.tv_sec    4 0 4 0 int     sa_h_timestamp.tv_usec Next is the maximum number of physical extents per physical volume and the maximum number of physical volumes for the volume group    8 0 2 0 u_short sa_maxpxs   10 0 2 0 u_short sa_maxpvs   12 0 4 0 u_int   reserved1 The final component in the structure is a checksum   16 0 4 0 u_int   sa_checksum The corresponding trailer is 16 bytes in length and consists of the VGSA magic number ("VGSA0001"} and a timestamp 

The MCR completes the primary structures in the VGRA and contains the disk copies of data from the MWC. The mwc_entry structure (Listing 11.9) contains the disk copies of mirror consistency cache information. We discuss the way these records are used later in this chapter.

Listing 11.9. q4> fields struct mwc_entry
 Timestamps surround 126 sets of logical volume number, the  track group shift, and the logical track group number    0 0    4 0 int        b_tmstamp.tv_sec    4 0    4 0 int        b_tmstamp.tv_usec    8 0    2 0 u_short    ca_p1[0].lv_number   10 0    2 0 u_short    ca_p1[0].ltgshift   12 0    4 0 u_int      ca_p1[0].lv_ltg --------------------------------------------- 1008 0    2 0 u_short    ca_p1[125].lv_number 1010 0    2 0 u_short    ca_p1[125].ltgshift 1012 0    4 0 u_int      ca_p1[125].lv_ltg 1016 0    4 0 int        e_tmstamp.tv_sec 1020 0    4 0 int        e_tmstamp.tv_usec Finally we have a 1024-character pad 1024 0 1024 0 char[1024] pad 



HP-UX 11i Internals
HP-UX 11i Internals
ISBN: 0130328618
EAN: 2147483647
Year: 2006
Pages: 167

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net