The additional administrative features that we need to look at regarding VxFS include the following: -
Upgrading an older VxFS filesystem. -
Converting an exiting HFS filesystem to VxFS. -
Resizing a filesystem online. -
Defragmenting a filesystem online. -
Logging levels used by the intent log. -
Setting extent attributes for individual files. -
Tuning a VxFS filesystem. -
Additional mount options to affect IO performance. -
VxFS snapshots. The additional features will require us to purchase a license for the Online JFS product. If we purchase a Mission Critical HP-UX 11i Operating Environment, the license to use Online JFS is included. 8.6.1 Upgrading an older VxFS filesystem Some of the features we have looked at (ACL being a case in point) are supported only with the most recent version of the VxFS filesystem layout. If we have upgraded an older operating system, we may have filesystems using an older version. To upgrade to the most recent layout version of VxFS, we simply use the vxupgrade command. This is performed on a mounted filesystem and hence requires a license for the OnlineJFS product. root@hpeos003[] vxupgrade /applicX /applicX: vxfs file system version 3 layout root@hpeos003[] root@hpeos003[] vxupgrade -n 4 /applicX root@hpeos003[] root@hpeos003[] vxupgrade /applicX /applicX: vxfs file system version 4 layout root@hpeos003[] 8.6.2 Converting an exiting HFS filesystem to VxFS This process allows us to take an existing HFS filesystem and convert it to VxFS. The reasons for doing this are numerous . We have seen in the previous section some of the performance benefits we can use to improve the IO performance of the HFS filesystem. To some people, this is nothing compared to the benefits that VxFS can offer in terms of High Availability and Performance. In order to convert an HFS filesystem to VxFS, approximately 10-15 percent of free space must be available in the filesystem: root@hpeos003[] fstyp /dev/vg00/library hfs root@hpeos003[] bdf /library Filesystem kbytes used avail %used Mounted on /dev/vg00/library 103637 26001 67272 28% /library root@hpeos003[] If we think about it, this is not unreasonable, because the vxfsconvert command needs space in the filesystem in order to rewrite all the underlying HFS filesystem structures in VxFS format. This includes structures such as ACLs. root@hpeos003[] lsacl -l /library/data.finance /library/data.finance: rwx root.% rwx fred.% --x barney.% r-x %.sys r-x %.% root@hpeos003[] Before we use the vxfsconvert command, we should make a full backup of the filesystem. If the system should crash midway through the conversion, the mid-conversion filesystem is neither HFS nor VxFS and is completely unusable. When we are ready to start the conversion, we must make sure that the filesystem is umounted: root@hpeos003[] umount /library root@hpeos003[] /sbin/fs/vxfs/vxfsconvert /dev/vg00/library vxfs vxfsconvert: Do you wish to commit to conversion? (ynq) y vxfs vxfsconvert: CONVERSION WAS SUCCESSFUL root@hpeos003[] Before we mount the filesystem, we must run a full fsck . This will update all data structures in the new VxFS filesystem superblock. root@hpeos003[] fsck -F vxfs -y -o full,nolog /dev/vg00/rlibrary pass0 - checking structural files pass1 - checking inode sanity and blocks pass2 - checking directory linkage pass3 - checking reference counts pass4 - checking resource maps fileset 1 au 0 imap incorrect - fix (ynq)y fileset 999 au 0 imap incorrect - fix (ynq)y no CUT entry for fileset 1, fix? (ynq)y no CUT entry for fileset 999, fix? (ynq)y au 0 emap incorrect - fix? (ynq)y au 0 summary incorrect - fix? (ynq)y au 1 emap incorrect - fix? (ynq)y au 1 summary incorrect - fix? (ynq)y au 2 emap incorrect - fix? (ynq)y au 2 summary incorrect - fix? (ynq)y au 3 emap incorrect - fix? (ynq)y au 3 summary incorrect - fix? (ynq)y fileset 1 iau 0 summary incorrect - fix? (ynq)y fileset 999 iau 0 summary incorrect - fix? (ynq)y free block count incorrect 0 expected 79381 fix? (ynq)y free extent vector incorrect fix? (ynq)y OK to clear log? (ynq)y set state to CLEAN? (ynq)y root@hpeos003[] root@hpeos003[] fstyp -v /dev/vg00/library vxfs version: 4 f_bsize: 8192 f_frsize: 1024 f_blocks: 106496 f_bfree: 79381 f_bavail: 74420 f_files: 19876 f_ffree: 19844 f_favail: 19844 f_fsid: 1073741836 f_basetype: vxfs f_namemax: 254 f_magic: a501fcf5 f_featurebits: 0 f_flag: 0 f_fsindex: 7 f_size: 106496 root@hpeos003[] We can now attempt to mount and use the filesystem. root@hpeos003[] mount /dev/vg00/library /library root@hpeos003[] ll /library/ total 51976 -rwxrwxr-x+ 1 root sys 26590752 Nov 14 01:03 data.finance drwxr-xr-x 2 root root 12288 Nov 14 01:02 lost+found root@hpeos003[] getacl /library/data.finance # file: /library/data.finance # owner: root # group: sys user::rwx user:fred:rwx user:barney:--x group::r-x class:rwx other:r-x root@hpeos003[] The last job is to ensure that we update the /etc/fstab to reflect the new filesystem type. 8.6.3 Online resizing of a filesystem Online resizing includes increasing as well as decreasing filesystem sizes. With layout version 4, the possibility of reducing the size has increased dramatically with the way the filesystem uses Allocation Units. When trying to reduce the filesystem size, an attempt to move data blocks will be made if it realized that current data blocks would stop the resizing process from completing. Increasing the size of the filesystem is ultimately easier because we do not need to move data blocks; we are adding space instead of reducing it. The thing to remember is the order of performing tasks . If the filesystem in question was contained within a VxVM volume, we could have performed these tasks in one step using the vxresize command. This command increases/decreases the size of the volume and the filesystem if both are using Veritas products. If we have a license for the OnlineJFS product, this can be accomplished while the filesystem is mounted: root@hpeos003[] bdf /logdata Filesystem kbytes used avail %used Mounted on /dev/vx/dsk/ora1/logvol 31457280 393648 30578268 1% /logdata root@hpeos003[] /etc/vx/bin/vxresize -g ora1 logvol 2G root@hpeos003[] bdf /logdata Filesystem kbytes used avail %used Mounted on /dev/vx/dsk/ora1/logvol 2097152 391852 1678660 19% /logdata root@hpeos003[] vxprint -g ora1 logvol TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 v logvol fsgen ENABLED 2097152 - ACTIVE - - pl logvol-01 logvol ENABLED 2097216 - ACTIVE - - sd oralog01 logvol-01 ENABLED 699072 0 - - - sd oralog02 logvol-01 ENABLED 699072 0 - - - sd oralog03 logvol-01 ENABLED 699072 0 - - - root@hpeos003[] 8.6.4 Online de-fragmentation of a filesystem In the lifetime of a filesystem, many files can be created, deleted, increased in size, and decreased in size. While the filesystem will try to maintain the best allocation policy for blocks and extents, it is not unbelievable that over time blocks for files will become misaligned . When trying to access files ( especially sequentially), it is best if we can align consecutive filesystem blocks. This also applies to directories; it's best if directory entries are ordered such that searches are most efficient. Both of these situations can be rectified with the fsadm command. The “e option defragments extents, while the “d option reorders directory entries. Here's a summary of the tasks each can perform: The respective uppercase options ( -E and “D ) can be used to provide a report of the number of extents and directories that need to be defragmented. This process can produce a significant amount of IO in the filesystem. As a result, we should consider running this command during a quiescent time. A good time to run the command is before a system backup because it will produce a defragmented filesystem, improving IO for online use, as well as improving the IO performance for the impending backup. root@hpeos003[] fsadm -F vxfs -de -DE /logdata Directory Fragmentation Report Dirs Total Immed Immeds Dirs to Blocks to Searched Blocks Dirs to Add Reduce Reduce total 2 1 1 0 0 0 Directory Fragmentation Report Dirs Total Immed Immeds Dirs to Blocks to Searched Blocks Dirs to Add Reduce Reduce total 2 1 1 0 0 0 Extent Fragmentation Report Total Average Average Total Files File Blks # Extents Free Blks 10 9738 3 426325 blocks used for indirects: 2 % Free blocks in extents smaller than 64 blks: 0.02 % Free blocks in extents smaller than 8 blks: 0.00 % blks allocated to extents 64 blks or larger: 99.97 Free Extents By Size 1: 5 2: 2 4: 1 8: 1 16: 2 32: 1 64: 2 128: 3 256: 1 512: 1 1024: 1 2048: 1 4096: 1 8192: 1 16384: 1 32768: 2 65536: 1 131072: 2 262144: 0 524288: 0 1048576: 0 2097152: 0 4194304: 0 8388608: 0 16777216: 0 33554432: 0 67108864: 0 134217728: 0 268435456: 0 536870912: 0 1073741824: 0 2147483648: 0 Extent Fragmentation Report Total Average Average Total Files File Blks # Extents Free Blks 10 9738 3 426325 blocks used for indirects: 2 % Free blocks in extents smaller than 64 blks: 0.02 % Free blocks in extents smaller than 8 blks: 0.00 % blks allocated to extents 64 blks or larger: 99.97 Free Extents By Size 1: 5 2: 2 4: 1 8: 1 16: 2 32: 1 64: 2 128: 3 256: 1 512: 1 1024: 1 2048: 1 4096: 1 8192: 1 16384: 1 32768: 2 65536: 1 131072: 2 262144: 0 524288: 0 1048576: 0 2097152: 0 4194304: 0 8388608: 0 16777216: 0 33554432: 0 67108864: 0 134217728: 0 268435456: 0 536870912: 0 1073741824: 0 2147483648: 0 root@hpeos003[] 8.6.5 Logging levels used by the intent log The Intent Log will record transactions that are in flight during updates to filesystem structures such as inodes and the superblock. Transactions are recorded in the Intent Log before the update occurs. Updates that are marked as COMPLETE have actually been written to disk. Should a transaction be incomplete at the time of a system crash, the fsck command will perform a log replay to ensure that the filesystem is up to date and complete. An issue we should deal with here is the size of the Intent Log. By default, the Intent Log is 1MB (1024 blocks) when the block size is 1KB. The filesystem block size will determine the size of the Intent Log. The default sizes have been seen to be sufficient in most situations. If the application using the filesystem makes updates to filesystem structures such as inodes, it may be worth considering increasing the size of the Intent Log. A filesystem that is NFS-exported will make significant use of filesystem structures. An NFS-exported filesystem will perform better if it has a bigger Intent Log than the default. If we have outstanding updates to make (in the Intent Log), subsequent updates may be blocked until a free-slot is available in the Intent Log. The maximum size of the Intent Log is 16MB. Changing the size of the Intent Log will require you to rebuild the filesystem, i.e., run mkfs/newfs . This is obviously destructive and should be undertaken only after you have performed a full backup of the filesystem. After we have decided on the size of our Intent Log, we should decide when the filesystem will record transactions in the Intent Log. In total, we have six options that affect the operation of the Intent Log via the mount command. The first four affect how much logging will occur via the Intent Log, while the last two options affect how/if data blocks are managed via the mount command. -
nolog : With this option, no attempt is made to maintain consistency in the filesystem. After a reboot, the filesystem will have to be recreated with the newfs command. As such, this option should be used only for filesystems used for completely transient information. -
tmplog : This option maintains a minimal level of consistency in the filesystem only to the extent that transactions are recorded but not necessarily written to disk until the filesystem is unmounted. At that time, the files with current transactions still in the filesystem will be updated. These updates may not reflect the true state of the filesystem. Like nolog , this option should be used only for filesystems that contain purely temporary information. In layout version 4, nolog is equivalent to tmplog . -
delaylog : This option behaves in a very similar fashion to how an HFS filesystem works in that updates to open files will be recorded in the buffer cache but flushed to disk only every few seconds when the sync daemon ( syncer ) runs. When a file is removed, renamed or closed, outstanding operations are guaranteed to be written to disk. -
log : This is the default option. Every update to the filesystem is written to the Intent Log before control is returned to the calling function. This maintains complete integrity in the filesystem while sacrificing performance. -
blkclear : This mode is used where data security is of paramount importance. Increased security is provided by clearing filesystem extents before they are allocated to a file. In this way, there is no way that old data can ever appear in a file inadvertently. This requires a synchronous write in order to zero the necessary filesystem blocks. The increased security is at the cost of performance. -
datainlognodatainlog : A VxFS filesystem has the ability to store synchronous inode and data transactions in the intent log ( datainlog ). This would require one less write to the filesystem. This can be bad for the integrity of the data. The premise of datainlog is that most disks perform bad block re- vectoring ; in other words, if we get a bad spot on the disk, the disk will re-vector a sector to somewhere else on the disk. With datainlog (and bad block re-vectoring), a synchronous write will store both the data and the inode update in the Intent Log. This requires one less IO. However, if the disk does fail, the application has been told that the data synchronous write was completed, when in fact it wasn't. This is dangerous and should be avoided in my opinion. The nodatainlog is the default for good reason! It would seem that most installations are happy to use the delaylog mount option. This is an acceptable compromise between integrity and performance. The appropriate option should be included in field 4 (the mount options field) of the /etc/fstab file. 8.6.6 Setting extent attributes for individual files As we have seen, the dynamic nature of files can lead to a less-than -optimal allocation of extents for large data files. To alleviate this we can reserve space for individual files before applications actually use them. By using a particular allocation policy we can construct a file using particular attributes. The allocation policy flags we use will determine how the file is constructed on disk during the initial reservation and the use of disk space in the future. To set allocation policies we use the setext command. A common task is to reserve aligned space using a fixed extent size for application files, e.g., a database files and large video capture files, before the application has actually stored any data. In this way, we can ensure that an optimal allocation policy is applied to allocating current extents now and in the future. The size of the reserve will be the initial amount of disk space allocated to the file. The alignment will ensure that we align extents on a fixed extent size boundary relative to the beginning of a device (prior to layout version 3, the alignment was based on allocation unit). With this, we can marry the fixed extent size to the size of the underlying disk configuration, e.g., stripe size in a stripe/RAID set (see Figure 8-5), ensuring that subsequent extents are aligned in the filesystem and sector-aligned within our stripe set. This, along with the size of our stripe size, has been geared toward performing IO in units that are compatible with our user application. Figure 8-5. Setext and allocation policies. Some extent attributes are persistent and stored in the inode (fields such as alignment, fixed extent size, and initial reservation). Other allocation attributes apply only to the current attempt to reserve space for the file. A common misconception is the use of the contiguous flag. This applies only to the current attempt to reserve space for the file in the filesystem. Any future extents need not be contiguous or even aligned with any disk/stripe boundaries. These flags I mention are applied using the “f <flag > option to the setext command. In fact, the only flags that are persistent and influence future extent allocation are the align and the noextend (no more space can be allocated for the file; it's stuck with what it currently has). If further extent allocation contravenes the allocation policy, changes to the files allocation will fail. For preexisting files or for pre-used filesystems, there is no guarantee that setext will work. In this example, we will try to reserve 1GB of aligned space for a database file /db/db.finance , and we will sector-align fixed extents over the 16KB stripes in our stripe set. First, we need to establish the filesystem block size: root@hpeos003[] bdf /db Filesystem kbytes used avail %used Mounted on /dev/vx/dsk/ora1/dbvol 10485760 3430 10154766 0% /db root@hpeos003[] echo "8192B; p S" fsdb -F vxfs /dev/vx/dsk/ora1/dbvol super-block at 00000004.0000 magic a501fcf5 version 4 ctime 1068808191 186592 (Fri Nov 14 11:09:51 2003 BST) log_version 9 logstart 0 logend 0 bsize 2048 size 5242880 dsize 5242880 ninode 0 nau 0 defiextsize 0 oilbsize 0 immedlen 96 ndaddr 10 aufirst 0 emap 0 imap 0 iextop 0 istart 0 bstart 0 femap 0 fimap 0 fiextop 0 fistart 0 fbstart 0 nindir 2048 aulen 32768 auimlen 0 auemlen 4 auilen 0 aupad 0 aublocks 32768 maxtier 15 inopb 8 inopau 0 ndiripau 0 iaddrlen 4 bshift 11 inoshift 3 bmask fffff800 boffmask 7ff checksum e55a9e6e free 5241165 ifree 0 efree 3 1 0 1 2 1 2 1 0 0 0 1 1 1 1 3 2 2 2 2 1 1 0 0 0 0 0 0 0 0 0 0 flags 0 mod 0 clean 3c time 1068808193 530000 (Fri Nov 14 11:09:53 2003 BST) oltext[0] 21 oltext[1] 1802 oltsize 1 iauimlen 1 iausize 4 dinosize 256 checksum2 827 checksum3 0 root@hpeos003[] The block size is 2KB. We can now use the setext command. setext uses a multiple of filesystem blocks as arguments to options. Here, we will attempt the extent allocation policy mentioned above. In order to achieve this, we will need to: -
Reserve 524288 filesystem blocks -
Make the extents 8 blocks in size = 16KB -
Ensure that the extents are aligned root@hpeos003[] touch /db/db.finance root@hpeos003[] root@hpeos003[] setext -r 524288 -e 8 -f align /db/db.finance root@hpeos003[] getext /db/db.finance /db/db.finance: Bsize 2048 Reserve 524288 Extent Size 8 align root@hpeos003[] root@hpeos003[] ll /db/db.finance -rw-rw-r-- 1 root sys 0 Nov 14 11:14 /db/db.finance root@hpeos003[] As you can see, we have set the extent allocation policy for this file, even though there appears to be no disk space allocated to the file. In fact, VxFS has allocated all the extents necessary to accommodate the current allocation policy. We can confirm this by using fsdb on the inode. root@hpeos003[] ll -i /db/db.finance 4 -rw-rw-r-- 1 root sys Nov 14 11:23 /db/db.finance root@hpeos003[] root@hpeos003[] echo "4i" fsdb -F vxfs /dev/vx/dsk/ora1/dbvol inode structure at 0x0000069f.0400 type IFREG mode 100664 nlink 1 uid 0 gid 3 size 0 atime 1068820230 0 (Fri Nov 14 14:30:30 2003 BST) mtime 1068820230 0 (Fri Nov 14 14:30:30 2003 BST) ctime 1068820257 730015 (Fri Nov 14 14:30:57 2003 BST) aflags 4 orgtype 1 eopflags 0 eopdata 0 fixextsize/ fsindex 8 rdev/ reserve /dotdot/matchino 524288 blocks 524288 gen 11 version 0 917 iattrino 0 de: 524288 0 0 0 0 0 0 0 0 0 des: 524288 0 0 0 0 0 0 0 0 0 ie: 0 0 ies: 0 root@hpeos003[] It should come as no surprise that the filesystem has allocated us one huge extent. Some people would say that this is contiguous allocation. It is, but only because the underlying allocation policy of the filesystem wants to make life as easy as possible. It is a consequence of having an empty filesystem that we have coincidentally been allocated a contiguous extent. The fact that the filesystem has given us one huge extent means that addressing this amount of data requires reading only one direct extent address from the inode. Even though we specified a fixed extent size, this only means that we will be allocated a multiple of 16KB when we need additional space in the file. When this one huge extent is transposed into disk allocation, we have ensured, with a fixed extent size, that multiples of our stripe size are always allocated from the filesystem. With our db.finance file, we now have space allocated from the filesystem. The actual size of the data within the file will be determined when the database is populated with data. Some administrators find it weird to have space allocated for a file that isn't displayed by an ls “l command. If we wanted to make life easy for ourselves , we could have used the “f chgsize option to the initial setext command to change the size of the file to match the current reservation. If we want to apply this idea to our existing file, we would have to delete it and reallocate the extents. root@hpeos003[] rm /db/db.finance root@hpeos003[] touch /db/db.finance root@hpeos003[] setext -r 524288 -e 8 -f align -f chgsize /db/db.finance root@hpeos003[] ll -i /db total 4558980 4 -rw-rw-r-- 1 root sys 1073741824 Nov 14 14:42 db.finance 3 drwxr-xr-x 2 root root 96 Nov 14 11:09 lost+found root@hpeos003[] echo "4i" fsdb -F vxfs /dev/vx/dsk/ora1/dbvol inode structure at 0x0000069f.0400 type IFREG mode 100664 nlink 1 uid 0 gid 3 size 1073741824 atime 1068820961 0 (Fri Nov 14 14:42:41 2003 BST) mtime 1068820961 0 (Fri Nov 14 14:42:41 2003 BST) ctime 1068820975 60022 (Fri Nov 14 14:42:55 2003 BST) aflags 4 orgtype 1 eopflags 0 eopdata 0 fixextsize/fsindex 8 rdev/reserve/dotdot/matchino 524288 blocks 524288 gen 12 version 0 923 iattrino 0 de: 524288 0 0 0 0 0 0 0 0 0 des: 524288 0 0 0 0 0 0 0 0 0 ie: 0 0 ies: 0 root@hpeos003[] Remember to always use setext on newly created files in an empty filesystem, in order to obtain the optimal allocation policy. |