19.3 Common Files

Cluster-wide files are a distinguishing feature of the TruCluster Server product. Having common administration files is one of those good news, bad news situations, however. Sit back and consider whether you would like to have a cluster common /etc/rc.config file for example. You can quickly see that there are configuration items (such as HOSTNAME) that should not be shared among members. Yet the sharing of files such as /etc/passwd provides the good news of having the cluster appear to the users as a single system.

19.3.1 The passwd and group Files

Adding a new user account is an example of a typical system administration activity that is made easier (or at least no harder) by the existence of cluster common files. The /etc/passwd file and the /etc/group file are both shared by all cluster members. Therefore any information added to these files through "sysman accounts", adduser, or dxaccounts (8) is immediately accessible to all cluster members. So rather than having to repeat the operation on each cluster member, do it once and you're done. Any activities served by a database file in a common directory will function similarly.

19.3.2 Mounting File Systems

If you create and mount a file system, all cluster members immediately see it. Once again, if you think of the cluster as a single system, it makes sense that all members should see essentially the same files except for some configuration and support files. Needless to say, the /etc/fstab file is shared by all cluster members.

19.3.3 Swap Space

What about swap space? That's not allocated from a file system so it can't be handled by CFS activities. Swap space is allocated from one or more raw partitions. Since we are being coached to think of the cluster as a single system, does that mean that all systems are served by a single swap space? The answer is that the swap partition is indicated by the swapdevice entry in the vm subsystem portion of sysconfigtab (4). The /etc/sysconfigtab file is a CDSL and thus is member-specific. So each member will have its own swap space. Note that if you are in the habit of reflecting your swap space in /etc/fstab, change your habit. It is no longer meaningful in that file. Also be aware that if you use the swapon (8) command to add more swap space on the fly, the addition will be member-specific.

If you are concerned with performance, HP recommends that the swap partition be on a disk local to the member that uses it. This offloads some of the traffic on the shared bus where your swap partition would normally be. We suggest that you think this over carefully. Everybody wants to squeeze the last possible drop of performance from his or her system. But in this case the performance gain comes at the cost of your reliability. If your non-shared bus adapter fails, your system can no longer access swap space and will hang or crash. If the swap partition is on a shared bus, and one adapter fails, access should still be available. Furthermore, the typical Tru64 UNIX system in the 21^st century is usually chock full of memory. If this describes your system, you probably don't get into paging out or swapping activities anyway, so why not keep the swap space on the shared bus in that case? Note that either way the swap space is still member-specific.

Another reason to keep your swap space on a shared bus is that it may be useful after a member crashes and cannot come back up for some reason. In order to find out anything about the nasty event that caused the member's seemingly permanent demise, we'll need access to the crash dump. But the crash dump is written to the swap partition until the system comes back up, at which point it is copied (by the savecore (8) program) into /var/adm/crash (a CDSL and therefore member-specific). So we are between a rock and a hard place in the case where the swap partition is on a non-shared bus. But if the swap partition is on a shared bus, another cluster member can run savecore (and crashdc (8) also to create the crash-data.n file) and either examine the crash in-house or ftp it to the HP Customer Support Center.

19.3.4 Command Directories

Speaking of commands, isn't the code for ps (and all other commands) in a file system somewhere on disk? As you may have guessed, they are comfortably ensconced in some famous directories such as /usr/bin, /sbin, and others. Are these directories and files duplicated on each system in a cluster? Ask yourself if we need multiple copies of the code for ps (or any other command). No matter where we are in the cluster, the command code will be the same, so why waste the disk space making copies of common code? This section briefly discusses CFS and CDSLs, which are thoroughly covered in Chapters 13 and 6' respectively.

As you know, the Cluster File System (CFS) handles common code very nicely since it provides a single view of all storage to all cluster members. Even file systems on local (non-shared) disks can be seen by each cluster member. Incidentally, a feature of CFS is that a software product installation is usually necessary only once per cluster, instead of once per cluster member.

You may be thinking that you would prefer to have multiple copies of the commands (and other files) in case one of the disks goes bad or the system goes down. Several RAID options are available to help with your concerns, including CLSM (discussed in Chapter 14). If you are convinced that it makes sense for the commands to exist in one place on the cluster, can't we apply the same thinking to some of the system's configuration files as well?

19.3.5 Device and Kernel Files

But what about /vmunix? What about the /dev directory? What about /etc/sysconfigtab and all of the boot sequence files? When we boot a cluster member, we'll still be issuing the boot command from the system console, won't we? If so, then the console will look to the boot device and start working from there to bring up the system. If this device is not accessible because the device is not local, and this member is the first member being booted, it will be impossible for the member to finish booting until the cluster-wide root file system has been mounted. Note that device directories are discussed in Chapters 7, 12, and 15, and the boot sequence is examined more closely in Chapter 17.

So how does the cluster provide a single cluster-wide view of the directory hierarchy from root on down while still providing each system access to a member-specific version of the vmunix file (and others)? The answer is Context Dependent Symbolic Links (CDSLs). The vmunix file is a CDSL in a cluster that points to /.local../boot_partition/vmunix and thus is a member-specific file.

19.3.6 CDSLs

Remember learning about symbolic links when you got into UNIX? They are usually thought of as pretty impressive stuff. We remember using a symbolic link to free up space on a packed disk by simply moving a bunch of files to a less packed disk. We then created a symbolic link from the original directory over to the location of the files on a different disk so as not to foul up the software that was expecting the files to be in their original location. CDSLs take that capability to another level. They provide a mechanism where a reference to a symbolic link (context dependent) will be translated into a location in a member-specific directory (CDSLs were covered in Chapters 6 and 12). Keep in mind that CDSLs are not the only mechanism used by the cluster to create member-specific files. Sometimes the file name itself indicates which member owns and uses the file (for example /etc/gated.conf.member*). The Virtual File System software (part of the Tru64 UNIX kernel) will insert the string "member" with the member's own cluster memberid into the "{memb}" location providing completely transparent access to files that must contain member-specific information.

19.3.7 System Startup

CDSLs play a prominent role during system startup. This topic is covered more fully in Chapter 17, but we visit it here to remind you of its importance. As the system comes up, it starts the init process (formerly PID #1, now some larger PID but with the rightmost 19 bits containing a 1. See the cvtpid script in section 19.2.4). The init process runs the init daemon, which reads the contents of the /etc/inittab file to get its marching orders.

19.3.7.1 The inittab File

The inittab file will contain entries to start getty (8) processes, if necessary, and to execute the rc0 (8), rc2 (8), and rc3 (8) scripts (among other things). Since each system may want to start up in a different manner, the /etc/inittab file is a CDSL. It is very likely, however, that the member-specific inittab files will be pretty much the same. How much of the init part of system startup will vary from system to system? If a particular member has a unique local device (non-storage oriented), then certainly its processes, daemons, or other supporting software need only to be started on that particular member. If software needs to be forced to run on only one member without failover, then use a restricted CAA placement policy. If software needs to run on only one member at a time, with failover, then use CAA. Basically, unless the software can run on every member it should not be in rc?.d, unless you make the decision to start based on a variable in rc.config.

Furthermore, the /sbin/rc0, rc2, and rc3 scripts are all cluster common, as is the /sbin/init program itself (why would we need more than one copy of the init program?) The system administration implication is that if an administrator were to put something into the rc3 script that starts a particular piece of software, the software would ultimately be started on all members since the rc3.d and rc3 files are cluster common.

It is important to note that if you have an application that installs its startup information in init.d/rc?.d, and that application cannot run more than one copy at a time, then the link from the rc?.d directory should be removed and the application should be managed by CAA. See Chapters 23 and 24 for more information on CAA.

19.3.7.2 The rc.config Scripts

While executing the startup scripts, the rc.config script will be run. Most people are not aware that rc.config is a script. The common misconception is that it is a file with a series of attributes that apply to various startup activities. It is actually a Bourne shell script that creates a series of exported variables. The exported variables stay in existence beyond the script within which they are created. Therefore the variables are available for subsequent scripts such as those run by the rc scripts.

 # fln /etc/rc.config /etc/rc.config -> ../cluster/members/{memb}/etc/rc.config

Note: the fln Korn shell function was defined in Chapter 6.

 # ls -lL /etc/rc.config -rwxr-xr-x     1     bin          bin          5252     Oct    31    2001 /etc/rc.config

 # cat /etc/rc.config #!/bin/sh ... # Read in the cluster attributes before overriding them with the member # specific options. # . /etc/rc.config.common # # ... CLU_BOOT_FILESYSTEM="root1_domain#root" export CLU_BOOT_FILESYSTEM ...

Notice that rc.config is a CDSL and thus is member-specific. Also notice that rc.config requests the execution of a cluster common script named rc.config.common (approximately five lines down, bolded, in the above file). Thus systems in a cluster can be fed a variety of system-specific configuration items through /etc/rc.config as well as cluster common attributes through /etc/rc.config.common. (Details can be found in Chapter 6).

So when would we use this new rc.config.common file, and how do we add entries to it and otherwise interact with it? Traditionally, rc.config was altered through the rcmgr command. Yeah, we know. You probably just used vi (1) on the rc.config file. Technically you should be using the rcmgr command. This is especially important now that the file has been broken up into the cluster common and the member-specific parts. The rcmgr command has options that allow you to designate whether the changes are to be applied cluster wide (rc.config.common) or to a specific system (rc.config). The command options are "–c" (cluster wide) and "–h" (host specific). There is a third option on the rcmgr command that designates a site-specific rc.config file (rc.config.site). This file may contain variables describing characteristics of site-specific software.

19.3.7.3 The sysconfigtab File

There are other system configuration options that need to be addressed as well. Tru64 UNIX reads the contents of the /etc/sysconfigtab file to access system attributes and driver characteristics. This file is member-specific so that each member of the cluster may be tweaked as appropriate for the applications it will be running. Just as with rc.config, there will also be a cluster-wide version of sysconfigtab called /etc/sysconfigtab.cluster. So if all cluster members are exactly the same, the bulk of the system attributes can be reflected in the /etc/sysconfigtab.cluster file. Most system attributes placed in the cluster-wide file get merged into the member-specific /etc/sysconfigtab file upon boot (see /sbin/init.d/clu_min).

 # fln /etc/sysconfigtab /etc/sysconfigtab -> ../cluster/members/{memb}/boot_partition/etc/sysconfigtab

 # ls -Ll /etc/sysconfigtab -rwxr-xr-x     1 bin      bin       22756  Dec 31 20:18 /etc/sysconfigtab

 # sysconfigdb -l clubase ics_ll_tcp clubase:  cluster_expected_votes=2  cluster_name=babylon5  cluster_node_name=molari  cluster_node_inter_name=molari-ics0  cluster_node_votes=1  cluster_interconnect=tcp  cluster_seqdisk_major=19  cluster_seqdisk_minor=47  cluster_qdisk_major=19  cluster_qdisk_minor=63  cluster_qdisk_votes=1 ics_ll_tcp:  ics_tcp_inetaddr0=10.1.0.1  ics_tcp_netmask0=255.255.255.0  ics_tcp_adapter0=tu0

 # ls -l /etc/sysconfigtab.cluster -rw-r--r--      1 root       system       38 Nov   15    2001 /etc/sysconfigtab.cluster

 # sysconfigdb -t /etc/sysconfigtab.cluster -l clubase clubase:            cluster_expected_votes = 2

Upon booting into a cluster, a script is run (/sbin/init.d/clu_min) that checks for differences between the member-specific /etc/sysconfigtab file and the cluster-wide /etc/sysconfigtab.cluster file. If differences exist, the member-specific file is made to match any cluster-wide entries that differ. (This is not a copy of the common file; it is an analysis of the entries themselves.) Note that the /etc/sysconfigtab.cluster file is managed by utilities such as clu_quorum (8) but can also be used by an administrator to make cluster-wide additions to the member-specific sysconfigtab files. Sound contradictory? The following example shows how you can use this mechanism without having to issue individual sysconfigdb commands on each cluster member.

 # sysconfigdb -t /etc/sysconfigtab.cluster -l inet inet: Entry not found in /etc/sysconfigtab.cluster

 # sysconfigdb -t /etc/sysconfigtab.cluster -m -f inet.stanza inet <no output>

 # sysconfigdb -t /etc/sysconfigtab.cluster -l inet inet:          ipport_userreserved = 65535

The next output shows that the cluster members' /etc/sysconfigtab files do not reflect the inet entry.

 # for i in 1 2 >  do >  print "\nmember$i's sysconfigtab:" >  sysconfigdb –t /.local../../member$i/boot_partition/etc/sysconfigtab -l inet >  done member1's sysconfigtab: inet: Entry not found in /cluster/members/member1/boot_partition/etc/sysconfigtab member2's sysconfigtab: inet: Entry not found in /cluster/members/member2/boot_partition/etc/sysconfigtab

The next output shows the undocumented (no reference page) /usr/sbin/clu_update_sysconfig command that actually forces the dispersal of changes found in the /etc/sysconfigtab.cluster file to the member-specific /etc/sysconfigtab files and then uses a "for" loop to display the sysconfigtab files on each member.

 # /usr/sbin/clu_update_sysconfig /etc/sysconfigtab.cluster <no output>

 # for i in 1 2 >  do >  print "\nmember$i's sysconfigtab:" >  sysconfigdb –t /.local../../member$i/boot_partition/etc/sysconfigtab -l inet >  done member1's sysconfigtab: inet:          ipport_userreserved = 65535 member2's sysconfigtab: inet:           ipport_userreserved = 65535

Be aware that an unpatched V5.1A system would disallow the use of the sysconfigdb command that used /etc/sysconfigtab.cluster as the target file.