19.4 General Administration Differences

The following chart lists administrative components and indicates whether they are cluster wide or member specific.

Certain commands are member-specific and return information only for the member on which the command executes (such as fuser, volstat (8), ps, mailstats (8), uptime (1), vmstat, who (1), etc.). Details are available in the reference pages and in the TruCluster Server Cluster Administration Guide as well as the Tru64 UNIX System Administration Guide.

Other commands are limited but yield interesting information that goes beyond the member on which the command is issued (iostat (1) can show disk statistics on local disks and on disks on a shared bus; however, statistics will be pertinent to traffic generated to and from the local member only).

Still others have had new options added (such as "–c") to indicate that the command should function cluster wide (such as "hwmgr view device –cluster" and "shutdown –c") or have cluster-wide information in their output such as "dsfmgr –s" which indicates a local or cluster-wide visibility for devices with the ‘scope’ column (shown below).

 # dsfmgr -s dsfmgr: show all datum for system at / Device Class Directory Default Database:      #  scope  mode    name      --  --- -------  --------       1      l        0755    .       2      l        0755    none       3      c        0755    cport       4      c        0755    disk       5      c        0755    rdisk       6      c        0755    tape       7      c        0755    ntape       8      c        0755    changer       9      c        0755    dmapix ...

The "c" in the "scope" field indicates the Device Class Directory is cluster-wide.

Table 19-1: System Administration Tasks
System Administration Tasks
Task	Cluster Wide	Memberspecific	Chapter of Section	Notes
Accounting	x	√	19	Enable on a specific host. `rcmgr -h set ACCOUNTING YES` Enable cluster-wide. `rcmgr -c ACCOUNTING YES`
Auditing	√	x	19.4.4.1	Cluster-wide Configuration, Member-Specific Audit Logs.
Backups	√	x	22	CDSLs are normal symbolic links.
Cron	x	√	21	`/usr/spool/cron` is a CDSL
Dumps	x	√	A	Use `dumpsys` to force dumps on each member.
EVM Events	√	√	8, 12	`Cluster_event` attribute forces posting to all members.
File Systems	√	√	6, 13	CFS allows cluster-wide file system.
Kernel Builds	x	√	12	Kernel configuration files are located in `cluster_user`, and can be built on any member, but the kernel itself is member-specific.
Licensing	x	√	4, 5, 10, 11	Every cluster member must be individually licensed.
Loading Software	√	x	10, 19	Installed once, supported by CAA
LSM	√	√	14	Cannot be used for member boot partition
Non-Storage Devices	x	√	19	Member-specific
O/S Install/Updates	√	x	5, 11, 19, 26	Rolling Upgrade Supported
Performance	√	√	19, 21, A	Cluster-wide for Cluster Services, Member Specific for Local Services
Printing	√	x	19.4.3	New "`on`" `printcap` attribute
Processes and Scheduling	x	√	6, 19.2.4	PIDs cluster-wide, Scheduling member-specific
Shutdown	√	√	19.2.2	Cluster-wide shutdown is supported
Startup	x	√	17	Member-specific startup files
Storage Devices	√	x	7, 12, 15	Supported through DRD and CFS
System Time	x	√	20	Must be Synchronized
User Accounts	√	x	19.3.1	The `passwd` and `group` files are cluster-wide.

 # hwmgr -view devices -cluster HWID:   Device   Name        Mfg         Model           Hostname         Location -------------------------------------------------------------------------------------    3: /dev/dmapi/dmapi                                   molari    3: /dev/dmapi/dmapi                                   sheridan    4: /dev/scp_scsi                                      molari    5: /dev/kevm                                          molari   35: /dev/disk/floppy0c                 3.5in floppy    molari            fdi0-unit-0   46: /dev/disk/cdrom0c     COMPAQ       CRD-8402B       molari            bus-1-targ-0-lun-0   47: /dev/disk/dsk0c       COMPAQ       BD009734A3      molari            bus-2-targ-0-lun-0   50: /dev/disk/dsk1c       COMPAQ       BD009635C3      molari            bus-3-targ-0-lun-0   50: /dev/disk/dsk1c       COMPAQ       BD009635C3      sheridan          bus-3-targ-0-lun-0   51: /dev/disk/dsk2c       COMPAQ       BD009635C3      molari            bus-3-targ-1-lun-0   51: /dev/disk/dsk2c       COMPAQ       BD009635C3      sheridan          bus-3-targ-1-lun-0   52: /dev/disk/dsk3c       COMPAQ       BD009635C3      molari            bus-3-targ-2-lun-0   52: /dev/disk/dsk3c       COMPAQ       BD009635C3      sheridan          bus-3-targ-2-lun-0   54: /dev/disk/dsk5c       COMPAQ       BD009635C3      molari            bus-3-targ-4-lun-0   54: /dev/disk/dsk5c       COMPAQ       BD009635C3      sheridan          bus-3-targ-4-lun-0   55: /dev/disk/dsk6c       COMPAQ       BD009635C3      molari            bus-3-targ-5-lun-0   55: /dev/disk/dsk6c       COMPAQ       BD009635C3      sheridan          bus-3-targ-5-lun-0   58: scp sheridan   59: kevm sheridan   89: /dev/disk/floppy1c                 3.5in floppy    sheridan          fdi0-unit-0  102: /dev/disk/cdrom1c     COMPAQ       CRD-8402B       sheridan          bus-0-targ-0-lun-0  103: /dev/disk/dsk8c       COMPAQ       BB009235B6      sheridan          bus-2-targ-0-lun-0  104: /dev/disk/dsk9c       COMPAQ       BB009235B6      sheridan          bus-2-targ-1-lun-0  108: /dev/disk/dsk4c       COMPAQ       BD009635C3      molari            bus-3-targ-3-lun-0  108: /dev/disk/dsk4c       COMPAQ       BD009635C3      sheridan          bus-3-targ-3-lun-0

The hwmgr command has several other cluster-oriented options such as "–member" that allows a command to be focused on a particular cluster member, or "–cluster" that forces the command to act cluster wide. The default action is that the command works from the perspective of the issuing member.

See Chapters 7 and 12 for more information on hwmgr and dsfmgr.

19.4.1 AdvFS

The Advanced File System (AdvFS) has grown in importance over the years. At first it was plagued with problems. Slowly but surely the problems were resolved so that the current release of AdvFS (version 4) is very robust. In fact, it is the default file system for Tru64 UNIX starting with version 5.0. Prior to that, the default file system was UFS.

In a cluster environment, AdvFS can be used to expand the cluster_root domain. Big deal, you say? Well normally, the root domain cannot be expanded, so it is a big deal! Generally, we discourage the use of multi-volume domains, but it sure is convenient to add a bigger volume to an AdvFS domain if and when necessary and then remove the smaller volume. Note that adding and removing volumes requires the ADVANCED-UTILITIES license.

The verify (8) utility can run on active domains using the "–a" option. This allows checking the cluster_root domain while it is up and running. This is big news because typically verify would be run on a domain with no mounted filesets. The following example shows the verify command being run on the cluster_root domain (an active domain). Note that there will be some extraneous errors (bolded in the output below) reported since the domain is currently active.

Caution

The "verify –a" command should only be run on the member that is the CFS server for the domain.

The CFS server for a domain can be discerned from the output of the cfsmgr (8) command.

 # cfsmgr /   Domain or filesystem name = /   Server Name =molari   Server Status : OK

 # showfdmn cluster_root                Id              Date Created     LogPgs      Version      Domain   Name 3be01cb1.000be1e0  Wed Oct 31 10:45:53 2001        512            4        cluster_root    Vol     512-Blks       Free      % Used     Cmode     Rblks     Wblks    Vol Name     1L       401408     165040         59%        on       256       256    /dev/disk/dsk3b

 # df / Filesystem         512-blocks        Used      Available     Capacity   Mounted on cluster_root#root      401408       215962        165040          57%   /

 # /sbin/advfs/verify -a cluster_root +++ Domain verification +++ Domain Id 3be01cb1.000be1e0 Checking disks ... Checking storage allocated on disk /dev/disk/dsk3b Checking mcell list ... Checking that all in-use mcells are attached to a file's metadata mcell chain... Checking tag directories ... Found 2 references to files that cannot be found in any directory. Most likely this is from file activity on the active domain. +++ Fileset verification +++   +++++ Fileset root +++++ Checking frag file headers ... Checking frag file type lists ... Scanning directories and files ...      100 Scanned 175 directories. Scanning tags ...     2100 Scanned a total of 2146 tags. Searching for lost files ... Creating //lost+found      2100 Found 4 lost files out of 2146 checked. Most likely this is from file activity on the active domain.

In the unlikely event that a cluster member fails while the verify command is active, leaving the filesets in an unmountable state, don't despair. Check for temporary mount points found under /etc/fdmns/domain_name/fset[0-9]_verify_identifier, where "identifier" is a unique id chosen by the verify utility. Unmount and delete these mount points and you should be in good shape again. These strange mount points were left behind because verify was interrupted in the middle of its operations and its temporary mounts were failed over to another member. (This will not happen on a standalone system.)

When a formerly standalone node is being added to an existing cluster, there is no magical way to get its domains recognized and its file systems mounted. Well, maybe the mechanism is a bit magical. It is always amazing to consider the jobs accomplished by commands such as verify, defragment (8), salvage (8), advscan (8), fixfdmn (8), and other effort saving (sometimes job saving) utilities. The new member's AdvFS file systems must be reflected in the /etc/fstab file, which is shared by all cluster members. Getting the domains recognized can be done in either of two ways: by manually creating a directory entry under /etc/fdmns matching the new domain name and then creating the symbolic links pointing to the domain's volumes; or by using the (somewhat magical) advscan command, which searches for AdvFS partitions and creates entries in /etc/fdmns if appropriate.

The cluster_root domain is treated in a special way by the verify utility. As we discussed, the utility can examine the integrity of the root file system while it is mounted if invoked using the "-a" option. But the powerful fix-it-up "-f" and "-d" options are not available when using the "-a" option. So here you are with a wonderful report from verify indicating several metadata errors and you would like to have the utility (which is smart enough to figure out that there are problems with the domain's metadata) take the next step and do its darnedest to fix the problems. Under normal circumstances this is not a big deal because the target domain will have no file systems mounted, so verify can have its unabashed way with the metadata. But when cluster_root is the problem domain, we have a sticky situation because the cluster_root file system is mounted and can't be dismounted without losing access to just about everything including the verify command itself.

In this case you will have to boot the "emergency repair" disk, which can be the system disk of the initial cluster member. Once booted, you can give verify a shot at repairing the cluster_root domain or restore the domain from backup storage.

19.4.2 Event Manager

The Event Manager (EVM) supports cluster-wide events. Certain events will have a "cluster_event" attribute. If that attribute is set to "false", the event will only be posted on the member generating the event; otherwise it is posted on all cluster members. The CAA, CFS, CLUA, CNX, and DRD cluster subsystems have EVM templates.

The Event Manager is discussed in Chapter 8, and the cluster_event attribute is covered in Chapter 12.

19.4.3 Printing

A printer that is connected to a cluster member may be accessed by the other cluster members.

When using lprsetup (8), there will be an additional "on" attribute, which is used to indicate which cluster member has the physical connection to the printer. The /etc/printcap file is not a CDSL, so it is shared by all cluster members. The following output shows the new "on" printcap option in use and then displays information available from within lprsetup by typing the "?" when prompted for the "on" string.

 # grep ':on' /etc/printcap            :on=molari:\

 # lprsetup Tru64 UNIX Printer Setup Program Command < add modify delete exit view quit help >: m

 Modifying a printer entry, type '?' for help. Enter printer name to modify (or view to view printcap file): lp0

 Enter the name of the symbol you wish to change. Enter 'p' to print the current values, 'l' to list  all printcap values or 'q' to quit. Enter symbol name: on

 There is 1 node in the babylon5 cluster. Do you want a list of cluster-member nodenames (y|[n])? y

 Member ID   Member Hostname ---------   --------------- 1           molari 2           sheridan 3           ivanova Enter a new value for symbol 'on'? [molari] ?

 The 'on' parameter specifies the on-list, which is the list of one or more cluster member nodenames which are authorized to run the queue-daemon for the spool queue. The format of the on-list string is illustrated by the following examples:                       :on=localhost: \                       :on=node1: \                       :on=node1,node2,nodeN: \ If this parameter is not specified, 'localhost' is assumed by default. The order of the nodes in the on-list, from left to right, specifies the priority from highest to lowest which the member-node parent print daemons will use to determine which member-node will run the queue-daemon. If localhost is specified, all member-nodes will be authorized to run the queue-daemon. Which node will actually run it is determined by the first node that submits a job to the queue while it is empty. In a cluster, localhost or no value should be specified only for printers that are connected using tcp. Printers that are connected to a device specified in the /dev/ directory must specify an on-list if the device is connected to a node that is part of a multi-node cluster. It is recommended that an on-list be specified if the cluster only contains one node. For non-clustered, stand-alone hosts, use of an on-list specifying the local hostname or 'localhost' is optional. Enter a new value, or press RETURN to use the default. Enter a new value for symbol 'on'? [molari]  ...

There is also a new lock file used to coordinate lpd (8) activities from cluster members. The following is an excerpt from the lpd (8) reference page.

 /usr/spool/lpd/lpd.lock       On clustered systems, this transient file is created to contain the       daemon status. Note that the /usr/spool/lpd directory is a Context       Dependent Symbolic Link (CDSL) and should not be manually created or       destroyed.

Note that the /usr/spool/lpd directory is a CDSL, so the spooling directory is member- specific. The /usr/spool/lpd/lpd.lock file is used to synchronize activities from the lpd daemons running on each cluster member. The printer's log file (usually found in /usr/adm/lp0err – or a similar file name) is not a symbolic link, so all cluster members will log printing activities to the same file. The reference pages warn that the lpd does not purge its log files, so you may want to monitor their size periodically using a crontab entry (see existing entries under /usr/var/spool/cron/crontabs).

19.4.4 Security

Security is treated as a cluster-wide choice. Either all of the systems are running with enhanced security enabled, or none of the members are running with enhanced security enabled. Therefore, the cluster is treated as a single security domain. HP recommends that Enhanced Security be enabled before the creation of a cluster. Otherwise, the entire cluster will have to be shutdown and rebooted as part of the enhanced security configuration process. So much for high availability!! We recommend that you evaluate your site's security needs before leaping into the installation and configuration of your first cluster member. If security is set up on the first member, it will automatically be ready to function on all subsequently added cluster members.

19.4.4.1 ACLs

Given the existence of CFS, yielding cluster-wide file systems, any file with an Access Control List (ACL) associated with it will be protected by its ACL cluster wide.

19.4.4.2 Auditing

If you are using auditing, there will be an auditd (8) daemon running on each member. The auditd daemon will write to a member-specific audit log file. The audit log file is the only security related file that is member-specific. Administrators using auditing are usually paranoid about something (and sometimes for good reason). Rest assured that if a cluster member goes down, auditing continues on the remaining systems. Note that auditing is enabled or disabled on a cluster-wide basis, but it actually runs independently on each cluster member.

19.4.4.3 The .rhosts and hosts.equiv Files

If the culture at your site is such that your users are inclined to use the "r" commands (rlogin (1), rcp (1), rsh (1)), you should be aware that the outgoing request will be identified as emanating from the cluster alias name (not the individual cluster member's name). This may have repercussions on any "trusts" set up between machines on your network using the /etc/hosts.equiv file or the ~/.rhosts files.

Caution

The cluster software currently uses rsh across the cluster interconnect. Thus there will be a .rhosts file in your root directory and a /etc/hosts.equiv file in existence whether you want them or not. The files will contain an entry listing the cluster alias as well as several names created by the cluster software itself (i.e., hostname-ics0 and member1-icstcp0). In V5.1B of Tru64 UNIX, there will be an ssh command (currently non-existent) for communications providing a more robust internode communication mechanism for the cluster software utilities with less reliance on the /.rhosts and /etc/hosts.equiv files.