The following chart lists administrative components and indicates whether they are cluster wide or member specific.
Certain commands are member-specific and return information only for the member on which the command executes (such as fuser, volstat (8), ps, mailstats (8), uptime (1), vmstat, who (1), etc.). Details are available in the reference pages and in the TruCluster Server Cluster Administration Guide as well as the Tru64 UNIX System Administration Guide.
Other commands are limited but yield interesting information that goes beyond the member on which the command is issued (iostat (1) can show disk statistics on local disks and on disks on a shared bus; however, statistics will be pertinent to traffic generated to and from the local member only).
Still others have had new options added (such as "–c") to indicate that the command should function cluster wide (such as "hwmgr view device –cluster" and "shutdown –c") or have cluster-wide information in their output such as "dsfmgr –s" which indicates a local or cluster-wide visibility for devices with the ‘scope’ column (shown below).
# dsfmgr -s dsfmgr: show all datum for system at / Device Class Directory Default Database: # scope mode name -- --- ------- -------- 1 l 0755 . 2 l 0755 none 3 c 0755 cport 4 c 0755 disk 5 c 0755 rdisk 6 c 0755 tape 7 c 0755 ntape 8 c 0755 changer 9 c 0755 dmapix ...
The "c" in the "scope" field indicates the Device Class Directory is cluster-wide.
System Administration Tasks | ||||
---|---|---|---|---|
Task | Cluster Wide | Memberspecific | Chapter of Section | Notes |
Accounting | x | √ | 19 | Enable on a specific host. |
Auditing | √ | x | 19.4.4.1 | Cluster-wide Configuration, Member-Specific Audit Logs. |
Backups | √ | x | 22 | CDSLs are normal symbolic links. |
Cron | x | √ | 21 | /usr/spool/cron is a CDSL |
Dumps | x | √ | A | Use dumpsys to force dumps on each member. |
EVM Events | √ | √ | 8, 12 | Cluster_event attribute forces posting to all members. |
File Systems | √ | √ | 6, 13 | CFS allows cluster-wide file system. |
Kernel Builds | x | √ | 12 | Kernel configuration files are located in cluster_user, and can be built on any member, but the kernel itself is member-specific. |
Licensing | x | √ | 4, 5, 10, 11 | Every cluster member must be individually licensed. |
Loading Software | √ | x | 10, 19 | Installed once, supported by CAA |
LSM | √ | √ | 14 | Cannot be used for member boot partition |
Non-Storage Devices | x | √ | 19 | Member-specific |
O/S Install/Updates | √ | x | 5, 11, 19, 26 | Rolling Upgrade Supported |
Performance | √ | √ | 19, 21, A | Cluster-wide for Cluster Services, Member Specific for Local Services |
Printing | √ | x | 19.4.3 | New "on" printcap attribute |
Processes and Scheduling | x | √ | 6, 19.2.4 | PIDs cluster-wide, Scheduling member-specific |
Shutdown | √ | √ | 19.2.2 | Cluster-wide shutdown is supported |
Startup | x | √ | 17 | Member-specific startup files |
Storage Devices | √ | x | 7, 12, 15 | Supported through DRD and CFS |
System Time | x | √ | 20 | Must be Synchronized |
User Accounts | √ | x | 19.3.1 | The passwd and group files are cluster-wide. |
# hwmgr -view devices -cluster HWID: Device Name Mfg Model Hostname Location ------------------------------------------------------------------------------------- 3: /dev/dmapi/dmapi molari 3: /dev/dmapi/dmapi sheridan 4: /dev/scp_scsi molari 5: /dev/kevm molari 35: /dev/disk/floppy0c 3.5in floppy molari fdi0-unit-0 46: /dev/disk/cdrom0c COMPAQ CRD-8402B molari bus-1-targ-0-lun-0 47: /dev/disk/dsk0c COMPAQ BD009734A3 molari bus-2-targ-0-lun-0 50: /dev/disk/dsk1c COMPAQ BD009635C3 molari bus-3-targ-0-lun-0 50: /dev/disk/dsk1c COMPAQ BD009635C3 sheridan bus-3-targ-0-lun-0 51: /dev/disk/dsk2c COMPAQ BD009635C3 molari bus-3-targ-1-lun-0 51: /dev/disk/dsk2c COMPAQ BD009635C3 sheridan bus-3-targ-1-lun-0 52: /dev/disk/dsk3c COMPAQ BD009635C3 molari bus-3-targ-2-lun-0 52: /dev/disk/dsk3c COMPAQ BD009635C3 sheridan bus-3-targ-2-lun-0 54: /dev/disk/dsk5c COMPAQ BD009635C3 molari bus-3-targ-4-lun-0 54: /dev/disk/dsk5c COMPAQ BD009635C3 sheridan bus-3-targ-4-lun-0 55: /dev/disk/dsk6c COMPAQ BD009635C3 molari bus-3-targ-5-lun-0 55: /dev/disk/dsk6c COMPAQ BD009635C3 sheridan bus-3-targ-5-lun-0 58: scp sheridan 59: kevm sheridan 89: /dev/disk/floppy1c 3.5in floppy sheridan fdi0-unit-0 102: /dev/disk/cdrom1c COMPAQ CRD-8402B sheridan bus-0-targ-0-lun-0 103: /dev/disk/dsk8c COMPAQ BB009235B6 sheridan bus-2-targ-0-lun-0 104: /dev/disk/dsk9c COMPAQ BB009235B6 sheridan bus-2-targ-1-lun-0 108: /dev/disk/dsk4c COMPAQ BD009635C3 molari bus-3-targ-3-lun-0 108: /dev/disk/dsk4c COMPAQ BD009635C3 sheridan bus-3-targ-3-lun-0
The hwmgr command has several other cluster-oriented options such as "–member" that allows a command to be focused on a particular cluster member, or "–cluster" that forces the command to act cluster wide. The default action is that the command works from the perspective of the issuing member.
See Chapters 7 and 12 for more information on hwmgr and dsfmgr.
The Advanced File System (AdvFS) has grown in importance over the years. At first it was plagued with problems. Slowly but surely the problems were resolved so that the current release of AdvFS (version 4) is very robust. In fact, it is the default file system for Tru64 UNIX starting with version 5.0. Prior to that, the default file system was UFS.
In a cluster environment, AdvFS can be used to expand the cluster_root domain. Big deal, you say? Well normally, the root domain cannot be expanded, so it is a big deal! Generally, we discourage the use of multi-volume domains, but it sure is convenient to add a bigger volume to an AdvFS domain if and when necessary and then remove the smaller volume. Note that adding and removing volumes requires the ADVANCED-UTILITIES license.
The verify (8) utility can run on active domains using the "–a" option. This allows checking the cluster_root domain while it is up and running. This is big news because typically verify would be run on a domain with no mounted filesets. The following example shows the verify command being run on the cluster_root domain (an active domain). Note that there will be some extraneous errors (bolded in the output below) reported since the domain is currently active.
Caution | The "verify –a" command should only be run on the member that is the CFS server for the domain. |
The CFS server for a domain can be discerned from the output of the cfsmgr (8) command.
# cfsmgr / Domain or filesystem name = / Server Name =molari Server Status : OK
# showfdmn cluster_root Id Date Created LogPgs Version Domain Name 3be01cb1.000be1e0 Wed Oct 31 10:45:53 2001 512 4 cluster_root Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name 1L 401408 165040 59% on 256 256 /dev/disk/dsk3b
# df / Filesystem 512-blocks Used Available Capacity Mounted on cluster_root#root 401408 215962 165040 57% /
# /sbin/advfs/verify -a cluster_root +++ Domain verification +++ Domain Id 3be01cb1.000be1e0 Checking disks ... Checking storage allocated on disk /dev/disk/dsk3b Checking mcell list ... Checking that all in-use mcells are attached to a file's metadata mcell chain... Checking tag directories ... Found 2 references to files that cannot be found in any directory. Most likely this is from file activity on the active domain. +++ Fileset verification +++ +++++ Fileset root +++++ Checking frag file headers ... Checking frag file type lists ... Scanning directories and files ... 100 Scanned 175 directories. Scanning tags ... 2100 Scanned a total of 2146 tags. Searching for lost files ... Creating //lost+found 2100 Found 4 lost files out of 2146 checked. Most likely this is from file activity on the active domain.
In the unlikely event that a cluster member fails while the verify command is active, leaving the filesets in an unmountable state, don't despair. Check for temporary mount points found under /etc/fdmns/domain_name/fset[0-9]_verify_identifier, where "identifier" is a unique id chosen by the verify utility. Unmount and delete these mount points and you should be in good shape again. These strange mount points were left behind because verify was interrupted in the middle of its operations and its temporary mounts were failed over to another member. (This will not happen on a standalone system.)
When a formerly standalone node is being added to an existing cluster, there is no magical way to get its domains recognized and its file systems mounted. Well, maybe the mechanism is a bit magical. It is always amazing to consider the jobs accomplished by commands such as verify, defragment (8), salvage (8), advscan (8), fixfdmn (8), and other effort saving (sometimes job saving) utilities. The new member's AdvFS file systems must be reflected in the /etc/fstab file, which is shared by all cluster members. Getting the domains recognized can be done in either of two ways: by manually creating a directory entry under /etc/fdmns matching the new domain name and then creating the symbolic links pointing to the domain's volumes; or by using the (somewhat magical) advscan command, which searches for AdvFS partitions and creates entries in /etc/fdmns if appropriate.
The cluster_root domain is treated in a special way by the verify utility. As we discussed, the utility can examine the integrity of the root file system while it is mounted if invoked using the "-a" option. But the powerful fix-it-up "-f" and "-d" options are not available when using the "-a" option. So here you are with a wonderful report from verify indicating several metadata errors and you would like to have the utility (which is smart enough to figure out that there are problems with the domain's metadata) take the next step and do its darnedest to fix the problems. Under normal circumstances this is not a big deal because the target domain will have no file systems mounted, so verify can have its unabashed way with the metadata. But when cluster_root is the problem domain, we have a sticky situation because the cluster_root file system is mounted and can't be dismounted without losing access to just about everything including the verify command itself.
In this case you will have to boot the "emergency repair" disk, which can be the system disk of the initial cluster member. Once booted, you can give verify a shot at repairing the cluster_root domain or restore the domain from backup storage.
The Event Manager (EVM) supports cluster-wide events. Certain events will have a "cluster_event" attribute. If that attribute is set to "false", the event will only be posted on the member generating the event; otherwise it is posted on all cluster members. The CAA, CFS, CLUA, CNX, and DRD cluster subsystems have EVM templates.
The Event Manager is discussed in Chapter 8, and the cluster_event attribute is covered in Chapter 12.
A printer that is connected to a cluster member may be accessed by the other cluster members.
When using lprsetup (8), there will be an additional "on" attribute, which is used to indicate which cluster member has the physical connection to the printer. The /etc/printcap file is not a CDSL, so it is shared by all cluster members. The following output shows the new "on" printcap option in use and then displays information available from within lprsetup by typing the "?" when prompted for the "on" string.
# grep ':on' /etc/printcap :on=molari:\
# lprsetup Tru64 UNIX Printer Setup Program Command < add modify delete exit view quit help >: m
Modifying a printer entry, type '?' for help. Enter printer name to modify (or view to view printcap file): lp0
Enter the name of the symbol you wish to change. Enter 'p' to print the current values, 'l' to list all printcap values or 'q' to quit. Enter symbol name: on
There is 1 node in the babylon5 cluster. Do you want a list of cluster-member nodenames (y|[n])? y
Member ID Member Hostname --------- --------------- 1 molari 2 sheridan 3 ivanova Enter a new value for symbol 'on'? [molari] ?
The 'on' parameter specifies the on-list, which is the list of one or more cluster member nodenames which are authorized to run the queue-daemon for the spool queue. The format of the on-list string is illustrated by the following examples: :on=localhost: \ :on=node1: \ :on=node1,node2,nodeN: \ If this parameter is not specified, 'localhost' is assumed by default. The order of the nodes in the on-list, from left to right, specifies the priority from highest to lowest which the member-node parent print daemons will use to determine which member-node will run the queue-daemon. If localhost is specified, all member-nodes will be authorized to run the queue-daemon. Which node will actually run it is determined by the first node that submits a job to the queue while it is empty. In a cluster, localhost or no value should be specified only for printers that are connected using tcp. Printers that are connected to a device specified in the /dev/ directory must specify an on-list if the device is connected to a node that is part of a multi-node cluster. It is recommended that an on-list be specified if the cluster only contains one node. For non-clustered, stand-alone hosts, use of an on-list specifying the local hostname or 'localhost' is optional. Enter a new value, or press RETURN to use the default. Enter a new value for symbol 'on'? [molari] ...
There is also a new lock file used to coordinate lpd (8) activities from cluster members. The following is an excerpt from the lpd (8) reference page.
/usr/spool/lpd/lpd.lock On clustered systems, this transient file is created to contain the daemon status. Note that the /usr/spool/lpd directory is a Context Dependent Symbolic Link (CDSL) and should not be manually created or destroyed.
Note that the /usr/spool/lpd directory is a CDSL, so the spooling directory is member- specific. The /usr/spool/lpd/lpd.lock file is used to synchronize activities from the lpd daemons running on each cluster member. The printer's log file (usually found in /usr/adm/lp0err – or a similar file name) is not a symbolic link, so all cluster members will log printing activities to the same file. The reference pages warn that the lpd does not purge its log files, so you may want to monitor their size periodically using a crontab entry (see existing entries under /usr/var/spool/cron/crontabs).
Security is treated as a cluster-wide choice. Either all of the systems are running with enhanced security enabled, or none of the members are running with enhanced security enabled. Therefore, the cluster is treated as a single security domain. HP recommends that Enhanced Security be enabled before the creation of a cluster. Otherwise, the entire cluster will have to be shutdown and rebooted as part of the enhanced security configuration process. So much for high availability!! We recommend that you evaluate your site's security needs before leaping into the installation and configuration of your first cluster member. If security is set up on the first member, it will automatically be ready to function on all subsequently added cluster members.
Given the existence of CFS, yielding cluster-wide file systems, any file with an Access Control List (ACL) associated with it will be protected by its ACL cluster wide.
If you are using auditing, there will be an auditd (8) daemon running on each member. The auditd daemon will write to a member-specific audit log file. The audit log file is the only security related file that is member-specific. Administrators using auditing are usually paranoid about something (and sometimes for good reason). Rest assured that if a cluster member goes down, auditing continues on the remaining systems. Note that auditing is enabled or disabled on a cluster-wide basis, but it actually runs independently on each cluster member.
If the culture at your site is such that your users are inclined to use the "r" commands (rlogin (1), rcp (1), rsh (1)), you should be aware that the outgoing request will be identified as emanating from the cluster alias name (not the individual cluster member's name). This may have repercussions on any "trusts" set up between machines on your network using the /etc/hosts.equiv file or the ~/.rhosts files.
Caution | The cluster software currently uses rsh across the cluster interconnect. Thus there will be a .rhosts file in your root directory and a /etc/hosts.equiv file in existence whether you want them or not. The files will contain an entry listing the cluster alias as well as several names created by the cluster software itself (i.e., hostname-ics0 and member1-icstcp0). In V5.1B of Tru64 UNIX, there will be an ssh command (currently non-existent) for communications providing a more robust internode communication mechanism for the cluster software utilities with less reliance on the /.rhosts and /etc/hosts.equiv files. |