3.2 CSM architecture

< Day Day Up >

This section describes some of the basic concepts and technologies of CSM. These concepts and technologies provide the building blocks for the rich function provided by CSM. Most of them are inspired by the AIX PSSP software.

At the base, CSM programs use the Resource Monitoring and Control subsystem. The Resource Monitoring and Control subsystem uses the concept of resources.

A resource is a collection of attributes that describe or measure a logical or physical entity. A system or cluster is composed of numerous resources of various types.

The term resource is used very broadly in this architecture to refer to software as well as hardware entities. Examples of resources could be an IP address, an Ethernet card on eth0, and so on.

A set of resources that have similar characteristics in terms of services provided, configuration parameters, and so on, is called a Resource Class. Examples of resource classes include Ethernet adapters, token-ring adapters, network interfaces, and logical volumes.

3.2.1 Resource Monitoring and Control subsystem

The Resource Monitoring and Control (RMC) subsystem provides a generalized framework for managing and manipulating resources within a system or cluster. The framework allows a process on any node of the cluster to perform an operation on one or more resources elsewhere in the cluster. A client program specifies an operation to be performed and the resources it is to apply to through a programming interface called the RMCAPI. The RMC subsystem then determines the node or nodes that contain the resources to be operated on, transmits the requested operation to those nodes, and then invokes the appropriate code on those nodes to perform the operation against the resource.

Note

Notice that RMC clients running in a cluster only need a single connection to the RMC subsystem and do not have to deal with connection management for each node of the cluster. This makes it much easier to write management applications.

The Resource Monitoring and Control subsystem, as the basis of the CSM cluster, installs a daemon on each node inside the cluster. This daemon ultimately controls the resources on the node.

The daemon started by the Resource Monitoring and Control subsystem on each node is the /usr/sbin/rsct/bin/rmcd executable file.

3.2.2 CSM components

All the CSM components work together to provide general cluster management functions. These functions are summarized in Figure 3-1 and explained in more detail in the text that follows.

click to expand
Figure 3-1: CSM architecture

Distributed shell (dsh)

dsh provides the capability to execute a command across one or more cluster nodes. The individual responses from the nodes are then consolidated into a single output for the administrator who invoked the command.

The csm.gui.dcem package provides a graphical interface for this shell.

Availability monitor

CSM uses the RMC subsystem to actively track the network availability of all the nodes in the cluster. Those nodes in the ManagedNode resource class are actively monitored, and their status attributes reflect the results.

Hardware control

The hardware control functions provide the ability to do some basic operations on cluster nodes, like power on, power off, reset, or shutdown. These operations can be performed on a single node or a group of nodes.

All of this is done via the out-of-band path connected directly to the service processor for xSeries servers.

Remote hardware console

CSM uses the remote hardware console to drive the operating system installation and other operations where the operating system or network is not available.

Either the MRV In-Reach Terminal Server or the Equinox Serial Provider provide the capability for the administrator to have direct access to whatever node he wants in the cluster. While the current CSM release supports only these terminal servers, there are multiple hardware solutions to provide this functionality.

Configuration File Management

The Configuration File Manager (CFM) gives the administrator the capability to define many configuration files (with variances for some machines) in a central place. An agent on each managed node then retrieves these modified files.

For any particular configuration file, the administrator can set up one version for all nodes, or can specify additional versions that should be used for particular types of machines or for a specific host name.

Each managed node can be instructed to pull the new configuration files at boot time, when files change, and/or when the administrator runs a command on the managed node.

This feature is very useful to distribute common files such as the /etc/hosts table.

Database and Distributed Management Server (DMS)

CSM needs a central place to store global or persistent data. For example, the list of machines in the cluster, information about each machine, what machines make up a node group, and so on.

CSM uses a two-part approach to accomplish this: the Distributed Management Server with a set of Resource Managers that provide the central coordination point for all of the management information, and a standard Perl database interface so that other tools can get at the information in a consistent way.

All of the CSM tools utilize Resource Managers to access the database information. This allows CSM to aggregate information from several sources that is not necessarily in the database or in the same format as the CSM database information.

DMS is comprised of the following resource classes:

The ManagedNode resource class, which provides the central place to change and access the global node data. This includes periodically verifying node availability and caching the results for each node.
The NodeGroup resource class.
The PreManagedNode resource class, which is used internally to store information about nodes that are currently being installed or being prepared to join the cluster.

Event Response Resource Manager (ERRM)

The ERRM component provides CSM the ability to run commands or scripts in response to user-defined events. An important set of predefined conditions and responses is provided in the CSM package. Many system resources can be monitored, including file systems, adapters, programs, and node availability. If needed, specific resources or components that are not predefined can be defined by the system administrator.

Predefined event condition and response actions

To make the Resource Monitoring and Controlling subsystem and the Event Response Resource Manager useful to administrators, CSM supplies a set of event conditions and response actions that an administrator might typically want to use.

This gives the administrator a quick way to start monitoring the cluster right after its initial installation. The full list of predefined conditions can be found in 3.3.3, "Predefined conditions" on page 57, and the response list can be found in 3.3.4, "Responses" on page 59.

Utilities for initial set up

This component includes the packaging and commands necessary to support the customer when installing and setting up a Linux cluster on both the management and managed nodes.

For more information on the installation process, see Chapter 5, "Cluster installation and configuration with CSM" on page 99.

Node groups

CSM has been designed to install and manage large clusters. To do that task efficiently, the notion of node groups has been implemented in the product.

The CSM database keeps track of all nodes in a cluster. The CSM agents that are running on nodes have access to data that could be collected by the management server.

It is possible to create fixed groups, for example, a group that contains only the machines installed in the first rack. By using the CSM agents, it is also possible to create and work with dynamic groups. This capability can be very useful for an administrator.

For example, suppose we want to know which nodes are running the Apache Web server in order to update them. The dynamic group function gives the administrator the ability to find all nodes that are running the httpd daemon and, with the dsh command, send to those nodes the appropriate commands to run.

The dynamic groups can use all parameters generated by the RMC components. Because RMC allows you to define new resources, CSM can be used to follow process-based or proprietary applications by searching for application-specific resources such as semaphores.

Agent Resource Manager

A Resource Monitoring and Controlling manager runs on each node that handles coordination issues between the node and its management server.

Initially, it has one resource class called ManagementServer, which deals with the management server. When a node joins a CSM cluster, this resource class contacts the management server to move its entry from the PreManagedNode table to the ManagedNode table and to verify that all information about it is correct.

Sensors

Sensors are customer-written scripts that run and retrieve one variable value from the system and make it known to the Resource Monitoring and Control subsystem. RMC event registrations and queries can be made against these variables.

The most common scenario would be for an administrator setting up one or more sensors to monitor variables that he cares about and then create ERRM conditions and responses to perform particular actions when the variables have certain values. This allows the administrator to control both sides of the RMC event pipe (detecting events and responding to them) simply by writing scripts.

A sensor Resource Manager manages the sensor scripts, running them periodically according to a specified interval. Each resource in the sensor Resource Manager represents the definition of one sensor, with information like the script command, what user name to run it under, and how often it should be run. The output of a sensor causes a dynamic variable within the resource to be set. It is this dynamic variable that can be queried or subscribed to by Resource Monitoring and Control clients (including ERRM).

Logging

CSM centralizes problems and informational messages into a single log file used by all CSM software. The size of the log file is monitored by CSM and archived when necessary.

3.2.3 Security in CSM

This section describes the mechanism used by CSM to authenticate the nodes and authorize them to perform actions. They are:

Shell security
Authentication
Authorization

Shell security

CSM runs daemons on the cluster nodes under the root user. The shell used to communicate between all the cluster nodes by default is ssh.

This shell is used to establish a connection from the management server to the nodes itself. The SSH protocol and related ssh command is becoming the standard secure shell environment on Linux platforms. However, a cluster administrator may elect to replace ssh with another technology. See 5.2.10, "Deciding which remote shell protocol to use" on page 125 for more details about changing the default shell.

Authentication

CSM takes advantage of the authentication mechanism used by RSCT. This authentication mechanism is a host based authentication using private-public key pairs. Each node in the cluster has a unique private-public key pair.

The public key should be exchanged between the management server and the nodes for proper authentication of the requests. CSM automatically handles all the required key exchanges between the nodes of the cluster during installation and configuration. The public key is copied from each of the managed nodes to the management server, and the management server's public key is copied to each of the managed nodes.

A CSM system administrator has control over the public and private keys used for cluster node security. Public and private keys are generated by cluster security services and used by cluster security services exploiters. These keys cannot be used with rsh, OpenSSH, or any other remote command technology. They are installed by default in the following locations:

/var/ct/cfg/ct_has.qkf (private keys)
/var/ct/cfg/ct_has/pkf (public keys)
/var/ct/cfg/ct_has.thl (trusted host list)

Authorization

CSM provides authorization in the form of an access control list (ACL) file.

This control list is used by the Resource Monitoring and Control subsystem to enable (or prevent) a user from executing a command that can change values on a class and its instances.

You can create the /var/ct/cfg/ctrmc.acls ACL file to apply access control to resource classes. If you do not modify the provided ACL file, then the system uses the default permissions, which allow the root account read/write access to all resource classes, and all other users are allowed read-only access, as shown in Example 3-1.

Example 3-1: Default CSM ACL in /var/ct/cfg/ctrmc.acls file

 # The following stanza contains default ACL entries. These entries are appended # to each ACL defined for a resource class and are examined after any entries # explicitly defined for a resource class by the stanzas in this file, # including the OTHER stanza. DEFAULT     root@LOCALHOST       *       rw     LOCALHOST            *       r

Read permission (r) allows you to register and unregister for events, to query attribute values, and to validate resource handles.
Write permission (w) allows you to run all other command interfaces.
No permissions are required to query the resource class and attribute definitions.

Important:

For any command issued against a resource class or its instances, the RMC subsystem examines the lines of the stanza matching the specified class in the order specified in the ACL file.

The first line that contains an identifier that matches the user issuing the command and an object type that matches the objects specified by the command is the line used to determine access permissions.

Therefore, lines containing more specific user identifiers and object types should be placed before lines containing less specific user identifiers and object types.

< Day Day Up >