|< Day Day Up >|| |
The purpose of the IBM Cluster Systems Management product (known as CSM throughout this book) is to provide broad management capabilities for clusters.
CSM provides many useful functions to manage a cluster from a single point of control. These include resource monitoring, automated monitoring and operation, remote hardware control, remote command execution, security, configuration file management, parallel network installation, and diagnostics. By consolidating these capabilities, CSM helps to increase utilization of an administrator's time and reduce their expenses.
CSM provides a variety of other benefits:
CSM helps administrators deploy their clusters rapidly by automating many configuration tasks and by leveraging existing Open Source products.
CSM provides efficient monitoring of cluster resources without overwhelming network bandwidth.
The automated error detection CSM provides helps catch problems before they impact the environment, and assists with rapid resolution and recovery of problems that occur.
CSM has an architecture and modular construction that maximizes flexibility so your cluster solution can evolve and grow as your needs change.
The concept of CSM came from IBM Parallel System Support Programs for AIX (also known as PSSP) and from other applications available as open source tools.
CSM is a collection of components that have been integrated to provide the basis to construct and manage a cluster. Each of these components provides specific capabilities related to the management of the cluster. This component-based architecture provides flexibility for future expansion of the capabilities provided by CSM.
Each of the CSM components can be easily personalized to help meet specific needs. For example, a cluster administrator can set up monitoring of application processes and take actions if those processes disappear.
CSM utilizes some underlying subsystems such as Resource Monitoring and Control (RMC), Reliable Scalable Cluster Technology (RSCT), and System Resource Controller (SRC), and adds on a set of programs to provide a wide range of capabilities around the complexities of managing large Linux clusters. The capabilities provided by the RMC, RSCT, and SRC subsystems allow CSM (as well as GPFS) to enable administrators to quickly deploy manageable clusters. CSM adds on top of this base a set of programs to add management functions, such as the Event Response Resource Manager, the Distributed Management Server, and the distributed shell. Both RSCT and SRC are described in detail in Appendix A, "SRC and RSCT" on page 253, and we cover RMC and the additional programs provided and used by CSM in this chapter.
|< Day Day Up >|| |