Linux Clusters

team bbl


There are two distinct types of clusters: high-performance clusters and high-availability clusters. The commonality between the two cluster types is that they are both made up of a set of independent computers that are interconnected and working on a common task. Each independent computer within a cluster is called a node.

The goal of a high-performance cluster is to perform large computational tasks, spreading the work across a large number of nodes. The goal of a high-availability cluster is for an application (typically a database) to be able to continue functioning even through the failure of one or more nodes. A high-performance cluster is not typically considered an enterprise server, but rather is dedicated to specific computationally intensive tasks. A high-availability cluster, on the other hand, usually operates as an enterprise server.

High-performance clusters (HPCs) tend to have a higher node count than high-availability clusters, with 100 node clusters being common. High-availability clusters tend to have smaller node counts, typically not exceeding 16 nodes, and more commonly only two to four nodes.

High-Performance Clusters

High-performance clusters are an inexpensive way of providing large computational power for problems that are divisible into multiple parallel components. The nodes typically are inexpensive single- or dual-processor computers. Because large numbers of nodes are involved, size is an important consideration. Computers that fit into a 1U form factor that allow for stacking large quantities per rack are commonly used for high performance clusters. Most major hardware vendors sell systems capable of being clustered. Nodes are headlessthat is, they have no keyboard, monitor, or mouse. Larger clusters might include a separate management LAN and/or a terminal server network to provide console capability to the nodes.

Each node in an HPC has its own local disk storage to maintain the operating system, provide swap space, store programs, and so on. Some clusters have an additional type of nodea storage serverto provide access to common disk storage for shared data. There is also a master node that provides overall control of the cluster, coordinating the work across nodes and providing the interface between the cluster and local networks.

The interconnect for nodes in an HPC can be Ethernet (10, 100, 1000MBps) or it can be a specialty interconnect that delivers higher performance, such as myrinet. The choice of the interconnect technology is a trade-off between price and speed (latency and bandwidth). The type of work a cluster is designed to do influences the choice of interconnect technology.

Certain file systems are designed for use in cluster environments. These file systems provide a global, parallel cluster file systemfor example, GPFS from IBM or CXFS from SGI. These file systems provide concurrent read/write access to files located on a shared disk file system.

Communication between HPC nodes often makes use of message-passing libraries such as MPI or PVM. These libraries are based on common standards and allow the easy porting of parallel applications to different cluster environments.

High-Availability Clusters

Some workloads are more sensitive to failurethat is, a failure of the workload can have expensive repercussions for a business. Sample workloads include customer relationship management (CRM), inventory control, messaging servers, databases, and file and print servers. Availability is critical to these workloads; availability requirements are often described as the five-nines of availability (99.999%). Providing that level of availability allows about 5 minutes of outage per year.

One method of preventing downtime caused by failure of a system running these critical workloads is the use of high-availability (HA) clusters. An HA cluster consists minimally of two independent computers with a "heartbeat" monitoring program that monitors the health of the other node(s) in the cluster. If one node fails, another node detects the failure and automatically picks up the work of the failed node.

It is common for HA clusters to be built from larger computers (four or more processors). Typical HA clusters have only a handful of nodes. Ideally, there is enough excess capacity on the nodes in a cluster to absorb the workload from one failed node. Thus, in a two-node cluster, each node should normally run at 50% capacity so that there is headroom to absorb the load of the other node. In a four-node cluster, each node could run at 75% capacity and there would be sufficient excess capacity to absorb the workload of a failed node. Thus, it is more efficient to have larger node counts. However, the efficiency of large node centers comes at the cost of additional complexity and administrative overhead.

For HA clusters, all nodes need to have access to the data being used by the application, which is normally a database. Use of Fibre Channel adapters and switches is usually necessary to connect more than a few nodes to a common disk storage array. This can often be the limiting factor on the number of nodes in an HA cluster.

Within the Linux community is an active group focused on high-availability clusters (see http://linux-ha.org). This site provides details on the design and configuration of Linux high-availability clusters and provides links to papers that describe various implementations and deployments of Linux HA clusters.

Clusters are a way of consolidating resources. In the following section, see what consolidation means in terms of the mainframe.

    team bbl



    Performance Tuning for Linux Servers
    Performance Tuning for Linux Servers
    ISBN: 0137136285
    EAN: 2147483647
    Year: 2006
    Pages: 254

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net