What is a Cluster? | Professional JMS

The introduction described clustering and the basic philosophy behind it. Before getting into the details, though, let's set the scene by stating an explicit definition of clustering, and examining the benefits that we can expect from it.

Definitions

Important

For the purposes of this chapter, clustering is the use of multiple physical computers to provide a single logical service with more capacity and reliability than would be possible with a single physical computer.

The concept of providing more of something is central to the definition. If you are not getting more of something you need, be it connections, storage, throughput, or reliability, then all you have done is spend a lot of money on extra hardware that brings no value-added benefit. This can be restated as the requirement that the cluster provide some degree of scalability, since scalability is the measure of how much additional service is provided by each additional machine.

The term scalable is often mistakenly interpreted as being synonymous with "high capacity." Actually, scalability together with efficiency determines total capacity. Efficiency refers to how much capacity you get for a given amount of resource, while scalability is the guarantee that you keep getting this efficiency as you add more resources.

Let's look at a concrete example using that most important of benchmarks, message throughput. Suppose two imaginary JMS providers can achieve the throughputs in the table below for clusters of two machines and clusters of ten machines. Provider A is more efficient because it has higher throughput for small clusters than Provider B. Provider B on the other hand is more scalable, because the throughput per machine does not decrease with a large cluster. Which is better? It depends on your needs. If you have a limited hardware budget, and 300 messages per second is sufficient throughput, then Provider A is a good deal. If you absolutely need 1000 messages per second, then Provider B is clearly the only one that can deliver:

Cluster Size	Throughput of JMS Provider A	Throughput of JMS Provider B
2	300 mg/s	200 mg/s
10	600 mg/s	1000mg/s

Most clusters belong to one of two general categories, which I call service clusters and parallel computation clusters. Although these both fit the above definition of cluster, in practice there is a very dramatic difference in what they are used for, where they are used, and their architecture. Rather than try to define these categories precisely, the following table contrasts their basic characteristics:

Cluster Type	Applications	Provides	Used by	Nodes	Examples
Service Cluster	Data storage and retrieval, data transmission (for example JMS), implementing business logic	Services to clients	Business, service providers	< 10	Oracle Parallel Server, most application servers, many message servers
Parallel Computation	Mathematical computation, modeling, simulation	Results of numerical calculations	Research institutions, military (modeling atomic decay), weather bureau, oceanographers	< 1000	Linux Beowulf, PVM

Important

Clustered JMS providers fall cleanly into the category of service cluster, and in the remainder of this chapter the word cluster will be used to refer to this type of cluster exclusively.

There are other ways to divide clusters into subcategories. One subcategorization that I feel is also important to point out is the one between application and system-level clustering:

In application-level clustering, clustering is an integral part of the application and does not make any assumptions about cluster support in the underlying operating system. The various application processes in the cluster communicate directly with each other via a network, or assume that they can share the same disk, or both. (Example: Oracle Parallel Server.)
In system-level clustering, the operating system, or a cluster enabling toolkit (which sits above the OS but below the application) provides generic functionality to support clustering. This is intended to make it easier to develop clustered servers, and even to adapt existing monolithic servers to act as a cluster. (Example: Microsoft Windows Cluster Server.)

I would tend to expect that pure Java JMS providers use application-level clustering, as this provides maximum portability. There is no requirement that the server part of a JMS provider be implemented in Java, and in the case that the server is tied to a specific platform, the provider might opt for system-level clustering.

Note

System-level clustering tends to use shared disk architectures. We will look at the pros and cons of this later in the chapter.

The following terms will be used to mean very specific things in the remainder of this chapter. For clarity, they are defined here:

Node

This is one element of a cluster. Typically this is used to refer to one machine in the cluster, but sometimes it is more precise to say that it refers to one process. With application-level clustering, the term node should refer to the process, as this is the basic unit of the cluster. It may be desirable to have multiple node processes on one machine, but these different nodes will usually interact in the same way whether they are on the same or different machines. In the case of system-level clustering (particularly at the OS level), one machine may host several different types of server, with redundant instances of those servers existing on other machines. In this case the term node would more often be used to refer to the machine, and not one of the processes. In the rest of this chapter, the term node will generally refer to a process node. When the distinction is important, I will use one the specific terms process node or hardware node.

Note

There are known to be cases when multiple Java virtual machines sharing the resources of one physical machine perform better than a single virtual machine that tries to use the full resources of the machine. Thus, a cluster that allows multiple process nodes to coexist on one physical machine can take advantage of this performance boost, in addition to the other advantages of clustering discussed in this chapter.

Monolithic server

This refers to a server that can only exist on one physical machine. That is to say that a monolithic server is a server that is not clustered.
Single logical server

I use this term to refer to a server that appears to the client as though it were monolithic server, even though it could actually be either a cluster or a monolithic server. In this case, the client is indifferent as to which type of server it is, as long as it acts like a monolithic server.
RAID (Redundant Array of Inexpensive Disks)

This is clustering for disks. It is a general technique for combining multiple physical disks to provide the outward appearance of one disk that has more capacity, performance, and reliability than the individual disks that compose it.
LAN (Local Area Network)

A data network that is confined to a small area, such as a single data center. This usually implies high bandwidth and "cheap" communication.
WAN (Wide Area Network)

A data network that spans a large geographic region, or even the whole world. When communicating over a WAN it is often not possible to assume large bandwidth between two arbitrary hosts. There may also be more "costs" (monetary or otherwise) incurred by communicating over a WAN.