Why Clustering? | Professional JMS

There are three main reasons for using a cluster instead of monolithic server. Let's look at each of them in detail.

Increased Capacity

Increasing server capacity through clustering is commonly referred to as load balancing. By spreading the load over multiple machines, a cluster should be able to accommodate some combination of more messages per second, more bytes per second, more persistent storage, more non-persistent storage, more simultaneous connections, and more active subscriptions. Not all clustering solutions balance all of these indeed providing more persistent storage is usually a direct tradeoff against another important benefit of clustering, redundant storage (see "High Availability" below). If a JMS message server must accommodate huge numbers of clients, or clients that produce huge message volume, any of the above mentioned parameters could become a bottleneck.

The balancing part of load balancing is also critical. After all, it does no good to provide extra machines if there is no mechanism to guarantee that the load is distributed fairly. If all machines have the same capabilities, then "fairly" means "evenly", but the machines in the cluster are dissimilar, then it takes a more sophisticated scheme to assure fairness.

Load balancing is usually done at the connection level, as each client is typically connected to one and only one node of the cluster. Other schemes are possible, but since the number of active connections is often one of the first limits reached, we'll omit any approaches that require multiple connections per client. A round-robin load-balancing scheme just assigns each new connection to the next node in list. More sophisticated schemes monitor the load of each node and assign new connections to those with the least load. Static load balancing schemes apply the balancing criteria only once per new connection. Dynamic load balancing schemes reevaluate the balancing criteria for existing connections and will redistribute live connections in order to better utilize lightly loaded machines.

Clusters that intend to provide increased storage capacity must balance message storage. This may be done with a simple approach. For example, each producer's messages are stored only on the machine to which that publisher is connected. This will go a long way toward spreading out the message storage over the cluster, but some publishers may produce much more volume than others, so an even better storage balancing scheme would be independent of the connection-balancing scheme.

It is important to note that clustered JMS servers that use a separate database server for persistent storage (via JDBC) will not provide increased storage capacity just because the number of message server (hardware) nodes increases. If all of those nodes access the same database server, then the total storage capacity of the cluster is the same no matter how many message server nodes there are. In this case clustering could only increase storage capacity if additional database servers are added along with each additional message server node, or if the database itself is clustered independently of the JMS server. All of the configurations described here are described in more detail later in the chapter.

High Availability

High availability is an important theme for message servers. One of the reasons for using inter-application messaging is to obviate the need for message producers to be concerned with whether the corresponding message consumers are available at a particular time. This is only possible when the message server is available at all times. This leads to the need to use clustering to guarantee continuous server availability even in cases where a monolithic server could satisfy the capacity requirements.

The terms high availability and fault tolerance are commonly used to mean the same thing. It is more precise, though, to use these to refer to two separate aspects of reliability:

Important

A system is fault tolerant if unexpected failures never lead to lost or corrupt data, but this says nothing about the frequency of failures or the amount of time required to recover. High availability means that there is a guaranteed maximum amount of downtime that will not be exceeded.

Thus, high availability is not an absolute service level, but a spectrum of service levels. A system's availability is usually specified by the percent uptime it guarantees. 99.0% uptime is generally one of the least demanding values that one would see specified in practice, with each increasing level adding another 9 to the end of the decimal. This is illustrated in the following table:

Guaranteed Uptime	Average Downtime	Typical Implications
95%	8 hours/week	System may be offline 1 hour each night for backup.
		No need to respond to failures until next business day.
99%	1.5 hours/week	System may be offline 1 hour each week for backup or maintenance. Need to respond to failures within 1 hour (technician must be on call 24 hours).
99.9%	40 mins/month	Backups are done while online. Technician/Sys Admin required on premises 24 hours x 7 days.
99.99%	4 mins/month	All persistent storage on RAID devices. Cold standby machine can be connected to RAID and booted within minutes.
99.999%	5 mins/year	Full hardware redundancy with automatic fail-over. Hot standby has OS booted and backup application running at all times. All data replicated to hot standby machine(s) in real time.

An implication of the table above is that the cost of maintaining a highly available system increases significantly as the guaranteed uptime increases. The actual costs depend on very many site-specific parameters, but one generalization is easy to make: the higher levels of availability are more cost effective for large installations than for small ones. Economies of scale come into play here.

The primary factor is the costs associated with having a system administrator on site nights and weekends. An installation of 100 machines may only require one person on site during off hours, but since we cannot divide people into parts, an installation of two machines requires the same amount of off-hours staffing. For large companies that have their own data centers, the cost effectiveness issue is not a big deal. Small companies that require very high availability, however, must often have their systems located at an application service provider (ASP) where such costs can be spread over the many other customers of the ASP.

Geographical Distribution

When I think of clustering, my first thought is generally of several computers sitting in the same rack, connected with a high speed LAN, and providing high availability and load balancing. Another variation, which fits the definition of a cluster above, is the case where nodes are located in different geographical regions and are connected by a WAN. This configuration can provide better service to local clients in each region than could be provided with a centralized cluster. It can provide higher availability because the local clients will still be able to connect to the local server, even when the WAN connection is broken. It can provide better performance because it will generally be optimized to transfer the least possible amount of data over WAN connections. It can also potentially protect your messaging system from the effects of regional power outages or natural disasters, which could cause the simultaneous failure of all of the nodes in a given region.

In theory, you could take a cluster designed to provide high availability and load balancing over a LAN and just move the nodes to different cities. In practice this would not be a good idea. WAN connections are usually slower and more costly that LAN connections, so you want to be smart about sending messages between servers. In particular, you want to avoid sending messages to a remote server, unless you are sure that they will be consumed there. For this reason I prefer to refer to this case as message routing. The term clustering may or may not include the aspect of message routing depending on the context in which it is used.

In Point-to-Point Messaging

In the case of Point-to-Point (PTP) messaging, described in Chapter 4, each message will be consumed by at most one receiver. If multiple copies of a queue message are sent to different local servers, it becomes very difficult to ensure that only one of these copies will be sent to a receiver. The message routing logic will usually not send a queue message to a remote server unless it is certain that it will be consumed there. When possible, the logic will try to favor a receiver connected to the same node to which the message sender is connected, in order to be sure that the message does not travel over a WAN unnecessarily.

In Publish/Subscribe Messaging

For Publish/Subscribe (Pub/Sub) messaging, described in Chapter 5, the message routing logic will be a bit different. There is always some efficiency to gain because at most one copy of each message must be moved to each local server, no matter how many subscribers are actually connected. It may be desirable to unconditionally send every message to every remote server, so that any subscriber has instant access to it as soon as it connects. It may be more efficient to send a message to a remote server only if you know in advance that there is a subscriber connected, or a durable subscriber registered, for the topic to which the message was published.

To some extent, a message routing configuration could provide load balancing and high availability as described above. This is not always a good idea though. Load balancing would mean that if a server were to be too heavily loaded to accept an additional local client, then that client could connect, via a WAN, to a remote server. This would work but would not be particularly efficient. A client might also connect directly to a remote server if the local server crashes. In this case it is difficult to guarantee that all of the messages that would have been available on the local server are available on the remote one. The message routing logic may just not have sent all the messages to the remote server.

In contrast, if the message routing logic tried to insure full redundancy by guaranteeing that every message were copied to every machine before the message was considered to have been sent, the result would be a distributed server that is very slow. The point here is that if a wide area message routing cluster needs load balancing and high availability, then each of the local services should be implemented as a highly available, load-balanced cluster.

What Clustering Cannot Do

In addition to mentioning the reasons for using clustering, it is important to be very clear about the following aspects of performance and reliability that cannot be improved through clustering.

Latency

You might notice that I consistently refer to messaging performance in this chapter in terms of throughput. This is intended to place emphasis on the most important performance scenario: large numbers of messages traveling from many producers to many consumers. Latency, in contrast, can be best described as the minimum time it takes for one message to travel from producer to consumer in a system that otherwise has no load. Clustering cannot decrease latency. In fact clustering will usually increase latency since messages must typically visit several nodes in the process of being delivered. The performance benefits of clustering should result in latencies that do not increase significantly when message throughput becomes very high.

As long as the latency of a message system is reasonable, there is seldom much benefit to be had by trying to decrease it. A securities trading application would usually require latencies to be much less than a second, but many messaging applications could actually tolerate quite high latencies. The important point to remember is that a latency of 1 second does not correspond to a throughput of one message per second; this is because the system does not need to wait for one message to be consumed before the next message may be produced. When writing benchmark programs to measure the performance of a messaging system, it is important to design the benchmark to measure throughput and not latency, as measuring latency will not be a good indicator of how the system performs under heavy load.

Systematic Failures

A cluster is often used to increase the reliability of a system. This reliability come in the form of redundant nodes, so that when one node fails, another node can take over in its place. This provides protection from errors such as hardware failure that tend to occur at random times and are completely independent from one node to another. Systematic failures, on the other hand, are deterministic and reproducible. They are typically the result of programming errors or the failure to safeguard against unexpected occurrences. Clustering cannot provide protection against this type of failure.

Consider what happens if a node fails because it runs out of memory. A backup node could take over, but if the backup node has been mirroring the actions of the primary node that it replaces (and we assume it has the same amount of memory), then we can count on it running out of memory right away also. Sure, there should be some means in place (other than clustering) to prevent the memory overflow. The point here is that clustering cannot compensate for programming deficiencies.

Byzantine Failures

Byzantine failures result when some process on the network runs amok and starts sending corrupt data throughout the system, or worse yet, when some agent maliciously tries to disrupt your system. The rule for systematic failure above holds here also: if one node in a cluster is susceptible to this type of failure, it is likely that all the nodes are, so the redundancy in the cluster will not provide much protection against this.