11.2 Leveraging Clusters | Software Fortresses: Modeling Enterprise Architectures

In Chapter 9 (General Fortress Issues) I discussed the overall best approach to both reliability and scalability. The basic lesson was to use clusters to scale out everything except the data strongbox, to scale up the data strongbox, and to design for reliability around the clusters. The design of component-level methods will go a long way to allowing your business application fortress to leverage clusters.

The golden rule (the method is the transaction) tells us that the method will be the unit of error control. In other words, if anything in the method fails to commit, everything in the method will fail to commit. If you call a method and it fails, it is safe to invoke it again. This principle assumes that all of the method's workload updates either transactional resources (resources that will ultimately be tied together into a temporary transactional partnership) or ephemeral resources (resources that are about to evaporate anyway and nobody cares much about).

The idea that it is always safe to reinvoke a failed method is not the same as idempotency, in the sense that I discussed it in Chapter 10 (Internet Fortresses). Idempotent means that it is always safe to reinvoke even a successful method. Here we have a much weaker notion of reinvoking. It is safe to reinvoke only a failed method. One could also design business application component methods to be idempotent, but the cost/benefit ratio of an idempotent intrafortress request to a business application component would probably not be favorable. Idempotency across the Internet is an entirely different matter.

Methods that follow the golden rule and the rule for transactional integrity are naturally cluster friendly. I already discussed the use of clustering indirectly in Chapter 6 (Asynchronous Drawbridges)that is, clustering through use of the drawbridge as a cluster controller. Another possible approach is to use the guard as a cluster controller, as shown in Figure 11.5.

Figure 11.5. The Guard as Cluster Controller

There are two ways the guard can serve as cluster controller. The first is through technology that works in conjunction with the COMWare system to automatically provide clustering on your behalf . An example of such a technology is Microsoft's Application Center Server component load-balancing capability. The other way is through writing your own clustering algorithms, which is easier than it sounds. The advantage of the first approach is that you get a rich set of cluster management tools along with the necessary algorithms. The advantage of the second approach is that you get algorithms tailored to your specific needs.

Let me take you through the general algorithm used by the guard. Everything here assumes that you have followed both the golden rule and the rule for transactional integrity. The algorithm is independent of the type of drawbridge associated with the guard. This description of the algorithm assumes that the method being called by the guard is the method that defines the transaction boundary, as discussed earlier in this chapter. In this discussion I'll assume the configuration shown in Figure 11.5:

The guard receives the infogram .
The guard chooses a cluster member.
The guard makes a component method invocation on the remote process on that machine.
If the invocation is successful (including the implicit transaction), then the guard is happy and everything is done.
If the invocation is not successful, then the guard chooses another machine and reinvokes the method on the remote process on that machine.

Various optimizations are possible. Rather than just choose a random machine to receive the method invocation, the guard can constantly poll the different machines on the cluster to see who is most available. The guard can note that the unsuccessful machine is down and request that corrective action be taken. The important point here is that proper management of transaction boundaries and state yields a close synergy of methods, transactions, reliability, and scalability in the business application fortress.

One final issue is worth discussing before I leave clusters for good. This is the age-old question of how much business logic to put in the components and how much to put in the database. All of the major database vendors support the equivalent of stored procedures. A stored procedure is a way of putting some or all of your business logic in the database. Stored procedures allow you to build components that are a mere veneer on top of the database. According to the database vendors, this is the preferred architecture for enterprise systems.

The main problem with this architectural approach is that it is cluster unfriendly. As I discussed in Chapter 9 (General Fortress Issues), databases do not scale out well. Currently, the only algorithms we have for scaling out databases use partitioned databases. Partitioned databases are databases with tables that span machines. Any partitioned database must be very carefully designed for partitioning. Even slight errors in partitioning parameters can have huge implications for how the databases will scale. It is also very difficult to reconfigure a bad configuration later.

Because each of the multiple machines housing the partitioned database contains a different subset of the data, they are not a true cluster. A true cluster is made up of identically configured and loaded machines. Because they are not a true cluster, the machines housing a partitioned database offer absolutely nothing in the way of reliability. If one machine does go down, it can't call on its siblings for backup.

Even the scalability of partitioned databases is questionable. One of the reasons a cluster is so easy to scale out is that more machines can easily be added at the drop of a hat. Because every machine is identically configured and loaded, the overall system doesn't care if there are four or ten machines; it can dynamically adjust its workload management algorithms as needed. This is not the case with partitioned databases. If you need to spread the database over a larger group of machines, you have a major, major reconfiguration project on your hands.

Given the difficulties in using and administering partitioned databases, I recommend that they be avoided wherever possible. They do not have any of the qualities that make cluster architectures attractive.

Some people might be confused by my statements that database machines cannot use clusters effectively. There is another use of the term clusters that database machines do support. This is the idea that two (or more) machines are tightly coupled , with one (or more) machines acting as a redundant machine for the other. I refer to this type of cluster as a tightly coupled cluster , as opposed to the loosely coupled cluster that I was discussing earlier.

Tightly coupled clusters have few of the advantages of loosely coupled clusters. In fact, the only advantage of tightly coupled clusters is that they improve overall reliability of the database. They do so, however, at a much higher cost than loosely coupled clusters do. They also do so without any of the benefits of scale-out workload management.

This does not mean you don't want to use tightly coupled clusters. In fact, if you want a highly reliable database, you have no choice but to use them. These are the only types of clusters that databases support and the only way to achieve high reliability in a database. The point here is that you generally want to base as much of your overall workload as possible on loosely coupled clusters rather than tightly coupled clusters. You do that by moving as much of your workload as possible from the database machines, which support only tightly coupled clusters, to the component machines, which support loosely coupled clusters quite nicely .

What does this discussion have to do with using stored procedures? Well, if you can't scale out the database, your only other alternative is to scale it up . Remember, from Chapter 9 (General Fortress Issues), scaling up means replacing your small, cheap machine with a big, expensive machine. Replacing machines, especially for databases which, by their very nature, contain huge collections of data, is a time-consuming and difficult process. You want to avoid having to go through this exercise as much as possible.

How do you avoid the time-consuming machine switches required for scaling up your database? Simple. You keep the load on the database machines as light as possible. One way you keep the load light is by avoiding having those machines do anything they don't absolutely need to do. And one thing they don't absolutely have to do is execute stored procedures. By moving the business logic into components, you allow the databases to do the one thing they do well: store data.

Of course, you will still have to scale your business logic, but once you have organized your business logic into well-designed components with proper transaction boundaries and well-managed state, you have a good scale-out architecture for at least that part of the fortress. You will still need to scale up your database, but now that the database is nothing but a data storage engine, its scalability requirements have been dramatically reduced.

One cautionary note here: I am not against any use of stored procedures. Occasionally you will run into situations in which the business logic needs to chug through large volumes of data. Then the alternative to running the business logic in the database is transferring all of that data to a component machine. If data transfer exceeds the cost of running the business logic, then for that specific bit of business logic you may actually decrease the overall load on the database machine by running the logic as stored procedures. The decision as to whether or not to use stored procedures should be based on minimizing overall database load. If you can move any of the workload off the database machine, do so.