6.6 Advantages of Asynchronous Drawbridges

Asynchronous drawbridges have four important benefits over synchronous drawbridges:

They have nonblocking workflow.
They give pseudoreliability .
They provide workload averaging.
They can be used to implement a poor-man's cluster.

I'll describe each of these in Sections 6.6.1 through 6.6.4.

6.6.1 Nonblocking Workflow

The nonblocking characteristic of asynchronous drawbridges is probably the most familiar. Once an infogram has been placed in the asynchronous drawbridge, the donor fortress can go on its merry way. No waiting. If it takes two hours for the recipient fortress to get around to receiving the infogram and another two hours for the work to complete, it shouldn't matter. If it does matter, then you probably should have been using a synchronous drawbridge.

6.6.2 Pseudoreliability

Wouldn't it be great if you could take off in the middle of the day whenever you wanted? The problem, of course, is that people expect you to be working in the middle of the day. But suppose you had a cardboard cutout of yourself that you could leave at your desk, so that everybody would assume you were working hard while, in reality, you were floating on a river in your kayak.

That is essentially what an asynchronous drawbridge does for a recipient fortress. The fortress can take off whenever it feels like it. Nobody can really tell that it's gone. As long as the fortress comes back online often enough to process its infograms in a reasonable time period, nobody knows if the fortress is at its desk or floating in the middle of a lake. The result is that even the most shiftless, irresponsible fortress in the world can be made to look like a fine, upstanding citizen. All that's required is an asynchronous front end!

6.6.3 Workload Averaging

If the whole world were composed of nothing but software fortresses , life would be grand and predictable. Unfortunately, it isn't. The world also has people. People, as any software architect knows, are the big problem.

One of the problems with people is that they clump together, like lemmings. They wake up at the same time. They go to work at the same time. They take their lunch breaks together. If they are women, they even go to the bathroom at the same time!

The result of all this clumping is huge variances in workload over time. At some times everybody in the world will seem to be beating on your poor, defenseless fortress. At other times your fortress will wonder if it is the last living being on Earth.

For most fortresses, especially those involved with the Internet, the difference between average workflow and peak workflow is large. I frequently find a tenfold difference between the two, meaning that if a fortress is asked to process, on average, 10,000 requests per minute, at peak it will be asked to process 100,000 requests per minute.

This huge difference between average and peak request flow can seriously affect the overall cost of a system. If your fortress is going to receive 100,000 requests per minute, you must make sure it can process 100,000 requests per minute. How will you do this? If these requests come over a synchronous drawbridge, you have only one choice. You must build your fortress big ”really big.

The way you build a fortress really big is simple: You spend money, lots of money. In the best case, hardware costs tend to be linear with workload. Suppose that the hardware necessary to process 10,000 requests per minute costs 1 million dollars. You can assume, then, that the hardware needed to process 100,000 requests per minute will cost 10 million dollars.

Nobody minds spending a lot of money on hardware if that hardware is being used effectively. After all, when you process requests, you are presumably making money. What is annoying is spending a lot of money on hardware that then sits around doing nothing for much of the day.

This is exactly what your hardware will be doing if it is supporting a fortress accessed through synchronous drawbridges. On average, it will be processing the average workload (that's why they call it average !). On average, your very expensive hardware system will be utilized at only 10 percent capacity ”probably even less, since you need to allow some buffer to handle peak peaks. This means that if you're using synchronous drawbridges, on average you will be wasting at least 9 out of every 10 hardware dollars you spend.

Building an asynchronous drawbridge to your fortress is far more cost-effective . Now you just need to buy enough hardware to process your average request flow. When the peak load hits, you just put off some of the processing until you reach the next lull. Sooner or later you will get it all done.

The result is a big difference in hardware costs between fortresses accessed with synchronous versus asynchronous drawbridges. Fortresses accessed with synchronous drawbridges require expensive hardware platforms that are used lightly most of the time. Fortresses accessed with asynchronous drawbridges can be built on cheap hardware platforms pushed to their limits.

Expensive hardware or cheap hardware . . . guess which one the bean counters are going to prefer!

6.6.4 Poor-Man's Clustering

When you build a large commerce system, you hope more and more people will use your system because this is where your profit comes from. But success also means your system must be able to scale gracefully to handle the increased workload. How will it do this?

There are generally three approaches to planning for increases in work load. First, you can build your system big enough from the beginning to handle the maximum workload it will ever face. I call this approach build-big . Second, you can plan on replacing your system with a bigger system once you exceed the capacity of your existing system. Most people call this approach scale-up . Third, you can build your system from the beginning using a clustered architecture and then plan on adding more systems to the cluster as your appetite for processing power increases. This last approach is known as scale-out .

Let me digress for a moment to give you a quick introduction to clusters. A cluster architecture is one in which workload can be evenly distributed among a collection, or cluster , of machines. If one machine goes down, the other machines can take over the remaining workload. If workload exceeds the capacity of the cluster, new machines can be added to the cluster without existing machines having to be taken offline.

There is a clear preference in how to choose among build-big, scale-up, and scale-out.

Scale-out is always the first choice. This approach allows you to add processing power to your system with little or no disruption to existing workflow. It allows you to use much cheaper hardware. And it dramatically increases your overall system reliability by providing built-in redundancy.

When you can't scale out, the second choice is scale-up. This approach still allows you to build small and handle increased workload if it actually materializes.

Your last choice is to build big from the beginning. The build-big approach forces you to try to predict your maximum system workload before you have processed your first work request, and it requires you to spend a lot of money on hardware at a time when you neither need nor can afford it.

In reality, often you will be using all three approaches: scale-up, scale-out, and build-big.

The database hosting machine is most compatible with build-big. The existing cluster algorithms for databases are not very good, so scale-out is out. Taking a database offline for the many days that will probably be required to migrate your database to new hardware is not very appealing, so scale-up is out. The only option you are left with is build-big. So if you have a fortress with high demands from its data strongbox, such as some business application fortresses, spend your extra money on the machine hosting the database. Mitigate the situation by limiting the build-big architecture to one machine and one machine only: the one housing the data strongbox.

Scaling out a legacy fortress is difficult. A scale-out architecture is easiest when the fortress is built from the beginning with a cluster architecture. So legacy fortresses typically scale up , not out.

New fortresses should be built from the beginning to scale out. The fortress architecture is especially amenable to a cluster architecture, so take advantage of it!

One way you can build a fortress to scale out is to plan on using asynchronous drawbridges (whew, finally, back to the topic of this chapter!). Because asynchronous drawbridges are typically built through the use of message queues, you can take advantage of the fact that multiple processes in multiple machines can all be pulling messages off of the same named message queue.

One simple architectural approach to clustering a business application fortress via asynchronous drawbridges is illustrated in Figure 6.5. Here we have a guard process and a business application component process, both running on the same machine. When the guard receives an infogram from the drawbridge, it invokes a remote method invocation on the business application component that is running on the guard's machine (albeit in a different process). This basic machine configuration is then duplicated as necessary to form the cluster. The clustered machines share a single data strongbox implemented as a shared database on a dedicated machine.

Figure 6.5. Clustering through Asynchronous Drawbridges

Using asynchronous drawbridges to implement a clustered architecture is not a complete cluster solution. The message queues that are the basis for the asynchronous drawbridges do not support many of the administrative tools one needs to manage a cluster environment, which is why I call clusters implemented with asynchronous drawbridges poor-man's clusters . Nevertheless, in many fortress environments, asynchronous drawbridges will provide all the clustering you need.