Clustering is the act of running multiple instances of a component or application that transparently work together to provide services. Clustering is an enterprise-wide phenomenon, not one limited to the Java world. When developing J2EE applications, for example, vendors provide the capability to cluster the application servers so that services such as EJB, JNDI, and Web Components can be made highly available. Then when a client or customer requests these services, they will be there.
This is exactly the same behavior that some users require of their Quartz applications. Users want to build and set up Quartz applications so that when a job absolutely needs to be executed, it gets executed. As the popularity of your Quartz application grows and an increasing demand is placed on it, a cluster of Quartz applications will provide better peace of mind that you'll be able to handle that demand and ensure that all goes as planned. And you get all of this with very little effort to set up and maintain.
The Benefits to Clustering Quartz Applications
Clustering Quartz applications provides two key benefits over nonclustered environments:
High Availability
A highly available application is one that can service clients for a high percentage of the time. In some cases, this might mean 24 hours a day, 7 days a week. For other applications, it might just mean "most of the time." Availability is usually expressed as a percentage between 0 and 100 percent. An application might fail often but still achieve high availability. On the other hand, an application might go down once but then might stay down for a long time, for low availability. What counts is not how many times the application goes down, but the total amount of downtime. Obviously, as developers, we hope our applications never fail. But this does happen, and you must be prepared for it.
The level of availability for hardware and software is sometimes referred to as levels of nine. Levels of nine indicate the number of nine digits in the percentage of availability. For example, 99.999 is said to have five levels of nine because there are five digits. Table 11.1 shows the approximate percentage of downtime for a particular level.
Availability |
Approximate Hours of Downtime Per Year |
---|---|
99% |
87.6 hours |
99.9% |
8.8 hours |
99.99% |
.9 hours |
99.999% |
0.09 (about 5 minutes) |
Looking at Table 11.1, you might come to the conclusion that four levels of nine (about an hour of downtime per year) is an awesome amount of availabilityand, in general, that's true. However, if the application was a Quartz application designed to send out invoices and it was down for five minutes every day for 12 straight days when the invoices were supposed to go out, the business would lose a lot of revenue, and you would probably be looking for a new job (and I'm not talking about the Quartz kind of job, either). It's not just about the amount of downtimeit's also when that downtime strikes.
Part of what makes high availability possible is the concept of failover. Failover ensures that even if a system failure occurs, other redundant components or services can handle the requests and insulate the clients (or jobs) from the failures. The capability to fail over from a failed component or service to another functioning one increases the availability of the application. The switch or failover should be transparent.
Scalability
Scalability means having the capability to dynamically add new resources such as hardware to the application environment, to increase the capacity of the application. In a scalable application, achieving this increase in capacity does not involve changing code or the design.
Achieving scalability is not done with magic. An application must be designed properly from the beginning; supporting extra capacity usually takes administrative effort in adding the new hardware (such as memory) or starting more instances of the application.
Load Balancing
As part of achieving good scalability, the capability to distribute work across the nodes in the cluster is very important. Spreading out work ensures that each node in the cluster is footing its share of the workload. Imagine if all the work was being given to one node in the cluster while the other nodes remained idle. If this pattern continued, eventually the overworked node would not be able to handle the increased work, and this would result in a failure.
In the best scenario, work is spread evenly across all instances in the cluster. Several different algorithms can be used to distribute the work, including random, round-robin, and weighted round-robin, just to name a few.
Currently, Quartz provides a minimal load-balancing capability using a random algorithm. Each Scheduler instance in the cluster attempts to fire scheduled triggers as fast as the Scheduler permits. The Scheduler instances compete (using database locks) for the right to execute a job by firing its trigger. When a trigger for a job has been fired, no other Scheduler will attempt to fire that particular trigger until the next scheduled firing time. This mechanism works better than you might infer from its simplicity. This is because the Scheduler that is "most busy" will be the one least likely to find the next job to fire. Hence, it's possible to achieve something near to a true balancing of the load.
How Clustering Works in Quartz |
Scheduling in the Enterprise
Getting Started with Quartz
Hello, Quartz
Scheduling Jobs
Cron Triggers and More
JobStores and Persistence
Implementing Quartz Listeners
Using Quartz Plug-Ins
Using Quartz Remotely
Using Quartz with J2EE
Clustering Quartz
Quartz Cookbook
Quartz and Web Applications
Using Quartz with Workflow
Appendix A. Quartz Configuration Reference