Availability is an attribute of an entire system, and is always measured from the user's perspective. High availability is a complex term to define. Most people agree that high availability is an attribute of a system, whereby the system is available to its target users almost all the time. High availability starts with a minimum of 99.99% availability. That is, a particular server configuration is said to be "highly available" if the combination of both planned and unplanned outages in a year is less than one hour. All the pieces of a high-availability solution need to be simultaneously available, as shown in Figure 11-1.
Figure 11-1. A typical two-tier application solution
High availability does not imply that there are never any failures within the system. Instead, the design of a high-availability system accepts that failures occur and focuses the effort on providing continuing service in spite of the failure. So, besides a focus on preventing failures by having robust components, a design for high availability also requires that recovery from failures is so fast as to not allow the failure be visible to the user as an outage.
Building a high-availability application server is a complex process that starts with the definition of the level of service required. Typically, the analysis includes a list of the sources for potential solution failure, and evaluates cost/benefit trade-offs of various recovery designs for potential failures. The higher the availability required, the more it is likely to cost in terms of development, deployment and keeping the solution running. Starting with the right components improves the likelihood of success at an acceptable cost.
A system is only as good as its weakest part. Linux and the mainframe are only a strong foundation to aid in the construction of a high-availability solution. Much of the effort in building such a solution will go into the application software, the middleware, and the management tools that are parts of the system. Here we will focus on what Linux and the mainframe can contribute to an overall high-availability solution.
zSeries and z/VM, having evolved over decades, are both inherently reliable and come with a rich set of tools. Linux, despite being only about a decade old, has earned a reputation for availability and is vigorously growing its tool-set that aids the rest of the software stack in constructing the environment for high availability. The current "state of the art" with Linux is still a distance away from allowing the easy configuration of application servers that meet such a stringent requirement as less than one hour of downtime a year. Even with a less-than-complete Linux tool-set, the existing tool-set, together with the capabilities of zSeries and z/VM, allow you to create application servers where users will not perceive a service outage even if a processor, disk, or Linux image itself, were to fail.
The cost of configuring for high availability is directly related to which risks of failure you want your system to be able to survive. As you can well imagine, it is a lot harder to ensure availability across an earthquake, explosion, or other such major catastrophe, than across a hardware or application failure. Is it enough to have two independent connections to the Internet backbone? Our example ISPCompany did not think so. They installed four separate backbone connections from four different suppliers. They also saw two independent potentials for failure: the bankruptcy of a supplier, and a potential for off-site damage to the fiber connection. Furthermore, they needed to ensure that they had the required bandwidth in case of failure.
To provide a high-availability solution, your design team needs to clearly understand the risks-versus-cost trade-offs that are important to your business. An application solution design is complex, and highly dependent on the particular situation. This chapter focuses on unique opportunities that Linux on the mainframe can contribute to your high-availability design.