13.1 Availability policy | Linux on the Mainframe

Most organizations have a service level agreement (SLA) with their end users, often in the form of an explicit contract. The intent of an SLA is to define what system resources will be provided, on what schedules and, possibly, even at what level of performance. For example, StoreCompany has the following SLA for its internal mail servers:

StoreCompany SLA

The mail server will be available from 7 a. m. to midnight, and no outage should last longer than 10 minutes.

These service level agreements become the availability policy that drives the availability management decisions for the various services.

However, given that availability is now usually defined from the end users' perspectives, there are some very interesting secondary effects that can aid in a system design based on Linux, z/VM, and the mainframe. Here are some examples that show the valuable benefits that both end users and the IT shop can have from properly crafted policy statements.

Based on the simple SLA for StoreCompany's mail server, even if that server is down between 3 a. m. and 5 a. m., service could still be considered 100% available because the scope of the availability is 7 a. m. to midnight. The system administrator can use the non-production time for change activities such as applying preventive maintenance to forestall problems. Not needing to apply changes during the defined system available times permits a less complicated system design, which in turn tends to improve the overall system reliability. Currently, most servers do not need to provide their service 100% of the time in order to meet the requirements of end users.

How often have you entered a URL, seen some activity occurring at the bottom of the Web browser, and then the activity seemed to stall? Most of us just simply try it again. We do not say that the server is unavailable if we manage to connect after a few tries. In truth, during that brief period the Web server might have been down. Then the proxy server, noticing that one server was down, routed all new requests (for example, our second attempt) to other servers that were still functioning properly. Because the user does not require the service from a particular server, the IT shop can use this less expensive design involving the proxy server to meet availability (see also 11.4, "High availability for the ISPCompany example").

Managing availability is a juggling act of providing service to end users in a way that they are satisfied, based on the SLA. Providing higher levels of availability than what is required may, in fact, be a waste of valuable corporate resources.