13.3 Automation

Availability management aims at responding to anomalous events fast enough that the damage is contained and repaired before the SLA criteria are missed. Typically, humans take at least a few seconds to observe, analyze, and react to a situation.[24] With the speed of modern processors, a lot of damage can occur in a few seconds.

[24] As today's systems continue to improve, these anomalous events happen less and less frequently a good thing. As a result, we tend to forget what the proper procedure is to respond to the event a bad thing. Typically, there is a written procedure on how to recover. But a site is almost guaranteed to have a lengthy outage if the operator first has to read up on what to do.

Automation refers to a program suite that performs the "observe, analyze, and react" processes at machine speeds. At the heart of the automation tool is the automation engine that gathers events, correlates information from various sources, and manages the system state changes with scripts. The key value of an automation tool is the speed at which a new system state (for example, the database manager is down) is recognized and at which it causes the predefined transition policies[25] to be executed. At the time of the writing of this book, automation tools cannot provide recovery itself. Instead, they provide a framework where you can plug in your recovery (state change) processes, typically in the form of a program script.

[25] Policy is one of those overused words today. So far in the book we have used it exclusively to refer to a set of corporate directives that guide how IT is practiced in the company. Here we are using policy to mean one or more scripts that implement a set of procedures that respond to an event. The concept here is that someone has thought through what procedures need to be followed when some event occurs. Loosely speaking, a policy on how to respond was developed.

Linux on the mainframe has an ideal environment for managing availability because a number of automation tools exist. For example:

  • In z/VM, both CP and CMS have a broad range of commands to control one, all, or a subset of the guests.

  • REXX is a very powerful scripting language that runs in the CP and CMS environments.

  • A programmed operator interface (called PROP in z/VM) to each guest's console facilitates direct (not over a LAN interface) interaction by REXX with a guest.

A number of automation tools are also available from system management vendors. See Chapter 25, "Systems Management Tools," for some other options.



Linux on the Mainframe
Linux on the Mainframe
ISBN: 0131014153
EAN: 2147483647
Year: 2005
Pages: 199

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net