Flylib.com

Books Software

 
 
 

Practical Service Level Management: Delivering High-Quality Web-Based Services - page 48


Summary

Real-time operations comprise a set of key functions that must operate within tight time constraints. Information flows into the real-time operations system from the instrumentation manager (the source of alert data) and from the SLA statistics modules, which provide time-sliced measurements of performance. The real-time operations system then processes the inputs in an attempt to improve MTBFpossibly by using proactive techniques to predict possible failures. At the same time, it tries to assist the operations staff in decreasing the MTTR when a failure actually occurs.

Reactive management, used to decrease MTTR, is based on the use of triage and root-cause analysis. Triage tries to identify the responsible organization very quickly, in the hope that they will be able to use their specialized tools and knowledge to fix the situation. Root-cause analysis is a more detailed, technically intense process that tries to assist in the detailed diagnosis of the situation.

Root-cause analysis uses sophisticated methods of filtering and correlating input data, possibly combined with a model of the system being managed, to make reasonable suggestions about the cause of a performance problem.

Active responses can then be used to handle routine problems or even predicted problems so that system operators can concentrate on more complex issues.



Chapter 7. Policy-Based Management

Managing services in compliance with a Service Level Agreement (SLA) places more demands on the management system and the staff. Stiffer penalties for noncompliance increase the pressures to respond quickly and accurately even while the environment grows more dynamic and complex. Often, more sophisticated automation than that described in Chapter 6, "Real-Time Operations," is needed to relieve and supplement overworked staff members . Toward that end, this chapter covers the following:

  • Policy-based management

  • The need for policies

  • The policy architecture

  • Policy design

  • Examples of products



Policy-Based Management

Automation is a key attribute of an effective Service Level Management (SLM) system. Stringent SLA compliance criteria reduce the time cushion that administrators might have had. One of the compliance criteria mentioned in Chapter 2, "Service Level Management," is a demand for higher availability. If management staff members are left to deal with high rates of change and growing complexity, the resolution times are unacceptable. Automated management tasks are the only way to add speed and to deal with complexity.

Note, however, that automated tasks are also of concern to administrators because they are taking actions and making changes at a faster rate than humans can maintain. A policy-based management system is an attempt to leverage automation while constraining actions.

Policies are sets of rules that define and constrain the actions the management system takes in different situations. Table 7-1 shows the various levels of rules that might be involved in a policy-based system. The rules are defined from the business level downward. Each rule level supports the goals of the levels above and depends on lower levels to achieve those goals.

Table 7-1. Multi-Level Rules

Level

Focus

Business rules

Business goals, such as protecting revenue

Service rules

Defining service quality metrics for end- user services

Infrastructure rules

Defining service quality metrics for infrastructure services

Element rules

Defining quality metrics for elements

Management system rules

For internal tasks, such as monitoring


As an example, consider that infrastructure rules might involve establishing special routes for low-latency network traffic or allocating more servers behind a load-balancing switch. Those infrastructure rules depend in turn on the proper element configurations.

Management system rules govern internal management processes, such as monitoring. Monitoring processes have targets, polling and heartbeat frequency, threshold values for alerts, and steps to take when there are failures in the instrumentation system.

Many policies are activated when a potential or actual service disruption is passed along by an alert from the real-time event manager. Other policies are activated in response to changes in the SLA statistics.

Policy-based management has a learning curve. Simple policies save staff time and effort and are usually implemented first. More sophisticated policies are implemented as the management team gains experience and learns how to extend policies more deeply into business processes and to more areas in the managed environment.

Policy-based management is the systematic creation of policies that drive the management system to maintain the highest service quality.