Different factors are driving service providers and enterprises to consider implementing policy-based management systems. These factors include the following:
Table 7-2 illustrates the major differences between the service and element policy domains. For example, high levels of redundancy can absorb element failures without substantially degrading service quality. More sophisticated policies are needed for managing a complex and dynamic service environment.
The next two sections discuss management policies for elements and for entire services. Management Policies for ElementsEarly management policies were associated with managing elements within the various infrastructures. They were vendor-specific for the most part and dealt with relatively simple situations. For example, a network switch could have a policy that says, "If any port has a utilization level greater than this threshold, send an alert to the element manager." A more complex policy might add local actions, such as, "If the broadcast traffic on any port exceeds the threshold, disable the port and send an alert." Simple policies are not exclusively applied to network elements. A policy applied to servers, for example, could specify that if a process dies, the management system should send an alert and restart the process. If that fails, the management system should try three more times and then reboot the server while sending another alert. Management policies for elements are good for speeding up many management responses and for preventing staff mistakes. Management staff involvement is needed only if the policy actions fail to restore service levels. Most element management policies are configuration-centric because they define specific configuration information for each element to satisfy higher-level rules. Different vendors have their own unique ways of setting operational parameters, making this job even harder if staff members are forced to remember all the vendor-specific details. Some companies, such as MetaSolv Software, have created products that deal with products from a range of vendors. Policies offer large environments that have many devices, sites, and users a consistent way for handling element configuration. This approach scales gracefully as the environment grows. In addition, staff are freed from element-specific details and are involved only if a policy fails. While freeing administrators from a plethora of low-level decisions and reducing the likelihood of error is attractive, it is important to remember that the best results are obtained when policy management has unambiguous input. Elements that have very clear management instrumentation and a limited set of configuration options are the best candidates for applying automated policies. Conversely, elements such as high-end operating systems, application servers, and other parts of the service delivery architecture don't always expose their management information clearly. This situation makes automated decisions less clear-cut. The multiple layers of complexity inside some elements, such as servers, also make tuning them a challenge. The pressure is on policy designers to incorporate those subtleties to get the most from a policy-based approach. Service-Centric PoliciesThis policy category deals with service-quality issues rather than element behavior. Such policies are inherently more complex, and they can span several infrastructures. Most importantly, service-centric policies are targeted as much toward achieving business aims as maintaining technical performance. For example, policies are focused on minimizing penalties or treating the affected customers in various ways. Let's look at an example to clarify the differences between element- and service-centric policies. Consider a provider using a tiered server farm to speed transaction flows. The redundancy of the farm means that a single server failure does not immediately impact service availability, but it begins to expose the site to performance problems if the remaining servers are approaching their loading limits. This is an example of a service-centric policy, which focuses on maintaining adequate server capacity, rather than responding in detail to the failure of any server in the farm. The policy actions taken when a server fails can include the following:
Other information can be used to increase the intelligence of the response. For instance, there can be a check to see if there are imminent load changes. The alert could then provide more information, such as whether the remaining servers in the tier are operating under threshold now and whether the afternoon traffic surge is 30 minutes away. This gives the staff better information and indicates that attention is needed to avoid compounding the problems. |