Architectural Trends for Web Management Systems | Practical Service Level Management: Delivering High-Quality Web-Based Services

Managing service flows in a webbed environment is a multi-faceted challenge, as you have seen throughout the preceding chapters. The market is cluttered with offerings associated with SLM and its variations, using phrases such as Application Performance Management, Service Assurance, Quality of Service, and Quality of Experience. There are traditional SNMP management platforms attempting to maintain their positions in a fluid management world and vying with purpose-built Java services platforms. In addition, innovative startups offer new solutions, while more mature management software vendors are acquiring new companies to broaden their portfolios. The multitude of forces creates some uncertainty and also opportunities to create new solutions.

Most environments have sets of management tools that address the specific needs of element and infrastructure management. Other tools are oriented toward SLM, but they also tend to be designed to operate in isolation. Integration remains the critical concern; various emerging strategies were discussed in the preceding section of this chapter. In all cases, sets of tools must be integrated to gain the process-level integration needed for effective real-time management processes.

The traditional management environment was platform-based. An enterprise or a service provider went through a lengthy evaluation, selected a platform, and spent a long time and a lot of money customizing and configuring it (and frequently had disappointing results after implementation). Many tools were selected for their ability to integrate with the selected platform instead of for their functionality. Other management tools were acquired for their functionality or to address a hole in platform coverage.

A different strategy is needed for today's SLM systems; the Web with its supporting services is a good model of a strategy that can work. Such a strategy uses the same architecture as many of the services it manages. It can scale using the network, and it evolves with new breakthroughs in Web service design and deployment.

The webbed services environment can be the basis for building effective SLM solutions. Many of the pieces are now available and the benefits of a new approach are compelling.

The Web is based on exploiting the power of loosely coupled systems that interact in many different ways to create a variety of services. That same loosely coupled approach can be applied to the architecture of a services management system and to the processes performed by that architecture; proposals for those designs are discussed in the following subsections.

Loosely Coupled Service-Management Systems Architecture

Management tools and tool clusters address fundamental functions, such as root-cause analysis, correlation, traffic adjustments, and content management. Tools are single-function modules, whereas a tool cluster is an integrated set of related tools from a single vendor or a set of closely collaborating vendors. A tool cluster can be multifunctional; for example, it might have its own discovery, event management, instrumentation, root-cause analysis, and reporting functions already integrated into a functional package. Tool clusters may simplify some integration tasks because the vendor has (ideally) integrated its own products beyond the superficial integration levels discussed earlier in this chapter.

A loosely coupled service-management system depends on sets of process managers, which coordinate sets of tools and tool clusters. These process managers and their underlying tools may be organized in a loosely coupled web of clustered processes that communicate by using signaling and messaging, as discussed in the following sections.

Process Managers

Process managers oversee a management process by ensuring that all lower-level tools and processes carry out their tasks successfully. They organize information and oversee portions of the managed environment. The process managers are higher-level tools or functions that coordinate tools and tool clusters; they also communicate with other process managers. The process manager organizes the collected information and determines if its task is complete; if it is, the process manager reports to a higher-level process manager. When the task is not complete, the process manager initiates further activities or reports a failure.

The process manager needs logic to analyze the incoming information and make the appropriate decisions. It takes different steps depending upon the analysis; for example, the process manager might request further detailed measurements, access other information sources, or use different tools as its analysis dictates. Process managers may also have correlation, policy, and presentation functions.

Correlation is important when determining a root cause or trying to understand the interactions among different parts of the managed environment. The ability to correlate across different infrastructures is a high-value capability.

The process manager might set or modify system policies while it collects information it needs; in addition, it may adjust other parts of the managed environment. Note that some process managers might focus primarily on overseeing policy-based operations.

Presentation is also a key function. A complex and dynamic environment is challenging to manage, and it is also a challenge to organize and present information that is useful. Useful information must be presented in a way that enables a human to gain an understanding of the situation quickly. This function must be very flexible because different people will respond to different types of presentation formats.

Clustering and the Webbed Architecture

A loosely coupled hierarchy enables administrators to add and change functional components without causing unnecessary readjustment of the rest of the management system. The flexibility of loosely coupled systems enables a range of functional management structures that suits each organization's needs.

Complexity and large quantities of data often slow down the analysis tools in service management systems. However, the webbed architecture offers easy scaling, performance tuning, high availability, and flexibility. New processors can be added as demands grow, and more speed can be gained through parallelism, which is the activating of all the tools simultaneously rather than serially. The environment can also adapt in the other direction as well; functions can be consolidated on the same hardware if demands shift. In addition, redundancy can be used to meet high availability goals.

A webbed environment means that resources (such as tools, tool clusters, and process managers) can be located anywhere, and they can be organized in a variety of ways. However, using multiple instances of tools, tool clusters, and process managers increases the burden of information management. Keeping information fresh at multiple locations can incur delays inherent in moving information across long distances. Trade-offs among simplicity, availability, and overhead must be evaluated as the service management system is constructed.

Integrating the Components with Signaling and Messaging

Data and event integration are needed to enable tools, tool clusters, and process managers to exchange information and to signal each other. Signaling controls the sequence of tool and tool cluster usage within a management process. Each tool signals its successor, thus ensuring that process steps are in the proper sequence.

Today's event managers can be extended into general signaling engines to integrate management system components. Events are generated and used to trigger other actions through the event manager itself. The event engine can be used by any component capable of generating the appropriate events to trigger further actions. Additional flexibility enables richer management processes by selecting tools depending on the outcome of current activities.

Instead of extending the signaling function of today's event managers, it is possible to use message queuing systems to integrate management system components. Message queuing software is offered by companies such as IBM (WebSphere MQ) and TIBCO ActiveEnterprise. These messaging products provide the means for different applications on different computer systems to exchange information in a controlled way. Such messaging platforms have been implemented as "backbones" in complex inter-application environments, such as brokerages and other financial services organizations.

The information can be exchanged in the form of XML documents, which the parties are responsible for transforming into locally useful forms. The messaging software handles the other aspects of application-to-application communications. It handles synchronization, queuing, backpressure or flow control, status reports, and other matters that smooth the exchange of the XML documents.

This combination of messaging and XML constitute a strong foundation for integrating a set of management tools. The XML documents provide the information, and the messaging software handles efficient exchanges and signaling between applications. Management tools can now be distributed across several servers, if desired. This flexibility enables administrators to link their management tools into sequences that define management processes. This level of integration offers substantial value to the management teams.

XML will be the major means of data sharing between management tools, especially those from different vendors. XML has already achieved a strong foothold in many products, and vendors are using it as an internal integration tool for their own products. This trend will accelerate, especially because XML makes absorbing products from mergers and acquisitions easier as well.

Loosely Coupled Service-Management Processes

Traditional management architectures revolved around a single platform that determined the conventions and standards for all tools that could be integrated into that framework. Newer management architectures will be organized around management processes and tasks rather than around a single platform.

Triage provides an example of an alternative, process-centric structure. Triage is the process of determining the responsible infrastructure and organizational group as a first step in handling a service disruption.

In the case of poor performance at an end-user location, the triage process manager could initiate a set of subprocesses to determine which infrastructure is the likely cause of the problem. First, it would initiate other subprocesses to determine if the servers, content delivery, applications, or other infrastructures have a role in the service disruption. Each subprocess could, in turn, activate other tools to carry out its responsibilities. Instead of a central management platform needing to know all the details of each server element, the element managers themselves would concentrate on knowing the details of their particular subsystem and would respond to status requests from the triage process manager.

If the server infrastructures do not report any problems, the triage process manager could turn to the transport infrastructure. For instance, it could ask transport infrastructure service managers to initiate end-to-end measurements to determine basic delays, packet loss, or other relevant metrics. The end-to-end testing tool may need to access instrumentation information to locate the appropriate probes to activate for the measurements.

If the end-to-end measurements indicate that further investigation is warranted, additional tools could be brought into action. If the servers are multi-homed, one example of an investigation into network problems might be a check of ISP performance using measurement data accumulated by the routing optimization system that's managing the selection of ISPs. If an external network is determined to be the problem, synthetic transactions or another testing method could be initiated to probe the external network and determine when it resumes operating within the range that does not further threaten service quality.

The loosely coupled, process-oriented approach enables administrators to focus on the steps they need to follow to achieve a management result, instead of trying to address management strictly in the context of a platform or element.

This approach will help IT groups manage the responsibilities and cooperation of each key process team. If, for example, a triage operation within the services group identifies the transport infrastructure as the likely cause of the disruption, the appropriate transport specialists and processes can be automatically signaled to resolve the problem and restore service levels. The technical means to integrate the support groups can come from the emerging messaging and signaling functions discussed previously in this chapter. They enable either team to signal the other and activate the appropriate processes.