Service Management Architecture: History and Design Factors | Practical Service Level Management: Delivering High-Quality Web-Based Services

A service management architecture must be designed to handle the heterogeneous, geographically distributed subsystems used for Web service delivery, and it must also cope with the fact that many of these subsystems are owned and managed by different organizations with only tenuous links for instrumentation and control. To make matters more complex, suppliers are being challenged, by customer demand, to provide individually tailored Service Level Agreements (SLAs) with fine distinctions and pricing for specific services and for specific customers. Customers are also beginning to ask for the ability to alter their service mix and service quality levels on demand.

The management system must, therefore, be able to handle the complex interactions involved as services are provisioned, activated, used, and deactivated. Despite these difficulties, it must help speed deployment of new services while also quickly adapting to new service demands and changed relationships among service suppliers.

Further discussion of this topic is divided into the following subsections:

The evolution of the service management environment
History of service management architectures for heterogeneous systems
Architectural design drivers for management of heterogeneous systems

The Evolution of the Service Management Environment

Today's service management environment bears scant resemblance to the one we had even a few years back. It's important to note that this shift is not just about new tools and technologies; it's equally about changes to the organizations and job definitions within IT shops.

An example should make the critical difference between traditional and Internet-based system structures clear. Consider that in a traditional system, such as a corporation's transaction-processing infrastructure based on IBM's Systems Network Architecture (SNA), the entire system was owned and operated by the corporation. Any externally owned telecommunications facilities were very simple; they were direct telephone lines between physical locations, with little variation or value-added services. Data switching was performed inside the corporation's private IBM communications controllers, which were centrally configured and operated, with all cross-network routes preplanned and tightly controlled. Major portions of the network were regularly taken offline for reconfigurations; it wasn't unusual for an entire worldwide network to be completely unavailable for one or more days per quarter. The end users all had IBM terminals or terminal emulators running on early PCs; those terminals connected to one server at a time and usually presented only text.

If a problem occurred in the traditional IBM SNA-based system, the system operator had central, integrated control of the application, the network, and the end user's terminal. The network's System Services Control Points (SSCPs) could instantly locate and diagnose the complete end-to-end connection between a particular application and a particular terminal. Given clear visibility into underlying connections, other tools could diagnose the application problems quickly. The entire system was tightly-coupled. Configuration was extremely complex and could be error-prone, but a running system was under strong central control.

In contrast, Internet-based systems are loosely coupled and do not rely on massive, centralized configurations of servers, storage, and network hardware. However, these more flexible configurations are more difficult to operate at a given level of service and do not have any central management system. Instead of having that central authority, which is the keystone of a traditional system, Internet-based systems have a loose confederation of interacting, separately owned and controlled subsystems.

Of course, the flexibility of networked architectures is a mixed blessing. It does facilitate changes to keep pace with changing demands of the business. However, such change can also introduce new complexities and vulnerabilities. When a problem occurs in an Internet-based system, finding the precise end-to-end path that the data flow is taking may be extremely difficult; there's no central switching or routing authority. Even if that path is found, it's unclear that the knowledge could be effectively used to fix any problems quicklythe responsible ISP might be one that isn't directly accountable to either end of the connection.

Further exacerbating the situation is that a problem as seen by the end user could have been caused by any of dozens of interacting subsystems and servers. The image on the end user's browser probably comes from multiple servers simultaneously (third-party suppliers provide stock charts, ads, and so on); each data flow may have been invisibly intercepted and possibly cached by devices unknown to server or end user; the server assigned to a particular end user may have been assigned only temporarily and cannot easily be traced at a later time or even while the error is occurring. Running a help desk in the complex Internet environment is much more technically difficult, and it takes more ongoing negotiating and interacting with external suppliers than running one in the traditional environment!

Service Management Architectures for Heterogeneous Systems

New architectures and platforms were created to manage Internet-based, heterogeneous systems. Managing services that span many infrastructures and organizations not only demands a set of management tools from a variety of vendors; it also means that tools must be applied in the appropriate sequence to solve the problem. How the tools are organized has a significant impact on the effectiveness of your management efforts.

The traditional approach, still common today, is to use each tool in isolation. When several tools are needed for a task, a staff person takes the output from one tool and uses that information to drive the next tool. This approach needs additional staff attention, can consume large amounts of time, and adds the risk of introducing errors with manual steps. It also requires an investment in additional equipment and requires additional physical space, adding significantly to the cost of monitoring and management. Integrating the tools appears to be a better solution.

If integration is good, you might wonder why is there so little of it. Consider the following reasons:

First, deep integration of the management structures of heterogeneous systems has been a significant, expensive technical challenge, especially when there were no accepted standards for guidance and the managed systems changed constantly.
Second, the market has been willing to settle for integration as defined by marketing departmentsintegration that seemed to correspond to needs, but that has failed to meet the test of practice.

The early management platforms touted themselves as integration points for a set of best-of-breed management tools. Unfortunately, their marketing hype exceeded their capacity for delivering any meaningful integration. Competition was on the basis of who had the longest list of third-party management tools sharing interfaces to their platform, notwithstanding any real integration efforts. The market positioning suggested that commonality of interfaces was the key to making tools useful; that turns out not to be the case in practice.

The integration many early management platform vendors actually offered might be better characterized as consolidation and tool launching. Consolidation allows customers to use a single server for a set of management tools rather than use a server for each one. Tools can be launched after an alert triggers a response. This is useful, but there is no integrationeach tool still operates as a separate entity with its own commands, functions, data schema, and display formats.

Some management platforms added integration on the glassa consistent look and feel for a set of tools. This feature is useful because it simplifies usage and reduces staff training requirements. The platforms offered this common look and feel for their products and the overall console. However, each tool could, and often did, have its own conventions after being launched.

All the early platform vendors got away with these low-level integration features because the market was relatively unsophisticated, and systems management did not demand as much integration. However, today, this lighter level of integration is no longer adequate; management tools must now work in a webbed services environment.

A cynical view is that the early lack of deep integration also served vendors as they built substantial professional services organizations to finish the job. I had one vendor in a moment of candor admit that his company made $10 in professional services for each $1 a customer spent on the actual software. Market studies in general showed that the consulting spent to take such tools off the shelf and put them to use exceeded the licensing fees by a factor of 2:1 or more.

The relatively shallow integration left organizations with several other choices: they could find another integrator, undertake the effort themselves, or live with a set of disjointed management tools. Of course, using a systems integrator was expensive and time-consuming; it often meant that a company was dependent upon the integrator every time new management tools were acquired. The alternative of internal integration efforts was also expensive and time-consuming, as it diverted development resources from the core business initiatives.

As with much of technology, invention is the mother of necessity in management. The management industry has been responding to the need for better integration through consolidation. The big players buy up niche products and offer the suite as an integrated solution. Others are forging strategic partnerships and integrating their products. Both trends offer some additional value for management solution buyers. However, it is still unusual for these efforts to produce a product suite that offers more than integration "on the glass." Often surface integration relies too heavily on limited new software to try to glue the disparate pieces together.

NOTE

It's important for prospective purchasers of integrated management systems to remember this history of superficial integration when evaluating systems. Deep integration of management systems is difficult, even though new standards, discussed later in this section, promise some help.

Architectural Design Drivers

The key factors that drive the development of architectural designs for service management architecture are as follows:

Demands for changing, expanding services
Multiple service providers and partners
Elastic boundaries among teams and providers
Demands for fast system management
Need for mutually understandable data item definitions and event signaling mechanisms

These are described in the following sections.

Demands for Changing, Expanding Services

The range of Web-based services continues to expand, with streaming, multimedia, and remote collaboration gaining interest. The range of network access devices in common use is also expanding, requiring services that can adapt to the inherent bandwidth, resolution, and screen size limitations of the access alternatives.

For example, feedback from the lowest network transport layers could be used to adjust the mixture of frame types in a streaming media presentation to improve end-user QoE. Many current systems conceal transport error rates from the application layer; the application layer therefore doesn't know that errors are occurring and that a change in the frame mixture might be helpful. As streaming increases in importance, new streaming-tuned services may appear. In such systems, the application layer that creates the video or audio stream can be told to increase the percentage of key (synchronization point) frames in the stream as the error rate in the transport layer increases. That increase assists the receiver in regaining lost synchronization quickly, at the expense of some instantaneous bandwidth use. Service levels delivered to the streaming media service in this situation could be adjusted quickly to provide the optimum mix of transport error rate and bandwidth, enabling an improved QoE for the end users.

Multiple Service Providers and Partners

An array of service providers and partners play their parts in the end-user experience, offering connectivity, value-added network services, hosting, content delivery, and back-office functions. Some partners use direct interaction for building their own supply chains and other online business processes, while other organizations use exchanges as a way of transacting business with a larger number of potential suppliers and partners.

The customer must consider how these various providers are held accountable for meeting their various compliance criteria. Holding providers accountable requires an ability to monitor their service delivery with the appropriate instrumentation, and the measurements from that instrumentation must be correlated with the service quality as seen by the end user.

Elastic Boundaries Among Teams and Providers

Boundaries in the service management environment are more elastic and fluid than they were in the older mainframe and client-server worlds. Service flows move among the infrastructures in many different ways, and managers must understand how the behavior of each infrastructure is affecting overall service quality. Service managers therefore need to understand issues that span multiple supporting infrastructures (networks, systems, and applications) and multiple organizations.

The contrast with traditional management organization strategies is stark. In the past, teams had isolated, well-bounded responsibilities; for instance, the network and application infrastructures managers had little reason to interact. Today, such specializations must be integrated with a structure for mutual responsibility and collaboration by specialists across these different layers. Infrastructure managers can be specialists, but service managers must also be generalists.

Boundaries between customers, their providers, and their business partners are also becoming more fluid. At any point, the constellation of providers and partners can change as the mix of services responds to business shifts. To keep pace with the changing mix, management systems must interact more frequently, and customers need to assume some of the management functions that have been the provider's domain.

Demands for Fast System Management

Despite the difficulties in managing an Internet-based system, competitive pressures drive fast services provisioning and configuration, along with fast problem detection and resolution. The fact that many of the critical underlying services are much more complex than in the days of traditional architectures, and that they are under only loose control, does little to soften the expectations of end users. They still want fast, effective support from the help desk.

Data Item Definition and Event Signaling

Products from different system management tool vendors generally use different names for management data, different ways of representing their values, and different ways for describing relationships among data elements. It comes as no surprise to anyone that each vendor's choices are not compatible with the others. Especially in older designs, a tool from one vendor usually cannot access needed information from another vendor without knowing the details of the latter's data definitions and creating the translation software that transforms the data into a usable form.

The Simple Network Management Protocol (SNMP) was the first effort to develop standards for exchanging management informationin this case between an agent on a network device and a management application running on a system management platform. As its name implies, it contains a very simple way of asking a remote device to send a formatted array of system management information (the Management Information Base [MIB]), which contains data, such as packet counts and error rates. It also has other features that enable the sending of asynchronous alert messages and the setting of some remote parameters. SNMP was very successful and has been extended to a number of other elements, such as servers and applications. However, SNMP has two significant drawbacks: it focuses on syntax, and it pays less attention to semantics.

The syntax (command and data structure) of SNMP can be used by a management application to determine that a variable included in an array of system management information is a 32-bit integer used as a counter. However, without the semantics (meaning) of the counter, the application cannot use the data. Missing information may include the following:

What does the counter count?
When is it incremented?
What are the maximum and minimum values?
When was it initialized?
What are the thresholds for generating an alert?

Many standard sets of SNMP syntax and semantics for network and applications systems were defined, and they attempt to answer these questions. However, manufacturers quickly introduced proprietary extensions to the standard MIB data definitions, and often the semantics of those extensions were poorly specified, which stymied interoperability.

Proprietary extensions are inevitable, and they are a mixed blessing for customers. They are desirable because they enable vendors to innovate and offer unique value-added features to the standard SNMP management capabilities. They are a problem because management applications from other vendors often do not use the "foreign" extensions to advantage. Without complete specifications, and without a financial incentive to do the integration work, vendors can't and won't incorporate other vendors' extensions into their management tools. This leads to situations where customers having similar network devices from several vendors must use different management tools for each product set even though all the devices perform the same functions in almost identical ways.

This problem of data definition continues in current standards efforts, although there has been some improvement. The extensible markup language (XML) standard is already being used extensively for exchanging structured information, and many vendors have adopted XML as a means for exchanging information between their own management products. XML takes a step forward by including methods for converting the format of a message's data into a format understood by the receiver, but the semantics of that message must still be defined elsewhere.

The Distributed Management Task Force, a standards body composed of industry players, has recently defined the Common Information Model (CIM), which is intended to complement XML by offering more complete definitions for all management tools. For example, CIM can be used to describe each managed object by the following:

Characteristics, or attributes Describe the specific parameters associated with each object. A server, for example, would have characteristics describing its manufacturer, model, memory capacity, disc storage, number of processes, and other attributes. An application would have characteristics describing its requirements for processing, storage, network resources, and service quality.
Methods Describe the operations that can be performed on the object. For the servers there would be methods for rebooting, killing a process, creating a process, changing the number of active threads, and other operations.
Indications (alerts) Are used by the object to communicate with external entities. A server would send indications when a process failed, memory was running low, or the disc system was clogged, for instance.
Associations Are used to describe the relationships among various managed objects, allowing a management system to construct logical groupings.

CIM is still very young, and not yet widely used, but it points a way to the future.