Operations Capabilities | Microsoft Corporation - Guidelines for Application Integration

The capabilities defined in this section provide operations functionality to your application integration environment.

System Monitoring

System Monitoring includes the monitoring of hardware, the operating system, and software applications to ensure that these various system components are functioning correctly and within the desired operating parameters or agreed-upon capability levels.

One of the most fundamental and often ignored aspects of monitoring is the concept of baselining. Every system behaves uniquely in terms of how quickly it responds to certain requests and the amount of resources it consumes to satisfy requests. It is very important to obtain baseline values for each of the metrics being monitored.

Baselining establishes a standard against which future comparisons can be made to detect anomalies.

There are several ways of viewing what a baseline really means, depending on the maturity of the system being monitored. For a newly commissioned system or a system receiving a major upgrade, it is useful to be able to obtain a baseline value based on a single transaction for each of the different transaction types. As the system matures, you may find it helpful to establish a different set of baselines based on other metrics that make sense for your application. For example, you may create a new baseline, measured only on working days to reflect the fact that systems exhibit certain load characteristics based on the working days of the week. Baselining can also be extended to the provisioning of capabilities and the associated time and cost involved. Baselines can help you to estimate the cost and time involved in extending the existing capability (through scaling up or scaling out) to handle larger loads.

Event Handling

Event handling can be classified into two categories: passive and active.

Passive event handling is commonly found in applications. It basically receives the event and logs or display the information. The system does not attempt to understand the exception type and resolve it.

Active event handling requires the capabilities of passive exception handling, but provides the additional capability to process and attempt to resolve the issue that caused the exception.

A resilient integration application provides some capabilities of active event handling. This allows minor exceptions to be automatically handled and allows the requests to go through as normal. A very simple example is a system that tries to access another back-end system through an unreliable network connection and as a result may lose requests. The integration application can build a timeout mechanism such that it can automatically retry if the first request fails.

Business Activity Management

Business Activity Management is an important aspect of any integration system. However, the term Business Activity Management is broad. In the context of this discussion, you should consider Business Activity Management as the monitoring, managing, and analysis of business transactions processed by the integration system.

Activity monitoring enables real-time tracking of business transactions. Alerting of business exceptions is an important feature of monitoring because it enables business analysts to react to any problems with business transactions as they occur. Fixing problems before users notice them, or being able to at least notify users in advance of potential delays, is a key part of proactive monitoring. Examples of problems can range from systems being unavailable to particular steps of the process exceeding the maximum allowed processing time. The latter is important for businesses that must adhere to business capability-level agreements with their customers or partners.

Activity management enables a business analyst to react to a business exception by modifying the transaction process path in real time, before the system rejects the transaction. Actual modification of predefined business processes is considered part of the development and application release process, due to the more permanent nature of the changes.

Activity analysis is about capturing all of the business transaction information that has passed through the system. All data that is relevant to the business transaction — such as process step response times, business data values, and data sources — is captured and is placed in a data warehouse. This allows the business analyst to perform historical analysis of the business process steps from numerous aspects. This ability is extremely important for process-oriented businesses that have capability-level agreements with customers or partners. Having this information allows the business to quickly verify whether it has breached any capability-level agreements. For example, imagine that a parts supplier has agreed with various business partners to process any orders and provide an invoice along with delivery tracking within three days from the time the order is received. If the integration system handles the whole process, it will be able to capture the duration of any process steps as well as the end-to-end processing time. Each step may involve other systems, or it may involve humans who approve or perform further processing. If the partners complain that the business constantly misses its processing-time commitments, the business can then perform an analysis and determine the reason for the delays. Similarly, having the right monitoring in place allows warnings to be generated if a particular step has exceeded its maximum processing time.

Configuration Management

For the purposes of this guide, configuration management refers to the tracking of hardware and software configurations. Although most configuration management tasks focus on process and documentation, a number of technology capabilities can assist in this area.

Note

This guide does not cover asset management.

Configuration management is important for a number of reasons. One of the most important is that it helps you keep track of how each system is configured. This information is very advantageous to have for disaster recovery scenarios. Another benefit of configuration management is that it helps you to compare two systems and to verify them before and after you make modifications, such as application upgrades. It is quite common for systems to exhibit stability issues for no apparent reason, but then on investigation to reveal that some of the files have been replaced or modified to different versions.

Being able to track dependencies is another important part of configuration management. Properly maintained dependency information can provide a real benefit in the area of change management, especially when determining possible impacts of a particular change and what regression testing might need to be performed.

Versioning

Versioning forms an important part of configuration management. Versions allow you to quickly verify whether the application or operating system files are correct. They can also help you to determine whether your applications have backward or forward compatibility.

Note

Effective versioning requires technology and policy support. It is important to ensure that version guidelines are defined and followed.

It is possible for applications to detect version information and enforce policies that determine which particular version of libraries can be used with the current version of the application. This mechanism ensures that either the application runs as expected or that it fails right away, if the required version of the libraries is not available.

Signing the actual binary images in conjunction with versioning provides an additional layer of protection and verification, which can be important when ensuring if certain files have been modified due to virus infection or malicious acts.

Some of the most commonly changed information is configuration data, which administrators often use a text editor to change. Signing the configuration file allows changes to the configuration information to be detected.

Integration applications may also keep a recent history of the configuration file, or upon loading a new configuration file may generate a log highlighting the changes between the newly loaded version and the version currently used. This mechanism allows specific attributes to be tracked and easily identified, simplifying troubleshooting problems related to errors in configuration.

Automated Provisioning

One of the most common issues with manual configuration is human error. It is the main reason that large organizations create images or automated system provisioning systems to reduce the amount of human involvement. Systems are usually composed of a variety of applications, software patches, and hardware. Even when the same applications are installed on two systems, if they were installed in a different sequence, some aspects of the configuration may be different. Such situations can produce problems that are difficult to discover. Automated provisioning of the base system and applications helps to ensure consistency between systems and platforms.

System Configuration Snapshot

A system configuration snapshot provides the ability to capture a snapshot of the system when changes were made. Such a snapshot essentially provides a mini backup capability, which allows the system to be rolled back when the changes produce undesired results. Without a system configuration snapshot, the usual course of action is to revert to a backup of the system.

Using the snapshot information, you can also compare the system to provide a before-and-after view and to track and control exact changes.

Configuration Verification

It is very important for you to be able to identify and verify the configuration of your systems. If you cannot, configuration of any system is at the mercy of the quality of provisioning and change management. When something goes wrong, it is difficult to check whether the configuration has been modified in any way.

Change Management

For the purpose of this guide, change management is limited to the considerations of change management, and does not specify the process itself. If you are interested in a detailed process-oriented view of infrastructure management, you should refer to IT Infrastructure Library (ITIL) and to the Microsoft Operations Framework (MOF).

Change management is particularly important for effective application integration. Any integration application relies on a number of other applications, each of which is susceptible to changes from its respective owner. Changes to these applications are likely to have a significant impact on the integration application itself. Because the sheer volume of changes required can be large, having effective change management processes and tools helps to reduce the chance of errors when applying changes. Change management can also help to reduce the amount of time required for changes to be applied.

An important goal of having a change management system and process is to allow controlled changes to occur in the shortest period of time. Controlled change is an important criterion in maintaining a stable system. Without controlled changes, a system modification can produce undesirable results.

There are a number of items to consider when you establish your change management procedures:

Release management. Release management involves planning how the change will be released as well as determining rollback or roll forward plans. Rollback entails reversing the changes and ensuring that the system is restored to the exact same configuration it was in prior to the release. Roll forward is necessary when the changes are known in advance to be irreversible. In such cases, it is vital to have plans in place to mitigate the risk of the system not functioning correctly. Ultimately, due to the possibility of frequent changes that can occur in an integration application, providing as much automation as possible will help speed the implementation of changes and reduce the risk of human error.
Regression testing. Regression-testing the new application ensures that it does not compromise existing functionality. When the regression testing methodology and steps are well-defined, automating this process can help during release management and can reduce human-related errors.
Backward compatibility. Backward compatibility is especially relevant in the integration area. Backward compatibility allows the functionality of the integration application to be enhanced or modified without affecting existing systems that relied on the application.

Directory

The Directory capability is essential in the context of integration, because it stores much of the information required by application integration.

Generally available directory systems are highly scalable and often reside across multiple physical servers. As a result, normally a delay occurs before changes to data are consistently presented across all physical servers. Hence, if the data stored in the directory changes often — a few times a minute, for example — the directory may never present a consistent view of that data across all physical servers. For this reason, directories normally provide fairly static information that changes a maximum of a few times a day. Application interaction with the directory is largely restricted to read-only interactions.

The directories that are used in an application integration environment include:

The identity directory, which is primarily used for security as the repository for identities and related information.
Subscription directory, which is primarily used in a publish/subscribe scenario.
Application configuration, which is primarily used by the applications themselves and stores configuration information.
Capabilities directory, which provides a list of capabilities offered.

The identity directory is the repository of all user and system identities. It may also contain other related information such as the user roles and profile information. Many organizations aim to have a unified identity directory, but struggle to do so, due to the number of preexisting applications and the custom developed security mechanisms they use. The problem is exacerbated due to the number of different environments the applications are using.

The subscription directory, as the name implies, keeps track of subscribers. This directory is typically used in publish/subscribe communication. A separate directory ensures that the subscriber base can be managed outside the application logic, which allows for either self-subscription or total management of subscribers, or for a combination of both.

The application configuration directory is most useful in providing a consistent application configuration across a number of systems. For example, if an integration system currently requires 10 different servers and relies on a local configuration file, the operator must ensure that all 10 are consistent. Providing an application configuration directory allows these settings to be provided in a single location. Although there are tools that can assist to maintain the files, using them introduces another level of complexity, and there may be limitations in how the tools can replicate the configuration files to the servers.

The capabilities directory provides a common place where capabilities can be listed and used. In the case of Web capabilities, Universal Discovery Description and Integration (UDDI) servers provide this functionality. Regardless of whether the capabilities directory is a UDDI server or a custom server, it is important to ensure that the information reflects the type of capabilities provided in a commonly understood format and protocol. Commonly understood, in this case, can mean organization-wide or even spanning across to partner organizations.