Section 9.3. Availability and Scalability | Enterprise SOA: Service-Oriented Architecture Best Practices

9.3. Availability and Scalability

It is mandatory for any enterprise architecture to provide functionality in the way it has been commissioned to do so. In this context, we consider a system as providing availability if it is operational for 100% of planned uptime. In other words, we consider a system to be 100% available if it prevents any unplanned downtime. Note that we do not include planned downtime in our concept of availability. For example, if an application becomes unavailable because a database is taken offline for backup every night, it does not reduce the availability of the application. Instead, it limits the service level that the application supports.

Scalability in the context of this book means that an enterprise architecture provides some way to increase its capacity beyond initial planning. Capacity designates the work a system can perform in a given amount of time, commonly measured in transactions per second (TPS). Other measures of capacity are the number of concurrent users a system can support and the amount of storage space it provides. Ideally, scalability should provide a linear solution, meaning that if the capacity of the system is doubled, the resources availablememory, CPUs, network and management overheadshould at most need to be doubled. In practice, systems can scale linearly only to a given point. For most practical purposes, scalability means that a clean and defined path exists to increase the capacity of a system by a requested amount. In general, scalability is only considered up to a certain boundary. For example, a requirement might be that an application must be scalable from an initial load of 100 to 10,000 users.

One of the most common confusions that arise surrounding issues of scalability and availability is that they are often not rigorously defined or that they are defined based on insufficient information. The so-called Service Level Agreement (SLA) lies at the heart of this matter. An SLA typically defines a set of performance figures that are an integral part of the contract between one organization that provides an IT service and another that consumes that service.

The most common performance figure is the guaranteed operation time. The operation time (commonly referred to as uptime) states the time that the system must be available, along with an acceptable amount of unplanned downtime. The SLA also states the capacity of the system. This includes storage capacity, number of concurrent users, and TPS. Often, an SLA also states the response times that are acceptable for a certain percentage of requests.

Care must be taken when the requirements for any IT system are defined. Further care should be taken for any system that relies on other systems to function properly. Far too many so-called "SLAs" in the industry only state the wishful thinking of the people who originally "defined" them during a sales pitch. As a consequence, they also state requirements that far exceed what the systems actually needs to deliver in the worst-case scenario. For example, it is unacceptable to require an airline booking system to support a number of concurrent requests that is equal to all airline seats available for the entire year. Even if the booking system could fulfill this requirement, there would be no benefit for the business. Therefore, such requirements must be based on known values, such as the current number of transactions and the actual number of users that are concurrently connected to the system, where possible. If such numbers are not available, extrapolation from currently known values is required. It is both valid and commonplace to add a safety margin, but you must take into consideration that the requirements for the system's availability ultimately impacts engineering efforts. At the same time, it is valid to question whether any business process needs to operate unaffected in the cases of nuclear war or an alien invasion. Finally, you should be wary of the concept of guaranteed response times for as yet unspecified and unimplemented business functionality. It is fairly easy to plan for capacity and availability; it is almost impossible to plan for an unknown computational algorithm.

By now, it should be clear that availability and scalability come at a price. A number of applications exist in which you should be prepared to spend a large amount of money to guarantee availability. Consider air traffic control systems or a control system for a nuclear power plant. In these cases, not only the functioning of the economy but also human lives depend on these systems remaining permanently operational. In other cases, the systems that are affected by your IT system might be very expensive or very hard to replace, such as a multi-billion dollar spacecraft whose IT systems enter unplanned downtime just as it enters the atmosphere of one of the moons of Jupiter. However, for a large number of systems, some amount of unplanned downtime is inconvenient but acceptable. One example is an online banking system where most users might not even notice a downtime of a couple of hours every month. A second example is an airline check-in system. Although downtime is very inconvenient and large queues form in front of the check-in desk, fallback procedures exist that allow for a manual check-in process.

The stronger the requirements, the higher the price tag. Building an extremely fail-proof system that is scalable from initial hardware setup to software development to operating procedures is very expensive.

On a different note, you must never forget that the overall performance of the system is ultimately limited by the weakest link in the technology chain. This has an important effect on any integration or service-enabling effort because the impact that legacy systems and databases have on overall system performance is regularly underestimated. In principle, the service layer scales linearly to support unlimited capacity, regardless of the underlying technology. However, the supporting systems stop scaling at some point, such as when the number of connections a database server can support is exceeded, when the storage area is full, or when transactions occurring at a certain rate create concurrency problems at the database level. Often, this is not a technical but an administrative problem: hardware must be bought, database configurations must be changed, host computing power must be ordered, and so on. Because these things typically take time to progress to the proper channels in an enterprise's IT operation, it is important to stress test the entire application stack as soon as possible in the project to allow enough time for uncovering any backend scalability issues.

Stress Test Early

Often, simple measures on IT system level will improve performance and scalability radically. However, time is required to implement these changes. To buy yourself this crucial time, do not be afraid of using stress testing early in the development process.

Finally, note that session state should be placed in the application frontend. If the application frontend is not suitable, you should create lean process-centric services, which should be carefully separated from non-conversational services.

9.3.1. SCALABILITY AND AVAILABILITY USING WEB SERVICES

Web services are generally built on one of the widely available technologies to deliver dynamic Web pages, most notably Microsoft .NET and J2EE. In principle, these technologies provide easy scalability. You can use a load balancer to forward the requests to any number of framework instances or containers that are configured identically. If needed, you can add additional hardware to host more instances of the container in order to increase system capacity until the network is saturated. Most off-the-shelf load balancers provide the concept of sticky sessions, where a request that originates from a certain IP address is always forwarded to the same container. This will enable the container to preserve conversational state with the client. As we discussed in Chapter 8, limited resources such as open database connections or transaction contexts should never be part of this conversational state. Newer load balancers even analyze the request for information about the server on which the session relies instead of using only the IP address. In general, failure of the container will result in the loss of the conversational state. Strictly speaking, from the preceding definition, this does not limit availabilitythe load balancer will notice that a system has come down and will forward the request to another machine. It might nevertheless prevent the uninterrupted operation of the system. Again, this presents additional motivation to make service interfaces stateless and idempotent wherever possible. Most frameworks support some notion of preserving session state. The actual implementations vary widely: state might be stored in a database or in a shared file system, or it might be replicated to all members of a cluster using multicast or to a single dedicated server. Similarly, session state replication might or might not be transactional. In any case, preserving session state creates a noticeable overhead. For example, state that is convenient to store but that is easily re-created should not be replicated. You might even consider the complete abandonment of conversational session state, even if this requires its re-creation with every service invocation. The simplicity and scalability obtained can easily outweigh the extra costs for hardware.

However, the best solution is to store the session state in the application frontend. This makes most of the challenges of maintaining server-side state irrelevant.

As an example, consider checking in for a flight using a self-service terminal. Consider the stage of the check-in service where the seat is assigned (see Figure 9-14). The call signature can easily be crafted to be both idempotent and stateless. In this case, no session replication is required. If the first try to assign a seat fails because the system has gone down, the terminal retries the call. The load balancer has noticed that it has lost connection to the first instance of the service and uses another instance to retry the call. This call subsequently returns the required information to print the boarding card.

Figure 9-14. Failover using a hardware load balancer and a service that uses no conversational state.

Interoperability with Off-the-Shelf Load Balancers

Always aim for a coarse-grained service interface. Use idempotent methods wherever possible. These measures will provide optimal interoperability with off-the-shelf load-balancing routers.

9.3.2. SCALABILITY AND AVAILABILITY USING EJBS

Creating scalable applications using EJBs is fairly simple because most EJB servers support clustering. This enables an administrator to add additional server instances and hardware until the capacity requirement is reached. Usually, the clients' EJB home interface implementation makes a decision based on certain algorithms as to where to create the required EJB. In case of a stateless session bean, the remote stub itself might load balance on a call-by-call basis. Typical algorithms include random, load-based, and round robin. Some EJB containers also facilitate the provision of custom algorithms.

Most EJB containers will bind a specified remote client to a single server instance to limit the amount of transaction coordination needed if multiple EJBs take part in the same transaction. If they do not, you can easily emulate this behavior by using a façade pattern to push down the transaction boundary.

Regarding availability, the same concepts discussed in the previous section on Web services hold true. Stateless session bean stubs will detect the failure of a server and try to connect to a different server. Stateless session beans should be used wherever possible to avoid conversational state. If conversational state is a firm requirement, some EJB containers provide data replication and failover for stateful session beans. Likewise, entity beans that are not part of a transaction can be reloaded from a different server. Because the latter is quite specialized behavior, and because it is also highly vendor-specific, it is again best to try to maintain as little conversational state as possible. Some EJB containers also support the idea of an idempotent method in a bean. This is a powerful concept because it enables the stub to transparently re-create a call if the original call fails in flight. These features of EJB containers often lead to performance problems in a tightly coupled environment with a fine-grained object model. However, they are very useful in the face of the scalability and availability issues of an SOA.

Avoid Stateful Beans' Fine-Grained Interaction Patterns

Using stateless session beans with coarse-grained interfaces will enable you to make effective use of your application server cluster. Avoid fine-grained object interaction patterns. The striking simplicity of this concept will improve not only the availability and scalability of the system but also its robustness and maintenance-friendliness.

As an example, consider booking an airline ticket, as illustrated in Figure 9-15. An examination of that part of the booking process contains the following three steps:

1.	A reservation is entered into the system.
2.	The customer's credit card is charged.
3.	The reservation is marked paid and closed.

Figure 9-15. A stateful booking service that calls the reservation and the pay-ment service. The reservation number, credit card data, and state of payment are replicated in order to support failover.

If the client notices that the call fails, it can retry the call using a replicated instance of the booking service. The key data to be replicated is the primary key of the reservation record and the credit card data, along with the state of the payment. If a reservation key exists and the state of payment is "charged," then the implementation of the booking service can continue with closing the reservation. If there is no primary key for the reservation, it can retry the whole process. If the state of payment is "not attempted" and a reservation primary key exists, it continues charging the credit card. Note that due to the transaction boundaries, a state might arise that cannot be handled automaticallyif the payment fails in flight and the payment state in the replicated data is attempted. In this case, the implementation can run checks using service methods of the payment service to determine the outcome of the attempted call.

9.3.3. SCALABILITY AND AVAILABILITY USING CORBA

With CORBA, most of the concepts discussed in the previous section hold true. The CORBA specification defines requirements for "fault-tolerant CORBA." Among other things, this specification is concerned with detecting communication faults, providing transparent failover, and replicating the object state of distributed object instances. Load balancing within CORBA is usually performed by the individual vendor implementations using low-level functions of the CORBA GIOP (General Inter-ORB Protocol).

Because CORBA fault tolerance is a relatively new specification, various vendor-specific implementations for availability and object state replication exist.

9.3.4. SCALABILITY AND AVAILABILITY USING CICS

Even though it might seem old-fashioned, IBM's Customer Information Control System (CICS) remains widespread for mission-critical services. Basically, CICS is a transaction server that provides very efficient facilities for loading, initializing, and executing programs.

While CICS servers usually run on mainframe or midrange computers, a variety of tools is available to connect to a CICS server. For example, the CICS Transaction Gateway (CTG) enables Java programs to act as a CICS client using the Java Connector Architecture (JCA). Other programming languages, such as COBOL, C++, and Visual Basic, can use the CICS universal client to access the CICS server. Connectivity to IBM's messaging product, MQSeries, is also available. In this way, any client of an MQSeries server can make use of existing CICS programs. Finally, CICS programs can be directly exposed as SOAP Web services using SOAP for CICS.

To ensure a seamless operation with existing CICS programs, any CICS-based service should ensure that no conversational state is held. One transaction should map to exactly one service call. CICS programs can run for an arbitrarily long time interval. If the service implementation breaks while the CICS call is in flight, it can be very hard to determine the result of the transaction. Therefore, extra care must be taken to ensure that CICS programs used in services are small and that they execute quickly.

Scalability and availability of CICS itself is provided by IBM's CICSPlex and SYSPlex technologies.

CICS can even participate in distributed transactions, for example using JCA. However, as discussed in Chapter 8, distributed transactions must be handled with care. Because CICS programs usually execute on systems with very high load and transaction density, using distributed transactions with CICS is often inefficient and might have a significant impact on other CICS jobs accessing shared resources.

9.3.5. SCALABILITY AND AVAILABILITY OF WRAPPED LEGACY APPLICATIONS

It is often desirable to use an existing terminal-based legacy application within a service. Just as often, it is not possible to make changes to the existing application. In these cases, one of the most popular and effective ways of using legacy applications in newly developed applications is screen scraping. The character stream that is used to build the terminal user screen, such as on a VT100 terminal, is analyzed. Effectively, the application mimics a VT100 terminal client.

This approach has several benefits. It is a fairly cheap and straightforward way to include some of the existing functionality into a service. Specifically, it is a low-impact solution because no changes to the original application are required. There are also downsides to this approach, mainly in relation to scalability and availability. First, a lot of existing VT100 systems will have daily maintenance windowsusually at nightwhen the application and thus the service are not available. This must be incorporated into the relevant SLAs for the service. Second, any changes to the application terminal screen require maintenance work in the service application. Furthermore, granularity of a typical stateless service call can easily span multiple terminal screens. This will lead to rather high latency when invoking such a service. On the other hand, a design that performs multiple fine-grained service calls requires a stateful service, which is generally not advisable, particularly because such systems normally work on a pooled resource, in this case, terminal connections. Furthermore, the service can only scale to the amount of transactions and sessions that are supported by the legacy application.

A business that aims for this type of reuse must be prepared to face the consequences of this decision. It will usually save a large amount of money, but this comes at the cost of somewhat limited scalability and robustness with increased maintenance [TDM2000].

9.3.6. SCALABILITY AND AVAILABILITY IN A HETEROGENEOUS SOA

Ultimately, services in most SOAs that run on different platforms are likely to be integrated. For example, the flight booking EJB might call a SOAP Web service to charge the customer's credit card. Because many SOA initiatives start out providing a whole new enterprise IT infrastructure, it is easy to lose sight of this fact. However, if properly deployed, an SOA supports the concept of heterogeneous service platforms working together. The individual services are only loosely coupled and share no common transaction contexts. Therefore, if all individual services are designed to be available and scalable, the overall system will be available and scalable in principle.

However, the situation is not quite so simple. For example, increasing scalability on the top of the service stack might not increase overall scalability of the system because services down the application stack might be overloaded. Thus, changing an SLA for a frontend service usually requires changing SLAs for backend services as well. As an analogy, imagine a call center that relies on a single database to operate at its maximum user load. Simply adding more agent seats and phone lines to the call center does not increase its capacity unless the database system is also upgraded.

The uptime of services that run within a heterogeneous infrastructure does not amount to the lowest uptime of the service chain. If three services provide an uptime of 98%, the resulting uptime is the product of the individual uptimes:

UT = 98% x 98% x 98% = 94.1%

However, the weakest link in the service chain still has the highest impact on availability in addition to scalability. Therefore, project management must first focus on bringing the weakest technology in line with the SLA's requirements.