Middleware is not a new concept in distributed computing. It was first developed during the 1990s and has evolved significantly since, with the increase in distributed systems. Due to its changing nature, it is difficult to provide one generally accepted definition and scope for middleware. A workshop was held at the International Center for Advanced Internet Research in December 1998 to decide on a general definition of middleware and to identify essential services to be researched and developed. Conclusions from the workshop (Aiken et al., 2000) were that it was agreed that the definition of middleware was dependent on the subjective perspective of those trying to define it. It was accepted that:
Application environment users and programmers see everything below the API as middleware. Networking gurus see anything above IP as middleware. Those working on applications, tools, and mechanisms between these two extremes see it as somewhere between TCP and the API…
Perhaps more generic definitions of middleware are (Emmerich, 2000) as follows:
"Middleware is a layer between network operating systems and application components." Middleware "facilitates communication and coordination of distributed components."
This can be visualized in Figure 1.
Figure 1: Middleware in Distributed System Construction
Middleware has become widely adopted in the industry to simplify the problem of constructing distributed systems. One of the classic application areas for middleware is enterprise application integration, perhaps resulting from corporate mergers. Very often, the period of integration allowed is so short that building a new system is neither feasible nor cost effective. Second, when components are to be integrated, they may have incompatible hardware and operating system (OS) platforms. To build applications using network OS primitives is too time consuming and expensive. Middleware resolves the heterogeneity between systems and provides higher-level primitives so that application engineers can focus on application requirements.
As middleware serves to facilitate communication and coordination between components in a distributed system, it should fulfill the requirements discussed next (Emmerich, 2000).
Communication between components in a distributed system is achieved by using network protocols. The Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) are examples of protocols used in the transport layer of the Open System Interconnection (OSI) model. Lower layers in the OSI model are managed by the network OS. Application engineers should not need to implement the session and presentation layers. These should be implemented by the middleware. Thus, the middleware should provide the ability to transform complex data structures into a format that can be transmitted using a transport protocol, that is, a sequence of bytes. This transformation is known as marshalling.
Synchronization of components communicating with each other is important. This could be achieved in various ways. Synchronous communication refers to a component being blocked while waiting for another component to complete execution of a requested service. Synchronization that does not require either of the parties to block on a response from the other party is referred to as asynchronous communication.
Sometimes there may be more than two components involved in a service request. This is the case when more than one component is interested in events that occur in another component. This form of communication is then known as group requests. Additional support for this type of communication is required to achieve reliable delivery and marshalling of data.
Another coordination issue of activating and deactivating components arises due to the sheer number of components in a distributed system. Middleware should thus provide application programmers to determine the activation policies.
Middleware should support threading policies, as a server component may be requested from many client components simultaneously. The server component may be single-threaded, in which case, it queues requests and processes them in the order of their arrival, or it can spawn new threads to process each request in a new thread. This policy is then multithreaded.
Network protocols have varying degrees of reliability. To ensure that the receiver receives every packet, error detection and correction mechanisms have to be incorporated to handle unreliability. There are four degrees of reliability in communication between components:
Best effort. The service request does not assure execution of the request.
At-most-once. Requests are guaranteed to execute only once.
At-least-once. Requests are guaranteed to be executed, possibly more than once.
Exactly once. Requests are serviced once and only once.
There is a trade-off between performance and reliability. Increase in reliability results in decrease in performance.
For group requests, there are three types of reliability mechanisms:
K-reliability. Indicates that at least K components receive the communication.
Time-outs. Specifies the time period after which no delivery of the request is attempted.
Totally-ordered. A request never overtakes a request of a previous group communication.
The above reliability mechanisms are applicable to individual service requests. When one considers reliability for more than one request, transactions are used. Transactions have ACID (atomic, consistency-preserving, isolated, durable) properties, meaning that a sequence of requests is performed completely, or not at all. Every completed transaction should be consistent. Concurrent transactions are isolated from one another, and once a transaction is committed, it cannot be undone.
Reliability can be further enhanced by replicating components, in other words, making multiple copies of components available on different hosts, so that a replica could service the request should the original component fail.
Middleware should be scalable to accommodate increasing load. In a distributed system, this could be achieved by load balancing, whereby the load is distributed across several hosts. Building a scalable distributed system to support changes in the allocation of components to hosts without changing the architecture of the system, or the design and code of any component, is challenging. The International Organization for Standardization (ISO) Open Distributed Processing (ODP) reference model defines, among others, the following two types of transparencies that middleware should support (ISO 7498-1, 1994):
Access transparency. A component accessing the services of another component is independent of whether the other component is located locally or remotely.
Location transparency. Components need not know where the other components they interact with are physically situated.
Heterogeneity comes in many aspects for a distributed system. There may be differences in hardware and OS platforms, as well as different programming languages being used for various components. Middleware should be able to resolve this heterogeneity.