Common Integration Protocols and Systems

Various protocols and standards have emerged over the past several decades in computing to address the challenges of distributed computing. From EDI to SOAP, we will review how various technologies work, and you will be able to see how certain advances with standards and implementation languages have shaped this evolution. In Chapters 3 and 4, we began an early study of RMI and CORBA, and in this chapter we will expand our analysis of integration to include other approaches as well. We will then discuss how these protocols are being used in the P2P domain in some cases.

The Sturdy Veteran: EDI

Electronic Data Interchange (EDI) is commonly defined as direct computer-to-computer transfer of business information. This information can include purchase orders, invoices, planning messages, shipping and receiving notices, payments, scheduling messages, and so on. EDI is a venerable technology (it's about 30 years old); it could be considered the first and most important (and maybe the most successful) integration technology. EDI is widely used in nearly every industry, up to and including health care and insurance, across the globe. The format and rules for EDI documents or transactions are governed by standards: ANSI X.12, used mainly in North America, and UN/EDIFACT, used worldwide.

EDI came into use at a time when every byte sent across the network mattered because each was expensive. The documents that make up EDI transactions were thus made to be compact, concise, and machine-friendly. It is not pleasant to read an EDI document, but it was not really the intention that someone read them. In practice, it is sometimes necessary to trace problems in EDI documents, and that might mean reading and understanding "raw" EDI.

EDI was intended to solve a number of the problems with "human integration"; that is, the exchange of business information and transactions via mail, fax, and phone. And for the most part, it has been very successful after thirty years, it is still in widespread use, while many companies promising "modern EDI" or "Internet EDI" technologies have tried and failed.

One of EDI's main strengths is also (strangely enough) one of its biggest problems. EDI documents and protocols (how the documents are exchanged) must adhere to a common standard. Standard formats and protocols are a good thing. Therein lies the EDI problem: It is the non-standard standard. Each pair of organizations that shares data via EDI does things a little differently. For example, a pair of trading partners might decide that the first address in an EDI purchase order will be used for the billing address, and the second will be used for the shipping address. This is a private agreement that may or may not adhere to recommendations for how that EDI document should be used. This is fine, but when a third trading partner comes into the picture, that same agreement must be negotiated yet again, perhaps with compromises made for the needs of the new partner.

This is not a unique occurrence it's common practice. This is why EDI works well once all the bugs are worked out of the process, but getting to that point is…well, let's just say we're back to the social engineering problem. This problem has represented a wealth of opportunity for products and services focused around EDI data translation and communication this is integration at its roots.

EDI predates widespread usage of the Internet, and companies needed to get these electronic documents to each other somehow. There were some options, which in fact are still active and viable transports, although the use of the Internet has for reasons of availability and cost since become one of the preferred EDI transport mechanisms.

Value Added Networks (VANs) are companies that provide communications services for EDI customers: message store-and-forwarding, electronic mail boxes, and so on are typically offered. VANs are essentially "EDI ISPs." Leased lines are another common method of connecting EDI business partners. A leased line is a dedicated telephone line or cable leased directly from a telephone or telecommunications company. Leased lines provide dedicated, reliable bandwidth for communications. Leased lines are generally not subject to the kinds of outages that sometimes plague ISPs.

XML-ified/HTTP-ified EDI: Is This Web Services?

As mentioned previously, EDI is not pleasant to read. The document formats make sense, but they were designed to be concise, machine-readable, and easily parsed. When integrating newer, WWW-enabled applications to EDI systems, it is tempting to simply "XML-ify" the EDI document formats you want to send and receive. In such cases, the XML version of an EDI document will be received from another application perhaps through a WWW server, generated from an HTML form via CGI or Java Servlets, or JSP.

This is not necessarily a bad thing, but this type of interoperation is not really Web services in action. Often what happens is that once the XML is received and is parsed, a true EDI document (that is, non-XML) will be created and used for processing. Actually, this might be a good way to introduce new ideas and technologies to an integration partner that is used to doing things one way (in this case EDI) and only that way. Integrating with EDI can require a good dose of social engineering; as much time might be spent negotiating with and educating trading partners as is spent in the actual technical work.

Java RMI

Java Remote Method Invocation (RMI) is a way of invoking methods on Java objects in a different or remote process space, as we learned in Chapter 2. RMI is a distributed object system that enables networked or distributed applications to be built in Java quickly and relatively easily, and (mostly) enables programmers to use the same semantics they are used to with local or nondistributed applications. RMI handles all the nasty parts of dealing with network communication, enabling the programmer to just deal with the problems at hand. RMI is a Java-only technology; if one needs to integrate Java applications with non-Java applications, RMI alone will not do the job.

As in most distributed object schemes, the RMI networking subsystem uses stubs and skeletons to proxy clients and servers for each other. In RMI, each stub represents an actual object. This means that object reference semantics can extend between JVMs, permitting distributed garbage collection and leasing to take place. These very interesting topics are beyond the scope of this chapter, but they are recommended reading.

Also as with other distributed object systems, RMI has the notion of remote interfaces, which are the sets of method signatures used to interact with remote objects. RMI also has a built-in lookup service and a naming service, but no automated discovery mechanism as in Jini and other service-oriented architectures.

RMI has a feature that enables some exciting and otherwise very difficult things to be done: Object and class definitions can be passed across the network via RMI, essentially allowing code itself to be mobile. You can actually make your applications send and receive new types of objects, which enable things like dynamic library updates and mobile agent-like behavior.

However, you need to be aware of what kind of object you're passing in a remote method call: a remote-capable object (an object that objects in other JVMs can invoke remote methods on), or a local object that is accessible only within the same JVM. The semantics make a difference here; a local object will be passed by copy this means that a copy of the local object will be created, and marshalled for sending to the remote server. In the remote JVM, the copy might be acted upon, but nothing that happens there will directly affect the original in the local JVM. When passing a remote-capable object, the remote representation of the object is actually being passed that is, the stub. The methods accessible on the remote object are RMI remote methods, thus changes made to it in the remote JVM will possibly (depending on the way that object works) affect the object in the originating JVM. This is often the desired behavior.

Let's see a very simple example of Java mobile agents that illustrates what happens when an object is passed by copy in RMI. Agents are simple objects with a humble role in life: Select a RemoteAgentServer in its itinerary (the Agent's itinerary is a list of IP addresses or hostnames of RemoteAgentServers), get a remote handle or reference to that RemoteAgentServer, and send itself to the selected RemoteAgentServer. A RemoteAgentServer, on the other hand, maintains a list of Agents that have visited it, and when an Agent visits it, it adds the new Agent to its list and simply prints the list of hosts that Agent has visited.

Here is the RemoteAgentServer's rather simple interface:

 public interface RemoteAgentServer extends java.rmi.Remote {      public void accept(Agent remoteAgent) throws java.rmi.RemoteException; }

The RemoteAgentServer has a single method: accept(), which takes an Agent object as a parameter. Here is the code for the accept() method in the RemoteAgentServerImpl, the concrete class that actually implements the RemoteAgentServer interface:

 public void accept(Agent remoteAgent) {      agentsAccepted.addElement(remoteAgent.getAgentName());     remoteAgent.go();     ArrayList hostsVisited = remoteAgent.getHostsVisited();     for (int i = 0; i < hostsVisited.size(); i++) {       System.out.println("Host "+ (i+1)          +":"hostsVisited.get(i));     }   }  // accept()

Simple, eh? The RemoteAgentServerImpl simply adds the Agent to its list and calls go() (which sends the Agent on its way to the next host on its itinerary) on the Agent, then prints the hosts that the Agent has visited. Here is the Agent's go() method:

 public void go() {      String nextHost = itinerary.get(0);     hostsVisited.add(nextHost);     itinerary.remove(0);     try {         /* get the remote reference to the next server;          * this would be a good place for error handling.*/ RemoteAgentServer agentServer  =  (RemoteAgentServer) Naming.lookup("rmi://" + nextHost +"/RemoteAgentServer");         /* send myself to the next RemoteAgentServer! */ agentServer.accept(this);     } catch (Exception e) {       e.printStackTrace();     }   } // go()

This is nearly as simple as the RemoteAgentServer's accept() method.

We see the Agent do a bit of bookkeeping with its itinerary, and then access the next RemoteAgentServer in its itinerary via RMI's naming service. (Usage of RMI's naming service is an interesting topic in its own right, but is not detailed here.) The interesting part is the call to accept(), where we see that the Agent passes itself as a parameter using Java's this keyword. accept() is a remote method; however, Agent objects are not RMI remote objects. Thus a copy of the Agent is passed to the RemoteAgentServer (which is why the Agent does its bookkeeping before it calls accept()). If the RemoteAgentServer actually did some manipulation or modification, the Agent at the originating host would not be changed.

RMI is a nice Java-only distributed object technology that is part of the core of the Java platform. It's always available to the Java programmer, and enables distributed applications to be built quickly and easily. But, as with anything else, it's not a silver bullet. There are JVM and code versioning issues that sometimes must be taken into account, and there is also the problem of making RMI-based applications work across firewalls. As in other networking technologies, RMI has advantages, disadvantages, and caveats that must be understood in order to use it effectively. Builders of Java P2P systems, for example, might use RMI to get the network communication job done, but would need to create their own peer discovery system, as RMI does not have this facility.

CORBA

CORBA, the Common Object Request Broker Architecture (quite an acronym), is an open and vendor-independent architecture and infrastructure that enables language- neutral network communication, as we learned in an introductory manner in Chapter 5. Basically, CORBA is the granddaddy of middleware platforms. CORBA is a distributed object scheme that uses Interface Definition Language (IDL), a kind of "language proxy," to specify how remote objects should interact with each other. The Object Management Group (OMG) is a consortium of companies that oversees CORBA specifications and compliance (in addition to other interoperation initiatives or specifications, such as UML).

CORBA has been used for years with varying levels of success many of CORBA's ideas have influenced more modern or recent distributed object systems, such as Java RMI. For example, using a client and server proxy scheme (stubs and skeletons) is a concept used in CORBA. Remote method invocations and the marshalling/demarshalling of method arguments is specified in CORBA.

The key to the CORBA architecture is the ORB the Object Request Broker. The ORB is essentially an "object bus" through which remote CORBA clients interact with each other. Acting as a central controller, the ORB is responsible for locating a remote CORBA object's actual implementation, and brokering remote method calls between it and its clients.

IDL enables inter-language unification (a fancy way to say you're probably able to use your preferred language for CORBA). CORBA interfaces are described in IDL, and each language mapping translates the concepts laid out in IDL for a particular interface or service. IDL compilers for each language are mapped to take IDL and produce language-specific constructs code that is then compiled, linked, and so on, depending on the language you're using.

Here's the classic Hello World, CORBA-style, done in IDL:

 module HelloWorld {      interface HiThere {         string helloWorld();     }; };

A CORBA module declaration maps to a Java package thus, this example would correspond to the Java package HelloWorld. The interface defined would correspond to the Java interface HelloWorld.HiThere (fully qualified, with the package name). The HiThere interface has a single (remote!) method, helloWorld(), which returns a string (a String in Java), and does not raise or throw any exceptions.

The following is a list (not necessarily comprehensive) of languages with IDL mappings:

Java
C
C++
Ada
COBOL
SmallTalk
Python

CORBA Services and Facilities

CORBA Services (COS) are a major part of CORBA's infrastructure and specification. They are a set of distributed object services built on top of the CORBA ORB to support the integration and interoperation of CORBA objects. CORBA Services are considered fundamental to constructing distributed applications with CORBA, and are thus viewed as independent of specific application semantics or domains. Some examples of CORBA Services include the following:

Naming Service Provides context-oriented namespace services; similar naming/namespace facilities are a common characteristic of distributed object architectures.
Property Service Provides a service-based approach to dealing with configuration or property information. It provides the capability to associate values with objects and create and/or manipulate sets of name-value pairs or name-value-mode tuples.
Event Service Provides a channel or route for events or event-related data to be sent between event "producers" and event "consumers."
Time Service Permits a CORBA object to obtain the "current time," along with an error estimation that might be associated with it.
(Object) Query Service Provides query operations on collections of objects. This includes not only read-only queries, but also general manipulative operations such as insertion, deletion, and updating collections of objects.

The CORBA Facilities, or Common Facilities, are a group of services intended for common use by any CORBA objects or applications, but are not considered to be as fundamental to CORBA as COS.

P2P developers or architects might see CORBA as a solution for the hybrid-P2P models (such as Napster) because of its discovery, transactional, and strong interoperability features. Because of the complexity of the specification and the loose definition of certain components of the ORB, CORBA has not gained significant traction in the enterprise model. It is unlikely that CORBA will be leveraged in a significant number of P2P architectures.

DCOM

DCOM, or Distributed COM, is Microsoft's distributed component or distributed object system (COM stands for Component Object Model). Often called "COM on the wire," DCOM supports remote objects by running on a protocol called the Object Remote Procedure Call (ORPC). The ORPC layer is built on top of DCE RPC (RPC is a venerable remote procedure call system) and interacts with COM's runtime services. As long as COM is available on a platform, DCOM can be used there as well.

DCOM is built on top of COM, which has gone through various incarnations as Dynamic Data Exchange, Object Linking and Embedding, COM itself, and ActiveX, or Internet-oriented COM. DCOM features the usual suspects. Stubs and skeletons are present, or in DCOM parlance, proxy on the client side, and stub on the server- or component-side. DCOM also features object activation, which is the capability for a new remote object or component to be created as the result of client calls, a very useful feature that both RMI and CORBA also provide.

Like CORBA, DCOM claims language neutrality, and in practice, for the Java practitioner there is several Java-COM bridges that permit interoperability between Java objects and COM/DCOM objects. C++, Visual Basic, Delphi, and COBOL among other languages also work with COM/DCOM. Heavily used on Microsoft's platforms, DCOM is also available for Unix and Linux, and for mainframe platforms as well.

P2P developers or architects might see DCOM more or less in the same category as CORBA, perhaps with less interoperability. Java developers might be less interested than others due on the one hand to the wealth of more Java-friendly tools available, and on the other hand to Microsoft's openly hostile attitude towards Java.

Web Services, XML-RPC, and SOAP

Web services, SOAP, and XML-RPC are the buzzwords du jour. Actually, they might represent a major shift in software applications and interoperability, something of more than a little interest to anyone concerned with integration. We briefly introduce Web services and SOAP in Chapter 4, and they are discussed in greater depth in Part III of this book, so we'll just skim over the top a bit here integrators should at least be aware of the concepts and options available for Web services.

First some basics: What are Web services? They are self-contained, modular applications or services that can be described, located, and activated or invoked across the Internet. Some consider Web services to be another attempt at distributed objects. This school of thought sometimes considers CORBA, DCOM, RMI, and Enterprise JavaBeans failures because they're too heavyweight, tightly coupled, and platform-, language-, or company-centric.

Usually, when discussing Web services, we mean platform- and language-neutral, lightweight, XML-based applications that use HTTP for a network transport, and generally enable the publish/find/bind cycle. Service builders or providers publish service definitions or interface descriptions (using a neutral description format such as an XML dialect) to a service registry or brokerage. Service consumers or users find the services using the service registry. They then bind them, meaning they map requests for specific data or functionality to the service interface they found via the service registry or brokerage.

Web services can be viewed as the next step in an intelligent evolution of software architectures toward more modular, loosely coupled, technology-independent systems that are easy to build by composing discrete chunks of software…okay, maybe you've heard this one before. Perhaps it is another go at distributed objects. But for the first time, open and standards-based technology are driving the evolution; and considering the widespread adoption among such movers and shakers as Microsoft, IBM, Sun, Oracle, et. al., there is a good chance that much or most of the distributed application development over the next few years will use Web services. P2P systems are no exception the publish/find/bind cycle is very useful for distributed peer computing systems.

XML-RPC is a remote procedure call technology that uses HTTP as the transport and XML as the encoding language. It's designed to be simple and lightweight, enabling programming language-independent method or function calls to be sent across the network.

From the XML-RPC specification at Userland's www.xmlrpc.com:

XML-RPC is a Remote Procedure Calling protocol that works over the Internet.

An XML-RPC message is an HTTP-POST request. The body of the request is in XML. A procedure executes on the server and the value it returns is also formatted in XML.

Procedure parameters can be scalars, numbers, strings, dates, and so on; and can also be complex record and list structures.

SOAP is a more complex or comprehensive version of XML-RPC (okay, let the tar and feathering start some people bristle when they hear that definition).

SOAP probably has the upper hand because it's the preferred Web services transport. XML-RPC represents a shorter path for the developer that wants to use a simple HTTP-based XML procedure call mechanism without dealing with the power or complexity of SOAP.

If we return our focus on P2P architectures, we can imagine a peer whereby it runs both as a client/server, depending upon the role that it is playing during a conversation with another peer. In this case, servers are now talking with other servers as peers, but in different roles. We can see how some of the classic distributed computing protocols and newer ones like SOAP can be applied to these new software architectures.

Other Systems

Beyond the approaches that we have reviewed, we must also recognize that the IT landscape is quickly changing as vendors attempt to solve many of the distributed computing problems. It's likely that other systems and standards will emerge that will bridge the gaps in interoperability.