The simple example in the previous section is a one-directional map: it flows from left to right. If we want a general-purpose architecture for the exchange of information, data will need to flow in multiple directions. Figure 4-6 expands on our routing and transformation concept by replicating the process on both sides of the communication, allowing the multiple forms of data to flow in all directions. Figure 4-6. Generic data exchange architecture using a canonical message format 4.4.1 Data Translation to and from a Canonical Format In the middle of this generic data exchange architecture is a canonical, application-independent version of the data. A best-practice strategy in integration is to decide on a set of canonical XML formats as the means for expressing data in messages as it flows through an enterprise across the ESB. Companies often use their already-established enterprise data models, or, alternatively, adopt industry-standard message formats as the basis for their own canonical representations. By doing so, companies standardize the definitions for common business entities such as addresses, purchase orders, and invoices. The integration architect then works with the owner of each individual application to transform any application content into and out of this canonical format. By having an independent representation of the data, all application teams can work in parallel and without requiring advanced agreement between them. When extensions to the canonical format are required, they can simply be added as extra XML attributes and elements, relying on the inherent flexibility of XML. As the applications change, the impact is limited to the transformation into and out of the canonical format. In addition to XML's inherent flexibility, application owners have a great deal of freedom to enhance their applications and take advantage of new information within the canonical format. There have been several attempts within the industry to provide a standardized representation of business data, and the results of these attempts have achieved various degrees of adoption. For example, in the heyday of electronic marketplaces, CommerceOne made popular the xCBL format, Ariba has promoted cbXML, and RosettaNet uses something called Partner Interface Processes (PiP). Some businesses have gone ahead and created their own proprietary XML formats. The level of "format" is at the business-object level; e.g., the definition of a purchase order or shipping address. What this means to you, the integration architect, is that there are a number of XML dialects and overlapping conventions out there for representing business data in XML. Your business partners may have already standardized on one of these dialects, but you can probably safely assume that at least a few of your business partners haven't standardized on XML at all just yet. Most of your business partners are probably limited to some kind of fixed format for exchanging data. For example, in financial services you have come to rely on FIX and SWIFT; in healthcare, you probably rely on HIPPA and HL7. Furthermore, the partners that have standardized on XML probably haven't decided on the same dialect. 4.4.2 Adopting a Canonical Data Exchange One scalable strategy that can be built upon is to use the routing and transformation technique that was explored previously and create a set of common XML formats for use in your ESB. These XML grammars become your company's internal standard and comprise "MyCompanyML," a markup language for data interchange within your company. Many organizations choose to build upon an existing standard, such as xCBL, rather than starting from scratch and inventing their own. All applications that communicate through the bus can share data using these formats. Does this mean that all applications need to be modified to speak this new dialect immediately? No. The ESB can convert data flowing onto the bus or off the bus, depending on the formats supported by the specific applications that need to be integrated. The recommended approach that is being used today in many ESB integrations is to create transformation services that convert the data to and from the common XML format and the target data format of the application being plugged in, as illustrated in Figure 4-7. Figure 4-7. Transformation and routing can facilitate a generic data exchange with a canonical XML format as the native data type of the ESB Figure 4-7 shows the series of steps that occur when a document flows through an enterprise using a canonical XML data exchange. The space in the middle between the applications and services is intentionally depicted as nebulous so that we can focus on the exchange concepts rather than the physical details of the underlying ESB. The details of what's in that center the core architecture of the ESB will be addressed in the next three chapters. The following are the processing steps illustrated in Figure 4-7, beginning from the top left in a clockwise direction. Each step is treated as an event-driven service that receives a message asynchronously using reliable messaging, and forwards the resulting message on to the next step after processing. An external partner sends an XML or SOAP message over the HTTP protocol. Once inside the firewall, the message is assigned to a business process that controls the steps through the ESB. The message (M1) is run through a transformation service. The transformation service converts the XML content from the data format used by the partner to the data format used as the canonical XML format. The resulting message (MC) is then forwarded onto a "splitter" service. The splitter service has the sole purpose of making a copy of the message and routing it to an audit service. The audit service may add additional information to the message, such as contextual information about the business process and a timestamp of when the message arrived. The audit service itself could be implemented as a native XML persistence engine that allows the direct storage and retrieval of XML documents. This subject will be covered in Chapter 7. The original message is forwarded on to application YC, which operates on the message; that is, it consumes the message and processes it. Conveniently, this particular application already knows how to consume messages in the canonical data format. As part of the business process, the message now needs to be forwarded to application Y2, which understands only its own proprietary format. Therefore, before moving on Y2, the MC message is routed to a transformation service that converts it to M2, which is the target data format required by application Y2. The transformation service sends the message M2 to the application Y2. After being consumed and operated on by application Y2, the message now needs to get routed to another business process. This other business process can be invoked in a variety of ways from a number of different places within the bus. The process can't ensure that the message is already in the canonical format. Therefore, it has a CBR service that examines its content and determines whether the message needs to be transformed. In this case, the CBR service has identified the message as being in M2 format, and routes the message to a transformation service to get converted to MC format before getting delivered to application ZC. Application ZC is responsible for generating an invoice and sending it asynchronously back to the partner. Before doing this, however, the invoice needs to be converted into the format that the partner understands. Therefore, it is routed to another transformation service that converts it from message MC to message M1. In addition, this service constructs the SOAP envelope around the message (if required by that partner). The invoice message is delivered to the partner asynchronously using the protocol that is appropriate for dealing with that partner. 4.4.3 Alternate Approaches The individual steps in the previous section could have been implemented in a number of different ways. Steps 2 and 3 could be combined; the transformation service in Step 2 could have done the splitting up of the message without a separate splitter service. An XSLT stylesheet can be written so that a "splitter" or "fan-out" operation can be performed while the data transformation is occurring. For example, a single XSLT stylesheet can perform a transformation that converts a purchase order from cXML to xCBL, and also splits off the line-item details into separate messages for the supply chain applications that need to process them. The dual transformation from MC to M2 for processing by application Y2, and then back again into MC, could have been done another way under different circumstances. The format of M2 could have preserved the original message content in its MC format, and appended the translated content to the end. That way, when the message moved on to the next step, it would still have its original MC message content intact and would avoid the need for an additional translate step. However, this method doesn't work in this particular case. Here, the Y2 application also needs to enrich the message with content that is required by the next application in the process. The application ZC, which generates the invoice, could have simply generated it in the target M1 format that the partner expected. However, many partners have their own formats and protocols that they prefer to communicate with. ZC is a generic invoicing application that needs to know only the canonical XML format. Specialized transformation services that know how to convert the canonical invoice message MC to the specific targeted formats, such as EDI, PiP, or proprietary flat file format, can take on that responsibility separately. Even the management of the multiple partners and their protocols and formats can be separated into its own service. At first glance, it may seem a bit exorbitant to require a transformation engine for each and every application that plugs into the ESB. However, in contrast with the point-to-point transformation solution, this method can reduce complexity over time, as the number of applications on the bus increases and as changes are introduced. When using specific point-to-point transformations between each application, the number of transformation instances increases exponentially with the number of applications. This is commonly referred to as the N-squared problem. With the canonical XML data exchange approach, the number of transformations increases much more linearly as new applications are brought into the integration. Applying the canonical XML data exchange technique on a larger scale yields the following benefits: Each application needs to focus on only one type of transformation to and from a common format. This illustrates an important philosophy of the bus that will be reinforced throughout the book. If you are the owner of an individual application, your only concern is that you plug into the bus, you post data to the bus, and you receive data from the bus. The bus is responsible for getting the data where it needs to go, in the target data formats that it needs to be in, and using the protocols, adapters, and transformations necessary to get there. New applications being written to plug into the bus can use the canonical XML format directly. And multiple applications not anticipated today can tap into the flow of messages on the bus to create heretofore unimagined uses. ESB services such as the CBR and the splitter can be written to use the canonical XML format. As we will see in Chapter 7, a service type can be written once and then customized on a per-instance basis by supplying different XSLT stylesheets, endpoint definitions, and routing rules. Having an agreed-upon format for common things such as address tags and purchase order numbers can be a tremendous advantage here. Standard stylesheet templates and libraries can be created and reused throughout the organization. Does it matter which XML dialect you choose for the native format of the ESB? That would depend on what the majority of the applications already speak. If much of your in-flight data is destined for a particular target format, and that target format is sufficient for generically representing all possible forms of your business data, go ahead and standardize on that XML format as the native datatype for the ESB. An example of such a format is xCBL. xCBL is a standard for describing things such as addresses, purchase orders, and so on, and was jointly developed by SAP and CommerceOne during the early days of dotcom and public exchanges. Like many other standards efforts, xCBL has not gone on to dominate the XML industry. However, because it was codeveloped by SAP, the message schema has a high degree of affinity to SAP's IDOC elements and terminology. The only basic recommendation here is that the format be XML. As you now know, many advantages can be realized when XML is the native datatype for in-flight documents within the ESB. And there are many more advantages of using XML as the native datatype, as will be discussed throughout this book. |