Section 8.1. Data Versus Process Integrity | Enterprise SOA: Service-Oriented Architecture Best Practices

8.1. Data Versus Process Integrity

Process integrity is not necessarily a well-established or well-defined concept. The core building blocks of process integrity are based on the widely established concepts of data integrity. However, as we will see in the following, data integrity is insufficient for addressing all integrity requirements of complex business processes spanning multiple IT systems, which is why it is necessary to introduce process integrity as a concept that helps to address these more complex and demanding requirements.

8.1.1. DATA INTEGRITY

Data integrity is an umbrella term that refers to the consistency, accuracy, and correctness of data. The classical mechanisms for ensuring data integrity are often closely tied to the concepts of relational databases. The primary types of data integrity include entity, domain, and referential integrity. Entity integrity requires that each row in the table be uniquely identified. Domain integrity requires that a set of data values fall within a specific range (domain)for example, a birth date should not be in the future. Referential integrity refers to the validity of relationships between different data tuples. Finally, user-defined data integrity refers to types of data integrity that usually cannot be enforced by commercial database tools. User-defined data integrity is typically enforced using a data access layer, triggers, and stored procedures. These are all fairly technical concepts, and typical business requirements for data integrity go far beyond technical concepts. These are all fairly technical concepts, and typical business requirements for data integrity go far beyond technical concepts. These concepts are typically limited to a single database. As you will see in the following, more flexible concepts for ensuring process integrity are required if you are leaving the domain of a single database or application system.

8.1.2. PROCESS INTEGRITY

The problem with complex business processes that span multiple IT systems goes beyond the issues of traditional data consistency. In these kinds of situations, we are not dealing with short-lived updates of data contained in a central repository, but instead with long-lived processes that cross multiple systems. These processes do not often have a well-defined state because it is not possible to obtain access to all the participants all the time, a requirement necessary to determine the process state. This is particularly true for processes that span the boundaries of business units or enterprises. Take the example of a manufacturer who receives product parts from a supplier. If the manufacturer receives a last-minute cancellation for an order, there will be time intervals where the internal systems reflect this cancellation while the order for related parts is still with the supplier. This is what we refer to as a process inconsistency.

8.1.3. TECHNICAL FAILURES VERSUS BUSINESS EXCEPTIONS

The key to maintaining process integrity is to ensure that failures within or between the execution of the different steps that comprise a complex business process are captured and the necessary steps taken to resolve the problem. It is necessary to differentiate between technical failures on one hand and exceptions and special business cases on the other:

Technical failures. Technical failures include database crashes, network problems, and program logic violations, among many others. Often, these problems can be addressed in their respective context, for example through backup and recovery mechanisms (provided by the DBMS; e.g., transactions), retries (in the case of network problems), exception handlers (e.g., a Java catch clause), or process restarts (e.g., after a process terminated because of a C++ NULL pointer exception). However, in many cases, technical failures must be addressed using more complex, often custom-built solutions. For example, systems must cope with network problems by temporarily storing a process state locally until the subsystem to which it attempted to connect can be reached again.
Business exceptions. Business exceptions can range from very simple exceptions to arbitrarily complex ones. An example of a simple exception is an attempt by a customer to book a flight on a date that lies in the past. Such a simple domain inconsistency (see the previous discussion on data inconsistencies) can be addressed at the database level. For the sake of usability, it can be handled directly in the user interface (e.g., Java script in the browser). However, simple domain inconsistencies have local impact only. An example of a more complex business exceptionwith a more proliferating impactis an out of stock exception, for example in an online Web shop. A straightforward solution is to tell the customer that the requested item is not available. However, in the real world, this is unacceptable. A better option is to trigger a process such as reorder item to ensure that the item is available the next time a customer wants to buy it. However, this might also be unacceptable because the customer is still lost. The best solution might be to constantly monitor inventory and reorder in advance. This avoids or at least minimizes out of stock situations and leads to the discussion on special cases.
Special cases. In almost all complex business processes, the complexity lies not in the happy path case but in special cases that are dependent on the context of the process. For example, a trading system might choose completely different execution paths, depending on the type and size of trade and the customer's risk profile. The business process, under consideration, could comprise a credit check of the customer. A negative result of the credit check could be either an exception (resulting in a refusal of the trade) or a special case requiring a different approach (e.g., resulting in another subprocess asking the customer to provide additional collateral). In many situations, the entire business process is a "special case" from a process integrity point of view. A good example is that of airline seat reservations: in order to ensure maximum utilization of airplanes, many airlines deliberately overbook flights, making the assumption that a certain percentage of bookings will be canceled or that some passengers will not show up. Such systems are constantly optimized to find the optimum level of overbooking, based on recent flight statistics. Although at first sight, it appears inconsistent to overbook a flight, achieving process consistency in this case requires finding the "right" level of overbooking.

The boundaries between business exceptions, special cases, and complex processes such as a flight booking are not black and white. Often, each problem scenario requires its own specific solution. In this chapter, we concentrate mainly on problems that are on the exception or failure side of the equation and not so much on special cases. However, in many cases, problems (exceptions or failures) at the technical or business level lead to process inconsistencies that cannot be immediately addressed and that must be treated as special cases.

8.1.4. WHO OWNS THE PROCESS LOGIC?

Process logic is rarely centralized but instead is spread across different systems, which makes it hard to devise generic strategies for ensuring process integrity. Although in the ideal world, all key process definitions would be managed by a central BPM (Business Process Management) or Workflow Management System (WMS), this is rarely the case. Although centralized management is reasonable in the B2B world because each company wants to retain control over its own processes, even a single company is unlikely to have centralized processes (or at least process implementations captured in a central system).

Take the example of a manufacturing company that has two systems, one for order processing and one for billing. This is quite a common scenario in many companies. The two systems are synchronized through nightly batch updates, where the order processing makes all the changes and additions that came in during the day available to the billing system using FTP (File Transfer Protocol). Assume that a customer makes an order one day and cancels the order the next day. The information about the order has been passed from the order processing system to the billing system during the nightly batch. The next day, the customer cancels the order, and the order processing system is able to stop the delivery of the order to the customer. However, the billing system cannot be notified until the following night, when the next batch run is executed. In the meantime, the billing system might already have debited the customer's account. Upon receiving the cancellation the next day, the billing system now must provide the customer with a credit note or take alternative steps to undo the debit. In the real world, the special cases caused by the decoupling of such systems are much more complex and have often led to a situation where individual systems have grown to a tremendously large size with a huge internal complexity. Of course, we now have a wide range of middleware technologies to enable real-time data exchange between systems where process logic is split between different subsystems, but this type of batch scenario is still a reality for most companies.

By examining the many enterprise IT architectures that have grown over decades (the famous Gartner Integration Spaghetti comes to mind), we will see that the majority of enterprises have no centralized, enterprise-wide workflow systems but rather that workflows are deployed more implicitly. Instead of a clean separation of application logic and business rules, you will often find that the logic comprising a particular logical workflow or process is scattered across a multitude of systems and subsystems, buried in new (e.g., Java-based) and old (e.g., COBOL-based) applications, tied together using point-to-point integration or middleware hubs. Even within a single application system, business logic is likely to be spread across the presentation tier (fat clients or presentation servers such as ASP or JSP), the middle tier (EJBs, Web services), and the database tier (e.g., stored procedures).

All this makes it very hard to realize consistent processes. Regardless of how implicit or explicit processes are realized in a distributed system (and how deliberate the decision for using or not using a dedicated BPM or workflow product), process integrity is one of the most difficult problems to solve.