An heuristic outcome to a transaction is simply the worst thing that can arise during a 2PC-based transaction system. They may occur where a participant in the transaction reneges on its promise to either commit or abort and instead does the exact opposite. This means that the ACIDity of the transaction is compromised, since changes to data in some participants will have been made durable, while changes to data managed by other participants will have been discarded. As was alluded to earlier, the chances of such an outcome occurring are significantly increased if the period of uncertainty (the gap between prepare and commit phases) is too long, and we may arrive at a situation where participants in a transaction start to guess at outcomes for themselves (that is, they may make heuristic decisions).
There is a glimmer of hope with a potential heuristic outcome since it's not always the case that mixed responses during the second phase and will ultimately cause an actual heuristic end to the transaction. Even though some heuristic outcomes can be handled by the transaction manager, the best rule of thumb is simply to try to avoid them altogether. Sometimes if the events leading up to potential heuristic outcomes happen early enough in the second phase, the remainder of the phase can be automatically brought into line. This is, however, a special case. More often than not an heuristic outcome cannot be avoided and it is left to mechanisms outside of the system (including good old human-to-human communication) to deal with the aftermath. Given the gravity of heuristics, it's important to understand exactly how they can arise and be dealt with. Consider the example shown in Figure 7-16, which depicts the second phase of the two-phase protocol. The first participant (System A) responds to its commit message (Message 2) with an abort response. That response may have been caused because of a failure or perhaps because the period of uncertainty was so long that System A decided to unilaterally abort the transaction to free up resources. This is clearly a violation of what it originally must have agreed to undertake in the first phase or the transaction coordinator would not have issued a commit to it. However, in this case the transaction coordinator can leap into the breach knowing, as it does, that no other participants have yet been contacted, and that it can alter the messages that it would have originally sent from commit messages to abort messages as shown by Message 4. In this case none of the other (hopefully better behaved) participants will be any the wiser about the close shave with the heuristic outcome, and will instead be instructed to discard any changes to their state.
Figure 7-16. Automatically avoiding a heuristic outcome.
If the contradictory responses from a badly-behaved participant do not occur early enough in the second phase, then we inevitably end up with an heuristic outcome, and potentially a whole lot of work poring over logs trying to piece together what actually happened. The kinds of situation that will eventually lead to a heuristic outcome are seen in Figure 7-17 and Figure 7-18, where the heuristic commit and heuristic abort scenarios are shown.
Figure 7-17. Heuristic Commit.
Figure 7-18. Heuristic Abort.
The heuristic commit outcome shown in Figure 7-17 has arisen because System B has decided to unilaterally commit the changes to its data even though the transaction coordinator has specified otherwise. Figure 7-18 shows an almost identical situation where a participant that should have committed decided instead to unilaterally roll back. The ultimate cause of either heuristic outcome, from the point of view of the transaction coordinator, is Message 4, which is the contrary message to that which the transaction coordinator was expecting (commit vs. abort or vice versa). The coordinator cannot retroactively recover from this situation by re-contacting the participants it contacted earlier because as far as they are now concerned the two phases are complete, and thus the transaction itself has finished. It therefore has only one choice: to report the error back to the client application and let a third party resolve any inconsistencies that have arisen.
There is simply no way that the two-phase protocol can internalize the problems caused by heuristics it simply does not have enough attempts at communicating (having only one chance per phase) with participants to let them know of any heuristic decisions which have arisen. This begs the question, "Why not simply add an extra phase to communicate any heuristic decisions to participants?" In theory this is a good idea insofar as it would solve the heuristic problems we've encountered while using the two-phase algorithm, because we would always have a phase where we could warn unwary participants about heuristics and let them take the appropriate restorative action. If we were using a three-phase commit protocol, an additional set of message exchanges would occur after the normal commit/abort messages to propagate the details of any heuristics. In practice, a three-phase approach is used in some transaction systems for use in large-scale and lossy environments. However, we have now introduced an opportunity for the third phase to cause an heuristic outcome, which in turn would have to be handled by a fourth phase. Of course, the fourth phase is not immune to mixed outcomes and so we would require a fifth phase and so on.
To be entirely sure about the validity of the Nth phase of a transaction, we always need an N+1th phase it's just a matter of how big we are realistically going to let N grow to. There is clearly a trade-off between the number of phases required and the performance of our system (in terms of the number of message exchanges required), which is mediated by the architectural characteristics of the underlying system that required transactional support. In general two-phase commit is a good trade-off between reliability and performance, which is why it has been the fundamental algorithm of choice in most commercial transaction-processing software.
Advanced Topics: Nesting and Interposition
In our whistle-stop tour of transactions so far, we've tackled a majority of the fundamental concepts. However, this isn't yet the end of the transactions story. Before we can delve into how transactions can be applied in a Web service environment, we need to tackle two more advanced transaction concepts: nesting and interposition. Don't be too intimidated. Although these are considered advanced topics in terms of their deployments and the benefits they provide, they're actually not incredibly difficult to understand.
In a Web services world, we are generally trying to tie together coarsely-grained business processes that typically take significant periods of time to execute and, thus, there are plenty of opportunities for a piece of work to fail along the way. Nesting transactions is a useful means of structuring a piece of work to provide failure isolation. What this actually means is that a transaction can contain smaller subtransactions that can fail and be retried without having to necessarily abort the whole transaction. In a long-running piece of work this can be especially valuable, as is apparent in Figure 7-19, where an especially long-running piece of work has been split into a number of smaller units. While there is a cost associated with the management of each individual transaction (so more transactions means more cost), in this case that cost has been more than worthwhile since the final subtransaction has failed. Had this failure and subsequent abort caused the abortion of a week's, a day's, or even an hour's worth of work, then it would be annoying to say the least, and potentially quite costly. Once the failure that caused subtransaction E to abort has subsided, it can simply be replayed (avoiding the replay cost of the whole parent transaction) and allow the parent transaction itself to reach completion.
Figure 7-19. Nested transactions to provide failure isolation.
On the other hand, interposition is a more subtle transaction structuring strategy through which we may improve performance and offer better encapsulation by allowing additional transaction coordinators to become involved in the management of a single transaction. This is shown in Figure 7-20, where Enterprise B, instead of enrolling its back end systems directly in the top-level transaction, has enrolled its own interposed coordinator instead.
Figure 7-20. Interposed transactions for performance and encapsulation.
At first glance it might appear that this interposed scheme complicates matters rather than provides benefits. However, there are a number of reasons why, in certain circumstances, it makes sense to utilize this scheme, such as:
In this case Enterprise B becomes a subcoordinator for Enterprise X (and for any of its own locally enlisted participants), and the messages exchanged between Enterprise B and Enterprise X will identify Enterprise B, not the top-level transaction manager, as the coordinator. The fact that the coordinator has changed is transparent to Enterprise X, and the top-level coordinator does not even see Enterprise X at all.
The work that Enterprise B must now do when it receives the termination message from the top-level coordinator is different from when it was just a "plain" participant. It must act as a coordinator, running its own local first- and second-phase messages exchanges, and relaying the aggregation of these messages to the top-level transaction coordinator.
As with nesting, interposition provides a really useful abstraction for transactions, especially in the Web services architecture. Since trust is fundamentally lacking between arbitrary hosts on the Web, then the ability for a service to work with its own trusted coordinator is compelling, and where services are not mutually visible, the case is vital.