Distributed Transactions and Two-Phase Commit | Developing Enterprise Web Services: An Architects Guide: An Architects Guide

The evolution of transactions from centralized to distributed systems has followed the evolution of computing from a centralized resource model to its modern day distributed and federated architectures. The underlying goals in distributed transaction processing are the same as in the traditional centralized model: to ensure a unit of work either successfully completes or logically appears to have never been run at all.

To achieve these goals, distributed transaction processing systems have attempted to keep the notion of ACID transactions as their dominant paradigm. While this might at first seem a tough goal to coordinate multiple systems through a transaction, the key algorithm which enables ACID across systems is well known: Two-Phase Commit.

Two-phase commit (or simply 2PC) is an algorithm that tries to achieve consensus between all parties that are participating in a transaction before allowing the transaction to fully terminate. 2PC does this by first gathering consensus among the distributed parties as to whether they should commit the transaction, and then relaying the group decision to all of those parties. Again the canonical example is the debit/credit problem, which in its distributed form can be extended across multiple systems. In this case we will assume that the creditor's and debtor's bank accounts are held by different divisions of the bank (e.g., personal and commercial sectors) and as such are hosted by different computing systems.

The Two-Phase Commit Approach

The two-phase commit protocol is the absolute bedrock of the transaction processing world, even, as we shall see, fundamental to executing transactions over Web services. Given its importance, it's a worthwhile exercise to understand the model and assumptions on which 2PC is based.^[1] The assumptions on which 2PC is founded are relatively straightforward and common sense, simply stating up front the standard operating conditions which we would expect for a transaction to be able to make headway. These assumptions are:

^[1] This is a much simplified and shortened description of the two-phase algorithm. For an in-depth explanation, see "Principles of Transaction Processing" by Philip Bernstein and Eric Newcomer (Morgan-Kaufmann, ISBN 1-55860-415-4).

During its work phase, a transaction accesses some underlying resources (such as databases, queues, or messaging systems) from time to time. If a fatal error occurs at any point while using these resources, the transaction is aborted immediately; otherwise it proceeds to run the two-phase protocol at the termination phase.
The underlying resources that the transaction touches have control only over the state to which they have direct access resources cannot coerce one another into taking a particular decision as to whether to commit or abort the transaction, and it is usually the case that the resources are mutually unaware of one another.
There is one and only one program that issues the instruction to commit the transaction.
All underlying resources that are participating in the transaction must have completed their work before the termination phase can begin.
Any failures in the system result in the failed component halting completely and remaining silent for the remainder of the transaction. This algorithm does not allow for so-called "Byzantine" failures where a component can fail in such a way that it sends out erroneous and misleading messages to other components rather than failing silently.

The protocol itself is straightforward. Each of the two phases has a specific task to perform. The first phase gathers consensus among the participating resources as to whether they agree to proceed with the transaction by asking each to cast a vote to indicate whether they are prepared to proceed. The second phase simply informs each of the participating resources whether the outcome of the vote was to commit (where every single resource signaled that it was prepared) or to abort (where at least one resource signaled that it was not prepared).

The question then arises, "How does this apply to our debit/credit example?" The answer is that from the point of view of the transaction's work, everything is business as usual with the caveat that the underlying resources may be less performant since they are no longer co-located with the transactional program itself with transaction coordination and application messages being sent across the network.

The high-level architecture of the distributed scenario is shown in Figure 7-6 and Figure 7-7. Figure 7-6 simply shows the work aspect of the transaction, where the debit and credit operations are delimited by a transaction begin/end pair. The application-level messages involved in performing the work now travel outside of the application that hosts the work and onto other systems, in this case System A and System B.

Figure 7-6. The application message paths in a distributed transaction.

graphics/07fig06.jpg

Figure 7-7. The transaction coordination message paths in a distributed transaction.

graphics/07fig07.jpg

Correspondingly, Figure 7-7 shows the transaction coordination messages that are exchanged to facilitate the desired consistent outcome. Notice that in this scenario, a (trusted) third party, in the form of a transaction coordinator, appears whose function is to mediate the exchange of transaction control messages in accordance with the two-phase commit protocol.

If we now re-examine the debit/credit scenario, we can see how the two-phase commit protocol can help to maintain the same ACID guarantees that we had in the centralized transaction processing engine. Starting with the client application (the party that performs the actual work associated with the transaction) initiating a transaction through a begin imperative (from the underlying transaction API) that, instead of being a local call, is sent across the network transaction coordinator which initializes a transaction on behalf of the client. This initialization also returns a message to the client application containing details that the client will pass on to other resources that will participate in the transaction. This information, known as the context, ensures that all parties have a reference to the transaction and the coordinator that is managing it. This start-up phase is shown in Figure 7-8.

Figure 7-8. Beginning a distributed transaction.

graphics/07fig08.jpg

Once the transaction has started and we have a context within which we can undertake the work, the client then interacts with the resources performing the necessary business logic. The application message sent from the client to System A that causes the debit to occur also carries the transaction context, which the transaction-aware resource knows how to deal with. In this case it is the first time that the resource has seen the context so it uses the information to enlist itself with the transaction coordinator, which is shown in Figure 7-9. From now on when System A receives messages about this transaction from the coordinator, it will be required to respond with messages indicating its intent or action. Once the enlist operation has finished, the remainder of the operations in our example follow the same pattern as the centralized example, whereby the debit operation locks the data and performs a shadow write (or equivalent) at the database, but it will not yet cause the online data itself to be updated.

Figure 7-9. The work phase: Debit.

graphics/07fig09.jpg

A similar set of operations happens with the creditor's account where System B also enlists itself with the transaction coordinator, as shown in Figure 7-10, and has updates written to some persistent store but has not yet merged with the main online data.

Figure 7-10. The work phase: Credit.

graphics/07fig10.jpg

Once the work phase of the transaction is complete, the termination phase is started. In this case the client reaches the end of its work and issues a commit message to the transaction coordinator to start the two-phase commit protocol, which is then completed entirely under the auspices of that coordinator. The first of the two phases is voting where each participating resource is asked whether it is prepared to proceed through to completion with the transaction. The message exchanges performed during the voting process are shown in Figure 7-11 and Figure 7-12.

Figure 7-11. The two-phase commit: Phase 1, Soliciting Votes.

graphics/07fig11.jpg

Figure 7-12. The two-phase commit: Phase 1, Reconciling Votes.

graphics/07fig12.jpg

Once participants have cast their votes, they are bound to follow their decision, even if they fail. So if System A fails after it has cast a positive vote but before it has made the changes to my account, it must ensure that when it recovers, it honors its pre-crash commitments in the same way that a centralized system would. In the case shown in Figure 7-12, both of the participants have voted positively (they have stated that they are prepared) and so the transaction manager relays the information to all participants that they should make any changes to their state durable by issuing a commit message to them. The second phase is shown in Figure 7-13 where the participants receive commit messages, on receipt of which they must update their online data and remove any locks held to allow the progression of other transactions.

Figure 7-13. The two-phase commit: Phase 2, Committing Resources

graphics/07fig13.jpg

The final step in the completion of the transaction is for the coordinator to report back to the client that the work has been successfully completed, thus allowing the client to safely terminate. The message exchange underpinning this simple last step is presented in Figure 7-14.

Figure 7-14. The two-phase commit: Phase 2, Returning Successfully to the Client.

graphics/07fig14.jpg

Up until now things have gone rather well, at least from the point of view of the banks that have suffered no system failures and the creditor who has received his money. However, transactions exist to mask system failures and we would feel cheated if we hadn't seen how a transaction-based infrastructure can maintain data consistency across systems even in the presence of failures. Let's revisit the same scenario, but this time add a sprinkling of mayhem into the system to create a few failures along the way.

The easiest of the failure cases to deal with is when a failure occurs during a transaction's work phase where no state changes have been made. In such cases the transaction is simply aborted, and any locks are released by live systems. Any locks still held by failed systems will be released on recovery. In such cases we simply try the work again, once all of the systems are up and stable.

The more devious and interesting scenarios crop up when failures occur during the termination phase of the transaction. Let's consider the scenario where one of the participants for some reason (usually a failure of some description) refuses to progress the transaction to completion, such as the example in Figure 7-15.

Figure 7-15. Inconsistent outcome to voting.

graphics/07fig15.jpg

In Figure 7-15, System B has suffered some kind of failure (or for some reason has made a unilateral decision to abort), which has resulted in it responding negatively to the voting request. In this case the transaction coordinator has no choice but to insist that both the parties in the transaction abort any work that they have done, which it communicates to the participants with an abort message. The participants then discard any partial updates to the data and release any locks. Having ensured that the data will be left in a consistent state, the coordinator returns a message to the client application to indicate that the transaction aborted.

It is easy to imagine variations on this theme such as when no response is forthcoming from a participant that has suffered a catastrophic failure during the first phase or when the transaction manager itself crashes (or even when the network connection between transaction manager and participant has been lost). In cases where a participant hangs indefinitely before the second phase, the transaction manager will not keep its patience and may choose to abort the transaction to release locks at some given time in the future, or to proceed to completion in those cases where assurances, in the form of positive votes, have been given. In any case, once the crash participant recovers it will have to ask the transaction coordinator for the outcome of the transaction. Where the transaction manager suffers a failure, the participants and client are simply left waiting for it to recover, at which point it will simply pick up where it left off. There is, however, a particular variant of this problem that exposes a negative aspect of two-phase commit where in between the first and second phases a participant is in effect left dangling, uncertain as to what the final outcome will be. The period of uncertainty is a nuisance since the longer it is, the longer resources will be locked, potentially delaying other work and increasing the likelihood that a participant will be forced to make a unilateral decision. Therefore, there is always a danger where a transaction coordinator crashes and does not recover in a timely fashion that the whole transaction may have to be aborted and rerun, with the risk of non-atomic behavior having occurred.

Nonetheless, transaction coordinator and participant failures, although irksome at times, are handled reasonably well by the two-phase commit approach. The two-phase algorithm is not, however, without its more serious drawbacks. In particular there is one aspect of the algorithm that can be of particular nuisance and that usually falls to a party outside of the transaction system to correct: heuristic outcomes.