Section 16.1. Transaction Overview


16.1. Transaction Overview

Before delving into the details of how transactions work, let's define some basic terminology. A transaction is a unit of work, a sequence of operations that succeed or fail as a single unit. Transactions guarantee that this unit of work either fully completes or is fully rolled back. Transactions are essential to the successful and robust working of a critical class of enterprise applications: those that make use of shared networked resources like databases, messaging services, and the like.

Figure 16-1 shows a simple conceptual model of a transaction. A transaction has three main operations: begin( ), commit( ), and rollback( ). You start a transaction with the begin( ) operation, perform a series of steps, and successfully commit with the commit( ) operation. You roll back the transaction by invoking the rollback( ) operation if a failure occurs.

Figure 16-1. Conceptual model of a transaction


Consider the following example. You want to move money from your checking account into your savings account. There is an operation to debit the money from the checking account and another operation to credit the money into the savings account. Both the operations must either succeed together or fail together. You group the debit and credit operations together in a single transaction. The complete sequence for this transaction is begin, debit checking account, credit savings account, commit if successful or roll back if it fails. Therefore, the entire transaction either succeeds or fails as a single unit. Without a transaction, failure or success of just one operation in this two-operation transaction will leave the system and the accounts in an inconsistent state. The application is unreliable if transactions are not implemented correctly.

Transactions enable a simplified programming model because the developers do not need to consider and program for all the error and exception conditions that can occur when using shared networked resources . When these errors and exception conditions do occur, transactions help with failure recovery by restoring the shared resource to a consistent state. The application simply receives and deals with a corresponding exception generated from the underlying transactional service(s), keeping with the general approach taken with other Java runtime services. This simplified model frees the developers from dealing with complex concurrency issues across multiple users and across multiple, potentially distributed, applications accessing the same networked resource at the same time.

Transactions work closely with one or more resources, as we've already mentioned. Most often, the resource is a database. Popular databases include Oracle, IBM DB2, Sybase, and MySQL. Other types of resources such as message-oriented middleware (MOM) destinations can also be transactional. A transaction does not directly interact with a resource. A resource manager manages the resource; the resource manager is built into the software that manages the resource (the database or MOM). A transaction is coordinated with the resource managers using a transaction manager, also known as a transaction processing (TP) monitor. Traditional standalone TP monitors include BEA Tuxedo, IBM TXSeries, and IBM CICS. But transaction managers can also be found embedded in resource managers themselves, such as some relational databases and messaging services and within J2EE containers and application servers. Figure 16-2 depicts the relationships among resources, resource managers, and TP monitors.

The transaction manager works by passing the same transaction context to the different resource managers participating in the transaction. The transaction manager keeps transaction logs of the work it has asked the resource managers to do.

Figure 16-2. Transaction processing monitor


16.1.1. Transaction Properties

Transactions have four main properties , referred to as ACID properties. Transactions are Atomic, Consistent, Isolated, and Durable.


Atomic

Implies that the transaction is a unit of work consisting of many operations as a single unit and that it succeeds or fails as one unit of work. In our bank transfer example, the debit operation and the credit operation are an atomic unit.


Consistent

Means that the effect of the transaction should leave the system in a known and stable state. The state of the system must be consistent regardless of whether the transaction succeeds or fails. In our example, the bank accounts must be in a consistent state whether the debit-credit transaction succeeds or fails. If the transaction succeeds, the accounts should reflect the adjusted amounts. If it fails, the transaction manager restores the accounts to their states before the transaction occurred.


Isolated

Implies that transactions do not intersperse intermediate updates with other transactions. The transactions execute serially even if clients perform them concurrently. Thus, concurrent transaction updates are isolated. In our example, if another transaction is acting on the same accounts while our debit-credit transaction is in progress, the other transaction must either complete (succeed or fail) before our transaction is executed or our transaction's operations must be committed or rolled back before the other transaction. The operations of the two transactions must not be interspersed.


Durable

Means that the effects of a transaction persist. In our banking example, after the successful completion of the debit-credit transaction, the effect of the transaction on the accounts is permanent.

16.1.2. Transaction Isolation Problems and Levels

Isolation, the "I" in ACID, can have various levels . Some breaches in strict transaction isolation might be necessary, or even required, in some contexts. Transaction isolation levels refer to the degree to which these isolation violations are allowed or prevented. The following section describes transaction isolation violations and the isolation levels that define the allowed isolation violations in a given transaction context.

16.1.2.1. Transaction isolation violations

It's easier to understand the various levels if you first understand what could go wrong if we didn't use various isolation levels to protect the transaction. Consider the following potential isolation violations that can arise in some transaction contexts:


Dirty read

A dirty read occurs when a transaction reads data that another transaction has changed but not yet committed.


Nonrepeatable read

Nonrepeatable reads occur when the same transaction reads the same data twice with different results. Consider the following scenario. In transaction A, you read data item X. Before transaction A completes, transaction B updates and commits the data in item X. A later operation within transaction A reads the same data again and sees the updated value of the data. The first data read in transaction A is nonrepeatable because transaction B's update has been allowed to intrude into transaction A's context.


Phantom read

A phantom read occurs when you read data, look again, and it is gone, or when you read data, look again, and new data suddenly appears. Consider the following scenario. In transaction A, you execute a query and obtain a result set. Before transaction A completes, transaction B inserts a row that matches the query used in transaction A, and this insert is allowed to be committed before transaction A completes. If transaction A executes the same query again, it will obtain a different set of data. This is a phantom read situation because, despite the supposed isolation of transactions, new data not created by transaction A has appeared in the underlying resource.

16.1.2.2. Transaction isolation levels

A transaction isolation level describes the degree to which the access to a resource manager by a transaction is isolated from the access to the resource manager by other concurrently executing transactions. This section describes the various transaction isolation levels, as well as the transaction isolation violations that they prevent or allow.


Read uncommitted

No isolation is provided when a transaction can read uncommitted data. This transaction isolation level is not useful for most business applications that require the ACID transaction properties. However, some data warehouse applications, especially those that principally read rather than update data, find this transaction isolation level useful. Dirty read, nonrepeatable read, and phantom read isolation violations are allowed at this isolation level.


Read committed

With this isolation level, a transaction cannot read uncommitted data and therefore dirty reads do not happen. Nevertheless, nonrepeatable reads and phantom reads can occur. This isolation level is the default in many popular databases such as Oracle.


Repeatable read

With this isolation level, dirty reads and nonrepeatable reads do not occur. Phantom reads can still occur.


Serializable

With this isolation level, no isolation violations are allowed. Dirty reads, nonrepeatable reads, and phantom reads do not occur.

Table 16-1 summarizes the relationship between transaction isolation violations and transaction isolation levels.

Table 16-1. Transaction isolation violations and levels

Isolation level

Dirty read

Nonrepeatable read

Phantom read

Read uncommitted

Possible

Possible

Possible

Read committed

 

Possible

Possible

Repeatable read

  

Possible

Serializable

   


Given its strict enforcement of transaction isolation, should you always select serializable as the transaction isolation level? The answer is no, because transaction isolation level and concurrency are inversely related. The higher the isolation level, the less concurrency you get. With less concurrency, the performance of the system suffers and the system supports fewer users. Therefore, you need to find a balance between the safety of higher isolation levels and the greater performance possible at lower isolation levels. This choice also has an impact on the complexity of your code. With the highest isolation levels, you need to worry less about concurrency issues in your code because the transaction manager is dealing with them for you. With lower isolation levels, you need to worry about the isolation violations that you're letting through, consider whether they might actually occur or not, and potentially deal with them yourself in your application code.

In many cases, the isolation level decision has been made for you. If your application interacts with a single relational database, for example, its transaction manager will have a default isolation level setting that has been optimized by the vendor for typical situations.

16.1.3. Transaction Models

A variety of different transaction models are available. To further complicate matters, each has some slight variations as well. The flat transaction model (FTM) is the simplest and most common. It specifies that only one transaction is active at any given time. Can you start another transaction at the same time? This can be handled in two ways. First, you can disallow starting another transaction. This is often too rigid, however. A second commonly implemented strategy is to suspend the current transaction and start a new one. Once the new transaction finishes, the original transaction resumes. In this scenario, the second transaction might commit while the first transaction rolls back. The outcome of the second transaction remains successfully committed.

Variations on the flat transaction model are quite common. One popular variation in the JDBC API is the use of savepoints. With savepoints, the resource periodically saves the work done within a transaction. Not surprisingly, the savepoint is the point at which the save occurs. The transaction can be partially rolled back to a specific savepoint. For more details, see Chapter 8.

Other transaction models, such as the nested transaction model (NTM ) and saga, were created to address some of the shortcomings of the flat transaction model. In NTM, transactions are nested, forming a tree of transactions with multiple transactions active at the same time. You can emulate nested transactions with savepoints. A saga is a sequence of flat transactions that work as a single unit. In the event of a rollback, the partial work is undone by applying compensating transactions to effectively undo each transaction in the sequence. The flat transactions in a saga are not isolated from one another.

A slight variation on these transaction models is the use of chained transactions. Some resource managers can be configured to automatically begin a new transaction once the current one is either committed or rolled back. This ensures that all operations performed by a client happen within a transaction, even if the client doesn't explicitly start one. Both the JDBC and JMS APIs implicitly assume a chained transaction model since they expose commit operations in their APIs, but the start of a transaction is assumed in the connection process. In JMS, for example, the client makes a Connection and creates a Session from it, which implicitly begins a transaction with the underlying message service. A similar model exists in JDBC for creating Connections and Statements.

Having said all this, the JTA itself remains fairly agnostic about the transaction model implemented by the underlying transaction manager. JTA implementations can support flat or nested transaction models, for example. If an application attempts to use the JTA to create nested transactions and the underlying manager supports only a flat model, a suitable Exception is thrown. In practice, most JTA implementations support only a flat transaction model, since this is sufficient for nearly all application contexts. In addition, the EJB specification requires only a flat transaction model for the EJB container's transaction manager. This decision was made in recognition of the fact that flat transactions are sufficient for the majority of applications out there and to allow vendors to more easily implement the transactional capabilities of an EJB container with existing transaction managers.

16.1.4. Distributed Transactions and the JTA

A local transaction is a transaction that communicates with just one resource with no transaction monitor involved. The resource manager manages a local transaction at the connection level. An application making a query to a single relational database, for example, can be managed by the RDBMS resource manager as a local transaction. On the other hand, a distributed or global transaction is a transaction that covers multiple resource managers and is coordinated by a transaction monitor. An application that wants to perform a transaction involving operations against several different database engines and a messaging service, for example, requires a distributed transaction.

Note that it's not the number or distribution of the clients that determines the need for a distributed transaction. When two different components in an application ask the application server to coordinate a transaction that accesses the same resource manager, a distributed transaction is not required. A single resource manager is involved in the transaction, so the application server can manage the request using a local transaction instead of a distributed transaction.

Various protocols and standards support, coordinate, and implement distributed transactions. The most popular standard is the Open Group's XA Protocol (X/Open XA). Others include the Open Systems Interconnect (OSI) transaction Processing protocol defined by the International Organization for Standards (ISO), and Systems Network Architecture (SNA) LU 6.2 defined by IBM.

The application or the application server leverages the JTA API programmatically or declaratively to manage distributed transactions. In addition to its general-purpose transaction API (found in the javax.transaction package), the JTA also provides a standard Java mapping of the X/Open XA protocol in its javax.transaction.xa package. This organization of the JTA interfaces allows the JTA to support both XA-based transactional systems as well as those that don't support X/Open XA.

A popular implementation of the JTA by application server vendors uses the Java Transaction Service (JTS). However, JTS usage is not mandatory in J2EE. JTS provides Java bindings for the CORBA Object Transaction Service (OTS). Since JTS is basically "under the covers" of the application server, you as a developer need know only about the JTA, which we describe in this chapter.

16.1.5. Distributed Transaction Scenarios

There are many common distributed transaction scenarios . One such scenario is a distributed transaction that spans a database and message queue. Others we discuss include those that use two or more databases or message servers. Yet another scenario is a distributed transaction initiated by a J2EE client that invokes different J2EE components.

16.1.5.1. Database and message queue

In the distributed transaction scenario depicted in Figure 16-3, a transaction spans a database and a message queue. This scenario is one of the most common distributed transactions in a J2EE application. For example, an application receives a message on a queue and processes the message. The application saves the outcome of processing the message to the database. Both the operations, one on the database and the other on the queue, are part of the same distributed transaction. If the processing of the message fails midstream, for example, we want both the relational database and the message queue to be returned to their states before the transaction was attempted.

Figure 16-3. Distributed transaction: Database and message queue


16.1.5.2. Multiple databases

In the scenario depicted in Figure 16-4, a transaction spans two different databases. An example of this scenario is a transaction that spans updates to a database maintaining the customer master as well as updates to another database maintaining the order master. Both the operations, one on the customer master and the other on the order master, are part of the same distributed transaction.

Figure 16-4. Distributed transaction: Multiple databases


16.1.5.3. Multiple application servers

In the scenario depicted in Figure 16-5, a transaction spans two different application server instances. An example of this scenario is a tax manager component that runs inside one application server that invokes methods of a tax service component running inside another server. The first application server passes along the transaction context to the other server so that the operations that run on the second application server participate in an overall distributed transaction. Both application servers follow the JTA specification, ensuring that the transaction is interoperable between the different server instances, even if the application servers are from different vendors.

16.1.5.4. Client demarcation

In the scenario depicted in Figure 16-6, the client starts a transaction. The client then invokes components (EJBs, web components, etc.) running on one or more application servers. The client container propagates the transaction to the various components that the client invokes. This scenario is similar to the multiple application server scenario, except that an external client initiates the transaction context rather than code running on one of the application servers.

Figure 16-5. Distributed transaction: Multiple application servers


Figure 16-6. Distributed transaction: Client demarcation


These distributed transaction scenarios illustrate the necessity of distributed transaction support in J2EE. The J2EE application server becomes the coordinator of distributed transactions. It coordinates the transaction across different resources, servers, and components. But how does a distributed transaction work under the covers? What is the protocol that the transaction coordinator uses to coordinate disparate resources? The protocol that the transaction coordinator uses is known as two-phase commit , which we discuss next.

16.1.6. Two-Phase Commit

Two-phase commit is the protocol used to implement distributed transactions between different resource managers. The two-phase commit protocol ensures the consistency and durability of distributed transactions by coordinating the states of the various resource managers involved in the transaction. The two-phase commit protocol kicks in when a commit or rollback is initiated on a distributed transaction. The commit or rollback can be requested by either the application itself or by the underlying transaction manager on behalf of the application.

As the name implies, there are two distinct phases in the two-phase commit protocol. The first phase is known as the prepare phase, and the second phase is known as the commit phase or rollback phase. In the prepare phase, the transaction manager sends a prepare (to commit) message to every resource participating in the transaction. Each resource replies back with either a ready response or an abort response. In the second phase, if any resource responds with abort, the transaction manager rolls back the whole transaction as depicted in Figure 16-7.[*] If all the resources respond as ready, the transaction manager commits the entire transaction as depicted in Figure 16-8. Instead of issuing a commit, the application may issue a rollback as depicted in Figure 16-9.

[*] This figure and the two that follow use the UML sequence diagram format.

Figure 16-7. Rollback in the first phase of a two-phase commit


What happens if the transaction manager never receives a response from one or more of the underlying resource managers? Every transaction has a timeout value, settable using the transactionManager.setTransactionTimeout( ) method. If a response is not received from a resource manager before this timeout expires, the transaction manager rolls back the entire transaction.

Even the protection of the two-phase commit protocol can't prevent unrecoverable issues during a transaction. One or more of the resource managers involved in a distributed transaction might lose communications with the transaction manager and decide to make a local decision to commit or roll back. A unilateral decision by a resource manager to commit or roll back is called a heuristic decision, and it can

Figure 16-8. Successful commit in two-phase commit


render a distributed transaction inconsistent. An even more catastrophic situation, like a system crash on the transaction manager or one of the resource managers, could prevent even a heuristic decision from being made. In these rare situations, you have to resort to the usual array of disaster recovery measures. Transaction logs from the resource managers and transaction manager can be reviewed to restore a consistent overall state, and/or data backups can be used to restore the distributed system to a previous stable state.

16.1.6.1. Emulating two-phase commit

The resources participating in a two-phase commit must be XA resourcesin other words, they must support the X/Open XA distributed transaction protocol. Can a non-XA resource participate safely in a distributed transaction? Normally, the answer is no, but there are workarounds.

Consider the following real-world example in which a non-XA resource needs to participate in a distributed transaction. You encounter a situation in which the program receives a message from a queue and updates data in a MySQL database. However, MySQL does not support XA and certain older versions of MySQL do not support transactions at all. For the data source that connects to the MySQL database, the transaction manager (in the container and/or application server) can emulate the two-phase commit protocol by responding on behalf

Figure 16-9. Transaction rollback in two-phase commit


of the underlying non-XA resource manager during two-phase commits. When emulating two-phase commit, a transaction manager may, for example, always return a ready response during the prepare phase on behalf of the resource manager. However, the guarantees provided by the two-phase commit protocol are diminished in this situation. The non-XA resource might be left in an inconsistent state with compromised data integrity (it might not have been ready when the transaction manager said it was). Obviously, if you are using a container or application server that supports two-phase commit emulation, you should use it with care. In addition, some emulating transaction managers put restrictions on their support for this, such as allowing one and only one non-XA resource to participate in a distributed transaction.



Java Enterprise in a Nutshell
Java Enterprise in a Nutshell (In a Nutshell (OReilly))
ISBN: 0596101422
EAN: 2147483647
Year: 2004
Pages: 269

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net