Common Data Access Issues | Microsoft Office PowerPoint 2007 On Demand

Regardless of which data access strategy - and type of database - we use, we will encounter some common issues. Those discussed below will have a crucial impact on scalability and performance.

Transaction Isolation

Transaction isolation levels determine database behavior, but are managed by the J2EE server, which is responsible for overall transaction co-ordination. Transaction isolation levels control the degree of isolation between concurrent transactions accessing the same database. SQL92 defines four transaction isolation levels, which are supported in the JDBC API. In decreasing order of isolation, these are:

TRANSACTION_SERIALIZABLE
TRANSACTION_REPEATABLE_READ
TRANSACTION_READ_COMMITTED
TRANSACTION_READ_UNCOMMITTED

The higher the level of isolation for a transaction, the greater the guarantee that its work will not be corrupted by other concurrent transactions, but the greater its impact will be on the total throughput of the resource. In the case of a relational database, the transaction isolation level will control the locking of rows or tables (see http://www.onjava.com/pub/a/onjava/2001/05/23/j2ee.html?page=2 for a good, concise, summary of SQL92 transaction isolation levels).

Important

Transaction isolation levels can have a dramatic effect on data integrity and performance. However, their exact meaning differs between databases. This is one reason that transparent persistence is elusive.

Most databases don't support all four transaction isolation levels, meaning that a J2EE server may be unable to guarantee the expected behavior. For example, Oracle does not support the TRANSACTION_READ_UNCOMMITTED isolation level. However, Oracle does guarantee non-blocking reads, which preserve the intent of the READ_UNCOMMITTED isolation level.

Outside the EJB container, we will need to use API-specific, programmatic control over transaction isolation, such as the setTransactionIsolation() method on the java.sql.Connection interface.

Although the EJB specification doesn't specify the way in which EJB containers should control transaction, we should be able to set it declaratively for EJBs using CMT. This is a further advantage of using EJBs with CMT to co-ordinate data access. We can use the standard ejb-jar.xml deployment descriptor to tell any container when to begin and how to propagate transactions, but there is no standard way of telling the container exactly what a transaction should mean.

Usually this is specified in a container-specific deployment descriptor. For example, in WebLogic there's an element in the weblogic-ejb-jar.xml file that looks like this:

   <transaction-isolation>     <isolation-level>TRANSACTION_SERIALIZABLE</isolation-level>     <method>       <ejb-name>Booking</ejb-name>       <method-name>*</method-name>     </method>   </transaction-isolation>

Important

Always specify the transaction isolation level for EJB methods, if your EJB container allows it Otherwise you're relying on the default transaction isolation level of the target server, and have compromised portability unnecessarily.

Pessimistic and Optimistic Locking

There are two basic strategies to database concurrency: pessimistic and optimistic locking. Pessimistic locking takes the "pessimistic" view that users are highly likely to corrupt each other's data, and that the only safe option is to serialize data access, so at most one user has control of any piece of data at one time. This ensures data integrity, but can severely reduce the amount of concurrent activity the system can support.

Optimistic locking takes the "optimistic" view that such data collisions will occur rarely, so it's more important to allow concurrent access than to lock out concurrent updates. The catch is diat we can't allow users to corrupt each other's data, so we have a problem if concurrent updates are attempted. We must be able to detect competing updates, and cause some updates to fail to preserve data integrity. An update must be rejected if the data to be updated has changed since the prospective updater read it. This means that we require a strategy for detecting such changes; often an optimistic locking attribute is used: a version number in the object or row that's incremented each time it is modified.

Transaction isolation levels can be used to enforce pessimistic locking. However, the EJB specification provides no support for optimistic locking, although particular implementations of entity bean CMP may, and many O/R mapping products do. To implement optimistic locking, we typically need to write code to check optimistic locking attributes and to roll back transactions if an inconsistency is detected.

Whether to choose pessimistic or optimistic locking for a particular scenario depends on business requirements and even the underlying database.

Important

Locking, like transaction isolation levels, works very differently in different databases. I strongly recommend reading a good reference on the database(s) you work with. For example, Expert One-on-One Oracle from Wrox Press (ISBN: 1-86100-4-826) explains these issues clearly where Oracle is concerned.

Primary Key Generation Strategies

Persistence stores require objects to have unique keys (primary keys, in RDBMS terminology). We may need to generate a new primary key each time we insert data.

The following discussion assumes the use of surrogate keys: primary keys without business meaning. If a primary key is a field with business meaning, such as e-mail, or a combination of fields with business meaning, it is called an intelligent or natural key, and there is no need to generate new keys - they're created as part of new data.

Surrogate keys are common in enterprise systems, although there are arguments for each type of key. These keys are easier to work with using entity beans, may lead to better performance and offer protection against changes in business rules (for example, if e-mail is used as a natural primary key for a User table, the table will need a new key if users are ever allowed to share e-mail addresses). Surrogate keys must never be exposed to manipulation; if they acquire business meaning they share the drawbacks of natural keys. See http://www.bcarter.com/intsurr1.htm for discussion of the pros and cons of intelligent and surrogate keys.

The problem of surrogate key generation is not unique to J2EE. It's often solved by auto-increment columns or other database-specific approaches. These aren't portable, and hence are distrusted by many J2EE developers. They can also be difficult to use from entity beans and JDBC.

EJB containers often provide a way to generate primary keys for CMP entity beans; although no standard approach is defined in the EJB specification (a disappointing omission). When such a mechanism is available, it's best to use it, if using entity beans. Check your server's documentation on entity bean CMP.

The key generation problem is encountered when using entity beans with BMP, or when implementing persistence with JDBC or another low-level API. The contract for entity bean BMP requires the ejbCreate() method to return the new primary key on entity creation. This means that we can't rely on database functionality such as auto-increment key columns. The problem is that we have no way of knowing which row we just inserted. We can try to do a SELECT based on the data values we've just inserted, but this is inelegant and not guaranteed to work. There might just be a valid reason why there could be duplicate data besides the primary key in logically distinct rows. The only alternative is to generate a unique primary key before the insert, and include the new primary key in the insert.

Note

Note that the problem of primary key generation doesn't always apply unless we are using entity beans. It isn't always necessary to know the primary key of a newly inserted row.

There is much discussion on J2EE forums such as TheServerSide.com about how to generate primary keys in a way that's portable between databases and application servers. I feel that this is misguided: the value of simplicity outweighs portability.

Important

Don't battle with increased complexity trying to generate unique primary keys in a way that's portable between databases. Instead, use the relevant features of your target database, isolating implementation-specific features so that the design remains portable.

It's better to face a simple reimplementation task if an application is ported to another database than to add additional complexity to ensure total code portability from the outset. The first alternative means there's always a simple, understandable, code base.

Three strategies are often suggested for primary key generation:

An entity bean that represents a sequence from the database
Generating a unique ID from within Java
Using a stored procedure or other database-specific means of generating primary keys

The first two solutions are portable. Let's consider each of these in turn.

Sequence Entity Bean

This approach uses numeric keys. An entity bean stores a single number in the database and increments it whenever a client requests a new key. For example, we might have an RDBMS table like this:

   KEYS   _______________________________________________________________________________________________   NAME    CURRENT_VALUE   USER    109

Entity bean instances could wrap primary keys for each entity bean. The value in the CURRENT_VALUE column would be updated each time a key was requested.

This approach illustrates the dangers of rating a portable solution above a good solution. It can only be made to perform adequately by interposing a session bean that periodically requests a block of keys from the entity. If the entity is used every time a new key is requested, key generation will create a point of contention, as generating a key requires the CURRENT_VALUE column to be updating, serializing access to the KEYS table. This means that we need to deploy two EJBs purely to handle primary key generation: they contribute nothing towards implementing the application's business logic. This approach is unworkable.

Unique ID Generation in Java

In this approach, surrogate key generation is done entirely outside the database. We use an algorithm that guarantees that each time a key is generated, it must be unique. Such algorithms are normally based on a combination of a random number, the system time, and the IP address of the server. The challenges here are that we may be running in a cluster, so we must make sure that individual servers can never generate the same key; and that Java only enables us to find the system time to a millisecond (for this reason, Java isn't a great language in which to generate unique IDs).

Unique ID generation involves an algorithm, rather than state, so it doesn't run into the problems of using singletons in the EJB tier, and hence we don't need to use additional EJBs to implement it.

Unique ID generation injava is fast and portable. However, it has the following disadvantages:

It generates lengthy keys, regardless of the number of rows that the table will hold
Keys are strings, not numbers, which may complicate indexing in the database
Keys are not sequential

This approach ignores RDBMS capabilities and insists on usingjava where it is weak.

Database-Specific ID Generation

In my opinion, the best way to generate primary keys is using database-specific functionality. As primary key creation is a fundamental database problem, each database offers a fast solution that minimizes contentions when multiple users create rows. Usually this will involve performing the update through calling a stored procedure, rather than running a SQL INSERT. Remember, that since we need to know the ID of the new row, we can't simply rely on an auto-increment column or an insert trigger to create a new primary key as the INSERT runs.

The stored procedure must take input parameters for all the data values for the new row, and an output parameter that will enable it to return the generated primary key to the JDBC caller. It must insert the new data, returning the new primary key. Depending on the database, the primary key may be created automatically when the INSERT occurs (for example, if the primary key column is an auto-increment column), or the primary key may be created before the insert. This approach will give us portable JDBC code, but will require database-specific coding for the stored procedure.

The following example shows this approach with Oracle (Simply type the code into SQL*Plus to try it). Let's look at generating keys as we add rows to the following simple table, with only a single data value besides the primary key:

    CREATE TABLE person (      id NUMERIC PRIMARY KEY,      name VARCHAR (32)    );

Oracle uses sequences instead of auto-increment columns, so we need to define a sequence that we can use with this table:

    CREATE SEQUENCE person_seq      start WITH 1      increment BY 1      nomaxvalue;

Now we need to write a stored procedure to do the INSERT; for Oracle, I used PL/SQL. Note that this stored procedure has an output parameter, which we use to return the new id to calling JDBC code:

    CREATE OR REPLACE    PROCEDURE person_add (p_name in varchar, p_id out number)    AS    BEGIN      SELECT person_seq.nextval INTO p_id FROM dual;      INSERT INTO person(id, name) VALUES(p_id, p_name);    END;    /

We've dealt with Oracle-specific code inside the database, but the JDBC code we use to call this is portable. Whatever RDBMS we use, all we need is to ensure that we have a stored procedure with the name person_add and the same parameters. In a real-world situation there would be many more input parameters, but always only one output parameter.

First, we must call the stored procedure. This is very similar to invoking a PreparedStatement:

    Connection con = cf.getConnection();    CallableStatement call = con.prepareCall (" {call person_add (?, ?)}");    call.setString (1, "Frodo");    call.registerOutParameter (2, java.sql.Types.INTEGER);    call.execute();

Now we can extract the value of the output parameter:

    int pk = call.getlnt(2);    System.out.println ("The primary key for the new row was " + pk) ;    call.close();

This approach gives us simple, portable, Java code, but will require us to implement a new stored procedure if we ever switch database. However, as the stored procedure is so simple, this is not likely to be a problem. The stored procedure is effectively the implementation of an interface defined in our Java code.

JDBC 3.0

Another option is available when using a database driver that supports JDBC 3.0. The JDBC 3.0 Statement interface allows us to obtain the keys created by an insert. The following is a simple example of using this functionality:

    Statement stmt = connection.createStatement();    stmt.executeUpdate ("INSERT INTO USERS (FIRST_NAME, LAST_NAME) " +                        "VALUES ('Rod', 'Johnson')", Statement.RETURN_GENERATED_KEYS);    ResultSet rs = stmt.getGeneratedKeys();    if ( rs.next() ) {      int key = rs.getlnt(1);    }

This solves the primary key generation problem if the database has a trigger to create a new primary key or if the primary key is an auto-increment column. However, JDBC 3.0 support is not yet widely available, so this strategy is usually not an option. To use this in a J2EE application, both your database vendor and application server will need to support JDBC 3.0, as application code is likely to be given server-specific wrapper connections for the underlying database connections.

Note

JDBC 3.0 is a required API in J2EE 1.4, so J2EE 1.4 applications using JDBC will be guaranteed the ability to learn the primary keys of newly inserted rows, allowing primary key generation to be concealed within the RDBMS.