Entity Bean Caching | Microsoft Office PowerPoint 2007 On Demand

Entity bean performance hinges largely on the EJB container's entity bean caching strategy. Caching in turn depends on the locking strategy the container applies.

Important

In my opinion, the value of entity beans hinges on effective caching. Unfortunately, this differs widely between application scenarios and different EJB containers.

If it is possible to get heavy cache hits, using read-only entity beans or because your container has an efficient cache, entity beans are a good choice and will perform well.

Entity Bean Locking Strategies

There are two main locking strategies for entity beans, both foreshadowed in the EJB specification (§10.5.9 and §10.5.10). The terminology used to describe them varies between containers, but I have chosen to use the WebLogic terminology, as it's clear and concise.

It's essential to understand how locking strategies are implemented by your EJB container before developing applications using entity beans. Entity beans do not allow us to ignore basic persistence issues.

Exclusive Locking

Exclusive locking was the default strategy used by WebLogic 5.1 and earlier generations of the WebLogic container. Many other EJB containers at least initially used this caching strategy. Exclusive locking is described as "Commit Option A" in the EJB specification (§10.5.9), and JBoss 3.0 documentation uses this name for it.

With this locking strategy, the container will maintain a single instance of each entity in use. The state of the entity will usually be cached between transactions, which may minimize calls to the underlying database. The catch (and the reason for terming this "exclusive" locking) is that the container must serialize accesses to the entity, locking out users waiting to use it.

Exclusive locking has the following advantages:

Concurrent access will be handled in the same way across different underlying data stores. We won't be reliant on the behavior of the data store.
Genuinely serial access to a single entity (when successive accesses, perhaps resulting from actions from the same user, do not get locked out) will perform very well. This situation does occur in practice: for example if entities relate to individual users, and are accessed only by the users concerned.
If we're not running in a cluster and no other processes are updating the database, it's easy to cache data by holding the state of entity beans between transactions. The container can skip calls to the ejbLoad() method if it knows that entity state is up to date.

Exclusive locking has the following disadvantages:

Throughput will be limited if multiple users need to work with the same data.
Exclusive locking is unnecessary if multiple users merely need to read the same data, without updating it.

Database Locking

With the database locking strategy, the responsibility for resolving concurrency issues lies with the database. If multiple clients access the same logical entity, the EJB container simply instantiates multiple entity objects with the same primary key. The locking strategy is up to the database, and will be determined by the transaction isolation level on entity bean methods. Database locking is described in "Commit Options B and C" in the EJB specification (§10.5.9), and JBoss documentation follows this terminology.

Database locking has the following advantages:

It can support much greater concurrency if multiple users access the same entity. Concurrency control can be much smarter. The database may be able to tell which users are reading, and which are updating.
There is no duplication of locking infrastructure. Most database vendors have spent a decade or more working on their locking strategies, and have done a pretty good job.
The database is more likely to provide tools to help detect deadlocks than the EJB container vendor.
The database can preserve data integrity, even if processes other than the J2EE server are accessing and manipulating data.
We are allowed the choice of implementing optimistic locking in entity bean code. Exclusive locking is pessimistic locking enforced by the EJB container.

Database locking has the following disadvantages:

Portability between databases cannot be guaranteed. Concurrent access may be handled very differently by different databases, even when the same SQL is issued. While I'm skeptical of the achievability of portability across databases, it is one of the major promises of entity beans. Code that can run against different databases, but with varying behavior, is dangerous and worse than code that requires explicit porting.
The ejbLoad() method must always be invoked when a transaction begins. The state of an entity cannot be cached between transactions. This can reduce performance, in comparison to exclusive locking.
We are left with two caching options: A very smart cache; and no cache, whether or not we're running in a cluster.

WebLogic versions 6.0 and later support both exclusive and database locking, but default to using database locking. Other servers supporting database locking include JBoss, Sybase EAServer and Inprise Application Server.

Note

WebLogic 7.0 adds an "Optimistic Concurrency" strategy, in which no locks are held in EJB container or database, but a check for competing updates is made by the EJB container before committing a transaction. We discussed the advantages and disadvantages of optimistic locking in Chapter 7.

Read-only and "Read-mostly" Entities

How data is accessed affects the locking strategy we should use. Accordingly, some containers offer special locking strategies for read-only data. Again, the following discussion reflects WebLogic terminology, although the concepts aren't unique to WebLogic.

WebLogic 6.0 and above provides a special locking strategy called read-only locking. A read-only entity bean is never updated by a client, but may periodically be updated (for example, to respond to changes in the underlying database). WebLogic never invokes the ejbStore() method of an entity bean with read-only locking. However, it invokes the ejbLoad() method at a regular interval set in the deployment descriptor. The deployment descriptor distinguishes between normal (read/write) and read-only entities. JBoss 3.0 provides similar functionality, terming this "Commit Option D".

WebLogic allows user control over the cache by making the container-generated home interface implementations implement a special CachingHome interface. This interface provides the ability to invalidate individual entities, or all entities (the home interface of a read-only bean can be cast to WebLogic's proprietary CachingHome subinterface). In WebLogic 6.1 and above, invalidation works in a cluster.

Read-only beans provide good performance if we know that data won't be modified by clients. They also make it possible to implement a "read mostly" pattern. This is achieved by mapping a read-only and a normal read-write entity to the same data. The two beans will have different JNDI names. Reads are performed through the read-only bean, while updates use the read/write bean. Updates can also use the CachingHome to invalidate the read-only entity.

Dmitri Rakitine has proposed the "Seppuku" pattern, which achieves the same thing more portably. Seppuku requires only read-only beans (not proprietary invalidation support) to work. It invalidates read-only beans by relying on the container's obligation to discard a bean instance if a non-application exception is encountered (we'll discuss this mechanism in Chapter 10). One catch is that the EJB container is also obliged to log the error, meaning that server logs will soon fill with error messages resulting from "normal" activity. The Seppuku pattern, like the Fat Key pattern, is an inspired flight of invention, but one that suggests that it is preferable to find a workaround for the entire problem. See http://dima.dhs.org/misc/readOnlyUpdates.html for details.

Note

The name Seppuku was suggested by Cedric Beust of BEA, and refers to Japanese ritual disembowelment. It's certainly more memorable than prosaic names such as "Service-to-Worker"!

Tyler Jewell of BEA hails read mostly entities as the savior of EJB performance (see his article in defense of entity beans at http://www.onjava.com/lpt/a//onjava/2001/12/19/eejbs.html). He argues that a "develop once, deploy n times" model for entity beans is necessary to unleash their "true power", and proposes criteria to determine how entity beans should be deployed based on usage patterns. He advocates a separate deployment for each entity for each usage pattern.

The multiple deployment approach has the potential to deliver significant performance improvements compared to traditional entity bean deployment. However, it has many disadvantages:

Relying on read-only beans to deliver adequate performance isn't portable. (Even in EJB 2.1, read-only entities with CMP are merely listed as possible addition in future releases of the EJB specification, meaning that they will be non-standard until at least 2004.)
There's potential to waste memory on multiple copies of an entity.
Developer intervention is required to deploy and use the multiple entities. Users of the entity are responsible for supplying the correct JNDI name for their usage pattern (a session façade can conceal this from EJB clients, partially negating this objection).
Entity bean deployment is already complicated enough; adding multiple deployments of the same bean further complicates deployment descriptors and session bean code, and is unlikely to be supported by tools. Where container-managed relationships are involved, the size and complexity of deployment descriptors will skyrocket. There are also some tough design issues. For example, which of the several deployments of a related entity bean should a read-only bean link to in the deployment descriptor?

The performance benefits of multiple deployment apply only if data is read often and updated occasionally. Where static reference data is concerned, it will be better to cache closer to the user (such as in the web tier). Multiple deployment won't help in situations where we need aggregate operations, and the simple O/R mapping provided by EJB CMP is inadequate.

Even disregarding these problems, the multiple deployment approach would only demonstrate the "true power" of entity beans if it weren't possible to achieve its goals in any other way. In fact, entity beans are not the only way to deliver such multiple caches. JDO and other O/R mapping solutions also enable us to maintain several caches to support different usage patterns.

Transactional Entity Caching

Using read-only entity beans and multiple deployment is a cumbersome form of caching that requires substantial developer effort to configure. It's unsatisfactory because it's not truly portable and requires the developer to resort to devious tricks, based on the assumption that out-of-the-box entity bean performance is inadequate. What if entity bean caching was good enough to work without the developers' help?

Persistence PowerTier (http://www.persistence.com/products/powertier/index.php) is an established product with a transactional and distributed entity bean cache. Persistence built its J2EE server around its C++ caching solution, rather than adding caching support to an EJB container.

PowerTier's support for entity beans is very different from that of most other vendors. PowerTier effectively creates an in-memory object database to lessen the load on the underlying RDBMS. PowerTier uses a shared transactional cache, which allows in-memory data access and relationship navigation. (Relationships are cached in memory as pointers, avoiding the need to run SQL joins whenever relationships are traversed).

Each transaction is also given its own private cache. Committed changes to cached data are replicated to the shared cache and transparently synchronized to the underlying database to maintain data integrity. Persistence claims that this can boost performance up to 50 times for applications (such as many web applications) that are biased in favor of reads. PowerTier's performance optimizations include support for optimistic locking. Persistence promotes a fine-grained entity bean model, and provides tools to generate entities (including finder methods) from RDBMS tables. PowerTier also supports the generation of RDBMS tables from an entity bean model.

Third-party EJB 2.0 persistence providers such as TopLink also claim to implement distributed caching. (Note that TopLink provide similar caching services without the need to use entity beans, through its proprietary O/R mapping APIs.)

I haven't worked with either of these products in production, so I can't verify the claims of their sales teams. However, Persistence boasts some very high volume, mission-critical, J2EE installations, such as the Reuters Instinet online trading system and FedEx's logistics system.

Important

A really good entity bean cache will greatly improve the performance of entity beans. However, remember that entity beans are not the only way to deliver caching. The JDO architecture allows JDO persistence managers to offer caching that's at least as sophisticated as any entity bean cache.