Minimizing Remote Calls

Optimizing Entity Bean Persistence

If you've developed applications that work with relational databases, you know that database access can easily become the determining factor in how well a system performs . In a typical business application, the time spent executing your Java code is much smaller than the time taken up reading from and writing to the database. With this in mind, any attention you pay to performance while you're designing a system needs to take the database into account. In the case of EJB, this, for the most part, means looking at your use of entity beans. Of course for some developers, this means avoiding entity beans altogether. That's not the position taken here, but you do need to be careful when deciding if an entity bean is an appropriate representation for a particular object. The accepted criteria for an entity bean's use were described back in Chapter 5. What this chapter is more concerned with is how to implement an entity bean once you've made the decision to use one.

Choosing a CMP Implementation

If you use CMP, the work the persistence framework does for you includes determining when to write an entity's state to the database and when to load it. The leading CMP implementations compete with each other based on the features they provide to optimize these parts of an entity object's lifecycle. For example, an ORM framework such as TOPLink used for CMP can detect changes to an entity's attributes and relationships and only write to the database when necessary. When a database update is needed, the SQL statements that are executed are limited to the attributes and associations that have actually changed. An implementation like this can also cache entity objects once they've been read to avoid unnecessary reads from the database later. If you're planning to deploy your application in a cluster, you have to understand whether the caching provided by your CMP implementation supports a clustered environment to know whether you can take advantage of such a feature or not.

The EJB 2.0 Specification leaves nearly all the details of a CMP implementation up to the individual vendors . Even though the mapping of an entity bean to a database can be done declaratively , taking advantage of any of the value-added performance features offered by a particular implementation can result in programmatic dependencies on that product. The concept of a Persistence Manager, which will provide more of a pluggable persistence framework for CMP, has been deferred to a later version of the specification. For now, the most sophisticated CMP features tend to be nonportable. This doesn't mean that you should ignore what's available to you ”you just need to be aware of the implications.

Only Execute `ejbStore` when Necessary

Chapter 6 covered the mechanics of how to implement an entity bean using BMP. That chapter showed you how to implement the various callback methods needed to keep an entity object in synch with its representation in the database. What that chapter didn't discuss in detail was how to improve the performance of a BMP entity bean.

The container calls an entity bean's ejbStore method when it needs to make sure that the database contents are in synch with the entity object in memory. This call is always made at the completion of a transaction that involves the entity. It also happens right before an entity is passivated. If you write to the database whenever a business method is called that modifies the entity, ejbStore isn't responsible for much. However, the opposite is typically true. Your business methods will usually update the state of an entity object but not write any changes to the database. It's in ejbStore that you most often execute the JDBC call (or calls) needed to update an entity's representation in the database. This simplifies your business method code and helps isolate your business logic from any knowledge of the persistence mechanism being used. The drawback to this approach is that a simple ejbStore method that blindly writes to the database even when an entity's state hasn't changed since the last update is inefficient.

Database access is an expensive operation in terms of performance. Because of this, it's usually worthwhile to perform a little extra processing within your entity beans to only update the data that has been modified since the last call to ejbStore . If you're using an ORM product for your BMP implementation, this might be taken care of for you. Some frameworks can compare the current state of a persistent object with its state last retrieved from the database and only update the attributes that have changed. It takes some work, but you can take a similar approach yourself and track changes to an object between ejbLoad and ejbStore calls. Without an ORM framework to manage this for you, you might not want the complexity of tracking individual attributes, but you can easily keep up with whether or not an entity object has changed at all. You can declare a transient boolean attribute that you set to true whenever a business method call results in a change to the persistent state of an entity object. This allows you to only perform the database updates in ejbStore when the indicator has been set. Both ejbStore and ejbLoad can reset the indicator to false before returning. The following fragment shows an example business method using this approach:

 public void setName(String newName) {   boolean change = false;    if (name == null) {     // a change if current value is null and the new one isn't      change = (newName != null);    }    else {     // current value isn't null, so use equals method to detect a change      change = (!name.equals(newName));    }    if (change) {     name = newName;      modified = true;    }  }

You could then modify this entity bean's ejbStore method like the following:

 public void ejbStore() {   // exit if the entity state hasn't changed    if (!modified) {     return;    }    // obtain a database connection and update the database    ...    // entity state has been synchronized, so reset the indicator    modified = false;  }

Your individual business methods become slightly more complex when using this approach, but that's the price you pay for a performance optimization. You can clean up the business methods by pulling out the logic needed to check for changes and implementing it in a set of simple helper methods. The extra processing that has to take place to detect and keep up with changes is insignificant compared to the potential savings in database access time.

Use Lazy Loading for BMP Dependent Objects

Just as paying attention to how you save data affects performance, so does the approach you use for loading an entity object. In particular, it's how you manage an entity's dependent objects when using BMP that can impact performance. As a general rule, you should wait until data is actually needed from the database before you retrieve it. Using a lazy loading approach avoids database accesses for values that are never used by an application.

In Chapter 6, the ejbLoad method for the EnglishAuctionBean loaded the various attributes that describe an auction but it only loaded the primary keys for the leading and winning bid dependent objects and the assigned item entity. The corresponding bid and item objects for these keys were not loaded. If you take this approach, you can maintain two references for each object with which an entity has a one-to-one relationship: one for the primary key and one for the actual object. The implementation for ejbLoad only needs to be responsible for loading the primary key reference. When the related object is needed, the fact that the reference to it is null and the primary key reference isn't can be used to trigger a database access to retrieve the necessary data and instantiate the object. Listing 17.2 shows an example of this approach.

Listing 17.2 getLeadingBid “A Method for Loading a Dependent Object on Demandprotected Bid getLeadingBid() {

 // see if the leading bid has been loaded    if ((leadingBid == null) && (leadingBidId != null)) {     leadingBid = loadBid(leadingBidId.intValue());      if (leadingBidId.equals(winningBidId)) {       // winning bid is the same as the leading bid        winningBid = leadingBid;      }    }    return leadingBid;  }

Listing 17.2 shows how to delay the loading of a dependent object until it's needed. Notice that the loadBid method is only called if the reference to the leadingBid is null and the leadingBidId isn't. Listing 17.3 shows the loadBid method that pulls in the rest of the information for the bid object when it's needed.

Listing 17.3 `loadBid` “A Method for Loading a Dependent Bid Object

 private Bid loadBid(int bidId) {     Connection con = null;      PreparedStatement stmt = null;      ResultSet rs = null;      try {       con = BMPHelper.getConnection("auctionSource");        stmt = con.prepareStatement("SELECT id, TransactionId, BidDateTime, " +          "Amount, BidderId from bid where id = ?");        stmt.setInt(1, bidId);        rs = stmt.executeQuery();        if (rs.next()) {         return createBid(rs);        }      }      catch (SQLException e) {       // throw a system exception if a database access error occurs        throw new EJBException(e);      }      finally {       // close the connection        BMPHelper.cleanup(stmt, con);      }      return null;    }

As you can imagine, the code needed to do this can start to get out of hand if you have to implement this type of functionality for more than a few dependent objects. To prevent this problem, you need a framework that manages the lazy loading of an object generically. Such a framework needs to provide a class that encapsulates the details shown in Listing 17.2. As you can imagine, doing this in a generic form that can be reused for any dependent object class is not a simple task. You can do it, but before starting down the path of reinventing the wheel, you should consider using a third-party ORM framework that has already solved the same problem for you.

Lazy loading is even more important for one-to-many relationships because you have to load more objects for each association. Rather than loading all of an auction's bids in ejbLoad , EnglishAuctionBean took a similar approach to what was done for the leading bid. In this case, nothing was read for the dependent bid objects in ejbLoad because the foreign keys are in the bid table. Instead, all retrieval from this table is done when the list of bids is first accessed. The comments about using a framework approach apply even more so here. Managing a list of dependent objects is more complex because you don't want to be inefficient in writing updates to the database when elements have been added to or removed from the list. Specifically, a BMP framework for managing a list of dependent objects needs to support lazy loading of the elements, tracking of modified entries, the addition of entries, and the removal of entries. This is the only way to ensure that interaction with the database is kept to a minimum.

Using Read-Only Entity Beans

The EJB specification defines entity beans to represent read-write objects. This means that regular calls to both ejbLoad and ejbStore are part of the normal management of an entity object's lifecycle. Sometimes, however, you might represent persistent data using an entity bean that's never modified by the application. As described earlier in the "Optimizing Entity Bean Persistence" section, you can avoid unnecessary database writes by only executing your ejbStore methods when necessary. This doesn't, however, help you if you're using CMP.

The EJB 2.0 Specification doesn't require the container to support the concept of a readonly CMP entity bean. The idea of using read-only CMP entity beans is to avoid unnecessary calls to ejbStore . Some vendors already offer a read-only entity bean as an option. Others are looking at taking this further by making it possible to bypass an ejbStore call when an entity bean's attributes have been accessed (through its get methods) during a transaction but no other operations have been performed on it.

As an example, WebLogic allows you to designate an entity bean as read-only. It's assumed that data represented by this type of bean is being updated externally (or there wouldn't be any reason for it to be stored in the database). To support detecting external changes to a read-only entity, you can specify how often the data should be updated from the database.

As it's been defined so far, the example auction site is responsible for maintaining auction data but the items offered for auction could be defined by another system responsible for managing a company's inventory. This would allow the item entity bean to be implemented as read-only. Because this is a vendor-specific option, you have to declare this in the WebLogic deployment information as shown in the following:

 <weblogic-ejb-jar>    ...    <weblogic-enterprise-bean>      <ejb-name>Item</ejb-name>      <entity-descriptor>        <entity-cache>          <read-timeout-seconds>600</read-timeout-seconds>          <concurrency-strategy>ReadOnly</concurrency-strategy>        </entity-cache>      </entity-descriptor>    </weblogic-enterprise-bean>    ...  </weblogic-ejb-jar>

This deployment directive informs the WebLogic EJB container that ejbStore calls aren't necessary for the item entity bean. The read-timeout-seconds entry allows you to specify how often the data should be refreshed from the database. You can use a value of 0, which is the default if the entry is omitted, to request that the entity only be read when it's first loaded into the entity cache. Using this deployment option is only valid if the application never attempts to update an entity object of this type and the NotSupported transaction attribute is assigned to the entity's methods.

Note

Some developers argue that an option like that just described violates the current EJB specification because ejbStore isn't called at the end of each transaction that involves the entity. The response in a case like this is that the intent of the specification is being upheld because all that really matters is that changes made to an entity object are synchronized with the database. If an application never changes the object, no outgoing synchronization is ever necessary. To eventually quiet the debate, EJB 2.0 lists read-only CMP entity beans as a desired feature that will be added in a later release of the specification.