Best Practices for Entity Beans

Entity beans are primarily used to represent an object view of data stored in persistent storage, such as a database. The entity bean's persistence can be implemented through the bean itself (Bean-Managed persistence, or BMP) or through the EJB container (Container-Managed persistence, or CMP), where the EJB container automatically manages the retrieval and storage of persistent data. The following sections discuss a few common best practices that relate to using both BMP and CMP entity beans in the context of WebLogic Server.

Consider Writing CMP Entity Beans Instead of BMP Entity Beans

The EJB 2.0 specification has greatly enhanced the capabilities of the persistence framework. In the EJB 1.1 specification, the age-old promise of automated mapping from the object domain to the database realm fell short of expectations. For example, Container Managed Persistence (CMP) was seriously lacking in various critical programming aspects such as multiplicity of relationships. The EJB 2.0 specification provides for Container Managed relationships, which let the container take care of the integrity of the relationships.

Adding abstract getters and setters is another great feature of CMP 2.0. It is the container's responsibility to implement these methods, and in CMP 2.0, the container can now control with great granularity what data you access and what data you write, thereby tracking most of the information it needs to optimize your data access patterns. What this means is that the container automatically takes care of many of the optimizations you needed to do by hand in CMP 1.1. If you are only reading a bean, the container knows it; if you are not modifying all the fields, the container knows it; and if you are going to load certain data sets for certain methods , the container knows it because you can give hints to the CMP engine and help it optimize the loading and storing time. The improvements made within the EJB 2.0 specification for CMP have given bean developers a realistic methodology for resolving persistent issues. They no longer have to rely on BMP to solve complex coding needs. As more companies purchase or upgrade to EJB 2.0-compliant containers, the use of BMP for data access will decline in favor of CMP.

Debug Flags to Instrument BMP Code

Making typographical errors when developing BMP entity beans is an easy mistake. Although BMP entity beans require a fair amount of Java Database Connectivity (JDBC) code, the database access code is the same for almost all BMP beans. However, it is a good practice to include debug code that prints out the SQL statements that are running. Otherwise, it is hard to troubleshoot problems that can result in cryptic database errors.

Writing an Efficient Primary Key Class

Like a database row, entity beans have an associated primary key that the container must be able to manipulate. This primary key can be a single entity bean field. Each entity bean class can define a different class for its primary key, but multiple entity beans can use the same primary key class. The primary key is specified in the entity bean's deployment descriptor. You can specify a primary key class for an entity bean with CMP by mapping the primary key to a single field or to multiple fields in the entity bean class. You can also provide a custom primary key class. This class is necessary for a compound primary key, one that maps to multiple entity bean fields. With a custom primary key class, the bean writer must implement the hashCode and equals methods. Because the EJB container often uses the primary key class in its internal data structures, this class must implement hashCode and equals correctly and efficiently . Listing 23.2 shows an inefficient, but workable implementation of the hashcode and equals methods, which will be improved in subsequent sections.

Listing 23.2 An Inefficient, but Correct, Primary Key Class

 public class MyPk implements java.io.Serializable { public String str; public int i; public byte b; public MyPk() {} public int hashCode() { return -1; } public boolean equals(Object o) { if ((o != null) && (MyPk.class.equals(o.getClass()))) { MyPk other = (MyPk) o; return other.str.equals(str) && other.i == i && other.b == b; } else { return false; } } }

Implementing the `hashCode` Method

The hashCode method must return the same value for two objects that are equal, and it should attempt to distribute the hashCode values relatively evenly. The following code snippet is efficient and correct, but it does not distribute the hashCode values at all. This hashCode implementation transforms any hash table into a list and forces linear searches, which clearly defeats the whole purpose of having an indexed data structure:

 private int hash = -1; public int hashCode() { if (hash == -1) { hash = str.hashCode() ^ i ^ b; } return hash; }

This hashCode implementation computes the exclusive OR ( XOR ) of the string's hashCode and the primitive fields. XOR should be preferred to other logical operators, such as AND or OR , because it gives a better distribution of hash values. This implementation also caches the hashCode value in a member variable to avoid recomputing this value.

Implementing the `equals()` Method

The equals() method compares the current object with the passed parameter and returns true if the objects have the same value. The default java.lang.Object. equals compares the reference (pointer) values and returns true if they are equal. For most primary key classes, this operation needs to be overridden to compare the values within the primary key class:

 // An efficient primary key class implementation public final class MyPk implements java.io.Serializable { public boolean equals(Object o) { if (o == this) return true; if (o instanceof MyPk) { MyPk other = (MyPk) o; return other.hashCode() == hashCode() && other.i == i && other.b == b && other.str.equals(str); } else { return false; } } }

The first line of the optimized equals implementation compares the passed reference against the this keyword. Although this operation seems strange at first, it is a common case when the EJB container checks whether a primary key already exists in its data structures.

Next, getClass().equals is replaced with a much more efficient instance of check. The instance of operator returns true if the passed parameter's class is MyPk or one of its subclasses. Making the MyPk class final allows the create method to safely use the instance of operator because there cannot be a subclass.

Finally, the hashCode and member variables are compared. Moreover, expressions are short-circuited in Java, which means that if the first expression is false, the second is not evaluated. The equals method takes advantage of this short-circuiting by ordering the && (forced logical AND) operator with the cheapest comparisons first. The hashCode s are compared first because the implementation caches this value, and it is rare for both objects to have the same hashCode but not be equal. Next, the primitive fields are compared, and finally the more expensive java.lang.String.equals is called.

Choose the Right Concurrency Strategy

The concurrency strategy plays an important role in developing entity beans. It specifies how the EJB container should manage concurrent access to an entity bean. Although the database option is the default concurrency strategy, depending on your requirements, you should choose the right concurrency model.

Four different concurrency strategies are available as of WebLogic Server 7:

Exclusive ” Places an exclusive lock on cached entity EJB instances when the bean is associated with a transaction. Other requests for the EJB instance are blocked until the transaction completes. This option was the default locking behavior for WebLogic Server versions 3.1 through 5.1.
Database ” Defers locking requests for an entity EJB to the underlying data store. WebLogic Server allocates a separate entity bean instance and allows the database to handle locking and caching. This option is the default. The database concurrency strategy leverages the database's deadlock detection capabilities. When the database deadlock is detected , one of the deadlocked transactions is aborted. An SQLException is thrown to the client, and the EJB container processes the transaction rollback.
Optimistic ” Holds no locks in the EJB container or database during a transaction. The EJB container verifies that none of the data updated by the transaction has changed before committing the transaction. If any updated data changed, the EJB container rolls back the transaction.
ReadOnly ” Used only for read-only entity beans. This strategy activates a new instance for each transaction so that requests proceed in parallel. WebLogic Server calls ejbLoad() for ReadOnly beans, based on the read-timeout-seconds parameter.

Generally , entity beans should use the default database concurrency strategy because it places much less burden on the EJB programmer. Unless your application has a special requirement, always use the default Database concurrency strategy.

Optimize Database Access Calls

Entity bean programmers are concerned about performance because they strive to minimize the number of round trips between the EJB container and the database. By default, the EJB container calls the entity bean's ejbLoad method at the beginning of a transaction to read the database's current state. When the transaction commits, the EJB container calls ejbStore to write the entity bean's contents to the database. Keep in mind that database access occurs on transaction boundaries, not on method-call boundaries.

In writing Entity beans, less experienced programmers can incur performance penalties by making their transactions too fine-grained. For instance, a Web page might need to gather 10 attributes from an entity bean to populate a page. If each method call runs in its own transaction, there are 10 database reads ( ejbLoad calls) and 10 database writes ( ejbStore calls). If the 10 method calls are wrapped in a single transaction, there is only one database read and one database write. For an operation such as populating a Web page, the database write is actually unnecessary because this transaction is read-only. The following section discusses how the EJB container avoids unnecessary database writes.

Loading Related CMP Fields

WebLogic Server's EJB 2.0 CMP container enables the EJB deployer to instruct the EJB container on how to optimize database access calls by grouping CMP fields that should be loaded together. For example, instead of loading only the primary key in the finder, the container could select the primary key plus some additional CMP fields that are used in the subsequent business method calls. In this case, only a single database access is needed to run the finder and all the related business methods.

For example, an entity bean could have Employee-ID as a primary key and an EmployeeName attribute. You could call findByPrimaryKey on a specific Employee-ID , and then call a business method that reads the EmployeeName attribute. By default, there would be one database access for the findByPrimaryKey call and another database access to load the employee name attribute. If the deployer had specified a group containing the employee name and employee ID fields, the findByPrimaryKey method and getEmployeeName() method could use the same group. In this case, only one database access is needed. If the getEmployeeName() method is called again within the same transaction, the database access is skipped and the name is retrieved from the cache. This feature is effective, as it does not require that every field be retrieved, which is an expensive operation. This example advocates the recommended practice for the bean developer to structure coarse-grained Entity beans, making many small and efficient data access calls. One of the limitations of the BMP programming model is that it cannot match the CMP container's cooperation between finders and subsequent business method calls.

Field groups are specified in the Weblogic-rdbms-cmp-jar.xml file, as follows :

 <Weblogic-rdbms-bean> <name-name>EmployeeBean</name-name> <field-group> <group-name>emp-data</group-name> <cmp-field>Employee-ID</cmp-field> <cmr-field>EmployeeName</cmr-fields> </field-group> </Weblogic-rdbms-bean>

Optimizing Finders

The general contract for the findByPrimaryKey method is to ensure that the primary key exists and then return the primary key. Usually, the bean implementation selects the primary key from the database, and if a row is returned, the implementation knows the key exists. It is possible to optimize finder method invocations if the primary key is not already loaded in the same transaction.

For example, in Transaction-1 you issue a Find for the primary key denoted as PK1. At a later point in the process, a subsequent Find operation is again issued for PK1. The EJB container will recognized the earlier request and return the cached primary key of PK1. This is true for both CMP and BMP beans. In general, BMP finders simply access the database and return the associated primary keys. However, CMP finders can be better optimized, as discussed in the following section.

Optimizing CMP Entity Beans

It is a common misunderstanding among EJB developers that because BMP gives the bean writer explicit control over data access logic, BMP should outperform container-generated code. WebLogic Server's EJB 2.0 CMP engine achieves high performance by minimizing the number of round-trips between the EJB container and the database.

In a BMP entity bean, finder methods return primary keys from the database. The EJB container then creates a bean reference for each key and returns the references to the client. Note that the BMP finder can return only the primary key.

Optimizing BMP Entity Beans

If your BMP bean handles a large amount of data, it is possible to optimize database reads by implementing the ejbLoad method to read a subset of data or skip the read operation entirely. For instance, when a business method call is invoked, the entity bean needs to determine whether the data has already been read in this transaction. You can find this information by storing a bitmask, with one bit for each field, in the entity bean. Then ejbLoad sets a bit when its associated field is loaded, and the remaining bits are cleared. The business method checks the bitmask to determine whether it needs to load its data.

You should use the bitmask technique carefully . If every CMP field is brought in only on demand, the entity bean can exhibit extremely poor performance. The bitmask approach is best used when entity bean fields are very large, but seldom used. For example, an Employee entity bean might have a picture stored as a binary image in the bean. Assuming that many users of this entity bean do not display the picture, the bean limits the amount of data transferred from the database by loading the picture only when requested . If the remainder of the bean's fields are simple relational types or are frequently used, they should all be loaded in the ejbLoad call. The memory overhead of loading a few extra integers is minimal compared to the cost of extra trips to the database.

Optimizing Database Writes for BMP Entity Beans

Two database write optimizations can be performed with BMP entity beans:

Avoiding database writes in read-only transactions
Using tuned writes to store only modified data in the bean

Read-only transactions are common in real-world e-commerce applications because entity bean data is often used to populate Web pages. Tuned writes are used to write only modified fields, instead of every field in the entity bean. Both optimizations are implemented using the same technique and are generally combined.

The entity bean keeps a bitmask with one bit per BMP field. The bit mask is cleared in the ejbLoad callback. As with EJB 2.0 CMP beans, the bean should access its fields through get and set methods, which simplifies porting EJBs to CMP. With BMP beans, the bean writer implements the get and set methods. The get method simply returns the associated field. The set method sets the bitmask field associated with the EJB field and then assigns the value. The ejbStore implementation first checks the bit mask. If the bit mask is all zeros, ejbStore returns immediately without writing to the database. The ejbStore method can also perform tuned writes because the bit mask shows which fields were modified in the transaction.

A common error when implementing this pattern is to clear the associated bit in the get method. This implementation does not work well because the value can be read again after it has been written. If the bean writer clears the mask in the get method, the previous database writes are lost.

Consider Using Tuned Updates for CMP 1.1 Beans

EJB CMP 2.0 automatically supports tuned updates because the container receives get and set callbacks when container-managed EJBs are read or written. WebLogic Server now supports tuned updates for EJB 1.1 CMP to improve performance. When ejbStore is called, the EJB container automatically determines which container-managed fields have been modified in the transaction. Only modified fields are written back to the database. If no fields are modified, no database updates occur.

With previous versions of WebLogic Server, you could write an isModified method that notified the container when the EJB 1.1 CMP bean had been modified. The isModified method is still supported in WebLogic Server, but we recommend that you no longer use isModified methods; instead, you should allow the container to determine the update fields.

This feature is enabled for EJB 2.0 CMP, by default. To enable tuned EJB 1.1 CMP updates, make sure you set the following deployment descriptor element in the weblogic-cmp-rdbms-jar.xml file to true:

 <enable-tuned-updates>true</enable-tuned-updates>

You can disable tuned CMP updates by setting this deployment descriptor element as follows:

 <enable-tuned-updates>false</enable-tuned-updates>

In this case, ejbStore always writes all fields to the database.

Consider Using Read-Only/Read-Mostly Entity Beans

Read-only entity beans are very powerful and effective in the following environments:

Database reads dominate, but there are very few database updates.
Your application can tolerate slightly stale data.

WebLogic Server's EJB container provides a ReadOnly concurrency strategy for read-only type entity beans to provide improved performance for read- intensive applications. By specifying the ReadOnly value for the <concurrent-strategy> tag in the weblogic-ejb-jar.xml deployment descriptor, the EJB deployer can specify a timeout value ( <read-timeout>) for the entity bean's cached state, allowing the cache to be refreshed after the timeout is expired .

Note

Previously, read-only entity beans used an exclusive locking strategy to keep the distributed cache coherent .

Like any other entity bean, the bean state is refreshed with the ejbLoad method call. When a method call is made on a read-only entity bean, the EJB container checks whether the associated data is older than its timeout value. If the timeout period has elapsed, ejbLoad is called and the bean state is refreshed. Because read-only entity beans do not have database updates, ejbStore will never be called. Also, read-only entity beans never participate in transactions.