Working with Relational Databases | Microsoft Office PowerPoint 2007 On Demand

As relational databases are used in most J2EE applications, let's examine some important RDBMS characteristics and capabilities, which we should take into account when formulating a data access strategy.

Referential Integrity

All RDBMSs offer sophisticated mechanisms for enforcing referential integrity, such as constraints (which prevent the addition of data that does not honor existing relationships) and cascade delete (in which related data is automatically deleted when its "parent" row is deleted).

In my experience, it's unusual for a J2EE application to have a database to itself in an enterprise system. As databases are a good means of communication between different applications, rather than extensions of J2EE applications, this makes sense. Hence RDBMS-based referential integrity mechanisms are essential in most enterprise applications, and we should not rely on our application'sjava code as the sole guardian of data integrity.

Note

EJB 2.0 offers a referential integrity mechanism for entity beans with CMP, as do many other 0/R mappings. This is useful, but it's only a supplement to referential integrity enforced by the RDBMS.

Stored Procedures, Triggers, and Views

Most RDBMSs offer stored procedures: operations on stored data that run within the database. Unfortunately, the language used differs between databases, although stored procedure languages tend to be SQL-oriented, such as Oracle's PL/SQL.

Many RDBMSs offer triggers: stored procedures associated with a particular table that is automatically invoked on an event such as insertion, and don't need to be called by application code.

Clearly, stored procedures and triggers can be abused. While an obvious use is to ensure referential integrity, what about the use of a trigger to enforce a business logic constraint? For example, what if a trigger was to veto an attempt to add a row to a order table based on the business rule that orders over $700 can only be accepted from Albania if the customer has previously made (and paid for) at least three orders over $50? This is business logic, and shouldn't be in the database in a J2EE application. Such business rales should be enforced by business objects.

Views – virtual tables, usually based on a query executed transparentiy as the view is accessed – can be useful to simplify queries and enable O/R mappings to joins. However, the level of support depends on the underlying database (for example, join views are partially updateable in Oracle 7.3 and later, but not in Cloudscape 3.6.). Usually, views are suitable only for backing read-only objects.

These implementation-specific features of RDBMSs have a place in J2EE applications, so long as the following criteria are satisfied:

They produce a real benefit, by reducing the amount and complexity of code or producing a worthwhile performance gain
They leave business logic injava business objects
They are hidden behind a portable abstraction layer

Stored procedures are particularly important, and deserve more detailed discussion.

J2EE developers (and Java developers in general) tend to hate stored procedures. There is some justification for this:

Stored procedures aren't object-oriented. Readers familiar with Oracle will counter that Oracle 8.1.5 introduced Java stored procedures. However, they don't solve the O/R impedance mismatch as much as move it inside the database, and they don't foster truly object-oriented use of Java.
Stored procedures aren't portable. Support for stored procedures varies much more than SQL dialects between RDBMSs. Nevertheless, it would be rare to lose the entire investment in a set of stored procedures on migrating from one RDBMS to another.
If stored procedures grow complex, they may reduce an application's maintainability.

Some other common objections have less validity:

"Using stored procedures puts business logic in the wrong place"
If we distinguish between persistence logic and business logic, the idea of putting persistence logic in a relational database makes perfect sense.
"Using stored procedures means that J2EE security may be compromised"
Security is a matter of business logic, not persistence logic: if we keep our business logic in J2EE, there is no need to restrict access to data.
"The database will become a performance bottleneck"
Especially, if a single database instance is serving a cluster of J2EE servers, the processing of stored procedures may limit overall performance. However, there are trade-offs to consider:
In my experience, it's much commoner to see network performance between application server and database limit the overall performance of a J2EE application than the performance of a well-designed database.
There's no reason to perform an operation in a stored procedure rather than a J2EE component unless the operation can be done more naturally and efficiently inside the database server than in Java. Thus if we've implemented an operation efficiently inside the RDBMS and it still eats the server's CPU, it probably indicates that the RDBMS is badly tuned or needs to run on better hardware. Performing the same heavily-requested operation less efficiently in the application server will probably result in a more severe problem, and the need for more additional hardware.

The use of stored procedures from J2EE applications is an area where we should be pragmatic, and avoid rigid positions. I feel that many J2EE developers' blanket rejection of stored procedures is a mistake. There are clear benefits in using stored procedures to implement persistence logic in some situations:

Stored procedures can handle updates spanning multiple database tables. Such updates are problematic with O/R mapping.
A more general form of the first point) Stored procedures can be used to hide the details of the RDBMS schema from Java code. Often there's no reason that Java business objects should know the structure of the database.
Round trips between the J2EE server and the database are likely to be slow. Using stored procedures can consolidate them in the same way in which we strive to consolidate remote calls in distributed J2EE applications to avoid network and invocation protocol overhead.
Stored procedures allow use of efficient RDBMS constructs. In some cases, this will lead to significantly higher performance and reduce load on the RDBMS.
Many data management problems can be solved much more easily using a database language such as PL/SQL than by issuing database commands from Java. It's a case of choosing the right tool for the job. I wouldn't consider using Perl in preference to Java to build a large application; neither would I waste my time and my employer's money by writing a text manipulation utility in Java if I could write it in Perl with a fraction of the effort.
There may be an investment in existing stored procedures that can be leveraged.
Stored procedures are easy to call from Java code, so using them tends to reduce, rather than increase, the complexity of J2EE applications.
Very few enterprises with existing IT shops have ported all their applications to J2EE, or are soon likely to. Hence persistence logic may be more useful in the RDBMS than in the J2EE server, if it can be used by other non J2EE applications (for example, custom reporting applications or in-house VB clients).

The danger in using stored procedures is the temptation to use them to implement business logic. This has many negative consequences, for example:

There is no single architectural tier that implements the application's business logic. Updates to business rules may involve changing both Java and database code.
The application's portability will reduce as stored procedures grow in complexity.
Two separate teams (J2EE and DBA) will share responsibility for business logic, raising the possibility of communication problems.

If we distinguish between persistence logic and business logic, using stored procedures will not break our architecture. Using a stored procedure is a good choice if it meets the following criteria:

The task cannot be accomplished simply using SQL (without a stored procedure). There is a higher overhead in invoking a stored procedure using JDBC than in running ordinary SQL, as well as greater complexity in the database.
The stored procedure can be viewed as a database-specific implementation of a simple Java interface.
It is concerned with persistence logic and not business logic and does not contain business rules that change frequently.
It produces a performance benefit.
The code of the stored procedure is not unduly complex. If a stored procedure is appropriate, part of the payoff will be a simpler implementation than could have been achieved in a Java object running within the J2EE server. Especially in an organization with DBA resources, 10 lines of PL/SQL will prove easier to maintain than 100 lines of Java, as such a size discrepancy would prove that PL/SQL was the right tool for the job.

Important

Do not use stored procedures to implement business logic. This should be done in Java business objects. However, stored procedures are a legitimate choice to implement some of the functionality of a DAO. There is no reason to reject use of stored procedures on design grounds.

Note

A case in point: In late 2001, Microsoft released a .NET version of Sun's Java Pet Store which they claimed to be 28 times faster. This performance gain appeared largely due to Microsoft's data access approach, which replaced entity beans with SQL Server stored procedures.

I found reaction to Microsoft's announcement in the J2EE community disappointing and worrying (see, for example http://www.theserverside.com/discussion/thread.jsp?thread_id=9797). J2EE purists reacted in horror at Microsoft's use of stored procedures, arguing that the Java Pet Store reflected afar superior design, with the benefits of "an object oriented domain model" and portability between databases. Most of all, the purists were concerned about the likely corrupting effect of the Microsoft benchmark on managers, who should clearly never be allowed to determine how fast an application should run.

I'm an enthusiastic advocate of OO design principles, as you know after digesting Chapter 4, but I read such responses with incredulity. Design is a tool to an end. Real applications must meet performance requirements, and design that impedes this is bad design.

Also, the panic and denial was unnecessary. The benchmark did not prove that J2EE is inherently less performant than .NET. The architectural approach Microsoft used could equally be implemented in J2EE (more easily than Sun's original Pet Shop example, in fact), but it did prove that J2EE orthodoxy can be dangerous.

Important

Isn't the use of stored procedures going back to the bad old days of two-tiered applications? No, it's not; two-tiered solutions went to the database to perform business logic. Stored procedures should only be used in a J2EE system to perform operations that will always use the database heavily, whether they're implemented in the database or in Java code that exchanges a lot of data with the database.

RDBMS Performance Issues

The heavier a J2EE application's use of an RDBMS, the more important it will be to ensure that the schema is efficient and the database is tuned.

RDBMS Performance Tuning

RDBMSs, like J2EE application servers, are complex pieces of software: much more complex than the vast majority of user applications. As with J2EE servers, application performance can be significantly affected by a host of tuning options that require expert knowledge. Hire a DBA.

Clearly, performance tuning is a losing battle if:

The schema cannot efficiently support common access requirements.
The queries used are inefficient.
The queries or the way in which they are run cause excessive locking in the database. Unfortunately, queries that are efficient in one RDBMS may prove to create contention in others: SQL is neither syntactically nor semantically anywhere near as portable as Java.

One potential quick win is the creation of indexes. Indexes, as the name implies, enable the RDBMS to locate a row very quickly, and based on values in one or more columns. These are automatically created on primary keys in most databases, but may need to be created to support some use cases regardless of what data access strategy we use in our Java code. For example, if we have several million users in our user table. Data for each user includes a numeric primary key (indexed by default), an e-mail address, and password. If the e-mail address is the user's login, we'll need to locate rows quickly by e-mail address and password when a user attempts to login. Without an index on these columns, this will require a full table scan. On a table this size this will take many seconds, and heavy disk access. With an index on e-mail and password, locating a user's row will be almost instant, and load on the RDBMS minimal.

Denormalization

Occasionally, certain queries remain slow regardless of query optimization and performance tuning, usually because they involve multi-table joins. In such cases, there is a last resort: denormalization, or the holding of redundant data in the database for performance reasons. Typically, this greatly reduces the complexity of joins required. I assume that the reader is familiar with the relational concept of normalization. This is essential knowledge for any J2EE developer, so please refer to a relational database primer if necessary.

Denormalization carries serious risks. It increases the size of the database, which may prove a problem if there is a lot of data. Most importantly, it can impair data integrity. Whenever something is stored more than once, there's the potential for the copies to get out of sync.

Sometimes it's possible to denormalize in Java, rather than in the database. This creates non-permanent redundancy, so is preferable in principle. This is usually only an option when we can cache a manageable amount of reference data in a partially read-only data structure: Java code may be able to navigate this structure more efficiently than it could be in the database.

Occasionally, denormalization is attempted simply to enable J2EE applications to work more easily with an RDBMS. This is seldom a good idea, because of the risk involved.

Important

Don't denormalize a relational database purely to support a J2EE application. The database schema may well outlive the J2EE application, in which case so will the costs of denormalization.