4.3 Code

I l @ ve RuBoard

In the previous section, I mentioned the principle of "design first and optimize later." Code is where optimization is realized. Most of the following coding best practices are aimed at improving database performance or getting around deficiencies in different JDBC drivers or database engines.

4.3.1 Use PreparedStatement

JDBC provides three kinds of statement classes: Statement , PreparedStatement , and CallableStatement . All too often, however, discussions of which kind of statement to use focus purely on performance. That's not to say that the choice of statement class doesn't impact performance. As a general rule, CallableStatement instances based on database-stored procedures provide the best performance, with PreparedStatement instances close behind. Finally, Statement instances generally perform significantly worse than the other kinds of statements. Focusing purely on performance, however, disguises two important facts:

  • The difference between CallableStatement and PreparedStatement is generally negligible.

  • There are nontrivial situations in which a Statement gives you optimal performance.

The primary difference in performance among the different statement types concerns how the SQL parsing occurs. With Statement -based calls, the driver sends the SQL to the database, which parses it every time you execute the statement. Calls through a PreparedStatement (as the name implies) are "prepared" before they are executed. In other words, the driver sends the SQL to the database for parsing when the statement is created but before it is executed. By the time you call execute( ) , the statement has been preparsed by the database. And if you're truly lucky, the same SQL has already been executed, and no parsing even needs to occur. Finally, a CallableStatement is "precompiled" in the database. The actual statement is stored in the database and referenced by your JDBC application by name. Consequently, there is no overhead for the initial call, and your application has the power to tweak the database configuration to support that call.

Sounds good, right? Wrong. Unfortunately , everything described here is the ideal. The truth is that every database engine handles these different statements in very different ways. Under mSQL, for example, a prepared statement will always perform a bit worse than a regular statement because the mSQL driver is only simulating prepared statement behavior ”mSQL does not inherently support prepared statements. Furthermore, many databases do not actually precompile stored procedures and thus provide absolutely no performance benefits at all.

In spite of the few odd situations in which a Statement does perform best, you should unconditionally avoid the use of Statement in production systems. Not only does it generally perform more slowly than other kinds of statements, but it also makes for ugly, error-prone JDBC. Consider the following SQL code fragment:

 conn = ds.getConnection(  ); stmt = conn.createStatement(  ); stmt.executeUpdate("UPDATE Person " +                    "SET lastName = '" + lastName + "', " +                    "firstName = '" + firstName + "', " +                    "biography = '" + clean(biography) + "' " +                    "WHERE personID = " + personID); 

First and foremost, this code is a mess to read with all the string additions. Readability is just a start to the maintenance problems of this very simple example, however. In this example, the biography is assumed to be free-form text pulled from user -supplied data. You cannot trust it to validly fit within the single quotes that signify a SQL string literal. What if there is a possessive in the biography, such as "his wife's car?" In that case, your application breaks. You therefore have to include code ”captured in a nonportable clean( ) method in this object ”to clean out any special SQL symbols and escape codes. And to make matters worse, what if strings such as the last name and first name need cleaning that you did not consider ahead of time?

Fortunately, you can do the same thing much more elegantly with a prepared statement:

 conn = ds.getConnection(  ); stmt = conn.prepareStatement("UPDATE Person " +                       "SET lastName = ?, firstName = ?, biography = ? " +                       "WHERE personID = ?"); stmt.setString(1, lastName); stmt.setString(2, firstName); stmt.setString(3, biography); stmt.setLong(4, personID); stmt.executeUpdate(  ); 

The only string concatenation in this code is done for readability. Because concatenation of string literals occurs at compile time, you also get a nice performance boost that has nothing to do with statement versus prepared statement processing. The real benefit here, however, is that your code elements are more compact, and there is a significant reduction in the risk of bugs . You no longer have to write your own database-specific routine to remove symbols from the raw strings ”the driver or database handles that for you. In short, the prepared statement code has significantly increased:

  • Readability

  • Reliability

  • Portability

Not performance? Maybe the PreparedStatement performs better; maybe the Statement performs better. It depends on the driver and database you are using. To be honest, this exact JDBC code can yield wildly different results depending on the driver/database combination and how many times you call this prepared statement. The only thing you really know for sure is that the Statement will never significantly outperform the PreparedStatement . If you are making repeated calls, you can bet that the prepared statement will significantly outperform the statement.

Of course, if the elegant PreparedStatement code is such a good thing, shouldn't the more elegant stored procedure code be even better? For example:

 conn = ds.getConnection(  ); stmt = conn.prepareCall("{call update_person(?, ?, ?, ?)}"); stmt.setString(1, lastName); stmt.setString(2, firstName); stmt.setString(3, biography); stmt.setLong(4, personID); stmt.executeUpdate(  ); 

This code doesn't contain long strings requiring concatenation for readability's sake! In fact, this code almost turns database access into a function call, as if the database were some black box. In general, this level of abstraction is the desirable form of an interface between two separate technologies. Unfortunately, there are absolutely no clues in this code as to what you are doing to the database. In a persistence engine, this separation makes maintenance difficult. The reason for this problem is that JDBC is the proper demarcation between the database and the application. The SQL is part of the persistence code and therefore belongs with the rest of the persistence code, not with the data storage logic of the database engine.

In spite of my recommendations to avoid stored procedures, there are valid reasons for using them. Specifically, stored procedures can significantly speed up some very complex database operations and encapsulate complex transaction logic. The rule of thumb in these situations is to write the application using prepared statements first and identify the bottlenecks and complexities. Once you have identified them, manage them by moving special cases to the database in the form of stored procedures. If the changes are designed to improve performance and are successful, keep the changes just in those bottlenecks. Also, stored procedures allow the DBA to handle the database logic and allow the programmer to use just the exposed hooks. But, no matter what, never resort to placing business logic into stored procedures!

Beyond abstract platitudes about the proper separation of systems behind clean interfaces, however, stored procedures also have one serious drawback: poor portability. Though JDBC provides a database-independent form for stored procedure calls, different database engines do not support stored procedures in even remotely similar fashions . Access times for retrieving data from a database in Oracle through PL/SQL can be wildly different from retrieving data from SQL Server through TransactSQL. In Oracle, for example, you need to use out parameters. TransactionSQL, however, supports retrieving data as result sets. Finally, the underlying stored procedure languages are very different. You will consequently have to rewrite all stored procedures to port an application to a new database engine.

4.3.2 Leverage PreparedStatement Pooling

You are guaranteed to see a performance boost from prepared statements only when the underlying database supports prepared statements and you repeatedly use the same PreparedStatement instance. Your application will probably receive a performance boost if you repeatedly use the same prepared SQL statement. However, a performance boost is very unlikely when you use a prepared statement only once.

To maximize the likelihood that you will see a performance boost in a cross-platform environment, you need to find ways to reuse your PreparedStatement instances. Unfortunately, pooling prepared statements is not quite as straightforward as pooling database connections. Because closing a connection closes any associated statements, you cannot pool statements across connection closures. The new JDBC 3.0 specification has stepped in to address the problem by supporting prepared statement pooling inside the specification.

The ability to use pooled prepared statements in JDBC 3.0 is hidden behind connection pooling. To take advantage of it, you must:

  • Use a JDBC 3.0 driver that supports pooled prepared statements.

  • Configure a data source for pooled connections that provide statement pooling.

Let's review how connection pooling works. When an application requests a database connection, it receives a logical connection that at some point becomes associated with a physical database connection. Note that the application never has direct access to the physical database connection. A call to close the connection closes only the logical connection ”the physical database connection remains open and returns to the connection pool. The same principle works for prepared statements. The code that you have always used to execute prepared statements works with prepared statement pooling. Use of statement pooling simply depends on proper data source configuration.

This best practice is not an absolute, as the costs of prepared statement pooling can outweigh the benefits. Specifically, if your application uses a significant number of distinct statements with little repeated calling of those statements, you will burden the system with an inordinate number of open statements. To get around this, you need to balance the repetition of the same statement against the number of statements your system will use. If the number of repeated calls to the same statement is significant, you should favor statement pooling. If your application consists of many calls to distinct statements, however, you should disfavor statement pooling.

The use of JDBC 3.0 is, as of yet, not widespread. What do you do if you cannot use a JDBC 3.0 driver? One alternative is to write your own statement pooling behind a JDBC 2.0 fa §ade! To accomplish this task, you need to write implementations of javax.sql.ConnectionPoolDataSource , java.sql.Connection , and java.sql.PreparedStatement . In fact, you will need two separate Connection implementations. On O'Reilly's FTP and web sites, an implementation based on a combination of the JDBC 3.0 specification and ideas from Morgan Catlin that aid in statement pooling for JDBC 2.0 applications are provided.

4.3.3 Question Using Nonstandard SQL

You should try to use only ANSI SQL in your JDBC calls. This best practice might seem obvious, but it is amazing how often people fall back on database-specific SQL simply because it is a nice crutch. On occasion, some behavior will demand proprietary SQL. For example, you may find that your project simply cannot live without the proprietary features of a specific database, and that is more important than portability. If this is the case, then so be it. Your first reaction, however, should always be to question your use of proprietary SQL and justify it to yourself and your team.

4.3.4 Delay Access to Metadata

Most people do not realize just how expensive metadata operations ”that is, calls to DatabaseMetaData , ResultSetMetaData , and ParameterMetaData methods ”can be. When calling for a result set's metadata, for example, many database engines need to finish pulling all rows in the current fetch batch before they can get the metadata for that result set. Consider the following code fragment:

 conn = ds.getConnection(  ); stmt = conn.prepareStatement("SELECT * FROM Person"); rs = stmt.excecuteQuery(  ); md = rs.getMetaData(  ); while( rs.next(  ) ) { 

For some database engines and drivers, you will not be able to read anything from the first row until the driver retrieves all the rows from the server. Of course, you cannot avoid calling metadata. The solution is therefore to delay accessing metadata until you need it.

4.3.5 Reference Columns by Number

This best practice is related to the previous one. Column names are part of the metadata for a result set. A JDBC driver might not have access to a column's name when you process the first row. If you are retrieving the data by column name, the retrieval will wait until the name is available to the driver. Because retrieving columns by name could penalize some database engines over others in terms of performance, the ability to retrieve columns by name was not even part of the original JDBC specification.

Most programmers prefer to retrieve results by name instead of by number because names are easier to remember, and the order of the parameters in the SQL can change without impacting code. This objection, however, is based on the faulty practice of including naked literals in code. In other words:

 id = rs.getLong(1); 


 id = rs.getLong("personID"); 

are equally improper. You should instead use:

 static private final int PERSON_ID = 1;      . . .      id = rs.getLong(PERSON_ID); 

With this format, it does not matter from a maintenance perspective whether you are using retrieval by name or number.

You can maximize the maintainability and performance of your code by placing the constants for the parameter values with a constant containing the SQL itself, as shown in Example 4-5.

4.3.6 Navigate a ResultSet Forward Only

This best practice is also related to the inconsistency in the way databases transfer results from the server to the client. Specifically, the only sure way to achieve the best database performance without optimizing your JDBC code for a specific database engine is to use only forward navigation with your results. In other words, you should ignore the support for scrollable result sets added in JDBC 2.0.

In fact, the need to do fancy navigation in a result set is generally reflective of an application design flaw. Except for batch programs, normal Java applications should simply grab data from the database and construct Java objects. Application processing should operate on those Java objects. Consequently, the application should not need to do anything except move forward through a result set instantiating Java objects from the data.

4.3.7 Close Statements and Result Sets

According to the JDBC specification, closing a database connection closes all open statement and result set resources associated with that connection. Theoretically, you should never have to close anything but connection instances unless you are intentionally keeping the connection open for a long time. Unfortunately, there are several drivers that do not appropriately clean up the underlying result set and statement resources unless you explicitly close them.

The following code can be used as a template for all JDBC access to ensure that all resources are properly cleaned up:

 PreparedStatement stmt = null; Connection conn = null; ResultSet rs = null;     try {     InitialContext ctx = new InitialContext(  );     DataSource ds = (DataSource)ctx.lookup("dsn");         conn = ds.getConnection(  );     // Do your database access here! } catch( SQLException e ) {     // Handle the error appropriately here. } catch( NamingException e ) {     // Handle the error appropriately here. } finally {     if( rs != null ) {         try { rs.close(  ); }         catch( SQLException e ) { // Handle}     }     if( stmt != null ) {         try { stmt.close(  ); }         catch( SQLException e ) { // Handle}     }     if( conn != null ) {         try { conn.close(  ); }         catch( SQLException e ) { // Handle}     } } 

This code ensures that no matter what awful things occur in your application, you will never leave any database resources open! The only truly controversial element in this code lies in the eating of the SQL exceptions when the resources are closed. If you like, you can do something with those exceptions; however, it is unlikely you will encounter such exceptions in a situation in which you will care and be able to do something about it.

4.3.8 Thread JDBC Code in Swing Applications

Swing events are not made to support complex application processing. In fact, if you perform any extensive processing in the same thread that triggered a Swing event ”the click of a button, for example ”you will create odd behavior in your user interface. Windows will stop redrawing, and menu items will cease to behave as expected.

Any network calls, especially JDBC calls that trigger database activity, require too much processing to place into a Swing event thread. Your Swing applications should therefore disable the appropriate user interface elements and start a new thread when the user triggers some event that needs to go to the database. All database access should take place in the spawned thread. When that thread is done, it should notify the Swing event queue that it is time to reenable the appropriate user interface elements.

The book Database Programming with JDBC and Java (O'Reilly) goes into more detail on the topic of database access from Swing access than is appropriate for this chapter. You can see some example code, however, by downloading its examples from ftp://ftp.oreilly.com/pub/examples/java/jdbc2.

4.3.9 Minimize Database Calls and Perform Lazy-Loading

One of the nastiest things about the object/relational mismatch is the fact that proper object-oriented coding implies that you should make an inordinate number of database calls. The canonical object-oriented search looks like this:

  • Get all the primary keys from the database that match search criteria.

  • For each match, go to the database to load its primary attributes.

  • Also for each match, go to the database once for each object attribute.

This degree of database access is not only unnecessary, but it will also render your application totally unusable. Therefore, this area is where compromising on object-oriented principles is not only acceptable, but also a requirement.

As you've probably guessed, your Java application should do whatever it can to minimize the number of trips it makes to the database to support a single logical operation. Selecting the columns necessary to instantiate objects as part of the search will reduce the number of calls immensely. In the true object-oriented way, a search returning 100 rows results in at least 101 calls to the database. By selecting the necessary columns in your search, you can potentially drop this number to just the single search query.

Of course, selecting all the columns when you are probably going to use only one or two can itself be a performance problem. Your application needs to balance the need to minimize calls to the database with the reality of how many rows you will actually use.

Another trick to create the illusion of good performance is to perform lazy-loading on any extra database calls that must happen. In other words, return from a JDBC search immediately with objects incompletely populated from whatever the database could offer from a single query. Then in a background thread (or perhaps on demand), the individual objects can populate the missing data without impacting the speed with which search results are returned, no matter how many database calls are necessary to populate the objects with data. A Person object that needs to make a second call to get all addresses associated with that person might have a getAddresses( ) method that looks like this:

 public synchronized Iterator getAddresses(  ) {     if( addresses =  = null ) {             // This method will go to the database to load the addresses.         addresses = Address.getAddressesFor(this);     }     return addresses.iterator(  ); } 

4.3.10 Cache Lookup Tables

This practice is a kind of corollary of the last best practice. Lookup tables are tables that contain fairly static lists such as states, countries , type codes, etc. Applications generally need to reference this data frequently. Because the data does not change, you can avoid repeated lookups against these tables by caching the data in memory.

4.3.11 Use wasNull( ) Checks

SQL NULL is a very useful tool that, unfortunately, is frequently abused. A good database application uses NULL in situations in which a row lacks a value. For example, if you have an application in which one of the attributes of the Person table is numPets , you should use NULL to indicate that you do not know how many pets they have. Without NULL , you would have to use a bogus value such as -1 .

Your JDBC code should always be checking for possible NULL values from nullable columns using the wasNull( ) method in ResultSet . The actual value that JDBC returns for NULL values in getter methods is undefined. Your code to check the number of pets, for example, should look like the following fragment:

 conn = ds.getConnection(  ); stmt = conn.prepareStatement("SELECT numPets FROM Person WHERE personID = ?"); stmt.setLong(1, 13933L); rs = stmt.executeQuery(  ); if( rs.next(  ) ) {     pets = rs.getInt(1);     if( rs.wasNull(  ) ) {         pets = -1; // Java has no concept of null for numerics!     } } 
I l @ ve RuBoard

The OReilly Java Authors - JavaT Enterprise Best Practices
The OReilly Java Authors - JavaT Enterprise Best Practices
Year: 2002
Pages: 96

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net