4.4 General Database | The OReilly Java Authors - JavaT Enterprise Best Practices

I l @ ve RuBoard

You can do only so much in Java and JDBC. In the end, your application is at the mercy of the database supporting it. Many best practices are limited to specific database engines. A solid best practice as a database architect is to become intimately familiar with the strengths and quirks of your database of choice. With that in mind, however, I will close this chapter by providing some general best practices for approaching any database engine.

4.4.1 Always Optimize SQL Before Other Optimizations

The natural tendency in JDBC programming is to tweak your Java code. The truth is you will get the most bang for your buck in SQL optimizations. Small things such as indexing the columns in your WHERE clauses will do more for your application performance than anything you can do in Java. A good way to find out what is happening with your SQL code is to use your database engine's command-line utility and run the SQL through the EXPLAIN SELECT command.

The information the EXPLAIN SELECT command provides is database-dependent. Whatever your database, it should tell you some basic things about how it is trying to execute your query. Is the query utilizing indexes fully, or is it doing multiple table scans for what should be a simple query?

4.4.2 Do Not Store Binary Data in Database

Databases are horrible places for binary data. The database engine ends up being a nasty middleman for pulling large chunks of data. Furthermore, it is not particularly optimized for the unique needs of binary data. Unfortunately, moving the data to the filesystem and maintaining pointers to the filesystem in the database puts data integrity at risk. What if someone deletes the file, but the record pointing to the file remains in the database?

For most applications, the data integrity issue is not a huge problem. Digital asset collections, however, can become unmanageable when you store the binary data on the filesystem. These specialized applications should instead use a digital asset management system to manage the binary data.

4.4.3 Normalize First, Denormalize Later

This principle is the database equivalent of design first and optimize later. Unfortunately, many Java programmers who follow the design-first -and-optimize-later strategy fail to normalize first and denormalize later. Instead, the tendency is not to spend much time on the data model at all and just use what works with the current object model.

In data modeling, you will see the best results from taking your design to at least the third normal form until the model is complete. Only when you have a solid data model is it safe to begin denormalizing in places where overnormalization severely impacts performance.

4.4.4 Do Not Use Primary Keys with Real-World Meaning

We spent a lot of time earlier in the chapter discussing a scheme for generating unique identifiers. You would not have to go to all that trouble if you used something natural such as a social security number or an email address. The use of such identifiers as primary keys, however, is a spectacularly bad idea, and is therefore our final best practice.

A primary key identifier should identify something about the thing it represents that can never change. Email addresses and even social security numbers can change. In fact, just about anything with real-world meaning can one day change. The fact that social security numbers rarely change is irrelevant. If the attribute can ever change ”even under the most remote of circumstances ”it makes a poor primary key.

The best thing for an application to use is otherwise meaningless numeric identifiers as primary key identifiers. Because their sole raison d' tre is to identify a row in a specific database, such identifiers will never need to feel the pressure to change. Better yet, by using meaningless numbers, you can pick a data type that will work best in your database as a key. In relational databases, the best data type for a unique identifier is a numeric type. I recommend the use of 64-bit numbers to avoid running out of identifier space.

I l @ ve RuBoard