8.4 Data Access Layer | The OReilly Java Authors - JavaT Enterprise Best Practices

I l @ ve RuBoard

Information that your applications will be presenting to the end user ”such as page content, menu items, and error messages ”will not always be static. Dynamic data will need to be stored and retrieved from a persistent storage mechanism such as a relational database. There are things you should be aware of that might not be directly evident when setting up your application's data storage, or when operating with this data.

8.4.1 Configure Your Database for Internationalization

Localization support will vary depending on your database vendor. Typically, you will be interested in knowing which character encodings are supported by your database, as this will affect the type of data that you can store in the database. This setting is usually something that needs to be configured at the database level or upon the creation of tables in a particular database.

Oracle, for example, is typically configured to encode data in tables in UTF-8 encoding. This allows you to store a good number of character sets using this particular encoding. In preparing your database for localization, using UTF-8 as the encoding is, to quote Martha Stewart, "a good thing." In presenting the data to the user, the data can be converted from UTF-8 to another character encoding for display.

However, what if you happen to be using PostgreSQL? Out of the box, PostgreSQL does not support localization ”this must be enabled when the PostgreSQL server is built from the source distribution into compiled binaries. However, one of the limitations of this database is that, to quote the PostgreSQL documentation on localization, ^[12] "PostgreSQL uses the standard ISO C and POSIX-like locale facilities provided by the server operating system." Therefore, to achieve proper localization for your application running over a PostgreSQL database, you must involve the system administrator to install the proper language packages.

^[12] The localization documentation for PostgreSQL can be found online at http://www.postgresql.org/idocs/index.php?charset.html.

Another popular database is MySQL. However, by default MySQL uses ISO-8859-1 (Latin-1) and sorts according to the rules of Swedish/Finnish. ^[13] We will discuss collation at the business object or database level in the next section.

^[13] The localization documentation for MySQL can be found online at http://www.mysql.org/documentation/mysql/bychapter/ in Section 4.6, "MySQL Localisation and International Usage."

The localization options you choose for your database when it is installed and configured will affect operations such as:

The sort order when using the ORDER_BY clause
Operators such as LIKE that do pattern-matching -type operations
Character conversion or escaping functionality
Table indexes

If you change localization information about your database, you will typically have to reindex tables.

In summary, be sure to check with your database vendor and database administrator to determine what localization information is configured for the system and for the database on which you will be deploying your application.

8.4.2 Use Standard Java Classes (Instead of Database Constructs) for Collation and for Sorting Database Information

Suppose you've properly configured your database to handle the localizations required by your application. As we mentioned in the previous section, the localization information contained within a database can be used to determine things such as the collation or sort order for data returned when the SQL ORDER_BY clause is used to retrieve data.

You might or might not be familiar with the standard Java libraries for collating data. Earlier, in Section 8.3.1, we mentioned the java.text package, which contains the classes that will allow you to control locale-based collation. In applications that you designed to be internationalization-ready, you rely on objects in the business object layer to control the collation and sorting of data. There are a number of reasons why this is advantageous over relying on your database to handle these operations for you.

First, the rules for collation and sorting within the database are typically not configurable. You rely on the operating system or database vendor to investigate the locales it supports to develop the proper rules for collation. However, these rules are not exposed from the database. By allowing your business object layer to perform locale-based collation, you can take advantage of classes such as java.text.RuleBasedCollator for fine-grained control over the rules for sorting.

Another reason is that although most databases today offer the ability to do collation at the database level, some do not. Therefore, if you rely on the database to perform the correct sorting of data returned to the user, you might be in for a surprise. In one instance, your database might not be able to do collation within the database. This dictates that your business object layer must be responsible for performing this type of operation. In another instance, depending on the localizations offered by your database vendor, you might want a particular localization that is not supported by your database. This might require adding the proper language support to the operating system, or it can require a new installation of your database that is configured properly.

Finally, the internal collation operations might not be optimized for performance. For example, the PostgreSQL database documentation states, "The only severe drawback of using the locale support in PostgreSQL is its speed." A frequent performance bottleneck in enterprise applications is in accessing dynamic data from a database. Although, in many instances in which performance degradation in an application is due to the inadequate use of (or lack of) connection pooling mechanisms, issues with database performance such as collation might not be directly evident. Also, classes such as the java.text.CollationKey class allow for faster sorting with certain types of data (e.g., strings).

8.4.3 Design Your Database Schema to Be Internationalization-Ready

Dynamic data such as catalog items present another challenge when designing your application for internationalization. Databases are meant to store large amounts of data that change often, whereas resource bundles are not meant to collect such information. Therefore, it is imperative that you design for internationalization in the data model you choose for the database that will store data for your application.

The simplest example to understand with regard to data modeling is a product catalog. Imagine for a moment a very simplified version of Amazon.com or any web site that sells goods over the Internet. Products in the catalog contain a number of attributes that will vary depending on the locale, such as the product description or product image. If your application is internationalized, each product will have a one-to-many relationship with the localized attributes. In other words, a product will contain a description for each localization instance.

Designing your schema appropriately is a challenge in itself. Let's look at a number of alternatives for modeling items from a catalog. Tables Table 8-1 through Table 8-6 offer three different models for a product entity that contains three attributes: a product ID, the quantity on hand, and a description. In this example, the description will need to be localized for the locales supported by your application.

Table 8-1. Product entity data model alternative #1 (PRODUCT)

Attribute	Data type and size
`PRODUCT_ID`	`INTEGER`
`QUANTITY_ON_HAND`	`INTEGER`
`DESCRIPTION_EN_US`	`VARCHAR2(5000)`
`DESCRIPTION_FR_FR`	`VARCHAR2(5000)`

Table 8-2. Product entity data model alternative #2 (PRODUCT)

Attribute	Data type and size
`PRODUCT_ID`	`INTEGER`
`QUANTITY_ON_HAND`	`INTEGER`

Table 8-3. Product entity data model alternative #2 (PRODUCT_DESCRIPTION_EN_US)

Attribute	Data type and size
`PRODUCT_ID`	`INTEGER`
`DESCRIPTION`	`VARCHAR2(5000)`

Table 8-4. Product entity data model alternative #2 (PRODUCT_DESCRIPTION_FR_FR)

Attribute	Data type and size
`PRODUCT_ID`	`INTEGER`
`DESCRIPTION`	`VARCHAR2(5000)`

Table 8-5. Product entity data model alternative #3 (PRODUCT)

Attribute	Data type and size
`PRODUCT_ID`	`INTEGER`
`QUANTITY_ON_HAND`	`INTEGER`

Table 8-6. Product entity data model alternative #3 (PRODUCT_DESCRIPTION)

Attribute	Data type and size
`PRODUCT_ID`	`INTEGER`
`LOCALE`	`CHAR(12)`
`DESCRIPTION`	`VARCHAR2(5000)`

Let's evaluate the alternatives outlined in the preceding tables. In Table 8-1, every attribute is named according to the localizations for the application. This is unattractive from the perspective that each time you want to add a localization to your application, your database schema will change. Also, your application code might change depending on how you retrieve information from the database. Even a sophisticated persistence engine might not accommodate such changes in a database schema. This also means that your schema becomes unwieldy as the number of attributes requiring localization increases .

In Tables Table 8-2 through Table 8-4, entities are divided into core attributes and localized attributes. In Table 8-2, the basic product information that will not change regardless of localization is stored. However, Tables Table 8-3 and Table 8-4 contain data that is appropriate for the product description in the English and French locales, respectively. This schema design presents many of the same issues already outlined for the previously mentioned alternative. Namely, you should avoid any changes to application code and/or schema design when adding a new localization to your application.

The last alternative for product entity data modeling is shown in Tables Table 8-5 and Table 8-6. This is our recommended approach for storing localized information in your database. Here, the product entity is again devided into its core attributes and a description table. However, the product description in Table 8-6 differs from that in Tables Table 8-3 and Table 8-4 in that LOCALE is now an attribute for the product description. This table attribute will be used in conjunction with the PRODUCT_ID attribute to form a primary key consisting of PRODUCT_ID , LOCALE . This means that accommodating another localization in your database should be as simple as adding the proper row to the PRODUCT_DESCRIPTION table in Table 8-6 with the appropriate locale identifier.

I l @ ve RuBoard