The Database Dilemma | Rapid J2EEв„ў Development: An Adaptive Foundation for Enterprise Applications

Enterprise-scale database servers are highly sophisticated software products capable of storing enormous amounts of data in a format that is fully optimized for blisteringly fast data access and retrieval. Given this level of sophistication, why do databases cause such frustration for the J2EE developer?

A number of factors, both cultural and technical, combine to make life harder for the developer:

Enterprise data is a valuable corporate asset, and its access and management is often carefully controlled.
Databases use relational rather than object technology, resulting in an objectrelational impedance mismatch.
Databases are sensitive to change, with database schema modifications having the potential to impact significantly any dependent systems.

To understand why databases present such barriers to producing solutions rapidly and agilely, let's consider each of these factors in turn.

Enterprise Data Is Valuable

Enterprise data is a valuable company asset that is strategic to an organization's ability to conduct its core business. Consequently, companies go to great lengths to safeguard the integrity and security of such vital corporate resources. Having enterprise data prized so highly and treated so carefully has its implications for development teams:

In many organizations, development teams and database teams are operated as separate groups.
Enterprise databases are unlikely to be used exclusively by a single application but by other systems and reporting tools as well.
As new systems replace old systems, applications must deal with legacy data structures.
Access to information may be restricted if the data is commercially sensitive.

It is worth considering these points in more detail, as each has an impact on how a development project is conducted.

Separate Development and Database Teams

Due to the importance of company data, it is common for companies to run a dedicated team of database administrators (DBAs) and data architects charged with safeguarding and administering all enterprise-level data repositories. This enterprise data team is often independent of the application development project team, but typically advise the project team on database design issues.

The implications of this distinction between application and data teams mean software architects do not have complete freedom to structure the data used by the application as they see fit. Instead, it is likely the data architect, whose role is to ensure the application needs of a single development project are not in conflict with corporate data standards and policies, must approve all database designs in some capacity.

This constraint may prevent the development team from adopting certain data access technologies that require data structure to be laid down according to a specific format. Moreover, the development team architects may find themselves having to work with a database structure that is not to their liking and may preclude the use of some data access technologies.

Although many architects might feel aggrieved not to have total control over the design of an application, it is reasonable that someone with specialized database design skills should be involved in the database design. The skill set for designing and maintaining a database is vastly different than that of designing J2EE applications.

This issue points to a cultural difference between teams using object-oriented methods to develop software and those charged with the integrity of corporate data. In his book Agile Database Techniques [Ambler, 2003], Scott Ambler suggests appointing someone to the role of mediator between the development and database groups. Ambler defines this role as the Agile DBA.

The role of the Agile DBA is to bridge the gap between the J2EE development team working with object-based techniques and the database group whose focus is on data modeling. By mediating between the two groups, the Agile DBA should ensure both teams are working toward the same goal, regardless of paradigm.

Shared Data Between Systems

Data that is truly enterprise-level is unlikely to be the sole preserve of a single system. Such data is usually accessed, and possibly even updated, by other applications within the organization.

Shared access is likely to come from multiple directions. Batch processes running reconciliation or data-fix jobs are common. Most organizations use commercial software tools for accessing data in order to generate reports. Consequently, a J2EE application is likely to share a database with batch processes and commercial reporting tools.

Shared database access between applications has implications for the data architect and the J2EE architect. The data architect must design the schema of the database according to the best practices of database design to ensure efficient use of the database for all systems, not just those using object-oriented technologies. Thus, the data architect is reluctant to violate the principles of good database design to meet the needs of a single application unless the application in question is of significant strategic importance to the business.

For the J2EE architect, sharing a database with other systems presents design issues, especially if considering the use of EJB-caching technology, as is offered by entity beans for addressing performance concerns.

Legacy Data Structures

Many new enterprise systems are either replacing older systems or being integrated with existing systems. Corporate data also tends to have a longer life span than software systems, so existing data must be migrated to newer systems as they come online.

Project teams therefore find themselves working with legacy data structures that are a hangover from an older system. In this situation, the team has no control of the structure of the data and must work with the design in place. Given the tendency for software to atrophy over time, legacy data structures often bear the scars of numerous enhancements, design changes, and emergency quick fixes. Such legacy data structures can result in data access code that is extremely difficult to write due to the convoluted nature of the data design upon which it must be based.

Data Security and Confidentiality

Where data is considered especially commercial sensitive or of a personal nature, access to data that is a copy of a production version may not be possible for the development team. This situation is most likely to arise when an existing database is being built upon.

A project team might be denied access to the very data it is expected to work with for a variety of reasons, including government laws regarding personal data.

This issue is of particular relevance where an organization employs the services of a separate software development company to undertake application development on its behalf. With this scenario, the commercially sensitive nature of the data may prevent the external development team from accessing any data representative of the production version. If this situation arises, additional tasks must be added to the project plan to cover the creation of suitable test data for the development and testing teams.

Note

Some companies have policies that mandate that all sensitive customer data, such as names, addresses, and phone numbers, be either stripped or obfuscated before being made available to development teams.

In addition, not being able to work with actual data and realistic data volumes presents some significant risks to the project. These risks relate to performance, since exploratory prototypes cannot be used to validate that the design will meet the performance criteria required of the system. Subtle differences between test data and actual data may also present problems when the system is released into a live environment.

All of the issues discussed so far take time to manage and thus may extend the timeframe of the project.

The Object-Relational Impedance Mismatch

Object-oriented and relational database technologies represent separate and distinct conceptual paradigms. The term object-relational impedance mismatch, or impedance mismatch, was coined in the early 1990s to formalize the problems endemic to moving between the object and the relational worlds.

The impedance mismatch problem occurs because object and relational techniques each work toward different objectives. Databases rely on the mathematical precision of relational algebra to structure business data in an efficient normalized form. Object-oriented design methods go beyond pure data modeling to define business processes as a collection of collaborating business components that have both state and behavior.

Given the impedance mismatch problem, the question arises, Why are object and relational technologies so frequently used together for the development of enterprise systems? An alternative to the relational database does exist in the form of the object database management system (ODBMS). However, ODBMS technology is taking time to mature and has yet to prove itself at the enterprise level. For this reason, almost all enterprise software uses a relational database.

Relational databases are a mature and proven technology and can trace their origins back to the 1970s when Dr. E. F. Codd, the father of the relational database, was working on defining his famous twelve rules.

Contrast this history with object-oriented technologies such as J2EE, which have only emerged into the mainstream in the past decade. Despite the frustrations the impedance mismatch causes object-oriented practitioners, relational databases are likely to be the standard form of database technology for enterprise software for the foreseeable future. Therefore, it is important to understand the constraints imposed by impedance mismatch and why the problem causes such headaches.

To appreciate the problems, consider the ideal behavior a J2EE architect would like to see from a persistence mechanism. Most well-designed object-oriented systems are constructed around a domain model. The domain model describes the various relationships between each object involved in the problem domain. Typically, the objects and the relationships between them are represented using a UML class diagram.

Ideally, the architect would like object instances from this domain model to be transparently persisted to and from the underlying database, although a good design would see a persistence layer residing between the business objects of the domain and the data store for decoupling purposes.

The importance of layers in software architecture is covered in Chapter 4.

The keyword here is that of transparency. True persistence transparency enables objects to be transferred between the database and the application without concern for the intricacies of how the state of an object is persisted to the data store.

Unfortunately, impedance mismatch problems make true transparency difficult to achieve if a relational database is the target. Let's consider some of the reasons this should be the case.

Mapping Database Types to Java Types

The first problem is relatively straightforward. The properties of a persistent Java class must be mapped to columns in a database table. The Java language and relational databases support subtly different basic types. For persistence to occur, the types must be mapped correctly to ensure no loss of data results from, for example, long Java strings being truncated in VARCHAR(20) columns. The mapping of types is relatively easy to manage. Mapping relationships, however, is considerably more complex.

Mapping Relationships

On the surface, the differences between object and relational technologies appear to be only superficial. After all, relational databases enable relationships to be specified between entities, while the object-oriented model defines relationships between classes.

Database designers use entity-relationship (ER) diagrams to describe relationships between database entities. ER diagrams are not part of the UML but are a recognized modeling notation. Relationships between entities in a database are modeled based on cardinality and enforced using foreign keys. Three possible relationship types can be modeled with relational technology:

One-to-one
One-to-many
Many-to-many

Note

The many-to-many relationship, although it may be modeled, is not supported by relational databases. Common practice in this case is to use a link, or association table, to split the many-to-many relationship.

The object-oriented designer has a richer set of relationships to draw upon. Information that can be both modeled in the UML class diagram and implemented in the Java code includes:

Relationship cardinality
Association by both composition and aggregation
Inheritance
Unidirectional and bidirectional relationships

Coercing these relationships to fit those of the relational database model is not a trivial task, and a direct mapping of object model to database schema can result in a suboptimal database design. This gives rise to the argument as to which technology, object-oriented or relational, should be driving the design of the data model.

Data Models Driving the Object Model

For the majority of enterprise systems, the data model drives the design of a system's object model. The reasons for taking this approach are as follows:

Object models tend to translate into inefficient database schemas.
Databases are often accessed by other enterprise systems not using object-oriented technology.
Database schemas are more rigid and harder to change than the object models, which are in the hands of the development teams.

Despite these reasons, life is considerably simpler for the development team if the object model translates directly into the underlying data architecture. This approach removes many of the headaches associated with mapping between the two paradigms and enables systems to be constructed swiftly.

Nevertheless, for the development of enterprise software, such arguments are likely to prove moot. As we discussed previously, enterprise data is a valuable commodity, and no data architect is likely to accept a data model from an object-oriented designer that does not comply with the best practices of data modeling techniques.