The Basics of Database Programming in Java

The recurring theme of Java-based technology is simplicity, portability, and robustness. Java addressed database programming with these goals in mind, and the result was JDBC. The JDBC API has been part of the core Java libraries since JDK 1.1. The simplicity of JDBC is evident in its small set of classes that are very intuitive to use. The portability allows Java programmers to be isolated from vendor-specific details of the database, meaning that EJBs can be created and reused irrespective of the database implementation. The robustness is provided by highly reliable JDBC drivers that translate JDBC API calls to native database calls.

Note

The JDBC API and drivers are explained in the "Understanding JDBC" section later in this chapter.

Database programming is always, at a fundamental level, a client/server model. The database server is a relational database, such as Microsoft SQL Server, Oracle, or Sybase. The Java client application uses JDBC as the interface to communicate with the database server. The messages sent to the database server are in a standard language called the Structured Query Language (SQL) . The syntax of SQL is a simple, structured, English format, which intuitively allows you to create, read, update, and delete the data held in the database server. Before you can use JDBC or SQL, you must have a conceptual understanding of the way data is stored in a relational database. When you have this knowledge, you can store and retrieve persistent data in the relational database using JDBC.

Note

All database vendors at a minimum support the ANSI SQL-92/SQL2 standard as a means of manipulating data stored in their databases. However, you will also find that each database vendor will provide a feature-rich environment for storing and manipulating data via enhanced SQL operators and functions, as well as their own specific SQL-type language for developing data-centric stored procedures. For this reason, unless you are developing for a specific type of database vendor, to ensure portability between databases your SQL statements should abide by the ANSI SQL-92/SQL2 standard.

Understanding Relational Databases

The structure of the way data is stored in a database is referred to as the schema . In other words, the schema is the abstract representation of the way the data is organized. The schema used by relational databases is very natural to object-oriented programmers. The relational database contains a set of tables. The columns in a table represent the data that is stored in that table. In Unified Modeling Language (UML) terminology, this would be a composition relationship . The database has tables, and each table has data. Using an analogy between JavaBeans and relational databases, the JavaBean stores data and provides getter and setter methods to access that data. With database programming, the relational database stores data and SQL commands are executed to perform the get and set on the stored data. The tables in a relational database may share a common key value that allows the tables to be joined for search operations. This simple data structure, a group of tables that share a common key field, provides all the flexibility necessary to store inventories, catalogs, business and customer information, as well as enterprise objects.

Designing the Data Model

The logical data model for a relational database looks very much like a UML class diagram. The entities on the data model diagram are the tables. Each table has a unique name and a list of column names . The columns can be thought of as attributes. Each row can be thought of as an instance holding values for each of the column "attributes." Typically, one column is used as a unique ID and is referred to as the primary key . Just as in UML class diagrams, the tables in the data model have relationships between each other. The relationship identifies a multiplicity level, which can be

One-to-One ” When one column value in a table relates to only one column value in another table, and vice versa. One-to-one relationships are rarely seen in "real-world" data models, as the relationships result in unnecessary division of data and the two tables are merged back together.
One-to-Many ” When one column value in a table relates to many column values in another table. One-to-many relationships are more common in "real-world" data models as they provide significant data reduction and allow for faster searching.
Many-to-Many ” When many column values in a table relate to many column values in another table. Many-to-many relationships are typically transformed into multiple one-to-many relationships, as they cannot be represented by means of a relational model. This transformation usually requires an assignment, also referred to as a link or pivot table.

An example of a one-to-one relationship is illustrated in Figure 13.1, where one author has authored exactly one book.

Figure 13.1. The one-to-one relationship is represented between entities where one column value in a table relates to only one column value in another table.

graphics/13fig01.gif

If, on the other hand, the author wrote many books, this relationship is represented by the one-to-many multiplicity, as illustrated in Figure 13.2.

Figure 13.2. The one-to-many relationship requires two tables that share a common key, in this case, the `author_id` .

graphics/13fig02.gif

The common author_id allows you to relate a row in the Books table to a row in the Authors table. Each of the "many" books contains a reference to the "one" author who wrote it. This relationship enables you to join the tables and search for all the books written by a particular author. It also gives the author_name of a particular book. Details of joining tables with the SQL WHERE clause will be discussed later in this chapter.

Finally, the database schema may require a many-to-many relationship between tables. For the many-to-many relationship, a cross-reference table is created. The contents of the cross-reference table are the key fields of the two tables that have a many-to-many relationship. For example, you can organize the books into various categories. Doing so requires a many-to-many relationship because you will allow a book to be included in multiple categories and each category has multiple books. The book you are now reading could be in the Enterprise, WebLogic, and Java categories. The Books and Categories have a many-to-many relationship. The Categories table also includes a rating column that will be used in the WHERE clause examples later in this chapter. A cross-reference table (also frequently called an association table), as illustrated in Figure 13.3, is required to contain this many-to-many relationship.

Figure 13.3. The cross-reference table contains the keys of the tables in a many-to-many relationship.

graphics/13fig03.gif

From these three tables, you can perform these searches:

Display all author names
Display all book names
Display all category names
Display all books for a particular author
Display all categories for a particular author
Display all books for a particular category
Display all categories for a particular book

In summary:

A primary key that consists of one column is likely to be the "one" in a one-to-many-relationship.
A table with a compound primary key of more than one column is likely to be the "many" in a one-to-many or a many-to-many relationship, where the join occurs on one of the columns of the primary key.
A foreign key is likely to be the "many" in a one-to-many relationship.

Understanding SQL

This introduction to SQL is intended to present the most commonly used features of the language. From the data model, you construct the relational database using SQL commands. The tables are created using the CREATE command and are populated with data using the INSERT and UPDATE commands. Data is removed from the tables using the DELETE command. You search the database by using the SELECT command. These are the five most common SQL commands and are described in the following sections. From this brief introduction, you will be able to use JDBC productively.

Note

Refer to other complete SQL references for more advanced features of the language.

SQL Datatypes

The schema for a relational database is represented by tables. Each table has a unique name and a list of columns. Each column has a name, type, and optional modifier. The SQL2 standard has a set of datatypes. Table 13.1 provides the correlation between SQL types and Java types. This information will help you choose the correct column types for your tables.

Table 13.1. The SQL Type “to “Java Type Mapping Table

SQL Type	Java Type
`BIT`	`Boolean`
`TINYINT`	`Byte`
`SMALLINT`	`short`
`INTEGER`	`Int`
`BIGINT`	`Long`
`REAL`	`float`
`DOUBLE`	`double`
`VARBINARY`	`byte[]`
`VARCHAR`	`java.lang.String`
`DATE`	`java.sql.Date`
`TIME`	`java.sql.Time`
`TIMESTAMP`	`java.sql.Timestamp`
`BLOB`	`java.sql.Blob`
`CLOB`	`java.sql.Clob`
`ARRAY`	`java.sql.Array`
`REF`	`java.sql.Ref`
`STRUCT`	`java.sql.Struct`
`NUMERIC`	`java.math.BigDecimal`

All the Java primitive types map to an SQL type, and the Java String maps to VARCHAR . The java.sql package provides additional classes to map the remaining SQL types to Java. The two types that are particularly interesting are BLOB and CLOB .

BLOB and CLOB are SQL datatypes for storing large amounts of data. BLOB , which stands for Binary Large OBject , references large amounts of binary data. CLOB , which stands for Character Large OBject , references large amounts of text data. These types have distinct advantages over VARBINARY and VARCHAR when large amounts of data are being stored:

BLOB and CLOB have a much larger maximum data size. The exact size is dependent on the database vendor implementing the BLOB or CLOB data types, for example Oracle CLOB and BLOB data types can be 4GB in size.
Clients can stream the data rather than receive it in one big chunk .

The streaming capability has a significant effect on performance. Any data field size greater than 100KB is much better served using BLOB or CLOB . Rather than having the query blocked while all the data is being passed across the network, the streaming aspect allows the client to do lazy evaluation and not get the data until it actually needs that data.

Note

Refer to the section "Working with BLOB and CLOB Data Types" later in this chapter for further information.

SQL Commands

You create the tables in the database using the standard SQL CREATE TABLE command. The data in the table is modified using INSERT , UPDATE , and DELETE . You can search the database by using the SELECT command. These are the five most commonly used commands in SQL. The UPDATE , DELETE , and SELECT commands introduce the WHERE clause. Again, this section provides only a very brief introduction to SQL, so you should refer to the documentation from your database vendor for more detailed information on SQL commands.

WHERE Clause The WHERE clause is a Boolean statement that compares column values with matching criteria. The keyword WHERE is followed by a matching pattern that is used to filter which rows are affected by the command. The simplest WHERE clause is to compare the values in a column against a specific value. For example, if you want to search for all the books by your favorite author, you use the WHERE clause with the SELECT command:

 WHERE author_name = 'MY_FAVORITE_AUTHOR'

Tables are joined by having a common key field as a column in multiple tables. The one-to-one, one-to-many, and many-to-many relationships are established this way. For example, the Books and Authors tables must be joined for you to be able to search for all the books by your favorite author. Joining tables effectively allows you to create a new temporary table that includes columns from multiple tables. The WHERE clause defines the matching criteria for which rows to select.

Note

Refer to the example shown with the SELECT command later in this chapter to see how to join tables.

The WHERE clause also supports much more complex pattern matching. Some additional features are the wildcard character % , which will match a string of characters . For example, the WHERE clause for finding all names that start with C would be stated as follows :

 WHERE name LIKE 'C%'

You can create extensive WHERE clauses using AND , OR , and IN . The AND keyword is equivalent to the && logical operator in Java, and OR is equivalent to the logical operator in Java. The IN keyword is a unique operator that allows you to perform subquery operations. This capability is necessary when no direct relationship exists between the tables; therefore, a simple join is not possible. For example, the database may also contain a table of publishers with a one-to-many relationship between publishers and books. There is still no direct relationship between authors and publishers. In this case, you must use a subquery. An example of a subquery using the IN keyword is provided in the Select section later in this chapter.

As you can see, there are endless possibilities with the patterns that can be matched with the WHERE clause. Additional examples are covered with the SELECT command.

CREATE You use the CREATE command to create tables in the database. Each table has a name and a list of columns. Each column has a name, type, and optional modifier. The name is of your choosing, and the type must be a valid SQL datatype. The two most common modifiers are shown in Table 13.2.

Table 13.2. Common Column Modifiers

Column Modifier	Description
`PRIMARY KEY`	Indicates this column is the primary key
`NOT NULL`	Indicates this column cannot have a null value

Note

You should always verify the column modifiers your database allows, for example Oracle also provides the DEFAULT , CHECK , and UNIQUE modifiers to enforce referential integrity and data validation for database tables.

The syntax of the CREATE command is

 CREATE TABLE  table_name  (  column_1_name column_1_type column_1_modifier  ,  column_n_name column_n_type column_n_modifier  )

Note

The column modifier is an optional specification. The column will only have a column modifier if it is the PRIMARY_KEY column, or the value for the column must be NOT_NULL .

INSERT You use the INSERT command to add data to an existing table. The table must have been created previously. The data values must match the datatype for the column. The command must be repeated for each row of data being inserted.

The syntax of the INSERT command is

 INSERT INTO  table_name  (  column_1_name, column_n_name  ) VALUES (  value_1  ,  value_n  )

UPDATE You use the UPDATE command to modify data that was inserted into the table. The WHERE clause matches which rows are modified. All rows that match the WHERE clause are updated. If the WHERE clause is not specified, every row in the table is updated, which is rarely what is desired.

The syntax of the UPDATE command is

 UPDATE  table_name  SET  column_1_name  =  value_1   column_n_name  =  value_n  WHERE  column_name  =  value

DELETE You use the DELETE command to delete rows from the table. The WHERE clause matches which rows are deleted. All rows that match the WHERE clause are deleted. If the WHERE clause is not specified, every row in the table is deleted, so you should be careful about using this command.

The syntax of the DELETE command is

 DELETE FROM  table_name  WHERE  column_name  =  value

SELECT You use the SQL commands described in the preceding sections to create and modify the database, but unless you can query the database, the data does not have much value. You use the SELECT command to search for rows in the database. The WHERE clause in the SELECT statement determines which rows are selected. If the WHERE clause is not specified, every row in the table is selected. In this case, that result may be exactly what you want.

The syntax of the SELECT command is

 SELECT  column_1_name  ,  column_n_name  FROM  table_name  WHERE  column_name  =  value

The topic of joining tables was introduced in the discussion of the WHERE clause. You can generate a search result that selects columns from joined tables. A key factor is that the tables share some common value that can be compared. The syntax of a SELECT command that joins tables is

 SELECT table_1.column_name, table_2.column_name FROM table_1, table_2 WHERE table_1.key_column = table_2.key_column

For example, to select books with the corresponding author, you must join the Books table to the Authors table using the author_id as the key column:

 SELECT books.book_name, authors.author_name FROM books, authors WHERE books.author_id = authors.author_id

The term subquery was also introduced in the discussion of the WHERE clause. Using a subquery is necessary when a search is being performed and there is no direct relationship between the tables. To search for all the authors of a particular publisher, you can use the following SELECT command:

 SELECT authors.author_name FROM authors, books WHERE books.author_id = authors.author_id AND books.book_id IN ( SELECT publishers.book_id FROM publishers, books WHERE publishers.publisher_name = 'MY_PUBLISHER' AND books.publisher_id = publisher.publisher_id )

Some additional keywords and clauses are used with the SELECT command. The two most common are the COUNT keyword and the ORDER BY clause. The COUNT keyword returns the number of rows that matched the search without returning the actual data. To find out how many books are in your database, you can use this SELECT command:

 SELECT count(*) FROM BOOKS

To have the results sorted, you use the ORDER BY clause with the SELECT command. To sort all the authors in the database by name, use the following SELECT command:

 SELECT * FROM authors ORDER BY author_name

The performance of your application is greatly influenced by how well you create your queries. You should make your WHERE clauses specific enough to return only the data in which you are interested. Do not select the entire table and parse through it on the client side; let the database server do some of the work for you. It is designed to be more than just a storage place for data. The performance issue is the primary reason to become knowledgeable about SQL database programming.

For a final example, you will perform a correlated subquery to select the five books from the database with the highest rating. With a correlated subquery, an entire SELECT statement is used in the WHERE clause of an outer SELECT . The difference here is that the subquery is now specific to each row in the outer query. This is a far more efficient way to search the database because much less data is being passed from the server to the client. Here, we will show how a well thought-out WHERE clause forces the database server to do most of the work, and your client receives only the data that it really wants.

The Categories table has the rating column, with additional columns to allow join relationships for the book and author. To select the book_name for the five highest rated books, you use the following:

 SELECT b.book_name, a.author_name FROM authors a, books b, categories c WHERE c.book_id = b.book_id AND c.author_id = a.author_id AND 5 >= ( SELECT count(*) FROM categories cc WHERE cc.rating >= c.rating );

SQL Transactions

The action to insert or update data into the database often requires multiple SQL statements. For example, you want to insert all the books for a particular author. The concept is referred to as a transaction , which is a group of SQL commands that either complete as a group or fail as a group. The SQL commands to control transactions are COMMIT and ROLLBACK .

By default, the database is in auto-commit mode, which means each SQL command is considered a transaction and is auto-committed. If auto-commit is disabled, the database server waits for an explicit COMMIT command before it modifies any entries in the database. If a ROLLBACK command is issued, all SQL commands since the last COMMIT are discarded. In JDBC, commit and rollback are performed with the Connection object.

The Basics of Database Programming in Java

Understanding Relational Databases

Designing the Data Model

Figure 13.1. The one-to-one relationship is represented between entities where one column value in a table relates to only one column value in another table.

Figure 13.2. The one-to-many relationship requires two tables that share a common key, in this case, the author_id .

Figure 13.3. The cross-reference table contains the keys of the tables in a many-to-many relationship.

Understanding SQL

SQL Datatypes

Table 13.1. The SQL Type “to “Java Type Mapping Table

SQL Commands

Table 13.2. Common Column Modifiers

SQL Transactions

Figure 13.2. The one-to-many relationship requires two tables that share a common key, in this case, the `author_id` .