Connections

The traditional and most popular way to connect via JDBC is with the DriverManager class, for example:

 Class.forName("sun.jdbc.odbc.JdbcOdbcDriver"); con = DriverManager.getConnection(       "jdbc:odbc:ocelot",                 // URL       "PETER",                            // user       "SESAME");                          // password

In fact, though, there is a lower accessible level. When you call get Connection , each registered driver is polled until the first one that responds for the given Uniform Resource Locator (URL) is found. This method is used in the Driver class:

 con = myDriver.connect(       "jdbc:odbc:ocelot",       "UID=PETER;PWD=SESAME");

You can bypass DriverManager , and call the Driver connect method directly. If there are 50 JDBC registered drivers and you want the last one registered, a direct call will be a bit faster.

The newerand recommendedway to connect via JDBC is with the DataSource class, for example:

 DataSource ds = (DataSource)ctx.lookup("jdbc/ocelot"); Connection con = ds.getConnection("PETER","SESAME");

For this to work, the name "jdbc/ocelot" must be registered with a Java Naming and Directory Interface (JNDI). Because JNDI naming is probably somebody else's job, and because it doesn't always work, try the DataSource method first, and if that throws an exception, then call the DriverManager method.

DataSource and DriverManager both deliver the same thing: a Connection . The difference between the two is that the DataSource method allows for connection pooling.

Connection Pooling

Assume you have a Java application and a service named DM . DM is always running. Here's what happens with connection pooling . ^[2]

^[2] Sometimes connection pooling also happens at the middle-tier level (for example, IBM WebSphere implements it)but middle- tier connection pooling isn't part of JDBC 2.0.

The application calls for a connection. DM receives the call. DM gets a Connection to the Data Source the usual way (by calling Driver . connect ), and returns it to the application (see Figure 13-1). But first DM stores a copy of the Connection in its own memory.

Figure 13-1. Application, DM, and data source linked
The application calls disconnect. DM receives the call, and returns an OK to the application. But DM doesn't disconnect from the Data Source (see Figure 13-2).

Figure 13-2. Application and DM connection broken
The application calls for a connection again. DM receives this call. DM sees that an extant Connection already is in its memory, so instead of calling Driver.connect , DM returns the existing Connection to the application.

To make it complicated, there's one more wrinkle: the application in the third step can be a different application. That is, Application #2 can reuse a Connection after Application #1 has stopped . In other words, DM keeps a "pool" of Connection s rather than a mere "cache" of Connection s.

The concept of connection pooling brings up two questions:

Does this mean that a Connection with the data source is immortal?

AnswerNo. In practice, a Connection is disconnected if it is dormant for a long time.
If there's more than one Connection , how will DM get the right one?

Answer DM stores the URL and the user strings along with the Connection .

Suppose connecting takes three secondswhich is awful but within the range of possibilities. In that case, connecting 100 times takes 300 seconds. But with connection pooling, connecting 100 times takes only 3 seconds, as long as there was no true disconnection.

You could implement connection pooling yourself, by caching Connection s in some persistent shared place (perhaps an Enterprise Bean ). Really, though, it's better to let the system handle the administrative details. So your sole job with connection pooling is:

Allow it to happen by using the right classes.
If you connect and disconnect and reconnect , use precisely the same parameter values every time.
If you know that a particular Connection will be needed only once, avoid connection pooling.

Pooling is good for Web use, but don't expect Connection settings to be preserved.

Connection Settings

A new Connection has some fields that are set to default values. These default values are often inappropriate for the sort of work planned, so JDBC provides two Connection methods that you can use to change inappropriate settings:

setAutoCommit

Changes the auto-commit flag.
setTransactionIsolation

Raises or lowers the transaction isolation level.

Auto-commit

Assume you have a Connection called con . Now consider this instruction:

 con.setAutoCommit(false);

In JDBC, the auto-commit flag is true by default. There is a simple explanation of what that means, and there is also an arcane explanation. While the simple explanation serves well as a first take, the arcane explanation is truer and contains extra information that you need to know.

The Simple Rules for Auto-Commit:

If auto-commit is true , COMMIT happens automatically after every SQL statement. If auto-commit is false , it doesn't.
The Arcane Rules for Auto-Commit:

The arcane explanation is shown as pseudocode in Listing 13-2. In the pseudocode, note that "method completion" means either successful or unsuccessful completion. In theory, this means COMMIT happens just be fore the return from executeUpdate ; in practice, some drivers issue an asynchronous COMMIT request and return before the request finishes.

Listing 13-2 The arcane rules for auto-commit pseudocode

 if (auto-commit is true) {   if (method is executeUpdate) {     COMMIT happens upon method completion }   if (method is executeQuery) {     // NB: "closed" is defined later in this chapter     COMMIT happens when result set is closed }   if (method is execute) {     action varies; depends on SQL statement(s) } } if (auto-commit is false) {   if (schema-change statement e.g. CREATE/ALTER/DROP) {     COMMIT happens upon statement completion, maybe }   else {     nothing happens } }

A default setting of true for auto-commit is not SQL Standard-compliant, and some vendors recommend that you set the flag to false . Here's their reasoning:

COMMIT causes transaction end, and with some DBMSs, a transaction end causes prepared statements to become unprepared. For example, this sequence won't work with PostgreSQL:

 PreparedStatement pstmt = con.prepareStatement(                           "UPDATE Table1 SET ..."); pstmt.executeUpdate();  // COMMIT happens automatically pstmt.executeUpdate();  // fails, pstmt no longer prepared

Portability

None of the Big Eight cancels prepared statements when you COMMIT.

COMMIT is expensive, so when you have two or more data-change statements it's always faster to delay the COMMIT until after the final statement finishes.
Auto-commit is inappropriate for distributed transactions. (Distributed transactions occur when two DBMSs are trying to work in concert.)

On the other hand, if auto-commit is false , you must explicitly call the Connection 's commit methodand that's significant because it causes a message to be sent over the network.

If you want to prevent commits from happening too soon, and yet get the performance advantage of auto-commit, here's how.

First, read the arcane rules (Listing 13-2) again and notice that the COMMIT happens on method completion. This means you should do multiple data changes with a single method call and a compound SQL statement, like this:
```
 Statement stmt = con.createStatement(                  "BEGIN UPDATE ... UPDATE ... END"); stmt.executeUpdate(); 
```

Second, remember that the setAutoCommit method can be turned off and on at any time, like this:

 con.setAutoCommit(false);  // turn auto-commit off stmt.executeUpdate();      // do statement, delay COMMIT con.setAutoCommit(true);   // turn auto-commit on stmt.executeUpdate();      // do statement, do COMMIT

The bottom line: People who don't understand the arcane rules should turn auto-commit off and use con.commit() . Now that you know the rules, though, you can control when commits happen without using con.commit() .

Isolation level

Again, assume you have a Connection called con . This time, consider these four variations of the same instruction:

 con.setTransactionIsolation(     TRANSACTION_READ_UNCOMMITTED); con.setTransactionIsolation(     TRANSACTION_READ_COMMITTED); con.setTransactionIsolation(     TRANSACTION_REPEATABLE_READ); con.setTransactionIsolation(     TRANSACTION_SERIALIZABLE);

The appropriate isolation setting for a transaction might be something higher or lower than READ COMMITTED, but there is one definite matter: You have some idea of what your program is going to do, but the DBMS has no idea. So the DBMS's default choice of isolation level is probably a poorer choice than the one you'd make. (The choice of isolation level, and the related matter of read-only transactionswhich is controllable with the con.setReadOnly methodis one we consider in Chapter 15,"Locks.")

Connections and DBMS Info

Given a Connection con , you can get an object dbmd with:

 DatabaseMetaData dbmd = con.getMetaData();

The term "database metadata" is misleading because your first need is static information about the DBMS not the database. For example, suppose that you want to create a table with a VARCHAR columnbut if the DBMS is Oracle, you want it to be a VARCHAR2 column instead. So you get the dbmd information before you pick the command, like this:

 String s = dbmd.getDatabaseProductName();       // get DBMS name   if (strcmp(s,"Oracle") {   Statement stmt = con.createStatement(                    "CREATE TABLE T1 col1 VARCHAR2(10) ... "); }   else {   Statement stmt = con.createStatement(                    "CREATE TABLE T1 col1 VARCHAR(10) ... "); }

Tip

Since this is static information, you can retrieve String s once, when you connect, and keep it in memory until you disconnect. Caching static information, so that you don't have to send more than one query to the DBMS for the same information, is always good.

There are well over 80 dbmd methods to retrieve DBMS information. That's daunting. To save you from worrying about all of them, we ran the most important methods on the Big Eight. The results are shown in Tables 13-1 through 13-5. We believe these DBMS information tables will be useful because they show you what the least common denominators/worst-case scenarios are for the dbmd methods we checked.

To keep the DBMS information tables compact, we abbreviated the headings.

Table 13-1 contains 26 "get" callsthat is, where the heading shows the call as CatalogSeparator , the actual name is getCatalogSeparator .
Table 13-2 contains four "nulls" callswhere the heading shows the call as AreSortedAtEnd , the actual name is nullsAreSortedAtEnd .
Table 13-3 contains six "stores" callswhere the heading shows the call as LowercaseIdentifiers , the actual name is storesLowercaseIdentifiers .
Table 13-4 contains 41 "supports" callswhere the heading shows the call as AlterTableWithAddColumn , the actual name is supportsAlterTableWithAddColumn .
And finally, Table 13-5 contains one "is" callthe actual name of the call is isCatalogAtStart , as shown.

In each table, a column is (a) "T" if the DBMS's response to the call is true , (b) "F" if the DBMS's response to the call is false , and (c) remains blank if the DBMS either returned an empty string or did not respond to the call. For Table 13-1, a "0" in a column means there is either no limit, or the DBMS did not return a known limit for that call.

Table 13-1. DBMS Information According to DatabaseMetaData Methods; "get" Calls

Actual Name Starts with "get"
get Call	IBM	Informix	Ingres	InterBase	Microsoft	MySQL	Oracle	Sybase
CatalogSeparator	.	.	.	.	.	.	@	.
CatalogTerm		D			d	d	DL	d
DefaultTransactionIsolationLevel	R-C	R-C	S	R-C	R-C	R-C	R-C	R-R
IdentifierQuoteString	"	"	"	"	"	'	"	"
MaxCatalogNameLength		128			128			30
MaxCharLiteralLength	4000				524288
MaxColumnNameLength	18	128	32	32	128	64	30	30
MaxColumnsInGroupBy	500		300					16
MaxColumnsInIndex	16	16	300		16	16		31
MaxColumnsInOrderBy	500		300					16
MaxColumnsInSelect	500		300		4096		1000
MaxColumnsInTable	500		300		1024		1000	8106
MaxConnections
MaxCursorNameLength	18	128	64	18	128	64	30	18
MaxIndexLength	255	255			900	120
MaxProcedureNameLength	256	128	32	32	134		30	30
MaxRowSize	4005	32767	2008		8060			8106
MaxSchemaNameLength	8	32	32		128		30	30
MaxStatementLength	32765		30000		524288	8192
MaxStatements			1		1			1
MaxTableNameLength	18	128	32	32	128	64	30	30
MaxTablesInSelect			30		32	32
MaxUserNameLength	128	32	32	128	128	16	30	30
ProcedureTerm	sp	Pr	dp	PR	sp	p	Pr	SP
SchemaTerm	s	O	U		o	o	O	o
SearchStringEscape	\	\	\	\	\	\	\	\

To use the tables, check for the appropriate call when you're constructing a SELECT statement. For example, if this is your SELECT:

 SELECT column1, SUM(column2), column3   FROM Table1   GROUP BY column1, column3   ORDER BY column1, column3, column2

check these three lines in the DBMS information tables:

 getMaxColumnsInGroupBy         // see Table 13-1 getMaxColumnsInOrderBy         // see Table 13-1 supportsGroupBy                // see Table 13-4

Checking the getMaxColumnsInGroupBy and GetMaxColumnsInOrderBy calls will reassure you that all DBMSs allow at least 16 columns in GROUP BY and ORDER BY clausesso you'll know the SELECT doesn't have too many. And checking the supportsGroupBy call will tell you that all DBMSs allow GROUP BYso GROUP BY in the SELECT is always legal. With this information, you can save yourself the bother of writing special code in case the DBMS rejects your syntax. Caveat: Sometimes a DBMS will return incorrect information. For example, here is a method to ask about standard SQL-92 conformance:

 boolean b = dbmd.supportsANSI92EntryLevelSQL();

MySQL will return true for this call, but the correct answer is false because MySQL doesn't support several SQL-92 Entry Level requirements. Here's another example: Sybase returns false for every one of the stores calls shown in Table 13-3. This is obviously incorrectSybase must store identifiers in some way!

Notes on Table 13-1:

getCatalogTerm row

This row is "d" if the response is " database ,""DL" if the response is " Database Link ," and "D" if the response is " Database ." A blank means the DBMS did not respond to the call.

getDefaultTransactionIsolationLevel row

The following abbreviations are used for the isolation levels:

 R=U for READ UNCOMMITTED / IsolationLevel(1) R=C for READ COMMITTED   / IsolationLevel(2) R=R for REPEATABLE READ  / IsolationLevel(4) S for   SERIALIZABLE     / IsolationLevel(8)

getMaxColumnsInGroupBy row
- For Informix, InterBase, Microsoft, MySQL, and Oracle, our tests showed it was possible to group at least 20 columns.
- For Sybase, our tests showed it was possible to group at least 20 columns. This differs from Sybase's response to getMaxColumnsInGroupBy , which returns 16 .
getMaxColumnsInOrderBy row
- For InterBase and Oracle, our tests showed it was possible to sort up to 254 columns.
- For Informix, Microsoft, and MySQL, our tests showed it was possible to sort at least 1,000 columns.
- For Sybase, our tests showed it was possible to sort up to 31 columns. This differs from Sybase's response to getMaxColumnsInOrderBy , which returns 16 .
getProcedureTerm row

This row is "sp" if the response is " stored procedure ," "dp" if the response is " database procedure ," "PR" if the response is " PROCEDURE ," "p" if the response is " procedure ," "Pr" if the response is " Procedure ," and "SP" if the response is " Stored Procedure ."
getSchemaTerm row

This row is "s" if the response is " schema ", "U" if the response is " Username ," "o" if the response is " owner ," and "O" if the response is " Owner ." A blank means the DBMS did not respond to the call.

Table 13-2. DBMS Information According to DatabaseMetaData Methods; "nulls" Calls

Actual Name Starts with "nulls"
nulls Call	IBM	Informix	Ingres	InterBase	Microsoft	MySQL	Oracle	Sybase
AreSortedAtEnd	F	F	F	F	F	F	F	F
AreSortedAtStart	F	F	F	F	F	T	F	F
AreSortedHigh	T	F	T	T	F	F	F	T
AreSortedLow	F	T	F	F	T	F	T	F

Notes on Table 13-2:

For InterBase, our tests showed that NULLs sort At Endthat is, NULLs come out at the end of a sorted list when you ORDER BY ... ASC as well as when you ORDER BY ... DESC . This differs from InterBase's response to (a) NullsAreSortedAtEnd , which returns false and (b) NullsAreSortedHigh , which returns true .
For MySQL and Sybase, our tests showed that NULLs sort Lowthat is, as if NULLs are less than all other values. In the first case, this differs from MySQL's response to (a) NullsAreSortedLow , which returns false and (b) NullsAreSortedAtStart , which returns true . In the second case, this differs from Sybase's response to (a) NullsAreSortedLow , which returns false and (b) NullsAreSortedHigh , which returns true .
For Oracle, our tests showed that NULLs sort Highthat is, as if NULLs are greater than all other values. This differs from Oracle's response to (a) NullsAreSortedHigh , which returns false and (b) NullsAreSortedLow , which returns true .

Table 13-3. DBMS Information According to DatabaseMetaData Methods; "stores" Calls

Actual Name Starts with "stores"
stores Call	IBM	Informix	Ingres	InterBase	Microsoft	MySQL	Oracle	Sybase
LowercaseIdentifiers	F	T	T	F	F	F	F	F
LowercaseQuotedIdentifiers	F	T	T	F	F	F	F	F
MixedCaseIdentifiers	F	F	F	F	T	T	F	F
MixedCaseQuotedIdentifiers	F	F	F	F	T	F	F	F
UppercaseIdentifiers	T	F	F	T	F	F	T	F
UppercaseQuotedIdentifiers	F	F	F	F	F	F	F	F

Table 13-4. DBMS Information According to DatabaseMetaData Methods; "supports" Calls

Actual Name Starts with "supports"
supports Call	IBM	Informix	Ingres	InterBase	Microsoft	MySQL	Oracle	Sybase
AlterTableWithAddColumn	T	T	F	T	T	T	T	T
AlterTableWithDropColumn	F	T	F	T	F	T	F	F
ANSI92EntryLevelSQL	T	T	T	T	T	T	T	T
ANSI92FullSQL	F	F	F	F	F	F	F	F
ANSI92IntermediateSQL	F	F	F	F	F	F	F	F
ColumnAliases	T	F	F	T	T	T	T	T
CoreSQLGrammar	T	T	T	T	T	T	T	F
CorrelatedSubqueries	T	T	T	T	T	F	T	T
DataDefinitionAndDataManipulationTransactions	T	F	T	T	T	T	T	F
DataManipulationTransactionsOnly	F	F	F	F	F	T	T	T
ExpressionsInOrderBy	T	F	F	F	T	T	T	T
ExtendedSQLGrammar	T	F	F	T	F	F	F	F
FullOuterJoins	F	F	F	F	F	T	F	F
GroupBy	T	T	T	T	T	T	T	T
IntegrityEnhancementFacility	T	F	F	T	T	F	F	T
LikeEscapeClause	T	T	T	T	T	T	T	F
MinimumSQLGrammar	T	T	T	T	T	T	T	T
MixedCaseIdentifiers	F	T	T	F	F	F	T	T
MixedCaseQuotedIdentifiers	T	T	T	T	F	T	F	F
nonNullableColumns	T	T	T	T	T	T	T	T
OpenCursorsAcrossCommit	T	F	F	T	F	T	T	F
OpenCursorsAcrossRollback	F	F	F	T	F	T	T	F
OrderByUnrelated	F	T	T	F	F	F	F	F
OuterJoins	T	T	T	T	T	T	T	T
PositionedDelete	T	T	T	T	T	T	T	T
PositionedUpdate	T	T	T	T	T	T	T	T
SelectForUpdate	T	T	T	T	T	F	F	T
StoredProcedures	T	T	T	T	T	F	T	T
SubqueriesInComparisons	T	T	T	T	T	F	T	T
SubqueriesInExists	T	T	T	T	T	F	T	T
SubqueriesInIns	T	T	T	T	T	F	T	T
SubqueriesInQuantifieds	T	T	T	T	T	F	T	T
TableCorrelationNames	T	T	T	T	T	T	T	T
TransactionIsolationLevel(0)	F	T	F	F	F	F	F	F
TransactionIsolationLevel(1)	T	T	T	F	T	F	F	T
TransactionIsolationLevel(2)	T	T	T	T	T	T	T	T
TransactionIsolationLevel(4)	T	F	T	T	T	F	F	T
TransactionIsolationLevel(8)	T	T	T	T	T	F	T	T
Transactions	T	F	T	T	T	F	T	T
Union	T	T	T	T	T	F	T	T
UnionAll	T	T	T	T	T	F	T	T

Table 13-5. DBMS Information According to DatabaseMetaData Methods; "is" Calls

Actual Name as Shown
is Call	IBM	Informix	Ingres	InterBase	Microsoft	MySQL	Oracle	Sybase
isCatalogAtStart	F	T	F	T	T	T	F	T

The Bottom Line: Connections

The traditional and most popular way to connect via JDBC is with the DriverManager class, but there is a lower accessible level: the Driver class. A direct call to the Driver class is faster if there are many registered drivers and you want one of the last ones registered.

The newerand recommendedway to connect via JDBC is with the DataSource class. DataSource and DriverManager both deliver the same thing: a Connection . The difference between the two is that DataSource allows for connection pooling.

If connecting takes 3 seconds, then connecting 100 times takes 300 seconds. With connection pooling, connecting 100 times takes only 3 seconds, as long as there was no true disconnection.

Your sole job with connection pooling is to allow it to happen by using the right classes. If you connect and disconnect and reconnect, use precisely the same parameter values every time.

If you know that a particular Connection will be needed only once, avoid connection pooling.

In JDBC, the auto-commit flag is true by default.

When auto-commit is true , COMMIT happens automatically after every SQL statement. When auto-commit is false , you must explicitly call the Connection 's commit methodand that's significant because it causes a message to be sent over the network.

The Arcane Rules for Auto-Commit are important. Remember them.

COMMIT is expensive, so when you have two or more data-change statements it's always faster to delay the COMMIT until after the final statement finishes.

Auto-commit is inappropriate for distributed transactions.

People who don't understand the Arcane Rules for Auto-Commit should turn auto-commit off and use con.commit() . Now that you know the rules, though, you can control when COMMIT happens without using con.commit() .

Driver Types

JDBC drivers are grouped into four types depending on their neutrality (a vendor-neutral driver can access more than one DBMS brand) and on their use of non-Java code. A driver's type is determined by the following criteria:

  Driver     Vendor-   Vendor-   Needs Non-Java     100%   Type       Neutral     Specific    DLL                Java  Type 1     Yes         -           Yes                - Type 2      -          Yes         Yes                - Type 3     Yes         -           -                  Yes Type 4      -          Yes         -                  Yes

A Type 1 driver, also called a JDBC-ODBC bridge, works with every DBMS. Its calls need to go through a conversion process: JDBC to ODBC to native API code. A famous example of a JDBC-ODBC bridge is Sun.jdbc.odbc.JdbcOdbcDriver.

A Type 2 driver is theoretically somewhat faster than a Type 1 driver because call conversion is more direct: JDBC to native API code. There are no well-known examples of a Type 2 driver.

A Type 3 driver, also called a net driver, comes in two parts : (a) a small Java driver that can be downloaded by applets and (b) a middleware package that receives from the driver, translates , and passes on to the server. Sometimes the middleware package uses a Type 1 or Type 2 driver. Famous examples of Type 3 drivers are Merant's SequeLink and IDS Software's IDS Driver.

A Type 4 driver, also called a two-tier driver, is locked into a particular DBMS brand, which the driver interfaces to at a low level by emulating what the DBMS's native client library passes in network messages. Famous examples of Type 4 drivers are Ashna's JTurbo and I-net's Sprinta 2000.

Different sources have noted that Type 1 involves a double conversionso its function overhead must be slow, while Type 4 is written entirely in Javaso it must be slower than native code, while Type 3 is smallestso its load time must be fast, and on and on. Such factors affect the client alone. Thus, while such factors could affect response time (the time elapsed between the user pressing Enter and the reply appearing on the screen), they do not affect throughput (the number of requests that the server handles per hour ).

Since throughput is what counts, it turns out that the driver type alone is not what performance depends on. For example, benchmark tests have found Sun's Type 1 JdbcOdbcDriver to be as fast and as reliable as some nonfree commercial drivers. However, performance does strongly depend on the brand name. The best JDBC driver runs up to three times faster than the worst. The reasons that one driver can beat another are (a) caching or pooling so that requests to the target DBMS are reduced and (b) good knowledge of the target DBMS and available commands for it. In other words, SQL performance tuning!

Cache static information, so that you don't have to send more than one query to the DBMS for the same information.