10.2 Batch Processing
The technique of combining many small operations together into one large batch has a proud heritage in computing, and dates all the way back to the time when jobs were submitted to computers in the form of punched cards. You will be pleased to know that SQLJ has similar facilities for batching database operations.
You can get SQLJ to buffer identical SQL INSERT, UPDATE, or DELETE statements in your SQLJ programs into a batch, which is then submitted to the database. This reduces the number of round trips made to the database over the network, and may lead to an improvement in performance.
| A SQLJ program may only take advantage of batching when the same SQL statement is run repeatedly. |
|
One candidate for batching is a SQL statement contained within a loop. Another is a method that is run repeatedly and contains a SQL statement. In this section, I explain how to control and run batches in your SQLJ programs.
10.2.1 Controlling Batching
Batching is a feature you control through an execution context. (Execution contexts were discussed in Chapter 8.) By default, batching is not enabled. To enable batching for an execution context, call the method setBatching( ). This method accepts a Boolean true or false value indicating whether or not batching should be enabled. The following statements first create an execution context named my_exec_context, and then enable batching for that context:
ExecutionContext my_exec_context = new ExecutionContext( ); my_exec_context.setBatching(true);
With batching enabled, SQLJ accumulates statements into groups and executes those statements a group at a time. In the following example, 500 rows are inserted into a table named batch_table using the execution context my_exec_context. The table batch_table contains a single integer column named value:
for (int count=0; count < 500; count++) { #sql [my_exec_context] { INSERT INTO batch_table (value) VALUES (:count) }; }
Because batching has been enabled for my_exec_context, the INSERT statements in this example are grouped in batches. As each batch fills, it is submitted to the database for execution. By default, each batch may contain up to 20 identical database operations. So for the example shown here, there will be 25 batches, with each batch containing 20 INSERT statements. The 25 batches are executed implicitly, one after another, after each batch reaches its limit of 20 identical statements. I will talk more about the execution of batches in the next section.
Batching may also be enabled for the default execution context. For example:
DefaultContext.getDefaultContext().getExecutionContext( ). setBatching(true);
To disable batching, simply call the setBatching( ) method passing the Boolean value false to the method. The following statement disables batching for my_exec_context:
my_exec_context.setBatching(false);
Similarly, the following statement disables batching for the default execution context:
DefaultContext.getDefaultContext().getExecutionContext( ). setBatching(false);
To determine if batching is enabled, use the isBatching( ) method. This method returns a Boolean true or false value indicating whether or not batching is enabled for the execution context. The following statement retrieves the status of batching for the execution context my_exec_context:
boolean is_batching = my_exec_context.isBatching( );
Similarly, the following statement retrieves the status of batching for the default execution context:
boolean is_batching = DefaultContext.getDefaultContext( ). getExecutionContext().isBatching( );
10.2.2 Running Batches
Batches of SQL statements may be run explicitly or implicitly. In the example in the previous section, the batches of INSERT statements were executed implicitly once each batch was filled. You can also run the current batch of statements explicitly by using the execution context's executeBatch( ) method; this submits the statements buffered so far in the batch to the database for execution. For example, the following statement explicitly runs the current batch for my_exec_context:
int [] batch_update_results = my_exec_context.executeBatch( );
The executeBatch( ) method returns an array of int values that indicate whether the statements in the batch were run successfully, as indicated by a value of -2. An array in Java starts with an index of 0, so the value at array index corresponds to the first SQL statement in the batch. The following example calls the executeBatch( ) method for the default execution context:
int [] batch_update_results = DefaultContext.getDefaultContext( ). getExecutionContext().executeBatch( );
As I mentioned before, a batch may consist only of identical SQL statements. If a batch has been started and a different SQL statement is encountered, then that batch is run implicitly. A batch is also run implicitly when the batch is full. By default, each batch may contain up to 20 identical SQL statements. However, you'll soon see how to change that batch size to match your requirements.
The following example shows how batches are run implicitly. Assume that batching is enabled for my_exec_context and that the default batch size of 20 statements is in effect. Also notice that the loop in the example contains an INSERT statement:
for (int count=0; count < 10; count++) { #sql [my_exec_context] { INSERT INTO batch_table (value) VALUES (:count) }; }
In the first iteration of the loop, the INSERT statement causes a new batch to be created. Each subsequent iteration of the loop results in an additional INSERT statement being added to the batch. The batched statements are not run until the entire batch is run. Therefore, at the end of the loop, because the batch has not been filled with 20 statements, there will be 10 INSERT statements in the batch waiting to be run.
When a SQL statement is encountered that is different from the SQL statements stored in the current batch, the current batch is implicitly executed, and a new batch created to store the new SQL statement. For example, if an UPDATE statement followed the loop in the example, the current batch containing the 10 INSERT statements would be run implicitly, and a new batch would be created to store the UPDATE statement.
10.2.3 Setting Batch Limits
When the number of SQL statements in a batch reaches the limit, the batch is implicitly executed. By default, the batch limit is set to 20 statements, but you may specify a different upper limit using the setBatchLimit( ) method. For example, the following statement sets the batch limit to 5 for the execution context my_exec_context:
my_exec_context.setBatchLimit(5);
In addition to a number specifying a batch limit, there are two constants that may also be passed to the setBatchLimit( ) method:
- AUTO_BATCH
-
Means that SQLJ determines the best upper limit.
- UNLIMITED_BATCH
-
Means that there is no batch limit.
These constants are defined in the class sqlj.runtime.ExecutionContext, which you must import if you want to use them. The following example uses the AUTO_BATCH constant in a call to setBatchLimit( ):
my_exec_context.setBatchLimit(ExecutionContext.AUTO_BATCH);
In addition to specifying a batch limit, you can use the getBatchLimit( ) method to return the current batch limit for an execution context. For example, the following statement gets the current batch limit for my_exec_context:
int batch_limit = my_exec_context.getBatchLimit( );
Similarly, this statement gets the batch limit for the default execution context:
int batch_limit = DefaultContext.getDefaultContext( ). getExecutionContext().getBatchLimit( );
10.2.4 Example Program: PerformanceExample2.sqlj
This section contains a complete program, PerformanceExample2.sqlj (Example 10-2), that illustrates how you can use batching. In your own programs, you can use both row prefetching and statement batching together to improve performance by reducing the number of round trips made to the database over the network. Remember that you gain an advantage from prefetching when retrieving large numbers of rows from the database, while batching improves performance when issuing identical SQL statements. This example program performs the following steps:
-
Connects over a network to a database running on another machine. You may need to change the name of the machine specified in the connect( ) method to get the program to run in your environment.
-
Creates a table, batch_table, containing a single integer column named value.
-
Displays the status of batching using the isBatching( ) method.
-
Inserts 2,000 rows into batch_table and displays the total time in milliseconds taken to insert the rows. This is done using the program's insertRows( ) method.
-
Enables batching using the setBatching( ) method.
-
Sets the batch limit to 10 using the setBatchLimit( ) method. The batch limit is then displayed using the getBatchLimit( ) method.
-
Inserts another 2,000 rows into batch_table using the insertRows( ) method. This time, the INSERT statements in the insertRows( ) method are batched together and sent to the database.
-
Explicitly executes the last batch using the executeBatch( ) method.
-
Displays the time in milliseconds taken to insert the rows when batching was enabled.
Example 10-2. PerformanceExample2.sqlj
/* The program PerformanceExample2.sqlj illustrates how to use SQL statement batching to improve performance. */ import java.sql.*; import oracle.sqlj.runtime.Oracle; import sqlj.runtime.ref.DefaultContext; import oracle.jdbc.driver.OracleConnection; import sqlj.runtime.ExecutionContext; public class PerformanceExample2 { public static final int ROWS = 2000; public static final int BATCH_SIZE = 10; public static void main(String [] args) { try { // connect to a database on a remote machine Oracle.connect( "jdbc:oracle:thin:@remotehost:1521:orcl", "fundamental_user", "fundamental_password" ); // create a temporary table #sql { CREATE TABLE batch_table ( value INTEGER ) }; // display the status of batching boolean is_batching = DefaultContext.getDefaultContext( ). getExecutionContext().isBatching( ); System.out.println("is_batching = " + is_batching); insertRows( ); // set batching to true for the default execution context DefaultContext.getDefaultContext().getExecutionContext( ). setBatching(true); // display the status of batching is_batching = DefaultContext.getDefaultContext( ). getExecutionContext().isBatching( ); System.out.println("is_batching = " + is_batching); // set the batch limit to BATCH_SIZE and display the batch limit System.out.println("Setting the batch limit to " + BATCH_SIZE + "."); DefaultContext.getDefaultContext( ). getExecutionContext( ).setBatchLimit(BATCH_SIZE); int batch_limit = DefaultContext.getDefaultContext( ). getExecutionContext().getBatchLimit( ); System.out.println("batch_limit = " + batch_limit); insertRows( ); // explicitly execute the last batch int [] batch_results = DefaultContext.getDefaultContext( ). getExecutionContext().executeBatch( ); System.out.println("batch_results.length = " + batch_results.length); for (int count = 0; count < batch_results.length; count++) { System.out.println("batch_results[" + count + "] = " + batch_results[count]); } // drop the batch table #sql { DROP TABLE batch_table }; Oracle.close( ); } catch (SQLException e) { System.err.println("SQLException " + e); System.exit(1); } } // end of main( ) private static void insertRows( ) throws SQLException { long start_time = System.currentTimeMillis( ); for (int count = 0; count < ROWS; count++) { #sql { INSERT INTO batch_table (value) VALUES (:count) }; } long end_time = System.currentTimeMillis( ); System.out.println("Total time for inserting " + ROWS + " rows was " + (end_time - start_time) + " milliseconds."); } // end of insertRows( ) }
The output from this program is as follows:
is_batching = false Total time for inserting 2000 rows was 4977 milliseconds. is_batching = true Setting the batch limit to 10. batch_limit = 10 Total time for inserting 2000 rows was 1312 milliseconds. batch_update_results.length = 10 batch_update_results[0] = -2 batch_update_results[1] = -2 batch_update_results[2] = -2 batch_update_results[3] = -2 batch_update_results[4] = -2 batch_update_results[5] = -2 batch_update_results[6] = -2 batch_update_results[7] = -2 batch_update_results[8] = -2 batch_update_results[9] = -2