JDBC Performance Techniques | Practical Java Game Programming (Charles River Media Game Development)

It’s hard to give specific advice on such a vast and complex topic such as databases. Things really depend on the particular project, the requirements, budget, and so on. Before specific techniques are discussed, a few general factors must be considered first, all of which can affect database performance dramatically. The following questions should be considered up front:

Frequency: How many requests per minute/hour will users make? How large is the user group accessing the database? As the number of users increases, so will the number of queries sent to the database. More queries means more compiles on the database server, more disk access, more network traffic, and so on.

Size: How big will the tables get? Large database tables mean slower query execution times, because there is more data to scan. Indexing will help and is highly recommended for tables that are frequently accessed or of significant size.

Complexity: How complex are the relationships between the data? In other words, how difficult will it be for the database to find a path to the data requested? Complexity in the data structure will obviously slow down queries. Indexing can help out here as well.

We should all proactively seek to take advantage of any features (in the database, in Java, in the operating system, or from anywhere) that might help out or speed up development time or runtime performance. Many databases will perform optimizations automatically, without any prompting or extra configuration, but relying solely on the database, the drivers, the garbage collector, and the like to keep your application running clean is not a good idea. Resources should be managed responsibly and efficiently.

Efficient Runtime JDBC Access

Database performance can be excellent, but it is always faster to get data if you know exactly where it is and how it is organized. Where databases really shine is in separating the data storage from the data access, which increases flexibility and ease of development and maintenance. Commercial database technology has been around a long time, and the optimization techniques are good, allowing efficient and high-performance database use without having to know the specifics of the database storage mechanism. You can expect a good commercial-quality database to be fast and support many concurrent users and countless queries. Then again, even the biggest, fastest database will perform dreadfully if it’s not accessed properly and efficiently. For example, submitting a large number of queries in a short amount of time could cripple the database. Equally, a small number of queries could do the same given that they force the database to scan through very large amounts of data. Every effort must be made to lift any extra burden off the database. Indexing is one of the most powerful means of reducing the time it takes to search through records. Using well-formed SQL statements, which efficiently access the data, is of utmost importance. Although it is counterproductive to form queries so complex that nobody can read them, submitting a single query for every piece of data is overkill. It’s probably best to find a happy medium. Minimize the number of trips to the database to reduce the number of tasks the database is dealing with at any given point in time. For example, it is always better from a performance point of view—although not always from a memory-conservation point of view—to retrieve larger amounts of data less often, rather than submitting a large number of queries for small amounts of data. For example, it’s almost always better to do the following:

SELECT * FROM TABLE

than this:

SELECT COL1 FROM TABLE SELECT COL2 FROM TABLE SELECT COL3 FROM TABLE SELECT COL4 FROM TABLE

Joins

Joins are operations that merge data from different tables into one context. This operation allows data from various tables to be combined into an efficient, single query and resultant data set. Doing so eliminates the need to query each table independently and allows more of the logic to be performed inside of the RDBMS engine, where it can be optimized for data access, rather than evaluate the values of the data in the Java code. From the application’s perspective, complex sets of data can be treated as if they were a single structure. Meanwhile, the SQL (or database access code) acts as a facade, making the interface to the database much simpler. To illustrate this, see Figure 9.3. The figure depicts two sets of data from two tables, one to the left, and one to the right.

image from book
Figure 9.3: Two sets of data from two tables.

There are a few different strategies for joining data. Using the previous diagram as a reference, the types of Joins can be summarized as follows:

Inner Joins: Return all the records from the left and right table that match the WHERE clause.

Left Outer Joins: Return all the records from the left, but only those on the right that match the WHERE clause.

Right Outer Joins: Return all the records from the right, but only those on the left that match the WHERE clause.

Full Joins: Return all records from both tables.

The syntax may differ among databases, but should be close to the following format:

SELECT column1, column2 FROM table1,table2 WHERE ...

Query Plan Caching

The database will always seek the quickest and most efficient path to the data because the RDBMS knows exactly how the data is organized. Caching query plans is one way the database can really save time. When a SQL statement is submitted, the database must:

Parse the SQL query
Build a plan/path to the data
Compile the plan
Execute the plan
Return the results

That’s a lot of work for the database! Fortunately, most queries are used repeatedly, and they can be cached. There’s no need for the database to keep compiling the same plan over and over. To take advantage of caching features, use the java.sql.PreparedStatement object. As the name implies, the statement is prepared and remains ready and set for repeated use, eliminating steps 1, 2, and 3 from the previous list.

String insertSQL = "INSERT INTO player VALUES(?,?,?)"; PreparedStatement pstmt = conn.prepareStatement(insertSQL);

The SQL INSERT has now been parameterized. Player information can now be inserted repeatedly by reusing the PreparedStatement pstmt object. The rest of the operation is done as follows:

pstmt.setInt(1, 4); pstmt.setString(2, "Joe"); pstmt.setInt(3, 100); pstmt.setInt(4, 2);

When all of the parameters have been set, the statement can be executed:

pstmt.execute();

As mentioned earlier, some caching may automatically be performed by the database, with or without your knowledge or intervention. For example, the MySQL Query Cache allows caching of query responses themselves. The database administrator can typically adjust the type (caching algorithm), size (memory usage), and limit (max result size) of the cache. When a client submits a cached query, the value in the cache, rather than the query execution, is returned. When new data is written to the tables, the cached queries are removed from cache. This behavior keeps the important, commonly accessed data in memory, allowing fast access to that data. SQL statements should also be executed in batches whenever possible. Doing so further minimizes trips to the database and excessive use of database resources, as well as disk access. The tradeoff for this improved performance is increased memory usage, so it is important to balance the two for your specific needs.

Stored Procedures

If a significant amount of logic must be applied to the data in the database, stored procedures can be of great help and improve performance of the overall system. Stored procedures are basically functions or logic that get executed on the database server. Therefore, certain tasks can be removed from the game servers and clients and moved to the database servers, freeing up valuable resources on the client and server machines. Unfortunately, not all databases support stored procedures. Stored procedures are similar to methods and functions in Java. They can accept parameters, execute arbitrary logic, and, if the database server supports it, return results. The major difference is that these functions are executed inside the database and not on the JVM.

The language and syntax used for writing stored procedures varies across databases. There is no standard, and depending on the database, stored procedures may be written in one of many languages. Thus, migrating stored procedures to another database may be difficult, depending on how they’re written. You should be committed to a database before developing a significant amount of stored procedures, because they are typically not compatible with other databases. Stored procedures may be written using Java, SQLJ, C, and other languages but are most often written using some form of SQL (TSQL, PL/SQL, and so on). SQL and TSQL are preferred because they are much more intuitive languages for operating on data.

MySQL supports stored procedures in version 5.0, which was not available at the time of this writing but is on the way. However, some documentation is already available to illustrate the syntax and use of stored procedures in MySQL:

CREATE PROCEDURE foo([IN] x INT, INOUT y INT, OUT z FLOAT) ...body... CREATE FUNCTION bar(x INT, y CHAR(8)) RETURNS CHAR(16) ...body with RETURN... CALL foo(1, 2, var) SELECT * FROM table WHERE x = bar(y, "foo") DROP PROCEDURE foo

Calling the stored procedure from Java will look something like the following:

CallableStatement callStmt = null; callStmt = conn.prepareCall("{ call function (?, ?) }"); callStmt.setInt(1, intVal); callStmt.setString(2, stringVal); callStmt.execute(); callStmt.close();

The DAO Pattern

The data access object (DAO) design pattern is popular and useful. At a high level, the DAO pattern is a framework or template used to decouple the business logic from persistence logic. DAOs abstract and encapsulate database access, the outcome of which is an object-relational mapping. This mapping (between Java objects and database tables) can be one-to-one, one-to-many, or many-to-many. Each DAO implements a common interface for accessing data, typically implementing read() and write() functions, or something to that effect. This wrapper layer of code allows the database-specific operations to be performed independently of the application logic.

Benefits of using DAO pattern follow:

Simplifies coding
Decouples business logic from persistence logic
Provides transparent access to data
Has a minimal schedule impact if migration is necessary

The Java 2 Enterprise Edition (J2EE) provides something similar to DAOs. Enterprise JavaBeans (EJB) achieves a similar end. In fact, there is tons of functionality in J2EE that could be leveraged by the game-development community. However, the J2EE platform is probably too much baggage for most games. It really depends on the project. EJBs are different in that they are managed in a container environment, which handles the creation and invocation of the beans, as well as the database access, transaction management, failover, clustering, and much more. All the developer must do is define the beans, configure and deploy them, and that’s it. This is great! However, the disadvantage of using EJBs is that in their container-managed form, they aren’t that extensible and may not perform as optimally as a custom solution. On the other hand, the J2EE is perfect for massive, distributed, online games that must manage vast amounts of resources reliably and securely. The J2EE has been proving itself every day in the enterprise for years in large-scale, mission-critical applications.

The Java Data Object (JDO) API is another example of a technology that uses the DAO pattern. Using JDO API is simpler than using EJBs, but it provides the foundation for using an abstract layer between application code and the database itself. Several commercial implementations of the JDO API exist.