Online steps in a business process that run in less than a second are unlikely to significantly slow the end user who is performing the process. Even steps that take well over a second can often be made convenient to the end user if those parts of the process are taken offline and made into batch processes. The only business-driven need for queries to run very fast ”under a half a second, for example ”comes about when a single step in a business process requires queries to run repeatedly. Some application designs repeat a query hundreds of times for a single online event, or up to millions of times for a single batch process. In these cases, clearly, a query would have to run in a few milliseconds or less to meet an end user's needs, and most queries are not that fast, even after perfect tuning. When you run into this problem, the real issue is the need for so many queries, not the runtime of each query. There are three basic solutions to the need for repeated queries:
Before I discuss these three solutions, consider a typical repeated query, which retrieves the pricing history for a given product: SELECT Product_Price FROM Price_List WHERE Product_ID = :1 AND Effective_Date <= :today ORDER BY Effective_Date DESC; To find the best solution to such a repeated query, you need to consider several questions:
The next few sections describe three possible solutions and show how the answers to these questions can lead to one of those solutions. 10.1.1 Caching to Avoid Repeated QueriesIf the set of possible Product_ID s is small compared to the number of times the example query repeats, you have a good case to cache the whole set. For example, we might have 10 Product_ID s while repeating the query 1,000 times, so it makes more sense to precache results across the whole set and read from the cache in place of repeated queries to the database. For example, you could issue this query: SELECT Product_Price FROM Price_List WHERE Effective_Date <= :today ORDER BY Product_ID ASC, Effective_Date DESC; The cache could then hold these results ”for example, in a structure in application memory using hash buckets or a binary tree that gives rapid (microseconds) access to the results for a particular Product_ID . Since individual reads from the first approach outnumber the Product_ID s, this approach offers two advantages in terms of the cost of database access:
In the current example, the repeated query likely returns multiple rows for each Product_ID , but it is quite likely that only the row with the highest Effective_Date returned (the first one in the sort order) is relevant to the application. An ideal caching algorithm would take this into account and simply discard the other rows, saving memory and other costs to fill the cache. Therefore, even if such a cache is used one set of times, to perform a single business task for a single end user, it can cost less than the repeated queries it makes unnecessary. If the cache continues to be useful for future tasks by the same end user, the benefit increases , even justifying a cache that is initially more expensive to populate than the repeated queries for a single task. If the cache resides in shared memory that is accessible to multiple end users, assuming its contents are useful to the whole community of end users, the benefits multiply still further.
10.1.2 Consolidated QueriesA blind query to read every row that a repeated query might potentially touch can be too expensive, and the result set might be too large to cache. However, that does not prevent the application from consolidating the multiple queries into a single query with one of two forms. In the current example, each pass through the loop would find a Product_ID to drive the initial repeated query. You could run through the loops without querying and simply compile a list of IDs to use in a single consolidated query. For the example, these forms would look like this: SELECT Product_Price FROM Price_List WHERE Product_ID in ( <long list of IDs> ) AND Effective_Date <= :today ORDER BY Product_ID ASC, Effective_Date DESC; or this: SELECT Product_Price FROM Price_List WHERE Product_ID in ( <subquery that returns the list of IDs> ) AND Effective_Date <= :today ORDER BY Product_ID ASC, Effective_Date DESC; This still has the advantages of eliminating the overhead of handling multiple queries, and it handles a case in which caching the set of all possible query results would not work, but it produces a result that yields far less row reuse. In all, this query-consolidation approach resolves a broad class of repeated-query performance problems. 10.1.3 Merging Repeated Queries into a Preexisting QueryUsually, the source of the variables in repeated queries is an earlier query. In the current example, :today surely comes from the clock/calendar, but :1 almost certainly comes from an earlier query that returned a long list of Product_ID s. The most likely reason behind the repeated query is to fill in a current (with the most recent Effective_Date ) product price for each of the rows the earlier query returned, based on the first row that the query returns each time it runs. Since the goal of the query is to find a matching price row for each of the earlier query's rows, this is functionally like a join, though not nearly so efficient. The technique in these cases is to convert these to actual joins. When these repeated queries are just simple single-row queries on a primary key, the conversion to a join is obvious. In cases like the current example, the solution is subtler. The query returned a price history sorted by Effective_Date to place the currently effective price at the top. This currently effective price record is the only record you actually want. To consolidate this repeated query with the query that provides the list of Product_ID s, you have to find a way to join with the first row a sorted query returns without reading the other rows. There are several approaches that resolve this requirement, and I will describe two. Each approach begins with a new column, Current_Price_Flag , which equals ' Y ' if and only if the price is the price currently in effect:
Given such a properly maintained Current_Price_Flag column, the join is simple. If the initial query looked like this: SELECT ... OD.Product_ID, ... FROM ... Order_Details OD, ... WHERE ... the new query that incorporates the join would be: SELECT ... OD.Product_ID, ..., PL.Product_Price FROM ... Order_Details OD, ..., Price_List PL WHERE ... AND OD.Product_ID=PL.Product_ID AND PL.Current_Price_Flag='Y' Not all procedural code is replaced by a join as easily as in this last example. However, SQL incorporates powerful logic expressions, such as CASE , that enable virtually any procedural data-lookup function to be converted to some series of joins combined with complex (often nested) SQL expressions. |