6.6 Using Subqueries for Efficiency | Oracle High Performance Tuning for 9i and 10g

< Day Day Up >

Tuning subqueries is a highly complex topic. Quite often subqueries can be used to partially replace subset parts of very large mutable joins, with possible enormous performance improvements.

6.6.1 Correlated versus Noncorrelated Subqueries

A correlated subquery allows a correlation between a calling query and a subquery. A value for each row in the calling query is passed into the subquery to be used as a constraint by the subquery. A noncorrelated or regular subquery does not contain a correlation between calling query and subquery and thus the subquery is executed in its entirety, independently of the calling query, for each row in the calling query. Tuning correlated subqueries is easier because values in subqueries can be precisely searched for in relation to each row of the calling query.

A correlated subquery will access a specified row or set of rows for each row in the calling query. Depending on circumstances a correlated subquery is not always faster than a noncorrelated subquery. Use of indexes or small tables inside a subquery, even for noncorrelated subqueries, does not necessarily make a subquery perform poorly.

6.6.2 IN versus EXISTS

We have already seen substantial use of IN and EXISTS in the section on comparison conditions. We know already that IN is best used for small tables or lists of literal values. EXISTS is best used to code queries in a correlated fashion, establishing a link between a calling query and a subquery. To reiterate it is important to remember that using EXISTS is not always faster than using IN.

6.6.3 Nested Subqueries

Subqueries can be nested where a subquery can call another subquery. The following example using the Employees schema shows a query calling a subquery, which in turn calls another subquery.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM division WHERE division_id IN       (SELECT division_id FROM department WHERE          department_id IN                (SELECT department_id FROM project));

Notice in the query plan how the largest table is scanned using an INDEX FAST FULL SCAN. The Optimizer is intelligent enough to analyze this nested query and discern that the Project table is much larger than both of the other two tables. The other two tables are so small that the only viable option is a full table scan.

Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                     14       10       590  HASH JOIN SEMI on                      14       10       590   TABLE ACCESS FULL on DIVISION          1       10       460   VIEW on VW_NSO_1                       8    10000    130000    HASH JOIN on                          8    10000     60000     TABLE ACCESS FULL on DEPARTMENT      1      100       400     INDEX FAST FULL SCAN on       XFKPROJECT_DEPT                    4    10000     20000

Nested subqueries can be difficult to tune but can often be a viable and sometimes highly effective tool for the tuning of mutable complex joins, with three and sometimes many more tables in a single join. There is a point when there are so many tables in a join that the Optimizer can become less effective.

6.6.4 Replacing Joins with Subqueries

For very large complex mutable joins it is often possible to replace joins or parts of joins with subqueries. Very large joins can benefit the most because they are difficult to decipher and tune. Some very large joins are even beyond the intelligence of the Optimizer to assess in the best possible way. Two ways in which subqueries can replace joins in complex mutable joins are as follows:

A table in the join not returning a column in the primary calling query can be removed from the join and checked using a subquery.
FROM clauses can contain nested subqueries to break up joins much in the way that PL/SQL would use nested looping cursors.

Certain aspects of SQL coding placed in subqueries can cause problems:

An ORDER BY clause is always applied to a final result and should not be included in subqueries if possible.
DISTINCT will always cause a sort and is not always necessary. Perhaps a parent table could be used where a unique value is present.
When testing against subqueries retrieve, filter, and aggregate on indexes not tables. Indexes usually offer better performance.

Do not be too concerned about full table scans on very small static tables.

Tip

Instances where joins can be replaced with subqueries often involve databases with heavy outer join requirements. Excessive use of SQL outer joins is possibly indicative of an over-granular data model structure. However, it could also indicate orphaned child table rows or the opposite: redundant static data. Cleaning out redundant or orphaned rows can sometimes help performance immensely by negating the need for outer joins.

Remove Tables Without Returned Columns Using EXISTS

Going back to the Accounts schema once again look at the following complex mutable join. We are joining four tables and selecting a column from only one of the tables. EXISTS comparisons can be placed into the WHERE clause to force index access, removing three tables from the join.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name       FROM customer c JOIN orders o USING(customer_id)             JOIN ordersline ol USING(order_id)                   JOIN transactions t USING(customer_id)                         JOIN transactionsline tl                            USING(transaction_id)       WHERE c.balance > 0;

Its query plan is a little scary. There are three full table scans and two full index scans. The objective is to remove full table scans and change as many index scans as possible into unique index scans.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                 ######  #######   #########  MERGE JOIN on                      ######  #######   #########   SORT JOIN on                      ######  #######   #########    MERGE JOIN on                      4988  #######   #########     SORT JOIN on                      3209   100136     3805168      HASH JOIN on                      762   100136     3805168       TABLE ACCESS FULL on          CUSTOMER                         9     2690      69940       MERGE JOIN on                    316   100237    1202844        INDEX FULL SCAN on          XFK_ORDERLINE_ORDER             26   540827    2704135        SORT JOIN on                    290    31935     223545         TABLE ACCESS FULL on           ORDERS                        112    31935     223545    SORT JOIN on                       1780   188185    1317295     TABLE ACCESS FULL on        TRANSACTIONS                     187   188185    1317295 SORT JOIN on                          5033   570175    2850875  INDEX FAST FULL SCAN on     XFK_TRANSLINE_TRANS                   4   570175    2850875

Only the Customer.NAME column is selected. This query is an extreme case but we can actually remove every table from the join except the Customer table. Let's show this in two stages. Firstly, I will remove the transaction tables from the join.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name       FROM customer c JOIN orders o ON(c.customer_id =          o.customer_id)             JOIN ordersline ol USING(order_id)       WHERE c.balance > 0       AND EXISTS(             SELECT t.transaction_id FROM transactions t             WHERE t.customer_id = c.customer_id             AND EXISTS(                   SELECT transaction_id FROM                      transactionsline                   WHERE transaction_id = t.transaction_id             )       );

We have now reduced the full table scans to two, have a single full index scan and most importantly index range scans on both of the transaction tables.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                    359     5007      190266  FILTER on    HASH JOIN on                         359     5007      190266    TABLE ACCESS FULL on CUSTOMER         9     135        3510    MERGE JOIN on                       316   100237     1202844     INDEX FULL SCAN on        XFK_ORDERLINE_ORDER               26   540827     2704135     SORT JOIN on                       290    31935      223545      TABLE ACCESS FULL on ORDERS       112    31935      223545   NESTED LOOPS on                       72      212        2544    TABLE ACCESS BY INDEX ROWID on       TRANSACTIONS                        2        70        490     INDEX RANGE SCAN on       XFX_TRANS_CUSTOMER                  1        70   INDEX RANGE SCAN on      XFK_TRANSLINE_TRANS                  1     570175 2850875

Now let's get completely ridiculous and remove every table from the join but the Customer table.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name FROM customer c       WHERE c.balance > 0       AND EXISTS(             SELECT o.order_id FROM orders o             WHERE o.customer_id = c.customer_id             AND EXISTS(                   SELECT order_id FROM ordersline                   WHERE order_id = o.order_id             )       )       AND EXISTS(              SELECT t.transaction_id FROM transactions t             WHERE t.customer_id = c.customer_id             AND EXISTS(                   SELECT transaction_id FROM                      transactionsline                   WHERE transaction_id = t.transaction_id             )       );

This is about the best that can be done with this query, now no longer a join. This final result has full table access on the Customer table only, along with four index range scans. We could possibly improve the query further by decreasing the number of Customer rows retrieved using filtering.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                      9        7         182  FILTER on   TABLE ACCESS FULL on CUSTOMER          9        7         182   NESTED LOOPS on                       66      201        2412    TABLE ACCESS BY INDEX ROWID on       ORDERS                              2       64         448     INDEX RANGE SCAN on        XFK_ORDERS_CUSTOMER                1       64    INDEX RANGE SCAN on       XFK_ORDERLINE_ORDER                 1   540827     2704135   NESTED LOOPS on                       72      212        2544    TABLE ACCESS BY INDEX ROWID on       TRANSACTIONS                        2       70         490     INDEX RANGE SCAN on        XFX_TRANS_CUSTOMER                 1       70    INDEX RANGE SCAN on       XFK_TRANSLINE_TRANS                 1   570175     2850875

FROM Clause Subquery Nesting

Now what we want to do is to retrieve columns from different tables. Columns cannot be retrieved from an EXISTS comparison in the WHERE clause. We have to use another method. Nested subqueries in the FROM clause allow retrieval of columns.

In this example I am adding extra filtering to the TransactionsLine table; at over 500,000 rows it is the largest table in the query. Since the TransactionsLine table is larger than the Customer table it is filtered first.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name, tl.amount FROM customer c          JOIN orders o USING(customer_id)             JOIN ordersline ol USING(order_id)                JOIN transactions t USING(customer_id)                   JOIN transactionsline tl USING(transaction_id)       WHERE tl.amount > 3170       AND c.balance > 0;

We start with three full table scans, one index range scan and a unique index hit.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                   1860      605       33880  NESTED LOOPS on                      1860      605       33880   HASH JOIN on                        1667      193        9843    TABLE ACCESS FULL on ORDERS         112    31935      223545    MERGE JOIN CARTESIAN on             436    43804     1927376     NESTED LOOPS on                    292       16         288      TABLE ACCESS FULL on        TRANSACTIONSLINE                276       16         176      TABLE ACCESS BY INDEX ROWID on        TRANSACTIO                        1   188185     1317295       INDEX UNIQUE SCAN on         XPKTRANSACTIONS                      188185     BUFFER SORT on                     435     2690       69940      TABLE ACCESS FULL on CUSTOMER       9     2690       69940   INDEX RANGE SCAN on     XFK_ORDERLINE_ORDER                  1   540827     2704135

Firstly, some appropriate simple tuning can be done. Since the TransactionsLine table is the largest table with the smallest relative filtered result, it should be selected from first.

Tip

The first table to be processed should be the largest table with the largest relative row reduction filter. In other words, the biggest table with the lowest number of rows retrieved from it. This applies to both the FROM clause and the WHERE clause. Always reduce rows to be joined first.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name, tl.amount FROM transactionsline tl          JOIN transactions t USING(transaction_id)             JOIN customer c USING(customer_id)                JOIN orders o USING(customer_id)                   JOIN ordersline ol ON(ol.order_id = o.order_id)       WHERE tl.amount > 3170       AND c.balance > 0;

Appropriate simple tuning yields one full table scan, two index range scans, and two unique index hits.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                   1381     3267      182952  NESTED LOOPS on                      1381     3267      182952   NESTED LOOPS on                      340     1041       53091    NESTED LOOPS on                     308       16         704     NESTED LOOPS on                    292       16         288      TABLE ACCESS FULL on        TRANSACTIONSLINE                276       16         176      TABLE ACCESS BY INDEX ROWID        on TRANSACTIO                     1    33142      231994       INDEX UNIQUE SCAN on         XPKTRANSACTIONS  33142     TABLE ACCESS BY INDEX ROWID on        CUSTOMER                           1     2690       69940      INDEX UNIQUE SCAN on         XPKCUSTOMER  2690    TABLE ACCESS BY INDEX ROWID on       ORDERS                              2   172304     1206128     INDEX RANGE SCAN on        XFK_ORDERS_CUSTOMER                1   172304   INDEX RANGE SCAN on      XFK_ORDERLINE_ORDER                  1   540827     2704135

Now let's use the FROM clause to create nested subqueries. The trick is to put the largest table with the most severe filter at the deepest nested level, forcing it to execute first. Thus we start with the TransactionsLine table.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name, b.amount       FROM customer c,       (             SELECT t.customer_id, a.amount             FROM transactions t,(                   SELECT transaction_id, amount FROM                      transactionsline                   WHERE amount > 3170             ) a             WHERE t.transaction_id = a.transaction_id       ) b, orders o, ordersline ol       WHERE c.balance > 0       AND c.customer_id = b.customer_id       AND o.customer_id = c.customer_id       AND ol.order_id = o.order_id;

The cost is reduced further with the same combination of scans because fewer rows are being joined.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                    533      605       33880  NESTED LOOPS on                       533      605       33880   NESTED LOOPS on                      340      193        9843    NESTED LOOPS on                     308       16         704     NESTED LOOPS on                    292       16         288      TABLE ACCESS FULL on         TRANSACTIONSLINE                276       16         176      TABLE ACCESS BY INDEX ROWID         on TRANSACTIO                     1    33142      231994       INDEX UNIQUE SCAN on          XPKTRANSACTIONS                       33142     TABLE ACCESS BY INDEX ROWID        on CUSTOMER                        1     2690       69940      INDEX UNIQUE SCAN on         XPKCUSTOMER                             2690    TABLE ACCESS BY INDEX ROWID on       ORDERS                              2    31935      223545     INDEX RANGE SCAN on        XFK_ORDERS_CUSTOMER                1    31935   INDEX RANGE SCAN on      XFK_ORDERLINE_ORDER                  1   540827     2704135

Now let's combine WHERE clause comparison subqueries and FROM clause embedded subqueries.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT c.name, b.amount       FROM customer c,       (             SELECT t.customer_id, a.amount             FROM transactions t,(                   SELECT transaction_id, amount FROM                      transactionsline                   WHERE amount > 3170             ) a             WHERE t.transaction_id = a.transaction_id       ) b       WHERE c.balance > 0       AND EXISTS(             SELECT o.order_id FROM orders o             WHERE o.customer_id = c.customer_id             AND EXISTS(                   SELECT order_id FROM ordersline                   WHERE order_id = o.order_id             )       );

Using EXISTS makes the query just that little bit faster with lower cost since the number of tables joined is reduced.

Query                                 Cost     Rows       Bytes ---------------------------------   ------   ------   --------- SELECT STATEMENT on                    420     2190       91980  FILTER on   NESTED LOOPS on                      420     2190       91980    MERGE JOIN CARTESIAN on             420     2190       81030     TABLE ACCESS FULL on       TRANSACTIONSLINE                 276       16         176     BUFFER SORT on 144 135 3510      TABLE ACCESS FULL on CUSTOMER       9      135        3510   INDEX UNIQUE SCAN on     XPKTRANSACTIONS                          188185      940925  NESTED LOOPS on                        66      201        2412   TABLE ACCESS BY INDEX ROWID on     ORDERS                               2       64         448    INDEX RANGE SCAN on      XFK_ORDERS_CUSTOMER                 1       64   INDEX RANGE SCAN on     XFK_ORDERLINE_ORDER                  1   540827     2704135

< Day Day Up >