| < Day Day Up > |
|
Different comparison conditions can have sometimes vastly different effects on the performance of SQL statements. Let's examine each in turn with various options and recommendations for potential improvement. The comparison conditions are listed here.
Equi, anti, and range
expr { [!]= | > | < | <= | >= } expr
expr [ NOT ] BETWEEN expr AND expr
LIKE pattern matching
expr [ NOT ] LIKE expr
Set membership
expr [ NOT ] IN expr
expr [ NOT ] EXISTS expr
Note | Oracle Database 10 Grid IN is now called an IN rather than a set membership condition in order to limit confusion with object collection MEMBER conditions. |
Groups
expr [ = | != | > | < | >= | <= ] [ ANY | SOME | ALL ] expr
Using an equals sign (equi) is the fastest comparison condition if a unique index exists. Any type of anti comparison such as != or NOT is looking for what is not in a table and thus must read the entire table; sometimes full index scans can be used. Range comparisons scan indexes for ranges of rows. Let's look at some examples.
This example does a unique index hit; using the equals sign an exact hit single row is found.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE generalledger_id = 100; Query Cost Rows Bytes ------------------------------------ ---- ------ ------- SELECT STATEMENT on 3 1 26 TABLE ACCESS BY INDEX ROWID on GENERALLEDGER 3 1 26 INDEX UNIQUE SCAN on XPKGENERALLEDGER 2 1
The anti (!=) comparison finds everything but the single row specified and thus must read the entire table.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE generalledger_id != 100; Query Pos Cost Rows Bytes ------------------------------ ---- ------ ------- ------- SELECT STATEMENT on 493 493 752739 19571214 TABLE ACCESS FULL on GENERAL 1 493 752739 19571214
In the next case using the range (<) comparison searches a range of index values rather than a single unique index value.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE generalledger_id < 10; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 4 1 26 TABLE ACCESS BY INDEX ROWID on GENERALLEDGE 4 1 26 INDEX RANGE SCAN on XPKGENERALLEDGER 3 1
In the next example the whole table is read rather than using an index range scan because most of the table will be read and thus the Optimizer considers reading the table as being faster.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE generalledger_id >= 100; Query Pos Cost Rows Bytes ------------------------------ ---- ---- ------- ------- SELECT STATEMENT on 493 493 752740 19571240 TABLE ACCESS FULL on GENERAL 1 493 752740 19571240
Here the BETWEEN comparison causes a range scan on an index because the range of rows is small enough to not warrant a full table scan.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE generalledger_id BETWEEN 100 AND 200; Query Cost Rows Bytes ------------------------------------ ---- ------ ------- SELECT STATEMENT on 4 1 26 TABLE ACCESS BY INDEX ROWID on GENERALLEDGE 4 1 26 INDEX RANGE SCAN on XPKGENERALLEDGER 3 1
The approach in the query plan used by the Optimizer will depend on how many rows are retrieved and how the pattern match is constructed.
This query finds one row.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE name like '24/7 Real Media, Inc.'; Query Cost Rows Bytes ------------------------------------ ---- ------ ------- SELECT STATEMENT on 2 1 142 TABLE ACCESS BY INDEX ROWID on SUPPLIER 2 1 142 INDEX UNIQUE SCAN on AK_SUPPLIER_NAME 1 1
This query also retrieves a single row but there is a wildcard pattern match and thus a full table scan is the result.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE name LIKE '21st%'; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 13 491 69722 TABLE ACCESS FULL on SUPPLIER 13 491 69722
The next query finds almost 3,000 rows and thus a full scan of the table results regardless of the exactness of the pattern match.
Tip | A pattern match using a % full wildcard pattern matching character anywhere in the pattern matching string will usually produce a full table scan. |
SQL> SELECT COUNT(*) FROM supplier WHERE name LIKE '%a%'; COUNT(*) -------- 2926 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE name LIKE '%a%'; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 13 194 27548 TABLE ACCESS FULL on SUPPLIER 13 194 27548
In general, since LIKE will match patterns which are in no way related to indexes, LIKE will usually read an entire table.
IN should be used to test against literal values and EXISTS is often used to create a correlation between a calling query and a subquery. IN is best used as a pre-constructed set of literal values. IN will cause a subquery to be executed in its entirety before passing the result back to the calling query. EXISTS will stop once a result is found.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM coa WHERE type IN ('A','L','I','E'); Query Cost Rows Bytes -------------------------- ---- ------ ------- SELECT STATEMENT on 1 38 950 TABLE ACCESS FULL on COA 1 38 950
There are two advantages to using EXISTS over using IN. The first advantage is the ability to pass values from a calling query to a subquery, never the other way around, creating a correlated query. The correlation allows EXISTS the use of indexes between calling query and subquery, particularly in the subquery. The second advantage of EXISTS is, unlike IN, which completes a subquery regardless, EXISTS will halt searching when a value is found. Thus the subquery can be partially executed, reading fewer rows.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM coa WHERE EXISTS (SELECT type FROM type WHERE type = coa.type); Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 1 55 1485 NESTED LOOPS SEMI on 1 55 1485 TABLE ACCESS FULL on COA 1 55 1375 INDEX UNIQUE SCAN on XPKTYPE 6 12
Now let's compare the use of IN versus the use of EXISTS. The next two examples both use indexes and have the same result. The reason why IN is the same cost as EXISTS is because the query contained within the IN subquery matches an index based on the single column it selects.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT stock_id FROM stock s WHERE EXISTS (SELECT stock_id FROM stockmovement WHERE stock_id = s.stock_id); Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 119 118 944 NESTED LOOPS SEMI on 119 118 944 INDEX FULL SCAN on XPKSTOCK 1 118 472 INDEX RANGE SCAN on XFK_SM_STOCK 1 570175 2280700 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT stock_id FROM stock WHERE stock_id IN (SELECT stock_id FROM stockmovement); Query Cost Rows Bytes ----------------------------------- ---- ------ ------- NESTED LOOPS SEMI on 119 118 944 SELECT STATEMENT on 119 118 944 INDEX FULL SCAN on XPKSTOCK 1 118 472 INDEX RANGE SCAN on XFK_SM_STOCK 1 570175 2280700
Now let's do some different queries to show a very distinct difference between IN and EXISTS. Note how the first example is much lower in cost than the second. This is because the second option cannot match indexes and executes two full table scans.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM stockmovement sm WHERE EXISTS (SELECT * FROM stockmovement WHERE stockmovement_id = sm.stockmovement_id); Query Cost Rows Bytes Sort ------------------------------ ---- ------ ------- ------- SELECT STATEMENT on 8593 570175 16535075 MERGE JOIN SEMI on 8593 570175 16535075 TABLE ACCESS BY INDEX ROWID on SM 3401 570175 13684200 INDEX FULL SCAN on XPKSTOCKMOVEMENT 1071 570175 SORT UNIQUE on 5192 570175 2850875 13755000 INDEX FAST FULL SCAN on XPKSTMOVE 163 570175 2850875 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM stockmovement sm WHERE qty IN (SELECT qty FROM stockmovement); Query Cost Rows Bytes Sort ------------------------------ ---- ------ ------- ------- SELECT STATEMENT on 16353 570175 15964900 MERGE JOIN SEMI on 16353 570175 15964900 SORT JOIN on 11979 570175 13684200 45802000 TABLE ACCESS FULL on STOCKMOVEMENT 355 570175 13684200 SORT UNIQUE on 4374 570175 2280700 13755000 TABLE ACCESS FULL on STOCKMOVEMENT 355 570175 2280700
Now let's go yet another step further and restrict the calling query to a single row result. What this will do is ensure that EXISTS has the best possible chance of passing a single row identifier into the subquery, thus ensuring a unique index hit in the subquery. The StockMovement table has been joined to itself to facilitate the demonstration of the difference between using EXISTS and IN. Note how the IN subquery executes a full table scan and the EXISTS subquery does not.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM stockmovement sm WHERE EXISTS( SELECT qty FROM stockmovement WHERE stockmovement_id = sm.stockmovement_id) AND stockmovement_id = 10; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- 1. SELECT STATEMENT on 2 1 29 2. NESTED LOOPS SEMI on 2 1 29 3. TABLE ACCESS BY INDEX ROWID on STOCKMOVEMENT 2 1 24 4. INDEX UNIQUE SCAN on XPK_STOCKMOVEMENT 1 570175 3. INDEX UNIQUE SCAN on XPK_STOCKMOVEMENT 1 5 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM stockmovement sm WHERE qty IN (SELECT qty FROM stockmovement) AND stockmovement_id = 10; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- 1. SELECT STATEMENT on 563 1 28 2. NESTED LOOPS SEMI on 563 1 28 3. TABLE ACCESS BY INDEX ROWID on STOCKMOVEMENT 2 1 24 4. INDEX UNIQUE SCAN on XPK_STOCKMOVEMENT 1 570175 3. TABLE ACCESS FULL on STOCKMOVEMENT 561 570175 2280700
The benefit of using EXISTS rather than IN for a subquery comparison is that EXISTS can potentially find much fewer rows than IN. IN is best used with literal values and EXISTS is best used as applying a fast access correlation between a calling and a subquery.
ANY, SOME, and ALL comparisons are generally not very conducive to SQL tuning. In some respects they are best not used.
| < Day Day Up > |
|