| < Day Day Up > |
|
The most relevant thing to say about functions is that they should not be used where you expect an SQL statement to use an index. There are function-based indexes of course. A function-based index contains the resulting value of an expression. An index search against that function-based index will search the index for the value of the expression.
Let's take a quick look at a few specific functions.
For older versions of Oracle Database the COUNT function has been recommended as performing better when used in different ways. Prior to Oracle9i Database the COUNT(*) function using the asterisk was the fastest form because the asterisk option was specifically tuned to avoid any sorting. Let's take a look at each of four different methods and show that they are all the same using both the EXPLAIN PLAN command and time testing. We will use the GeneralLedger table in the Accounts schema since it has the largest number of rows.
Notice how all the query plans for all the four following COUNT function options are identical. Additionally there is no sorting on anything but the resulting single row produced by the COUNT function, the sort on the aggregate.
Using the asterisk:
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT COUNT(*) FROM generalledger; Query Cost Rows ----------------------------------- ---- ------ 1. SELECT STATEMENT on 382 1 2. SORT AGGREGATE on 1 3. INDEX FAST FULL SCAN on XPK_GENERALLEDGER 382 752825
Forcing the use of a unique index:
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT COUNT(generalledger_id) FROM generalledger; Query Cost Rows --------------------------------------------- ---- ------ 1. SELECT STATEMENT on 382 1 2. SORT AGGREGATE on 1 3. INDEX FAST FULL SCAN on XPK_GENERALLEDGER 382 752825
Using a constant value:
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT COUNT(1) FROM generalledger; Query Cost Rows --------------------------------------------- ---- ------ 1. SELECT STATEMENT on 382 1 2. SORT AGGREGATE on 1 3. INDEX FAST FULL SCAN on XPK_GENERALLEDGER 382 752825
Using a nonindexed column:
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT COUNT(dr) FROM generalledger; Query Cost Rows -------------------------------------- ---- ------ 1. SELECT STATEMENT on 382 1 2. SORT AGGREGATE on 1 3. INDEX FAST FULL SCAN on 382 752825 XPK_GENERALLEDGER
Now with time testing, below I have simply executed the four COUNT function options with SET TIMING set to ON in SQL*Plus. Executing these four SQL statements twice will assure that all data is loaded into memory and that consistent results are obtained.
SQL> SELECT COUNT(*) FROM generalledger; COUNT(*) -------- 752741 Elapsed: 00:00:01.01 SQL> SELECT COUNT(generalledger_id) FROM generalledger; COUNT(GENERALLEDGER_ID) ----------------------- 752741 Elapsed: 00:00:01.01 SQL> SELECT COUNT(1) FROM generalledger; COUNT(1) -------- 752741 Elapsed: 00:00:01.01 SQL> SELECT COUNT(dr) FROM generalledger; COUNT(DR) --------- 752741 Elapsed: 00:00:01.01
As you can see from the time tests above, the COUNT function will perform the same no matter which method is used. In the latest version of Oracle Database different forms of the COUNT function will perform identically. No form of the COUNT function is better tuned than any other. All forms of the COUNT function perform the same; using an asterisk, a constant or a column, regardless of column indexing, the primary key index is always used.
DECODE can be used to replace composite SQL statements using a set operator such as UNION. The Accounts Stock table has a QTYONHAND column. This column denotes how many items of a particular stock item are currently in stock. Negative QTYONHAND values indicate that items have been ordered by customers but not yet received from suppliers.
The first example below uses four full reads of the Stock table and concatenates the results together using UNION set operators.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT stock_id||' Out of Stock' FROM stock WHERE qtyonhand <=0 UNION SELECT stock_id||' Under Stocked' FROM stock WHERE qtyonhand BETWEEN 1 AND min-1 UNION SELECT stock_id||' Stocked' FROM stock WHERE qtyonhand BETWEEN min AND max UNION SELECT stock_id||' Over Stocked' FROM stock WHERE qtyonhand > max; Query Pos Cost Rows Bytes ------------------------------ --- ---- ---- ------ SELECT STATEMENT on 12 12 123 1543 SORT UNIQUE on 1 12 123 1543 UNION-ALL on 1 TABLE ACCESS FULL on STOCK 1 1 4 32 TABLE ACCESS FULL on STOCK 2 1 1 11 TABLE ACCESS FULL on STOCK 3 1 28 420 TABLE ACCESS FULL on STOCK 4 1 90 1080
This second example replaces the UNION set operators and the four full table scan reads with a single full table scan using nested DECODE functions. DECODE can be used to improve performance.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT stock_id||' '|| DECODE(SIGN(qtyonhand) ,-1,'Out of Stock',0,'Out of Stock' ,1,DECODE(SIGN(qtyonhand-min) ,-1,'Under Stocked',0,'Stocked' ,1,DECODE(sign(qtyonhand-max) ,-1,'Stocked',0,'Stocked' ,1,'Over Stocked' ) ) ) FROM stock; Query Pos Cost Rows Bytes ------------------------------ --- ---- ---- ------ SELECT STATEMENT on 1 1 118 1770 TABLE ACCESS FULL on STOCK 1 1 118 1770
Using the DECODE function as a replacement for multiple query set operators is good for performance but should only be used in extreme cases such as the UNION clause joined SQL statements shown previously.
Datatype conversions are a problem and will conflict with existing indexes unless function-based indexes are available and can be created. Generally, if a function is executed in a WHERE clause, or anywhere else that can utilize an index, a full table scan is likely. This leads to inefficiency. There is some capability in Oracle SQL for implicit datatype conversion but often use of functions in SQL statements will cause the Optimizer to miss the use of indexes and perform poorly.
The most obvious datatype conversion concerns dates. Date fields in all the databases I have used are stored internally as a Julian number. A Julian number or date is an integer value from a database-specific date measured in seconds. When retrieving a date value in a tool such as SQL*Plus there is usually a default date format. The internal date value is converted to that default format. The conversion is implicit, automatic, and transparent.
SELECT SYSDATE, TO_CHAR(SYSDATE,'J') "Julian" FROM DUAL; SYSDATE Julian ------------ ---------- 03-MAR-03 2452702
Now for the sake of demonstration I will create an index on the GeneralLedger DTE column.
CREATE INDEX ak_gl_dte ON GENERALLEDGER(DTE);
Now obviously it is difficult to demonstrate an index hit with a key such as this because the date is a datestamp as well as a simple date. A simple date format such as MM/DD/YYYY excludes a timestamp. Simple dates and datestamps (timestamps) are almost impossible to match. Thus I will use SYSDATE in order to avoid a check against a simple formatted date. Both the GeneralLedger DTE column and SYSDATE are timestamps since the date column in the table was created using SYSDATE-generated values. We are only trying to show Optimizer query plans without finding rows.
The first example hits the new index I created and has a very low cost.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE dte = SYSDATE; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 2 593 15418 TABLE ACCESS BY INDEX ROWID on GENERALLEDGER 2 593 15418 INDEX RANGE SCAN on AK_GL_DTE 1 593 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger WHERE TO_CHAR(dte, 'YYYY/MM/DD') = '2002/08/21';
This second example does not hit the index because the TO_CHAR datatype conversion is completely inconsistent with the datatype of the index. As a result the cost is much higher.
Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 493 7527 195702 TABLE ACCESS FULL on GENERALLEDGER 493 7527 195702
Another factor to consider with datatype conversions is making sure that datatype conversions are not placed onto columns. Convert literal values not part of the database if possible. In order to demonstrate this I am going to add a zip code column to my Supplier table, create an index on that zip code column and regenerate statistics for the Supplier table. I do not need to add values to the zip code column to prove my point.
ALTER TABLE supplier ADD(zip NUMBER(5)); CREATE INDEX ak_sp_zip ON supplier(zip); ANALYZE TABLE supplier COMPUTE STATISTICS;
Now we can show two examples. The first uses an index because there is no datatype conversion on the column in the table and the second reads the entire table because the conversion is on the column.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE zip = TO_NUMBER('94002'); Query Cost Rows Bytes ------------------------------------- ---- ------ ------- SELECT STATEMENT on 1 1 142 TABLE ACCESS BY INDEX ROWID on SUPPLIER 1 1 142 INDEX RANGE SCAN on AK_SP_ZIP 1 1 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE TO_CHAR(zip) = '94002'; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 13 1 142 TABLE ACCESS FULL on SUPPLIER 13 1 142
Oracle SQL does not generally allow implicit type conversions but there is some capacity for automatic conversion of strings to integers, if a string contains an integer value. Using implicit type conversions is a very bad programming practice and is not recommended. A programmer should never rely on another tool to do their job for them. Explicit coding is less likely to meet with potential errors in the future.
It is better to be precise since the computer will always be precise and do exactly as you tell it to do. Implicit type conversion is included in Oracle SQL for ease of programming. Ease of program coding is a top-down application to database design approach, totally contradictory to database tuning. Using a database from the point of view of how the application can most easily be coded is not favorable to eventual production performance. Do not use implicit type conversions. As can be seen in the following examples implicit type conversions do not appear to make any difference to Optimizer costs.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE supplier_id = 3801; Query Cost Rows Bytes ---------------------------------------- ---- ------ ------- 1. SELECT STATEMENT on 2 1 142 2. TABLE ACCESS BY INDEX ROWID on SUPPLIER 2 1 142 3. INDEX UNIQUE SCAN on XPK_SUPPLIER 1 3874 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM supplier WHERE supplier_id = '3801'; Query Cost Rows Bytes ---------------------------------------- ---- ------ ------- 1. SELECT STATEMENT on 2 1 142 2. TABLE ACCESS BY INDEX ROWID on SUPPLIER 2 1 142 3. INDEX UNIQUE SCAN on XPK_SUPPLIER 1 3874
In short, try to avoid using any type of data conversion function in any part of an SQL statement which could potentially match an index, especially if you are trying to assist performance by matching appropriate indexes.
Now let's expand on the use of functions by examining their use in all of the clauses of a SELECT statement.
Firstly, let's put a datatype conversion into a SELECT statement, which uses an index. As we can see in the two examples below, use of the index is not affected by the datatype conversion placed into the SELECT statement.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT customer_id FROM customer; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 1 2694 10776 INDEX FAST FULL SCAN on XPKCUSTOMER 1 2694 10776 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT TO_CHAR(customer_id) FROM customer; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 1 2694 10776 INDEX FAST FULL SCAN on XPKCUSTOMER 1 2694 10776
Now let's examine the WHERE clause. In the two examples below the only difference is in the type of index scan utilized. Traditionally the unique index hit produces an exact match and it should be faster. A later chapter will examine the difference between these two types of index reads.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT customer_id FROM customer WHERE customer_id = 100; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 1 1 4 INDEX UNIQUE SCAN on XPKCUSTOMER 1 1 4 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT customer_id FROM customer WHERE TO_CHAR(customer_id) = '100'; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 1 1 4 INDEX FAST FULL SCAN on XPKCUSTOMER 1 1 4
The ORDER BY clause can utilize indexing well, as already seen in this chapter, as long as WHERE clause index matching is not compromised. Let's keep it simple. Looking at the following two examples it should suffice to say that it might be a bad idea to include functions in ORDER BY clauses. An index is not used in the second query and consequently the cost is much higher.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger ORDER BY coa#; Query Cost Rows Bytes Sort --------------------------------- ---- ------ ------- ------- SELECT STATEMENT on 826 752740 19571240 TABLE ACCESS BY INDEX ROWID on GL 826 752740 19571240 INDEX FULL SCAN on XFK_GL_COA# 26 752740 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM generalledger ORDER BY TO_CHAR(coa#); Query Cost Rows Bytes Sort ------------------------------ ---- ------ ------- ------- SELECT STATEMENT on 19070 752740 19571240 SORT ORDER BY on 19070 752740 19571240 60474000 TABLE ACCESS FULL on GENERALLEDGER 493 752740 19571240
Here is an interesting twist to using the same datatype conversion in the above two examples but with the conversion in the SELECT statement and setting the ORDER BY clause to sort by position rather than using the TO_CHAR(COA#) datatype conversion. The reason why this example is lower in cost than the second example is because the conversion is done on selection and ORDER BY resorting is executed after data retrieval. In other words, in this example the ORDER BY clause does not affect the data access method.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT TO_CHAR(coa#), dte, dr cr FROM generalledger ORDER BY 1; Query Cost Rows Bytes Sort ------------------------------ ---- ------ ------- ------- SELECT STATEMENT on 12937 752740 13549320 SORT ORDER BY on 12937 752740 13549320 42394000 TABLE ACCESS FULL on GENERALLEDGER 493 752740 13549320
Using functions in GROUP BY clauses will slow performance as shown in the following two examples.
EXPLAIN PLAN SET statement_id='TEST' FOR SELECT order_id, COUNT(order_id) FROM ordersline GROUP BY order_id; Query Cost Rows Bytes ----------------------------------- ---- ------ ------- SELECT STATEMENT on 26 172304 861520 SORT GROUP BY NOSORT on 26 172304 861520 INDEX FULL SCAN on XFK_ORDERLINE_ORDER 26 540827 2704135 EXPLAIN PLAN SET statement_id='TEST' FOR SELECT TO_CHAR(order_id), COUNT(order_id) FROM ordersline GROUP BY TO_CHAR(order_id); Query Cost Rows Bytes Sort ------------------------------ ---- ------ ------- ------- SELECT STATEMENT on 3708 172304 861520 SORT GROUP BY on 3708 172304 861520 8610000 INDEX FAST FULL SCAN on XFK_ORDERLINE_ORDER 4 540827 2704135
When using functions in SQL statements it is best to keep the functions away from any columns involving index matching.
| < Day Day Up > |
|