6.2 Using Functions | Oracle High Performance Tuning for 9i and 10g

< Day Day Up >

The most relevant thing to say about functions is that they should not be used where you expect an SQL statement to use an index. There are function-based indexes of course. A function-based index contains the resulting value of an expression. An index search against that function-based index will search the index for the value of the expression.

Let's take a quick look at a few specific functions.

6.2.1 The COUNT Function

For older versions of Oracle Database the COUNT function has been recommended as performing better when used in different ways. Prior to Oracle9i Database the COUNT(*) function using the asterisk was the fastest form because the asterisk option was specifically tuned to avoid any sorting. Let's take a look at each of four different methods and show that they are all the same using both the EXPLAIN PLAN command and time testing. We will use the GeneralLedger table in the Accounts schema since it has the largest number of rows.

Notice how all the query plans for all the four following COUNT function options are identical. Additionally there is no sorting on anything but the resulting single row produced by the COUNT function, the sort on the aggregate.

Using the asterisk:

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT COUNT(*) FROM generalledger;     Query                                 Cost     Rows -----------------------------------   ----   ------ 1. SELECT STATEMENT on                 382        1 2.  SORT AGGREGATE on                             1 3.   INDEX FAST FULL SCAN on        XPK_GENERALLEDGER                382   752825

Forcing the use of a unique index:

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT COUNT(generalledger_id) FROM          generalledger;     Query                                           Cost     Rows ---------------------------------------------   ----   ------ 1. SELECT STATEMENT on                           382        1 2.  SORT AGGREGATE on                                       1 3.   INDEX FAST FULL SCAN on XPK_GENERALLEDGER   382   752825

Using a constant value:

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT COUNT(1) FROM generalledger;     Query                                           Cost     Rows ---------------------------------------------   ----   ------ 1. SELECT STATEMENT on                           382        1 2.  SORT AGGREGATE on                                       1 3.   INDEX FAST FULL SCAN on XPK_GENERALLEDGER   382   752825

Using a nonindexed column:

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT COUNT(dr) FROM generalledger;     Query                                    Cost     Rows --------------------------------------   ----   ------ 1. SELECT STATEMENT on                    382        1 2.  SORT AGGREGATE on                                1 3.   INDEX FAST FULL SCAN on              382   752825       XPK_GENERALLEDGER

Now with time testing, below I have simply executed the four COUNT function options with SET TIMING set to ON in SQL*Plus. Executing these four SQL statements twice will assure that all data is loaded into memory and that consistent results are obtained.

SQL> SELECT COUNT(*) FROM generalledger;     COUNT(*) --------   752741 Elapsed: 00:00:01.01 SQL> SELECT COUNT(generalledger_id) FROM generalledger;     COUNT(GENERALLEDGER_ID) -----------------------                  752741 Elapsed: 00:00:01.01 SQL> SELECT COUNT(1) FROM generalledger;     COUNT(1) --------   752741 Elapsed: 00:00:01.01 SQL> SELECT COUNT(dr) FROM generalledger;     COUNT(DR) ---------    752741 Elapsed: 00:00:01.01

As you can see from the time tests above, the COUNT function will perform the same no matter which method is used. In the latest version of Oracle Database different forms of the COUNT function will perform identically. No form of the COUNT function is better tuned than any other. All forms of the COUNT function perform the same; using an asterisk, a constant or a column, regardless of column indexing, the primary key index is always used.

6.2.2 The DECODE Function

DECODE can be used to replace composite SQL statements using a set operator such as UNION. The Accounts Stock table has a QTYONHAND column. This column denotes how many items of a particular stock item are currently in stock. Negative QTYONHAND values indicate that items have been ordered by customers but not yet received from suppliers.

The first example below uses four full reads of the Stock table and concatenates the results together using UNION set operators.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT stock_id||' Out of Stock' FROM stock WHERE    qtyonhand <=0 UNION SELECT stock_id||' Under Stocked' FROM stock       WHERE qtyonhand BETWEEN 1 AND min-1 UNION SELECT stock_id||' Stocked' FROM stock WHERE qtyonhand BETWEEN min AND max UNION SELECT stock_id||' Over Stocked' FROM stock       WHERE qtyonhand > max;     Query                            Pos    Cost   Rows    Bytes ------------------------------   ---    ----   ----   ------ SELECT STATEMENT on               12      12    123     1543  SORT UNIQUE on                    1      12    123     1543   UNION-ALL on                     1    TABLE ACCESS FULL on STOCK      1       1      4       32    TABLE ACCESS FULL on STOCK      2       1      1       11    TABLE ACCESS FULL on STOCK      3       1     28      420    TABLE ACCESS FULL on STOCK      4       1     90     1080

This second example replaces the UNION set operators and the four full table scan reads with a single full table scan using nested DECODE functions. DECODE can be used to improve performance.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT stock_id||' '||       DECODE(SIGN(qtyonhand)             ,-1,'Out of Stock',0,'Out of Stock'             ,1,DECODE(SIGN(qtyonhand-min)                    ,-1,'Under Stocked',0,'Stocked'                    ,1,DECODE(sign(qtyonhand-max)                         ,-1,'Stocked',0,'Stocked'                           ,1,'Over Stocked'                    )             )       ) FROM stock;      Query                            Pos    Cost   Rows    Bytes ------------------------------   ---    ----   ----   ------ SELECT STATEMENT on                1       1    118     1770 TABLE ACCESS FULL on STOCK         1       1    118     1770

Using the DECODE function as a replacement for multiple query set operators is good for performance but should only be used in extreme cases such as the UNION clause joined SQL statements shown previously.

6.2.3 Datatype Conversions

Datatype conversions are a problem and will conflict with existing indexes unless function-based indexes are available and can be created. Generally, if a function is executed in a WHERE clause, or anywhere else that can utilize an index, a full table scan is likely. This leads to inefficiency. There is some capability in Oracle SQL for implicit datatype conversion but often use of functions in SQL statements will cause the Optimizer to miss the use of indexes and perform poorly.

The most obvious datatype conversion concerns dates. Date fields in all the databases I have used are stored internally as a Julian number. A Julian number or date is an integer value from a database-specific date measured in seconds. When retrieving a date value in a tool such as SQL*Plus there is usually a default date format. The internal date value is converted to that default format. The conversion is implicit, automatic, and transparent.

SELECT SYSDATE, TO_CHAR(SYSDATE,'J') "Julian" FROM DUAL;     SYSDATE            Julian ------------   ---------- 03-MAR-03         2452702

Now for the sake of demonstration I will create an index on the GeneralLedger DTE column.

CREATE INDEX ak_gl_dte ON GENERALLEDGER(DTE);

Now obviously it is difficult to demonstrate an index hit with a key such as this because the date is a datestamp as well as a simple date. A simple date format such as MM/DD/YYYY excludes a timestamp. Simple dates and datestamps (timestamps) are almost impossible to match. Thus I will use SYSDATE in order to avoid a check against a simple formatted date. Both the GeneralLedger DTE column and SYSDATE are timestamps since the date column in the table was created using SYSDATE-generated values. We are only trying to show Optimizer query plans without finding rows.

The first example hits the new index I created and has a very low cost.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM generalledger WHERE dte = SYSDATE;     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                      2      593     15418  TABLE ACCESS BY INDEX ROWID on     GENERALLEDGER                         2      593     15418   INDEX RANGE SCAN on AK_GL_DTE          1      593     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM generalledger       WHERE TO_CHAR(dte, 'YYYY/MM/DD') = '2002/08/21';

This second example does not hit the index because the TO_CHAR datatype conversion is completely inconsistent with the datatype of the index. As a result the cost is much higher.

Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                    493     7527    195702  TABLE ACCESS FULL on GENERALLEDGER    493     7527    195702

Another factor to consider with datatype conversions is making sure that datatype conversions are not placed onto columns. Convert literal values not part of the database if possible. In order to demonstrate this I am going to add a zip code column to my Supplier table, create an index on that zip code column and regenerate statistics for the Supplier table. I do not need to add values to the zip code column to prove my point.

ALTER TABLE supplier ADD(zip NUMBER(5)); CREATE INDEX ak_sp_zip ON supplier(zip); ANALYZE TABLE supplier COMPUTE STATISTICS;

Now we can show two examples. The first uses an index because there is no datatype conversion on the column in the table and the second reads the entire table because the conversion is on the column.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM supplier WHERE zip = TO_NUMBER('94002');     Query                                   Cost     Rows     Bytes -------------------------------------   ----   ------   ------- SELECT STATEMENT on                        1        1       142  TABLE ACCESS BY INDEX ROWID on SUPPLIER   1        1       142   INDEX RANGE SCAN on AK_SP_ZIP            1        1     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM supplier WHERE TO_CHAR(zip) = '94002';     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                     13        1       142  TABLE ACCESS FULL on SUPPLIER          13        1       142

Oracle SQL does not generally allow implicit type conversions but there is some capacity for automatic conversion of strings to integers, if a string contains an integer value. Using implicit type conversions is a very bad programming practice and is not recommended. A programmer should never rely on another tool to do their job for them. Explicit coding is less likely to meet with potential errors in the future.

It is better to be precise since the computer will always be precise and do exactly as you tell it to do. Implicit type conversion is included in Oracle SQL for ease of programming. Ease of program coding is a top-down application to database design approach, totally contradictory to database tuning. Using a database from the point of view of how the application can most easily be coded is not favorable to eventual production performance. Do not use implicit type conversions. As can be seen in the following examples implicit type conversions do not appear to make any difference to Optimizer costs.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM supplier WHERE supplier_id = 3801;     Query                                      Cost     Rows     Bytes ----------------------------------------   ----   ------   ------- 1. SELECT STATEMENT on                        2        1       142 2.  TABLE ACCESS BY INDEX ROWID on SUPPLIER   2        1       142 3.   INDEX UNIQUE SCAN on XPK_SUPPLIER        1     3874     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM supplier WHERE supplier_id = '3801';     Query                                      Cost     Rows     Bytes ----------------------------------------   ----   ------   ------- 1. SELECT STATEMENT on                        2        1       142 2.  TABLE ACCESS BY INDEX ROWID on SUPPLIER   2        1       142 3.   INDEX UNIQUE SCAN on XPK_SUPPLIER        1     3874

In short, try to avoid using any type of data conversion function in any part of an SQL statement which could potentially match an index, especially if you are trying to assist performance by matching appropriate indexes.

6.2.4 Using Functions in Queries

Now let's expand on the use of functions by examining their use in all of the clauses of a SELECT statement.

Functions in the SELECT Statement

Firstly, let's put a datatype conversion into a SELECT statement, which uses an index. As we can see in the two examples below, use of the index is not affected by the datatype conversion placed into the SELECT statement.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT customer_id FROM customer;     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                      1     2694     10776  INDEX FAST FULL SCAN on XPKCUSTOMER     1     2694     10776     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT TO_CHAR(customer_id) FROM customer;     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                      1     2694     10776  INDEX FAST FULL SCAN on XPKCUSTOMER     1     2694     10776

Functions in the WHERE Clause

Now let's examine the WHERE clause. In the two examples below the only difference is in the type of index scan utilized. Traditionally the unique index hit produces an exact match and it should be faster. A later chapter will examine the difference between these two types of index reads.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT customer_id FROM customer WHERE          customer_id = 100;     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                      1        1         4  INDEX UNIQUE SCAN on XPKCUSTOMER        1        1         4     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT customer_id FROM customer       WHERE TO_CHAR(customer_id) = '100';     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                      1        1         4  INDEX FAST FULL SCAN on XPKCUSTOMER     1        1         4

Functions in the ORDER BY Clause

The ORDER BY clause can utilize indexing well, as already seen in this chapter, as long as WHERE clause index matching is not compromised. Let's keep it simple. Looking at the following two examples it should suffice to say that it might be a bad idea to include functions in ORDER BY clauses. An index is not used in the second query and consequently the cost is much higher.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM generalledger ORDER BY coa#;     Query                               Cost     Rows     Bytes      Sort ---------------------------------   ----   ------   -------   ------- SELECT STATEMENT on                  826   752740  19571240  TABLE ACCESS BY INDEX ROWID on GL   826   752740  19571240   INDEX FULL SCAN on XFK_GL_COA#      26   752740     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM generalledger ORDER BY TO_CHAR(coa#);     Query                            Cost     Rows     Bytes      Sort ------------------------------   ----   ------   -------   ------- SELECT STATEMENT on             19070   752740  19571240  SORT ORDER BY on               19070   752740  19571240  60474000   TABLE ACCESS FULL on    GENERALLEDGER                  493   752740  19571240

Here is an interesting twist to using the same datatype conversion in the above two examples but with the conversion in the SELECT statement and setting the ORDER BY clause to sort by position rather than using the TO_CHAR(COA#) datatype conversion. The reason why this example is lower in cost than the second example is because the conversion is done on selection and ORDER BY resorting is executed after data retrieval. In other words, in this example the ORDER BY clause does not affect the data access method.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT TO_CHAR(coa#), dte, dr cr FROM generalledger          ORDER BY 1;     Query                            Cost     Rows     Bytes      Sort ------------------------------   ----   ------   -------   ------- SELECT STATEMENT on             12937   752740  13549320  SORT ORDER BY on               12937   752740  13549320  42394000   TABLE ACCESS FULL on     GENERALLEDGER                 493   752740  13549320

Functions in the GROUP BY Clause

Using functions in GROUP BY clauses will slow performance as shown in the following two examples.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT order_id, COUNT(order_id) FROM ordersline       GROUP BY order_id;     Query                                 Cost     Rows     Bytes -----------------------------------   ----   ------   ------- SELECT STATEMENT on                     26   172304    861520  SORT GROUP BY NOSORT on                26   172304    861520   INDEX FULL SCAN on    XFK_ORDERLINE_ORDER                  26   540827   2704135     EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT TO_CHAR(order_id), COUNT(order_id) FROM         ordersline       GROUP BY TO_CHAR(order_id);     Query                            Cost     Rows     Bytes      Sort ------------------------------   ----   ------   -------   ------- SELECT STATEMENT on              3708   172304    861520  SORT GROUP BY on                3708   172304    861520   8610000   INDEX FAST FULL SCAN on       XFK_ORDERLINE_ORDER             4   540827   2704135

When using functions in SQL statements it is best to keep the functions away from any columns involving index matching.

< Day Day Up >