Missing or stale optimizer statistics are often the cause of suboptimal query performance. The cost-based optimizer uses statistics such as the cardinality of the table, number of distinct values of a column, and the data distribution to determine the cost of an access path. The cost is a measure of how much I/O, CPU time, and memory will be required to execute the query. To use the cost-based optimizer effectively, statistics describing the cardinality and data distribution must be collected for each table, index, and materialized view.
Statistics should be collected using the DBMS_STATS package. (Note that previous releases of Oracle had a command called ANALYZE, which is now deprecated.) Statistics either may be gathered by reading all rows or estimated by reading only a sample of rows or blocks. The DBMS_STATS package provides procedures to collect statistics at a database, schema, or table level and also at the level of a table partition.
To collect statistics on all tables and indexes in the EASYDW schema issue the following query:
SQL> EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS ('EASYDW');
If you are just interested in a specific table-PURCHASES, for example-you can use the following statement. This will gather statistics on the table, its columns, and indexes.
SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS ('EASYDW', 'PURCHASES');
You can see the table and column statistics by querying the user_tables and user_tab_cols dictionary views, as follows:
-- table statistics SQL> SELECT NUM_ROWS, BLOCKS, AVG_ROW_LEN, LAST_ANALYZED FROM USER_TABLES WHERE TABLE_NAME = 'PURCHASES'; NUM_ROWS BLOCKS AVG_ROW_LEN LAST_ANAL ---------- ---------- ----------- --------- 94113 623 46 07-AUG-02 -- column statistics SQL> SELECT NUM_DISTINCT, NUM_NULLS, AVG_COL_LEN FROM USER_TAB_COLS WHERE TABLE_NAME = 'PURCHASES'; COLUMN_NAME NUM_DISTINCT NUM_NULLS AVG_COL_LEN ------------------------------ ------------ ---------- ----------- PRODUCT_ID 162 0 7 TIME_KEY 20 0 8 CUSTOMER_ID 40 0 9 PURCHASE_DATE 11 0 8 PURCHASE_TIME 492 0 4 PURCHASE_PRICE 1032 0 5 SHIPPING_CHARGE 3 0 4 TODAY_SPECIAL_OFFER 2 0 2
To collect statistics for the index, CUSTOMER_GENDER_INDEX, use the following query.
SQL> EXECUTE DBMS_STATS.GATHER_INDEX_STATS('easydw', 'customer_gender_index');
Statistics should be gathered after data is loaded and again whenever changes made to the data are likely to have altered the distribution. This ensures that the cost-based optimizer has up-to-date data to base its decision upon. When partitioned tables are used, only the newest partition or subpartition to which rows have been added needs to be analyzed for statistics. For example, if a new partition for July 2002 was added to the purchases table, statistics can be collected for the newly added partition, as follows:
SQL> ALTER TABLE purchases ADD PARTITION purchases_jul2002 values less than (TO_DATE('31-07-2002', 'DD-MM-YYYY')); SQL> INSERT /*+APPEND */ INTO purchases ... SQL> EXECUTE DBMS_STATS.GATHER_TABLE_STATS('easydw', 'purchases', 'purchases_jul2002');
To automate gathering statistics for a table enable monitoring of DML activity for that table using the MONITORING keyword in the CREATE or ALTER TABLE command. Then use the GATHER STALE option in the GATHER_DATABASE_STATS procedure to gather statistics for just those tables whose contents have changed significantly. This procedure can be called in a job that is scheduled periodically, so that statistics are kept up-to-date.
To turn on automatic DML monitoring on the purchases table, you would issue the following query.
SQL> ALTER TABLE purchases MONITORING;
Alternatively, monitoring can be turned on for an entire schema or database using the ALTER_SCHEMA_TAB_MONITORING or ALTER_DATABASE_TAB_MONITORING procedures. For instance, to turn DML monitoring on for the entire EASYDW schema:
SQL> EXECUTE DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING('easydw', TRUE);
To collect statistics on only the tables that have a significant number of changes issue the following:
EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS ('easydw', options=>'GATHER STALE');
In Oracle 9i, the cost-based optimizer was enhanced to consider system statistics in determining the execution plan. System statistics such as CPU speed and number of I/Os per second give an indication of the resource availability in the system. This allows the optimizer to make smarter decisions when several users are contending for the same resources. System statistics can be collected using the DBMS_STATS.GET_SYSTEM_STATS procedure. As with all statistics, it is important that the system statistics give the optimizer an accurate picture of the system load. Statistics could be collected for different periods during the day-for example, during normal business hours and during after-hours reporting activities. Statistics can be saved using the EXPORT_SYSTEM_STATS procedure and later imported into a database using the IMPORT_SYSTEM_STATS procedure.
Oracle 9i Release 2 introduced a new feature known as dynamic sampling, which is used when statistics on a table are absent. In dynamic sampling, the optimizer automatically collects statistics by sampling the data before optimizing the query. Dynamic sampling is useful for queries whose tables do not have statistics collected or when the statistics are too old. To enable dynamic sampling, the initialization parameter, OPTIMIZER_DYNAMIC_SAMPLING, must be set to a value greater than 1. The optimizer will perform dynamic sampling if there is more than one table in the query and some of the tables have no statistics and no indexes. Note that dynamic sampling will incur some overhead during query optimization and, hence, must be used with care. In a data warehouse, it is always a good practice to collect statistics on all your tables as part of your housekeeping procedures and to keep them current.