Oracle currently offers two completely different optimizers, the rule-based optimizer (RBO) and the cost-based optimizer (CBO), and the methods for tuning on each differ . The RBO is Oracle's original automated optimizer, back from the days of Oracle Version 6 and earlier. By rule-based , Oracle means that the optimizer uses only fixed properties of the tables, indexes, and SQL to guess an optimum execution plan from a set of simple rules of thumb (or heuristics ) built into the automated optimizer. The RBO uses no data about the sizes of the tables or indexes, or about the distribution of data within those objects. It does use data on the fixed properties of the indexes: whether they are unique, which columns they cover, in which order, and how well those match up with the most selective-looking filter conditions and joins in the SQL. As tables grow and data distributions change, the RBO should go right on delivering the same plan indefinitely, as long as you don't alter the indexes (for example, from unique to nonunique ) or change the table structure (for example, from an ordinary table to a partitioned table). However, at some future time, perhaps even in Oracle Database 10 g , Oracle will drop all support for the rule-based optimizer, and cost-based optimization will become your only choice. Since Oracle7, the RBO has been even more stable than before, because Oracle chose to freeze the RBO code beginning with Oracle7, except for rare, slight changes necessary to deliver functionally correct (as opposed to necessarily optimum) results. Therefore, an execution plan that is correct on the RBO today will likely stay unchanged until Oracle drops the RBO altogether. This is appealing from the perspective of stability, although the dark side of this stability is that the execution plans never get any better either. Execution plans on the RBO never change to adapt to changing data distributions, and this is often cited as an argument to switch to the CBO. However, in my own experience, data-distribution change is the least of the reasons for cost-based optimization. In over 10 years , I have yet to find a single case in which it was important to use different execution plans for different real-world data distributions with the same SQL.
Another argument cited in favor of the CBO is that it can deliver parallel execution plans , plans that can bring multiple processors to bear on the SQL statement at once. I have not found this to be a compelling argument, since I have yet to find a real-world case in which the optimum SQL, with the optimum database design, required parallel execution for adequate performance. I expect some such cases exist in data-warehousing environments, which are not where most of my experience lies, I admit, but almost all cases in which parallel execution plans appear to shine are really covering up some mistake in database design, indexing, or application design, compensating for design deficiencies with horsepower. That, by itself, would not be such a bad thing; extra horsepower might be cheaper than fixing the application. However, parallel plans are usually in service of large batch processes, competing heavily for resources with online processes that are more critical to end users. Therefore, parallel plans often rob needed resources from other processes that are more critical. These are the strongest arguments against using the RBO:
That said, the RBO does a surprisingly good job; its heuristics are well designed to get along with the tiny amount of information that the RBO uses to guess the best plan. In Chapter 6, I will describe properties of what I call a robust execution plan, one that behaves well across a wide range of data distributions. The RBO almost always delivers a robust plan when the necessary indexes are available and when the developer has not prevented use of an index with some index-disabling expression, as discussed earlier in this chapter. Given the right indexes, you can almost always get the best robust plan on either optimizer, with manual tuning. With automated tuning, the biggest advantage of the CBO is that it is more resourceful when dealing with imperfect indexing and nonoptimally written SQL; more often, it delivers at least an adequate plan in these cases, without manual tuning. When more than one robust plan is possible, the CBO is also more likely to find the best robust plan, while the RBO will pick one without knowing relative costs, unless you manually tune the SQL . 4.2.1 Controlling the Choice of Oracle OptimizerIt is unrealistic to optimize Oracle queries simultaneously for both the rule-based and the cost-based optimizers. Therefore, you should understand the factors that lead Oracle to choose which optimizer it applies, so that you can control those factors and get the optimizer you choose. The RBO cannot handle certain object types and object properties that did not yet exist when Oracle froze the RBO code. However, rather than simply have its code error out, Oracle modified the RBO code just enough to let it recognize the cases it cannot handle and to have it pass those cases on to the CBO. Thus, even if you think you have set up your system for rule-based optimization, the following circumstances will absolutely force cost-based optimization:
If the tables and indexes involved in your SQL do not prevent using the RBO, Oracle chooses between the RBO and the CBO as follows :
In the last three steps of this decision cascade, Oracle chooses according to an optimizer_mode parameter, which you or your DBA sets. These are the four possible parameter values and how they affect the choice:
Here's a quick way to check the instance-level parameter for optimizer_mode : SELECT VALUE FROM V$PARAMETER WHERE NAME = 'optimizer_mode'; When you have an execution plan in PLAN_TABLE , a quick way to see whether it is cost-based is to run the following query: SELECT POSITION FROM PLAN_TABLE WHERE ID=0; This returns the cost of the entire execution plan, in arbitrary units, when the plan is cost-based. When cost is not null, you have a cost-based plan. 4.2.2 Controlling Oracle Rule-Based Execution PlansMost of the methods for controlling rule-based execution plans are the universal techniques of controlling plans, covered in the first section of this chapter. The primary Oracle-specific method of tuning under a rule-based default optimizer_mode is simply to switch modes to cost-based optimization, usually with a hint such as /*+ FIRST_ROWS */ . In other words, you can always control a plan via hints, and hints (with the exception of the /*+ RULE */ hint) in a statement cause Oracle to use the CBO for that statement. However, if you prefer not to use cost-based optimization, thus precluding the use of hints, one RBO-specific technique remains: in the FROM clause, list tables and their aliases in exactly the reverse order from the join order you want. This usually gives enough control of the join order, without using the techniques described earlier in Section 4.1.4. In particular, eligible, unique joins toward primary keys happen in the reverse order they are listed in the FROM clause, without changing the join conditions. For example, consider this query: SELECT /*+ RULE */ E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Locations LM, Employees M, Locations LE, Employees E WHERE E.Last_Name = 'Johnson' AND E.Manager_ID=M.Employee_ID AND E.Location_ID=LE.Location_ID AND M.Location_ID=LM.Location_ID AND LE.Description='Dallas'; Unlike the earlier version of this query in Chapter 3, which had the wrong order in the FROM clause, you now get the correct join order. In this correct execution plan, E joins to LE before joining to M or LM , as shown by the plan output: SQL> @ex PLAN ---------------------------------------------------------------------- SELECT STATEMENT NESTED LOOPS NESTED LOOPS NESTED LOOPS TABLE ACCESS BY INDEX ROWID 4*EMPLOYEES INDEX RANGE SCAN EMPLOYEE_LAST_NAME TABLE ACCESS BY INDEX ROWID 3*LOCATIONS INDEX UNIQUE SCAN LOCATION_PKEY TABLE ACCESS BY INDEX ROWID 2*EMPLOYEES INDEX UNIQUE SCAN EMPLOYEE_PKEY TABLE ACCESS BY INDEX ROWID 1*LOCATIONS INDEX UNIQUE SCAN LOCATION_PKEY When the RBO otherwise has no preference based on the conditions and indexes, the RBO joins tables by working from right to left in the FROM clause. However, this method offers only limited control by itself, because the RBO follows its other rules of thumb before considering the join order in the FROM clause. For example, the RBO always chooses to perform unique indexed reads and joins before doing indexed range scans , when it can. 4.2.3 Controlling Oracle Cost-Based Execution PlansThere are two main parts involved in tuning on the Oracle CBO:
4.2.3.1 Oracle cost-based optimizer prerequisitesProving that a little knowledge is a dangerous thing, cost-based optimizers often do a terrible job if they do not have statistics on all the tables and indexes involved in the query. It is therefore imperative to maintain statistics on tables and indexes reliably, including regenerating statistics whenever table volumes change much or tables or indexes are rebuilt. It is safest to regenerate statistics periodically, during times that load is relatively quiet, such as nightly or at least weekly. The best way to generate and update statistics is with Oracle's DBMS_STATS package, documented at length in Oracle8i Supplied PL/SQL Packages Reference and Oracle9i Supplied PL/SQL Packages and Types Reference . Here is a simple example of using DBMS_STATS to generate statistics for a whole schema, Appl_Prod , sampling 10% of the data in the larger tables and cascading statistics collection to the indexes: BEGIN DBMS_STATS.GATHER_SCHEMA_STATS ('Appl_Prod',10, CASCADE => TRUE); END; / Often, queries include conditions on highly skewed distributions, such as conditions on special types, codes, or flags, when these columns have only a few values. Normally, the CBO evaluates selectivity of a condition based on the assumption that all nonnull values of a column are equally selective. This assumption generally works well for foreign and primary keys that join business entities, but it is inaccurate when the columns have permanent special meanings and certain meanings apply much more rarely than others. For example, in an Orders table, you might have a Status_Code column with three possible values: ' CL ' for closed (i.e., fulfilled) orders, ' CA ' for cancelled orders, and ' OP ' for open orders. Most orders, by far, would be fulfilled, once the application has been running for a few months. A steady, significant fraction of orders would end up cancelled, so that value would also eventually point to a large list of orders. However, as long as the business keeps up with incoming orders, the number of open orders would remain moderate and steady, even as data accumulates for years. Quite early, a condition specifying Status_Code='OP ' would be selective enough to justify indexed access, if you had an index with that leading column, and it is important to enable the optimizer to realize this fact, preferably without a lot of manual tuning. Enabling the CBO to recognize when a column is selective requires two things:
Oracle stores special statistics on distribution when you request them, based on sorting the rows for a column and arranging the sorted list into a specified number of buckets that each contain the same number of rows. Since Oracle already knows that the range each bucket holds has the same number of rows, Oracle needs to know only the value-range endpoints in each bucket. In the current example, with 20 buckets, the first bucket might hold the range ' CA ' to ' CA ', and the second bucket might hold the range ' CA ' to ' CL '. The next 17 buckets would hold the most common range, ' CL ' to ' CL '. The last bucket would hold the range ' CL ' to ' OP ', which includes the rarest value. From this, Oracle can deduce that the selectivity of the column is 5-10% for the value ' CA ', 85-95% for the value ' CL ', and 0-5% for the value ' OP '. Since you want the optimizer to know more closely how selective the ' OP ' value is, you would choose more buckets than this, perhaps the maximum of 254. (Oracle compresses the bucket information when so few values apply, so the large number of buckets should be inexpensive.) To create 254 buckets for the example case, in the schema owned by Appl_Prod , use this: BEGIN DBMS_STATS.GATHER_TABLE_STATS ('Appl_Prod','Orders', METHOD_OPT => 'FOR COLUMNS SIZE 254 Status_Code'); END; / Generate the histogram statistics after you generate the general table statistics, because table-statistics generation deletes earlier histogram statistics. 4.2.3.2 General hint syntaxOracle uses hints for manual control of cost-based optimization. Syntactically, these hints take the form of comments, like /*+ <Hint_String> */ , immediately following the SELECT keyword. Oracle recognizes that this syntax encloses a hint, not a comment, by the + at the beginning and by the location of the hint, which must immediately follow SELECT . However, since these are comments from the point of view of standard SQL syntax, they do not interfere with parsing the SQL if the SQL is also to be executed on non-Oracle databases.
Each hint directly affects only the SELECT block that has the comment. Thus, to control the order of joins and index choices within a subquery, place the hint after the SELECT keyword that begins the subquery. But to affect the outer-query order of joins and index choices, place a hint immediately after the outer-query SELECT . 4.2.3.3 Approaches to tuning with hintsThere are two basic extremes involved in tuning with hints:
If you are confident in your chosen execution plan, as you should be if you apply the methods I describe later in this book, there is little reason to hold back from fully specifying that plan. I have yet to find a case where a well- chosen , robust execution plan needed to evolve to handle new data distributions or new database features. On the other hand, it is easy for SQL with a partially restricting set of hints to go wrong, especially if some table or index loses its statistics. When the CBO chooses incorrectly, the error that made the CBO choose incorrectly will likely propagate over the entire plan. For example, consider this query: SELECT E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Locations LM, Employees M, Locations LE, Employees E WHERE E.Hire_Date > :1 AND E.Manager_ID=M.Employee_ID AND E.Location_ID=LE.Location_ID AND M.Location_ID=LM.Location_ID At parse time, when the optimizer does its work, it cannot know that the bind variable :1 will likely be set to a value in the current week, so it makes a conservative assumption about the selectivity of that condition on Hire_Date . Having made that assumption, it might not only forego using an index on Hire_Date (depending on the data distribution), but it might also further calculate that it will hit most of the rows of all the joined tables as well, and the CBO might choose full table scans with hash joins on them. Even if you instruct the CBO to use the index on Hire_Date , it still retains its initial assumption that the driving condition is unselective, and will likely retain its poor choices for the other joins and table-access methods. This is really no flaw in the optimizer; it cannot know what the application developer knows about the likely values to be assigned to the bind variable. However, the consequence is that, if you need to be any more specific than just specifying ALL_ROWS or FIRST_ROWS , chances are relatively high that the optimizer will need help across the board, to correct for some incorrect assumption somewhere.
4.2.3.4 Table-access hintsThese are the main hints to control table-access methods:
The INDEX and FULL hints are common and easy to use. The INDEX_DESC hint is useful only rarely, but it is occasionally vital to use. For example, if you want to know all about the last employee hired in April, you might use this query: SELECT * FROM Employees E WHERE Hire_Date>=TO_DATE('2003-04-01','YYYY-MM-DD') AND Hire_Date< TO_DATE('2003-05-01','YYYY-MM-DD') ORDER BY Hire_Date DESC You'll find the most recently hired employee you want at the top of the list of rows returned by this query. To avoid reading all the data for other employees hired in April, you might think to add a condition AND ROWNUM=1 to the query. However, this sometimes will not yield the desired result, because (depending on the data) Oracle will sometimes apply that condition before performing the descending sort . If Oracle uses a full table scan, it will return the first employee hired in April it finds in the table, likely the least recently hired. If it uses a simple index range scan on an index on Hire_Date , it will begin, as range scans generally do by default, at the low end of the index range, returning the first employee hired in April. However, the INDEX_DESC hint, with the index Employee_Hire_Date on the Hire_Date column, neatly solves the problem, returning the desired row with just a single logical I/O to the table: SELECT /*+ INDEX_DESC(E Employee_Hire_Date) */ * FROM Employees E WHERE Hire_Date>=TO_DATE('2003-04-01','YYYY-MM-DD') AND Hire_Date< TO_DATE('2003-05-01','YYYY-MM-DD') AND ROWNUM=1 Note that I removed the explicit ORDER BY clause, since it gives the false impression that it has effect, given the condition on ROWNUM .
There are several other table-access hints that I have not described in this section, but I have never found them necessary. 4.2.3.5 Execution-order hintsThese are the main hints to control the order of execution for joins and subqueries:
The ORDERED and LEADING hints are common and straightforward to use. The PUSH_SUBQ hint is occasionally useful. When it comes to subqueries, Oracle offers hint-based control only at the two extremes: executing subqueries as early or as late as possible. However, you can gain full control of when subqueries execute if you combine the PUSH_SUBQ hint with the earlier methods of postponing correlated joins. For example, consider the earlier query: SELECT ... FROM Orders O, Customers C, Regions R WHERE O.Status_Code='OP' AND O.Customer_ID=C.Customer_ID AND C.Customer_Type_Code='GOV' AND C.Region_ID=R.Region_ID AND EXISTS (SELECT NULL FROM Order_Details OD WHERE O.Order_ID+0*C.Customer_ID=OD.Order_ID AND OD.Shipped_Flag='Y') Without a hint, Oracle would execute the EXISTS check after joining all three outer-query tables. The point of the expression O.Order_ID+0*C.Customer_ID was to delay the EXISTS check until after the join to C , but not after the join to R . However, without any hint, all EXISTS conditions are automatically delayed until after all outer-query joins. To force the EXISTS condition to execute between the joins to C and R , use both the hint and the correlating -join-postponing expression: SELECT /*+ PUSH_SUBQ */ ... FROM Orders O, Customers C, Regions R WHERE O.Status_Code='OP' AND O.Customer_ID=C.Customer_ID AND C.Customer_Type_Code='GOV' AND C.Region_ID=R.Region_ID AND EXISTS (SELECT NULL FROM Order_Details OD WHERE O.Order_ID+0*C.Customer_ID=OD.Order_ID AND OD.Shipped_Flag='Y') Now, the PUSH_SUBQ hint causes Oracle to execute the EXISTS condition as early as possible, and the expression O.Order_ID+0*C.Customer_ID ensures that "as early as possible" doesn't come until after the join to C . 4.2.3.6 Join-method hintsThese are the main hints to control the join methods:
4.2.3.7 ExampleHere's an example to illustrate the most frequently useful hints to yield complete control of an execution plan. I'll force the join order, the access method to every table, and the join method to every table. Consider the earlier example tuned for the RBO, shown at the end of Section 4.2.2. To fully force the same plan, but substitute a hash join for the first nested-loops join, with the employee locations read through the index on Description , use this query: SELECT /*+ ORDERED USE_NL(M LM) USE_HASH(LE) INDEX(E Employee_Last_Name) INDEX(LE Location_Description) INDEX(M Employee_Pkey) INDEX(LM Location_Pkey) */ E.First_Name, E.Last_Name, E.Salary, LE.Description, M.First_Name, M.Last_Name, LM.Description FROM Employees E, Locations LE, Employees M, Locations LM WHERE E.Last_Name = 'Johnson' AND E.Manager_ID=M.Employee_ID AND E.Location_ID=LE.Location_ID AND M.Location_ID=LM.Location_ID AND LE.Description='Dallas' This results in the execution plan, as shown here: SQL> @ex PLAN ---------------------------------------------------------------------- SELECT STATEMENT NESTED LOOPS NESTED LOOPS HASH JOIN TABLE ACCESS BY INDEX ROWID 1*EMPLOYEES INDEX RANGE SCAN EMPLOYEE_LAST_NAME TABLE ACCESS BY INDEX ROWID 2*LOCATIONS INDEX RANGE SCAN LOCATION_DESCRIPTION TABLE ACCESS BY INDEX ROWID 3*EMPLOYEES INDEX UNIQUE SCAN EMPLOYEE_PKEY TABLE ACCESS BY INDEX ROWID 4*LOCATIONS INDEX UNIQUE SCAN LOCATION_PKEY |