Supposedly, a rule-based optimizer (RBO) differs from a cost-based optimizer (CBO). Consider this SQL statement: SELECT * FROM Table1 WHERE column2 = 55 Assume that column2 is an indexed, non-unique column. A rule-based optimizer will find column2 in the system catalog and discover that it is indexed, but not uniquely indexed. The RBO then combines this data with the information that the query uses the equals operator. A common assumption in the field of optimization is that " = <literal> " search conditions will retrieve 5% of all rows. (In contrast, the assumption for greater-than conditions is that they will retrieve 25% of all rows.) That is a narrow search, and usually it's faster to perform a narrow search with a B-tree rather than scan all rows in the table. Therefore the rule-based optimizer makes a plan: find matching rows using the index on column2 . Notice that the rule-based optimizer is using a non-volatile datum (the existence of an index) and a fixed assumption (that equals searches are narrow). A cost-based optimizer can go further. Suppose the system catalog contains three additional pieces of information: (1) that there are 100 rows in Table1 , (2) that there are two pages in Table1 , and (3) that the value 55 appears 60 times in the index for column2 . Those facts change everything. The equals operation will match on 60% of the rows, so it's not a narrow search. And the whole table can be scanned using two page reads, whereas an index lookup would take three page reads (one to lookup in the index, two more to fetch the data later). Therefore the cost-based optimizer makes a different plan: find matching rows using a table scan. Notice that the cost-based optimizer is using volatile data (the row and column values that have been inserted) and an override (that the contents are more important than the fixed assumptions). In other words, a cost-based optimizer is a rule-based optimizer that has additional, volatile information available to it so that it can override the fixed assumptions that would otherwise govern its decisions. The terminology causes an impression that one optimizer type is based on rules while the other is based on cost. That's unfortunate because both optimizer types use rules and both optimizer types have the goal of calculating cost. The reality is that cost-based is an extension of rule-based, and a better term would have been something like "rule-based++." Most vendors claim that their DBMSs have cost-based optimizers, as you can see from Table 17-1. The claims don't mean much by themselves . What's important is whether the optimizer estimates cost correctly and how it acts on the estimate. In this chapter, we'll look at the actions DBMSs take to fulfill their claims. Table 17-1. Cost-Based Optimizers
Notes on Table 17-1:
|