Chapter 17. Cost-Based Optimizers | SQL Performance Tuning

Supposedly, a rule-based optimizer (RBO) differs from a cost-based optimizer (CBO). Consider this SQL statement:

 SELECT * FROM Table1   WHERE column2 = 55

Assume that column2 is an indexed, non-unique column. A rule-based optimizer will find column2 in the system catalog and discover that it is indexed, but not uniquely indexed. The RBO then combines this data with the information that the query uses the equals operator. A common assumption in the field of optimization is that " = <literal> " search conditions will retrieve 5% of all rows. (In contrast, the assumption for greater-than conditions is that they will retrieve 25% of all rows.) That is a narrow search, and usually it's faster to perform a narrow search with a B-tree rather than scan all rows in the table. Therefore the rule-based optimizer makes a plan: find matching rows using the index on column2 .

Notice that the rule-based optimizer is using a non-volatile datum (the existence of an index) and a fixed assumption (that equals searches are narrow).

A cost-based optimizer can go further. Suppose the system catalog contains three additional pieces of information: (1) that there are 100 rows in Table1 , (2) that there are two pages in Table1 , and (3) that the value 55 appears 60 times in the index for column2 . Those facts change everything. The equals operation will match on 60% of the rows, so it's not a narrow search. And the whole table can be scanned using two page reads, whereas an index lookup would take three page reads (one to lookup in the index, two more to fetch the data later). Therefore the cost-based optimizer makes a different plan: find matching rows using a table scan.

Notice that the cost-based optimizer is using volatile data (the row and column values that have been inserted) and an override (that the contents are more important than the fixed assumptions).

In other words, a cost-based optimizer is a rule-based optimizer that has additional, volatile information available to it so that it can override the fixed assumptions that would otherwise govern its decisions. The terminology causes an impression that one optimizer type is based on rules while the other is based on cost. That's unfortunate because both optimizer types use rules and both optimizer types have the goal of calculating cost. The reality is that cost-based is an extension of rule-based, and a better term would have been something like "rule-based++."

Most vendors claim that their DBMSs have cost-based optimizers, as you can see from Table 17-1. The claims don't mean much by themselves . What's important is whether the optimizer estimates cost correctly and how it acts on the estimate. In this chapter, we'll look at the actions DBMSs take to fulfill their claims.

Table 17-1. Cost-Based Optimizers

	Claims to be CBO	"Explains" the Access Plan	"Updates" Statistics for the Optimizer
IBM	Yes	EXPLAIN	RUNSTATS
Informix	Yes	SET EXPLAIN	UPDATE STATISTICS
Ingres	Yes	EXECUTE QEP	optimizedb utility
InterBase	Yes	SELECT PLAN	SET STATISTICS
Microsoft	Yes	EXPLAIN	UPDATE STATISTICS
MySQL	No	EXPLAIN	ANALYZE TABLE
Oracle	Yes	EXPLAIN PLAN FOR	ANALYZE
Sybase	Yes	SET SHOWPLAN ON	UPDATE STATISTICS

Notes on Table 17-1:

Claims to be CBO column

This column is "Yes" if the DBMS's documentation makes the claim that it operates with a cost-based optimizer.
"Explains" the Access Plan column

This column shows the non-standard statement provided by the DBMS so that you can examine the access plan the optimizer will use to resolve an SQL statement. For example, if you want to know how Oracle will resolve a specific SELECT statement, just execute an EXPLAIN PLAN FOR statement for the SELECT.
"Updates" Statistics for the Optimizer column

This column shows the non-standard statement or utility the DBMS provides so that you can update volatile information, or statistics, for the optimizer. For example, if your DBMS is MySQL and you've just added many rows to a table and want the optimizer to know about them, just execute an ANALYZE TABLE statement for the table.