Filter Factors | DB2 Developers Guide (5th Edition)

< Day Day Up >

Do you recall the discussion of filter factors from Chapter 1, "The Magic Words"? The optimizer calculates the filter factor for a query's predicates based on the number of rows that will be filtered out by the predicates.

The filter factor is a ratio (a number between 0 and 1) that estimates I/O costs. It is an estimate of the proportion of table rows for which a predicate is true. The formulas used by the optimizer to calculate the filter factor are proprietary IBM information, but Table 21.2 provides rough estimates for the formulas based on the type of predicate. These formulas assume uniform data distribution, so they should be used only when determining the filter factor for static SQL queries or queries on tables having no distribution statistics stored in the DB2 Catalog. The filter factor for dynamic SQL queries is calculated using the distribution statistics, in SYSCOLDIST , if available.

Table 21.2. Filter Factor Formulas

Predicate Type	Formula	Default FF
`COL = value`	1/ `FIRSTKEYCARDF [COL]`	.04
`COL = :host-var`	1/ `FIRSTKEYCARDF [COL]`	.04
`COL <> value`	1-(1/ `FIRSTKEYCARDF [COL]` )	.96
`COL <> :host-var`	1-(1/ `FIRSTKEYCARDF [COL]` )	.96
`COL IN (list of values)`	(list size )?(1/ `FIRSTKEYCARDF [COL]` )	.04?(list size)
`COL NOT IN` `(list of values)`	1-[(list size)?(1/ `FIRSTKEYCARDF [COL]` )]	1-[.04?(list size)]
`COL IS NULL`	1/ `FIRSTKEYCARDF [COL]`	.04
`COL IS NOT NULL`	1-(1/ `FIRSTKEYCARDF [COL]` )	.96
`COLA = COLB`	smaller of 1/ `FIRSTKEYCARDF [COLA]` 1/ `FIRSTKEYCARDF [COLB]`	.04
`COLA <> COLB`	smaller of 1/ `FIRSTKEYCARDF [COLA]` 1/ `FIRSTKEYCARDF [COLB]`	.96
`COL < value`	( `LOW2KEY` -value)/( `HIGH2KEY` - `LOW2KEY` )	.33
`COL <= value`	( `LOW2KEY` -value)/( `HIGH2KEY` - `LOW2KEY` )	.33
`COL > value`	( `LOW2KEY` -value)/( `HIGH2KEY` - `LOW2KEY` )	.33
`COL > value`	( `HIGH2KEY` -value)/( `HIGH2KEY` - `LOW2KEY` )	.33
`COL >= value`	( `HIGH2KEY` -value)/( `HIGH2KEY` - `LOW2KEY` )	.33
`COL < value`	( `HIGH2KEY` -value)/( `HIGH2KEY` - `LOW2KEY` )	.33
`COL BETWEEN val1 AND val2`	( `val2-val1` )/( `HIGH2KEY` - `LOW2KEY` )	.01
`COL LIKE 'char%'`	( `val2-val1` )/( `HIGH2KEY` - `LOW2KEY` )	.01
`COL LIKE '%char'`	1	1
`COL LIKE '_char'`	1	1
`COL op ANY` `(non-corr. sub)`	- - -	.83
`COL op ALL (non-corr. sub)`	- - -	.16
`COL IN (non-corr. sub)`	FF( noncor . subquery)	.90
`COL NOT IN (non-corr. sub)`	1-FF(noncor. subquery)	.10
`predicate1 AND predicate2`	Multiply the filter factors of the two predicates, FF1?FF2
`predicate1 OR predicate2`	Add filter factors and subtract the product, FF1+FF2-(FF1?FF2)

For example, consider the following query:

 SELECT   EMPNO, LASTNAME, SEX FROM     DSN8810.EMP WHERE    WORKDEPT = 'A00';

The column has an index called DSN8810.XEMP2 . If this query were being optimized by DB2, the filter factor for the WORKDEPT predicate would be calculated to estimate the number of I/Os needed to satisfy this request.

Using the information in Table 21.2, you can see that the filter factor for this predicate is 1/ FIRSTKEYCARDF . So, if the value of the FIRSTKEYCARDF column in the SYSIBM.SYSINDEXES DB2 Catalog table is 9 , the filter factor for this query is 1/9, or .1111. In other words, DB2 assumes that approximately 11% of the rows from this table will satisfy this request.

You might be wondering how this information helps you. Well, with a bit of practical knowledge, you can begin to determine how your SQL statements will perform before executing them. If you remember nothing else about filter factors, remember this: The lower the filter factor, the lower the cost and, in general, the more efficient your query will be.

Therefore, you can easily see that as you further qualify a query with additional predicates, you make it more efficient because more data is filtered out and I/O requirements diminish.

< Day Day Up >