Bitmap Indexes | SQL Performance Tuning

All DBMSs use B-trees, but about 50% of them also offer an extra option: a bitmap index. Bitmap indexes are good for getting statistics out of large numbers of rows with static data. Specialist "warehouse database managers" like IBM Red Brick Warehouse are best in this field, but most of the large general DBMS vendors support bitmap indexes too. ^[5] In this section, we'll look at the simple bitmap indexes supported by Informix, Oracle, and Sybase (IBM bitmap indexes look quite different).

^[5] Except for Microsoft, but there are rumors that Microsoft will get into the act soon, taking some technology from Microsoft FoxPro's Rushmore technology.

A bitmap index consists merely of multiple lines of bits. A line of bits is called a bit vector (a vector is any one-dimensional array). Suppose you have a table, Table2 , with a column called sex and a bitmap index on sex . The sex column can have three possible values: M or F or NULL. Table2 's data is shown below:

Table2.sex
F
M
F
F

Because there are three possible values for the Table2.sex column, the bitmap index for the column contains three bit vectors. Each entry in a bit vector is one bit long. Because Table2 contains four rows, the length of each bit vector is four bits. A bit value of one means true . A bit value of zero means false . (If all vectors have a zero bit, the row in that position is a deleted row.) The bit vector is in order by row number, and the position of a bit in its vector matches the position of the row in the data table. Here is the bitmap index result for Table2 .

M vector	F vector	NULL vector
0 (M is false)	1 (F is true)	0 (NULL is false)
1 (M is true)	0 (F is false)	0 (NULL is false)
0 (M is false)	1 (F is true)	0 (NULL is false)
0 (M is false)	1 (F is true)	0 (NULL is false)

Thus for row #2 in Table2 the bitmap index stores three redundant statements: M is true , F is false , NULL is false . Despite the redundancy, the storage is efficient: Three 4-bit vectors are only 12 bits, or 1 ¹ ¼2 bytes. In theory, three 8,192-byte pages would be enough to store the "index entries" of 65,536 rows of data.

Suppose the size of Table2 expands to 65,536 rows, and you want to know whether any males are recorded in the table. So you do this search:

 SELECT * FROM Table2   WHERE EXISTS     (SELECT * FROM Table2        WHERE sex = 'M')

Listing 9-2 shows an assembly routine that scans the relevant index page and tells you (in theory) how long this search would take. On a Pentium, the loop shown in Listing 9-2 takes two cycles per row. A 1GHz Pentium can do one billion cycles per second. Assuming the worst casethat there are no malesa scan requires (4 cycles / 1 billion cycles per second) * (8192 bytes / 4 bytes per 32-bit word) that is, one quarter-millionth of a second, approximately. Well, anything that can scan 65,000 rows in one quarter-millionth of a second, plus the time for one page I/O, is fast. Even though no DBMS uses an optimized assembler, you can see that bitmap indexes are great for certain types of questions.

Listing 9-2 Assembly routine to scan bitmap index

 mov   ebx,offset start_of_page rept  8192/4 <       mov    edx,[ebx]        ;1 cycle       add    ebx,4            ;4 bytes tested at a time       test   edx,0ffffffffh   ;1 cycle       jnz    exists       > mov   eax,false               ;nothing found ret exists: mov   eax,true                ;something found ret

The Bottom Line: Bitmap Indexes

We think it's inevitable that every SQL DBMS will support bitmap indexes soon, so we're sure that the following tips will be handy.

Remember that B-trees are considered effective by optimizers if selectivity is greater than 0.1. If your queries use "key values" with lower selectivity than 0.1, the best way to speed them up is with bitmap indexes.

B-trees and bitmap indexes do not combine well. When you search via a B-tree, you get a list of ROWIDs in index-key order. When you search via a bitmap, you get a list of bits in ROWID order. Because the lists differ in structure, ANDs, ORs, and NOTs require extra conversion steps.

Assembly-language instructions for AND, OR, and NOT all take only two cycles for each 32-bit word in a bit vector. However, no fast assembly-language instruction tells you "how many bits are on in the word" or "which bit is on in the word." Therefore a DBMS is fast at Boolean logic but relatively slow at finding particular rows.

Bitmap indexes are particularly useful for queries that contain [NOT] EXISTS, UNIQUE, or GROUP BY.

Bitmap indexes are occasionally used for joins, but that's part of an internal process that the application programmer can't influence.

Bitmap indexes are particularly useless when a column can be any one of thousands of possibilities. Have you ever noticed that "income" reports or questionnaires contain no gross averages or individual figures but only boxes (e.g., [] 1500-3000 [] 3001-6000 [] more )? That kind of pregrouping is a big aid for bitmap index construction.

A common recommendation is that a column is a candidate for a bitmap if the number of possible distinct values is less than 1% of the total number of rows. For example, if the column is color and the value can be any one of the values in a 256-color display or NULL, the number of rows in the table should be at least 25,700 before you put color in a bitmap index.

Bitmap indexes are best for large (because a bit of storage is efficient), non-normalized (because there will be many repetitions), static (because updating one row means updating multiple bit vectors) tables.