Index Key Values

   

The general rule for indexes is that a key value is an exact copy of the column value in the data row. That is, if the data column has a 16-bit SMALLINT value, then so does the key. And if the data column is variable-length, then so is the key. The rule that copies are exact has just two common exceptions.

The first and most notorious exception to the rule that key and column value copies are exact is that Oracleand Oracle alonerefuses to store NULLs. Oh, the DBMS will put NULLs in compound keys if other columns are NOT NULL, but Oracle doesn't have much use for NULL keys. So if you have a table ( Table1 ), containing two columns ( column1 , column2 ), and column2 is indexed, then:

  • Oracle can do:

     INSERT INTO Table1 VALUES (5, NULL) 

    much more quickly, because the DBMS doesn't have to update the index at all.

  • Oracle can do:

     SELECT * FROM Table1 WHERE column2 < 5 

    a bit more quickly, because the index on column2 has fewer keys and therefore might have fewer levels.

  • Oracle cannot do:

     SELECT * FROM Table1 WHERE column2 IS NULL 

    quickly, because the DBMS can't use an index for any case where column2 IS NULL might be true .

This is a good trade off. Unfortunately, it causes an attitude difference between Oracle programmers and everybody else: Oracle programmers will think nullable columns are good, while programmers for other DBMSs will think that NOT NULL leads to performance improvements.

The second exception to the rule that key and column value copies are exact is that index keys might be truncatedsometimes silently, sometimes explicitly. [6] This is a reasonable thing, because it's good to keep key size small, and a few dozen bytes should be enough to differentiate two similar but different key values. The only effects you, the programmer, see are:

[6] For example, MySQL allows indexes to be defined on the left-most n characters of a column. PostgreSQL, on the other hand, has now dropped support for partial indexes.

  • The order of the keys isn't necessarily the order that should appear for an ORDER BY clause.

  • The value of using compound indexes is less because the right-most columns can disappear.

  • Some queries, especially those that contain column1 = column2 comparisons, will take longer because not everything is in the index.

On the other hand, accuracy is not affected. Usually truncation becomes an issue when the total index key size is more than 100 bytes, which should be rare.

Tip

It is a mistake, when calculating data-row size, to think thusly:

The free space in the page is 8,000 bytes, and the row size is 798 bytes, and one of the columns is CHAR(38), so I'll change that column to CHAR(40). After all, it's free becauseeven though the row size is two bytes biggerI'm still fitting the same number of rows (10) in the page.

What's forgotten is that every index key based on the column will also be bigger.


While we're on the subject of shortening keys, here's a brief note about compression that is, making index keys shorter by throwing bytes away from the front or from the back.

Front compression depends on the fact that keys are in order, and so the first n bytes of key #2 are probably the same as the first n bytes of key #1. The first n bytes of key #2 can thus be replaced with a single byte that means "the first n bytes are the same as those of the previous key." For example, here are four key values:

 Johnson   Johnson   Jonathan   Jones     (total 27 bytes) 

If each key begins with a byte meaning "number of bytes that are the same," these four keys can be stored like this:

 0Johnson  7         2nathan    3es       (total 19 bytes) 

Back compression depends on the fact that most keys are distinguishable after a few characters. That is, if two keys are different, then that will be clear after a small number of characters, so there's no need to store the bytes that don't establish the differences between key values. (This is called a lossy compression because information is lost, but what's lost can be recovered when the original data row is read in.) If the four example keys were back-compressed instead of front-compressed, this would be the result:

 Joh       Joh       Jona       Jone      (total 14 bytes) 

Both front compression and back compression can save lots of space. The downside is that compression can slow down index scansextra CPU time is often required to reconstruct the key column values. Compression also incurs additional storage overhead (usually four bytes). Because of this, not all DBMSs use compression. DBMS makers don't like the effects of compression on locking, CPU time, cascading changes, complexity, and difficulty doing binary searches.

At first glance, it may look like our claimthat there are only two exceptions to the rule that index keys are exactly like data keysis mistaken. The apparent error is this: keys are supposed to be in order. To maintain order, keys must be compared with each other frequently. In a comparison, some conversion is commonly necessary. For example, a case-insensitive index might have these keys: {a , B} . For comparison purposes, such an index must convert B to b that is, ask for LOWER(B) before checking if the value is greater than a . In that case, why doesn't the DBMS store b instead of B ? The reason is that it doesn't want to lose information that might be useful for a covering index.

The Bottom Line: Index Key Values

The general rule for indexes is that a key value is an exact copy of the column value in the data row. The two exceptions to this rule are (a) the case of NULLs (Oracle won't store NULL so Oracle programmers think nullable columns are good, while programmers for other DBMSs think that NOT NULL leads to performance improvements) and (b) the fact that index keys might be truncated or compressed.

Usually truncation becomes an issue when the total index key size is more than 100 bytes, which should be rare. In that case, (a) the order of the keys isn't necessarily the order that should appear for an ORDER BY clause, (b) the value of using compound indexes is less because the right-most columns can disappear, and (c) some queries will take longer because not everything is in the index. So keep the size of index keys small.

   


SQL Performance Tuning
SQL Performance Tuning
ISBN: 0201791692
EAN: 2147483647
Year: 2005
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net