Indexes

Index is another database physical structure that occupies disk space in a way similar to that of a table. The main difference is that indexes are hidden from users and are not mentioned in any DML statements, even though they are often used behind the scene.

Assume our PHONE table created previously in this chapter has been populated with records. Assume you issue the following query:

SELECT * FROM phone WHERE PHONE_CUSTID_FN = 152

The RDBMS first checks if the index exists on the condition column (PHONE_CUSTID_FN). If the answer is yes, the index is used to determine the physical location of the corresponding rows (i.e., the two rows with PHONE_CUSTID_FN = 152). If no index is found on the column, the whole table is scanned to find rows with appropriate values.

A database index is similar to an index at the end of a book — it stores pointers to the physical row locations on the disk in the same way a book's index points to the page numbers for the appropriate topics. From another viewpoint, it is similar to a database table with two or more columns: one for the row's physical address, and the rest for the indexed table columns. In other words, index tells the RDBMS where to look for a specific table row (or a group of rows) on the disk as illustrated on Figure 4-4.

click to expand
Figure 4-4: Index use

As you can see, the table column and the index have the same set of values; the main difference is that in the index, these values are sorted, so it takes much less time to find what you need.

In most databases indexes are implemented as B-Tree indexes, that is, they use the so called B-Tree algorithm that minimizes the number of times the hard disk must be accessed to locate a desired record, thereby speeding up the process. Because a disk drive has mechanical parts, which read and write data far more slowly than purely electronic media, it takes thousands of times longer to access a data element from a hard disk as compared with accessing it from RAM.

B-Trees save time by using nodes with many branches (called children). The simplest version of a B-Tree is called a binary tree because each node has only two children. Figure 4-5 illustrates a search for the value 100 using a binary tree. The algorithm is very simple. Starting at the top, if the top node value is less than what you are looking for, move to the left; if it's greater than 100, go to the right, until the value is found.

click to expand
Figure 4-5: B-Tree example

Indexes can be created to be either unique or nonunique. Unique indexes are implicitly created on columns for which a PRIMARY KEY or a UNIQUE constraint is specified. Duplicate values are not permitted. Nonunique indexes can be created on any column or combination of columns without any regard to duplicates.

Indexes can be created on one column or on multiple columns. The latter can be useful if the columns are often used together in WHERE clauses. For example, if some frequently used query looks for a certain customer's orders created on a certain date, you may create a nonunique index on the ORDHDR_CUSTID_FN and ORDHDR_CREATEDATE_D columns of the ORDER_HEADER table.

SQL99 does not specify any standards for indexes (or even require their existence at all), but practically all database vendors provide mechanisms to create indexes, because without them any production database would be unbearably slow.

There is no universal rule on when to create indexes, but some general recommendations can be given.

  • It usually does not make much sense to create indexes on small tables — they can degrade performance rather than improve it. If the table is only, say, 50 rows long, it might be faster to scan it than to use the B-Tree algorithm.

  • On large tables, indexes should be created only if the queries that involve indexed column(s) retrieve a small percentage of rows (usually under 15 percent).

  • Indexes are usually helpful on columns used in table joins. (Primary keys and unique columns are indexed by default; it is often not such a bad idea to index foreign key columns also.)

  • Indexes slow down DML operations that involve indexed columns — for example, if you update a value on such column, the index column has also to be updated; and if you insert a row, the corresponding index(es) may have to be re-sorted. So if a table is likely to be subjected to frequent updates, inserts, and/or deletes, it is recommended to have fewer indexes.

CREATE INDEX statement

The CREATE INDEX statement differs slightly for different implementations. The somewhat simplified syntax is below.

Oracle 9i

CREATE [UNIQUE | BITMAP]  INDEX [<schema>.]<index_name> ON  [<schema>.]<table_name> ({<column> |  <column_expression> }[ASC | DESC],...)  [<physical_parameters>];

This statement creates a unique index on the IDX_CUST_NAME on the CUST_NAME_S column of the CUSTOMER table with column values stored in descending order:

CREATE UNIQUE INDEX  idx_cust_name ON CUSTOMER(cust_name_s DESC)

Function-based indexes

Oracle 9i also lets you create so-called function-based indexes where instead of a column you can specify a column expression (a deterministic function). For example, you know that customers often query the CUSTOMER table using the LOWER function in the WHERE clause:

... WHERE LOWER(cust_name_s) = 'boswell designs corp.' ...

In this situation, a function-based index might make sense:

CREATE UNIQUE INDEX idx_cust_name ON CUSTOMER(LOWER(cust_name_s)) 

Bitmap indexes

The BITMAP keyword indicates that an index is to be created with a bitmap for each distinct key rather than indexing each row. Oracle recommends creating bitmap indexes on columns with low cardinality, that is, columns such as hypothetical GENDER and MARITAL_STATUS columns that would likely have few distinct values.

In the ACME database, order status can be a goof candidate for a bitmap index because there are only four possible statuses for an order (COMPLETE, INVOICED, SHIPPED, and CANCELLED). The example below creates bitmap index IDX_ORDHDR_STATUS on the ORDHDR_STATUSID_FN column of the ORDER_HEADER table:

CREATE BITMAP INDEX idx_ordhdr_status ON order_header (ordhdr_statusid_fn)
Note 

Function-based and bitmap indexes are available in Oracle Enterprise Edition only, so you will get an error if you try to execute the foregoing statement on the Personal or Standard versions.

Physical storage clause

As we mentioned before, you can specify separate tablespaces for table data and table indexes (and it does make sense from the database performance point of view). The syntax is the same as for CREATE TABLE (the example below assumes tablespace INDEX01 exists in your database):

CREATE UNIQUE INDEX idx_cust_name ON CUSTOMER(cust_name_s) TABLESPACE INDEX01

DB2 UDB 8.1

Here is the simplified DB2 syntax to create an index:

CREATE [UNIQUE] INDEX  [<schema>.]<index_name> ON [<schema>.]<table_name>  (<column_name> [ASC | DESC],...)

This statement creates unique index IDX_CUST_NAME_ALS on CUST_NAME_S and CUST_ALIAS_S columns of CUSTOMER table with column values stored in default (ascending) order:

CREATE UNIQUE INDEX  idx_cust_name_als ON CUSTOMER(cust_name_s, cust_alias_s)

start sidebar
Clustered Indexes

MS SQL Server allows you to create clustered indexes by specifying CLUSTERED keyword. (See the CREATE INDEX syntax described previously.)

The concept is very similar to one with index-organized tables in Oracle — the actual table rows are with the index and the physical order of rows is the same as their indexed order; that is, the rows are re-sorted every time a new one is inserted (or deleted). Only one clustered index is allowed on a table (or view) at a time; thus, you can create a clustered index on a column, drop it later, and create a clustered index on another column, but you can't create two clustered indexes on the same table simultaneously as data in the table can only be physically organized in one order.

These statements create the table SALESMAN and unique clustered index on its SALESMAN_CODE_S column.

CREATE TABLE SALESMAN ( SALESMAN_ID_N INTEGER NOT NULL, SALESMAN_CODE_S VARCHAR(2) NOT NULL, SALESMAN_NAME_S VARCHAR(50) NOT NULL, SALESMAN_STATUS_S CHAR(1) DEFAULT 'Y', CONSTRAINT CHK_SALESSTATUS CHECK (SALESMAN_STATUS_S in ('N', 'Y')) ) CREATE UNIQUE CLUSTERED INDEX idx_sales_code ON SALESMAN (salesman_code_s)

The default for a CREATE INDEX statement is NONCLUSTERED except for indexes on the primary key columns that are automatically created with CLUSTERED.

end sidebar

Note 

You cannot specify a physical location for the index in a CREATE INDEX statement in DB2; all indexes for a table will be created either in the default tablespace or in a tablespace specified in the CREATE TABLE or ALTER TABLE INDEX IN clause.

MS SQL Server 2000

To create an index in MS SQL Server, use this syntax:

CREATE  [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX <index_name> ON  <table_name> | <view_name> ( column [ ASC | DESC ],...) [ON  filegroup] 

The syntax is very similar to DB2. You can specify a filegroup to create the index on. This statement creates the unique index IDX_CUST_NAME_ALS on the CUST_NAME_S and CUST_ALIAS_S columns of the CUSTOMER table physically stored on filegroup INDEX01 (assuming it exists):

CREATE UNIQUE INDEX  idx_cust_name_als ON CUSTOMER(cust_name_s, cust_alias_s) ON  INDEX01

Tip 

In MS SQL Server you can create indexes on views. The view definition must be deterministic and schema bound.




SQL Bible
Microsoft SQL Server 2008 Bible
ISBN: 0470257040
EAN: 2147483647
Year: 2005
Pages: 208

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net