Hack 9. Filter on Indexed Columns

Table of contents:

Query filtering will improve performance. You can gain even better performance if your filtering criteria can use indexed columns.

Queries can return all the rows and all the columns from tables. But what if you want only a few columns? It would be a waste of system resources to send you columns you don't want. Similarly, if you want to see only one row of a table, you should be able to ask only for that row. By asking only for what you really want, you are filtering away rows and columns that you don't want. To filter columns, make sure that you explicitly ask for only the columns you want (e.g., don't just use * on the SELECT line). You can filter rows using WHERE rules, but you can also use other clauses, such as HAVING.

Defining a primary key creates an index on the columns involved. This allows the database to find data much faster than it could without the index. The database server uses this index to make sure the key is unique in the table, which is a requirement for a primary key to be valid. Joins that use primary key columns also benefit from this index.

Filtering on something that does not have an index can cause a significant performance problem. Not only is it faster to search with an index, but also query optimizers can use the index first to perform initial filtering, instead of using the actual table data being queried. It may even be possible for the optimizer to use the index for the entire operation, depending on the query being executed.

If the index is all that is required to retrieve the result set, and the database never needs to be looked at, the index is called a covering indexit "covers" everything in the query.

Now let's consider a database of pages. Say you have page contents stored against a pagename, with previous versions of your pages also recorded so that you can implement version control. Table 2-2 shows an example.

Table 2-2. The page table
Content	pagename	username	lastmod	versionnum
hello	index.htm l	gordon	2006-03-01	1
Hia	index.html	gordon	2006-10-10	2
page2	p2.html	andrew	2006-02-05	1
Index	contents.html	gordon	2006-02-05	1

Now you can support page changes that occur multiple times per day, by different users, and you can maintain a change log. Here is the query to use to extract the current version of the index.html page:

SELECT pagename,content
FROM page x
WHERE pagename = 'index.html'
AND versionnum = (
 SELECT MAX(y.versionnum) from page y
 WHERE y.pagename = 'index.html'
)

This query is reasonably efficient. An index on pagename allows the database system to find "index.html" quickly, without having to scan all the rows one at a time, looking for a match. This table has a primary key (pagename,versionnum) which, although it is not an index on pagename alone, should work even better because the index contains all the data needed by both query conditions. The database system can use composite indexes such as (pagename,versionnum) as long as it can find what it wants to index on by reading an index key from left to right, without having to use an unwanted column. This is known as partial index matching.

If the optimizer had an index (versionnum,pagename,lastmod) and needed an index for versionnum, this would be fine, but if it needed an index for pagename, this index would be ineffective because versionnum comes first in the list. It makes sense to think carefully about the order when creating a composite index or a composite primary key. Make sure that each of the commonly used columns comes first at least once.

If you had a choice of (pagename,versionnum) or (versionnum,pagename) and knew there were thousands of pagename records in the database with, in general, only a few versions per page, you would definitely want to put the highest discriminator first, which in this case would be (pagename,versionnum). It would be a bad idea to put a low-ranked discriminator first in an index, and a worse idea to create a separate index just for versionnum.

Join conditions, and WHERE clauses that include = and >, usually make good use of indexes. So, you should seriously consider having indexes for the columns involved. Consider a query on table t:

SELECT z
FROM t
WHERE x = 6 AND y > 7 ;

This query uses x and y for filtering. If you were executing this query a lot, and you wanted to make it run fast using indexes, you would need to create the right indexes. You could create an index on x and y like this:

CREATE INDEX ind_1 ON t (x) ;
CREATE INDEX ind_2 ON t (y) ;

But this would be missing a trick. If the indexes were needed only for this query, you should realize that the ideal lookup would be on x first (the most discriminating term) and then on y, so the ideal answer is:

CREATE INDEX ind_1 ON t (x,y) ;

Of course, the optimizer may simply ignore all your indexes and do it another way if it thought the result would be produced more efficiently without indexes. But in general, indexes will give noticeable performance improvement when used correctly.

SQL Fundamentals

Joins, Unions, and Views

Text Handling

Date Handling

Number Crunching

Online Applications

Organizing Data

Storing Small Amounts of Data

Locking and Performance

Reporting

Users and Administration

Wider Access

Index

Hia

Index