Hack 9. Filter on Indexed Columns

Query filtering will improve performance. You can gain even better performance if your filtering criteria can use indexed columns.

Queries can return all the rows and all the columns from tables. But what if you want only a few columns? It would be a waste of system resources to send you columns you don't want. Similarly, if you want to see only one row of a table, you should be able to ask only for that row. By asking only for what you really want, you are filtering away rows and columns that you don't want. To filter columns, make sure that you explicitly ask for only the columns you want (e.g., don't just use * on the SELECT line). You can filter rows using WHERE rules, but you can also use other clauses, such as HAVING.

Defining a primary key creates an index on the columns involved. This allows the database to find data much faster than it could without the index. The database server uses this index to make sure the key is unique in the table, which is a requirement for a primary key to be valid. Joins that use primary key columns also benefit from this index.

Filtering on something that does not have an index can cause a significant performance problem. Not only is it faster to search with an index, but also query optimizers can use the index first to perform initial filtering, instead of using the actual table data being queried. It may even be possible for the optimizer to use the index for the entire operation, depending on the query being executed.

If the index is all that is required to retrieve the result set, and the database never needs to be looked at, the index is called a covering indexit "covers" everything in the query.

Now let's consider a database of pages. Say you have page contents stored against a pagename, with previous versions of your pages also recorded so that you can implement version control. Table 2-2 shows an example.

Table 2-2. The page table

Content pagename username lastmod versionnum
hello index.htm l gordon 2006-03-01 1


index.html gordon 2006-10-10 2


p2.html andrew 2006-02-05 1


contents.html gordon 2006-02-05 1

Now you can support page changes that occur multiple times per day, by different users, and you can maintain a change log. Here is the query to use to extract the current version of the index.html page:

SELECT pagename,content
FROM page x
WHERE pagename = 'index.html'
AND versionnum = (
 SELECT MAX(y.versionnum) from page y
 WHERE y.pagename = 'index.html'

This query is reasonably efficient. An index on pagename allows the database system to find "index.html" quickly, without having to scan all the rows one at a time, looking for a match. This table has a primary key (pagename,versionnum) which, although it is not an index on pagename alone, should work even better because the index contains all the data needed by both query conditions. The database system can use composite indexes such as (pagename,versionnum) as long as it can find what it wants to index on by reading an index key from left to right, without having to use an unwanted column. This is known as partial index matching.

If the optimizer had an index (versionnum,pagename,lastmod) and needed an index for versionnum, this would be fine, but if it needed an index for pagename, this index would be ineffective because versionnum comes first in the list. It makes sense to think carefully about the order when creating a composite index or a composite primary key. Make sure that each of the commonly used columns comes first at least once.

If you had a choice of (pagename,versionnum) or (versionnum,pagename) and knew there were thousands of pagename records in the database with, in general, only a few versions per page, you would definitely want to put the highest discriminator first, which in this case would be (pagename,versionnum). It would be a bad idea to put a low-ranked discriminator first in an index, and a worse idea to create a separate index just for versionnum.

Join conditions, and WHERE clauses that include = and >, usually make good use of indexes. So, you should seriously consider having indexes for the columns involved. Consider a query on table t:

WHERE x = 6 AND y > 7 ;

This query uses x and y for filtering. If you were executing this query a lot, and you wanted to make it run fast using indexes, you would need to create the right indexes. You could create an index on x and y like this:

CREATE INDEX ind_1 ON t (x) ;
CREATE INDEX ind_2 ON t (y) ;

But this would be missing a trick. If the indexes were needed only for this query, you should realize that the ideal lookup would be on x first (the most discriminating term) and then on y, so the ideal answer is:

CREATE INDEX ind_1 ON t (x,y) ;

Of course, the optimizer may simply ignore all your indexes and do it another way if it thought the result would be produced more efficiently without indexes. But in general, indexes will give noticeable performance improvement when used correctly.

SQL Hacks
SQL Hacks
ISBN: 0596527993
EAN: 2147483647
Year: 2004
Pages: 147
Simiral book on Amazon

Flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net