Leaving off the ORDER BY clause and forcing the type of sort you want to use (usually via a collation) are two methods you can use to speed up sorts, but each involves depending on features that may not be supported by the DBMS. Let's look at three things you can do that don't involve DBMS vagaries: sort keys, encouraging index use, and preordering. Sort KeysIf you need secondary sorts or exotic collations, the speed goes down, and the DBMS support gets dicey. Both these problems are solvablejust add a column that contains a sort key to the table. A sort key is a string with a series of one-byte numbers that represent the relative ordering of characters . The string has three parts : the primary number, the secondary number, and the tertiary number. Parts are separated by 01 hexadecimal. For example, the sort keys for 'naive' and 'Na ve' look like this: Character Diacritic Case Weights Weights Weights Primary Secondary Tertiary naive 0E 70 0E 02 0E 32 0E A2 0E 21 01 01 01 01 Nave 0E 70 0E 02 0E 32 0E A2 0E 21 01 02 02 13 01 12 01 01 Notice that:
The MS Windows NT API has a function for converting a character string to a sort key. The function's name is LCMapString . The LCMapString input string can be in any of the character sets supported by Windows NT, and the function's output sort key has appropriate weights for the locale, which are roughly adequate for most common languages. Listing 3-1 shows a simplified version of a program for populating a table with sort keys. Listing 3-1 Populating a Table with Sort Keyslocale_id = an ID representing country/collation/etc. ... DECLARE Cursor1 CURSOR FOR SELECT character_column FROM Table1; OPEN Cursor1; for (;;) { FETCH Cursor1 INTO :character_string; if (NO_DATA_FOUND) break; LCMapString(locale_id, character_string, sort_key); UPDATE Table1 SET sort_key_column = :sort_key WHERE CURRENT OF Cursor1; } ... Note that, for the code in Listing 3-1 to work, you must define sort_key_column as a character column with a default sort type of "binary sort." Since it's binary, it works as quickly as any binary sort, which is to say, more quickly than a dictionary sort. Once you've populated a table with sort keys, you can use this SQL statement to get a sorted list: SELECT * FROM Table1 ORDER BY sort_key_column GAIN: 8/8 Encouraging Index UsePeople often associate sorting with indexing. Partly that's because at one time sorting was indexingdBASE II seems to have performed sorts by creating indexes. Nowadays, sorting is a separate activity, and only transient associations exist between the two. (For example, the DBMS may perform a sort while processing a CREATE INDEX statement.) If the DBMS uses an index on column1 , the natural result is that the rows will come out in order by column1 . So it's tempting to replace this SQL statement: SELECT * FROM Table1 ORDER BY column1 with this one: SELECT * FROM Table1 WHERE column1 >= '' ORDER BY column1 GAIN: 5/8 The idea here is that ORDER BY works a bit faster if some preordering occurs before the sort. And, in fact, a gain occurs when the WHERE clause is used. There are, however, three things to note about this trick:
There is a variation on the trick, though. Suppose you have a compound index on (column1 , column2) . You want to retrieve by column2 , but sort by column1 . Here's how: SELECT column1, column2 FROM Table1 WHERE column1 > 0 AND column2 = <result you want> ORDER BY column1 As we saw in Chapter 2, "Simple Searches," this is called using a redundant expression. It works particularly well if the DBMS chooses to use the index on (column1, column2) as a covering index (discussed in Chapter 9, "Indexes"). A second variation on the trick works if your DBMS supports clustered indexes. In that case, you can sort the columns of a table's cluster key in a way that supports your favorite ORDER BY option. When a clustered index exists, the rows are in order by the cluster keythis is guaranteed . This is not to say that you should try to force the DBMS to use an index merely because it helps with ORDER BY. The DBMS can choose another path. Exceptionally, you might want to override the DBMS's path choice if your main interest is to get the first rows onto the user screen as quickly as possible. Microsoft specially allows for that possibility with its FIRSTFASTROW hint. This is particularly important if you want to limit the number of rows. A question can arise whether you should change your ORDER BY clause to suit existing indexes or make new indexes to support your ORDER BY plans. In both cases, we would have to say "No." If you did either one, you'd be crossing the boundary between "taking advantage of a side effect" and "depending on a side effect." Anyway, recall from the discussion of secondary sorts earlier in this chapter, that a column's index keys may not be in the same order that you will need when you sort the column. DBMSs can use compound indexes. For example: SELECT column1, column2 FROM Table1 ORDER BY column1 will be faster if a compound index( column1 , column2 )is on Table1 (GAIN: 5/8 with compound index). DBMSs can misuse noncompound indexes. For example: SELECT column1, column2 FROM Table1 ORDER BY column1, column2 will be slower if a noncompound index( column1 )is on Table1 (GAIN: 3/8). To improve the sort speed, remove the index entirely or replace it with a compound index (GAIN: 3/8). PreorderingWe have touched a few times on the point that ORDER BY goes more quickly if the incoming data is presortedbut not a lot more quickly, so this section involves ideas that pay off in only a minority of cases. The obvious ways to preorder by column1 are either (a) declare that column1 is the clustered index key, or (b) export in sorted order and reimport . The primary advantage here is that fewer disk seeks should occur when the fetch direction is always forward. A further gain could occur if you split the sorted table into two, since some DBMSs can perform sorts in parallel if they're on two separate tables. The less obvious way to preorder is to add a "rough key" column to the table. The procedure to form rough keys is analogous to the one we showed you for sort keys in Listing 3-1. The difference is that the rough key must be a single integer that contains only some primary-sort information for the first few characters of a column. If you decide to use this idea, keep this tip in mind: A 32-bit integer can store up to five uppercase Latin characters, given six bits per character. Is the second of these two statements really faster than the first? SELECT * FROM Table1 ORDER BY real_column SELECT * FROM Table1 ORDER BY rough_key, real_column GAIN: 5/8 The answer is definitely "Yes" if duplication of column values is rare. Is the saved time worth the trouble of making the rough key column? The answer is "Maybe yes" if absolutely every SELECT on the table has an ORDER BY real_column clause tacked on to it. The Bottom Line: Other OptionsSecondary sorts and exotic collations are not universally supported. They also make sorts slower. The solution is to add a column containing a sort key to the table and use that in your ORDER BY clauses. A sort key is a string with a series of one-byte numbers that represent the relative ordering of characters. You can use the MS Windows NT API function LCMapString to convert a character string to a sort key. A sort key column should be a CHAR column with a default binary sort. ORDER BY works faster if some preordering occurs before the sort. Encourage preordering by using indexes, especially primary key and clustered indexes. One way to encourage the DBMS to use an index is to add redundant expressions to your queries. Another way to preorder is to add a rough key column to the table. Rough keys are analogous to sort keys except that rough keys are defined as a single integer that contains only some primary-sort information for the first few characters of a column. |