Transact-SQL Examples and Brainteasers | Inside Microsoft SQL Server 2000

In this section, we'll look at some relatively common programming tasks that aren't as simple as you might think. If you find a solution here for something you need to do, that's great. But the real goal is to give you some insight into the power and flexibility of Transact-SQL. For some examples, I'll offer multiple solutions, often with different performance characteristics. Keep in mind that there are often many ways to solve a problem, and you can find more examples similar to the ones I present here in some of the SQL programming books listed in the bibliography.

Generating Test Data

There is actually nothing particularly tricky to generating test data, but because generating test data is a common need and some of the later solutions depend on test data, we'll look at a few simple techniques. Of course, it's preferable to use an actual dataset of representative values for testing rather than generated test data before you deploy a production application. But sometimes it's more practical to generate test data when you need something "quick and dirty."

If you need a bunch of rows and you don't care about the distribution of values, you can easily create a table using default values for every column and the Identity property on a column that will act as the primary key. You can then set a counter to loop as many times as necessary to achieve the number of rows you want and execute the special INSERT…DEFAULT VALUES statement. This example creates 1000 rows in table xyz:

 -- Method 1. Simple DEFAULT values on table. CREATE TABLE xyz ( col1 int PRIMARY KEY IDENTITY(1, 1) NOT NULL, col2 int NOT NULL DEFAULT 999, col3 char(10) NOT NULL DEFAULT 'ABCEFGHIJK' ) GO SET NOCOUNT ON DECLARE @counter int SET @counter=1 WHILE (@counter <= 1000) BEGIN INSERT xyz DEFAULT VALUES SET @counter=@counter+1 END SELECT * FROM xyz col1 col2 col3 ---- ---- ----------- 1 999 ABCEFGHIJK 2 999 ABCEFGHIJK 3 999 ABCEFGHIJK 4 999 ABCEFGHIJK 5 999 ABCEFGHIJK  999 999 ABCEFGHIJK 1000 999 ABCEFGHIJK

Usually, you want some distribution in the data values; the RAND function, the modulo operator, and the functions CONVERT, CHAR, and REPLICATE come in handy for this purpose. RAND is a standard random number generator that's just like the function used in C, which RAND calls. Because RAND returns a float with a value between 0 and 1, you typically multiply it and convert the result to an integer so that you can use the modulo operator to indicate the range of values. For example, if you want a random integer from 0 through 9999, the following expression works nicely:

 (CONVERT(int, RAND() * 100000) % 10000)

If you want to include negative numbers (for example, a range from -9999 through 9999), you can use RAND to flip a coin and generate 0 or 1 (by doing modulo 2) and then use CASE to multiply half of the numbers by -1:

 CASE WHEN CONVERT(int, RAND() * 1000) % 2 = 1 THEN (CONVERT(int, RAND() * 100000) % 10000 * -1) ELSE CONVERT(int, RAND() * 100000) % 10000 END

To produce character data, you can generate a random number from 0 through 25 (since the English alphabet has 26 letters) and add that number to 65, which is the ASCII value for A. The result is the ASCII value for a character from A through Z. You can perform this operation a specified number of times or in a loop to generate as many characters as you want. Usually, after you indicate a few lead characters, you can use filler characters for the rest of a field. The REPLICATE function is a nice tool to use for creating the filler. If you want to be sure that different executions of the routine return different random numbers, you should seed the RAND function by including an integer value between the parentheses. You can use @@spid, the object_id of the newly created table, or any other "almost random" integer value.

The most common way to generate character data is to use local variables for the generated data inside a WHILE loop that does INSERT statements using the variables. Here's an example:

 -- Method 2. Generate random data in a loop. IF (ISNULL(OBJECT_ID('random_data'), 0)) > 0 DROP TABLE random_data GO CREATE TABLE random_data ( col1 int PRIMARY KEY, col2 int, col3 char(15) ) GO DECLARE @counter int, @col2 int, @col3 char(15) /* Insert 100 rows of data */ -- Seed random generator SELECT @counter=0, @col2=RAND(@@spid + cpu + physical_io) FROM master..sysprocesses where spid=@@spid WHILE (@counter < 1000) BEGIN SELECT @counter=@counter + 10, -- Sequence numbers by 10 @col2= CASE -- Random integer between -9999 and 9999 WHEN CONVERT(int, RAND() * 1000) % 2 = 1 THEN (CONVERT(int, RAND() * 100000) % 10000 * -1) ELSE CONVERT(int, RAND() * 100000) % 10000 END, @col3= -- Four random letters followed by random fill letter CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) -- 65 is 'A' + CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) + CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) + CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) + REPLICATE(CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65), 11) INSERT random_data VALUES (@counter, @col2, @col3) END GO SELECT * FROM random_data WHERE COL1 < 200 col1 col2 col3 ---- ----- --------------- 10 -5240 LXDSGGGGGGGGGGG 20 9814 TTPDOOOOOOOOOOO 30 3004 IEYXEEEEEEEEEEE 40 -9377 MITDAAAAAAAAAAA 50 -3823 ISGMUUUUUUUUUUU 60 -4249 DHZQQQQQQQQQQQQ 70 2335 XBJKEEEEEEEEEEE 80 -4570 ILYWNNNNNNNNNNN 90 4223 DHISDDDDDDDDDDD 100 -3332 THXLWWWWWWWWWWW 110 -9959 ALHFLLLLLLLLLLL 120 4580 BCZNGGGGGGGGGGG 130 6072 HRTJOOOOOOOOOOO 140 -8274 QPTKWWWWWWWWWWW 150 8212 FBQABBBBBBBBBBB 160 8223 YXAPLLLLLLLLLLL 170 -9469 LIHCAAAAAAAAAAA 180 -2959 GYKRZZZZZZZZZZZ 190 7677 KWWBJJJJJJJJJJJ

You can also set up the table with a DEFAULT that includes the random data expression and then use the DEFAULT VALUES statement. This method is a combination of the previous two methods. All the complexity of the random values is segregated to the CREATE TABLE command and the INSERT is again a simple loop that uses DEFAULT VALUES. In the following example, we use a CASE statement in a DEFAULT clause of a CREATE TABLE command:

 -- Method 3. Generate random values for DEFAULT. CREATE TABLE random_data ( col1 int PRIMARY KEY IDENTITY(10,10) NOT NULL, col2 int NOT NULL DEFAULT CASE -- Random integer between -9999 and 9999 WHEN CONVERT(int, RAND() * 1000) % 2 = 1 THEN (CONVERT(int, RAND() * 100000) % 10000 * -1 ) ELSE CONVERT(int, RAND() * 100000) % 10000 END, col3 char(15) NOT NULL DEFAULT CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) -- 65 is 'A' + CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) + CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) + CHAR((CONVERT(int, RAND() * 1000) % 26 ) + 65) + REPLICATE(CHAR((CONVERT(int, RAND() * 1000) % 26) + 65), 11) ) GO DECLARE @counter int SET @counter=1 WHILE (@counter <= 1000) BEGIN INSERT random_data DEFAULT VALUES SET @counter=@counter + 1 END -- Limit number of rows for illustration only SELECT * FROM random_data WHERE col1 <= 200 col1 col2 col3 ---- ----- --------------- 10 -6358 LCNLMMMMMMMMMMM 20 -2284 SSAITTTTTTTTTTT 30 -1498 NARJAAAAAAAAAAA 40 -1908 EINLZZZZZZZZZZZ 50 -716 KNIOFFFFFFFFFFF 60 -8331 WZPRYYYYYYYYYYY 70 -2571 TMUBEEEEEEEEEEE 80 -7965 LILNCCCCCCCCCCC 90 9728 IXLOBBBBBBBBBBB 100 878 IPMPPPPPPPPPPPP 110 -2649 QXPAPPPPPPPPPPP 120 -4443 EBVHKKKKKKKKKKK 130 6133 VRJWXXXXXXXXXXX 140 -5154 HMHXLLLLLLLLLLL 150 -480 RNLVQQQQQQQQQQQ 160 -2655 SEHXTTTTTTTTTTT 170 -8204 JVLHZZZZZZZZZZZ 180 -3201 PTWGBBBBBBBBBBB 190 -7529 TDCJXXXXXXXXXXX 200 2622 ANLDHHHHHHHHHHH

Getting Rankings

To get the five highest values or five lowest values in a dataset, you can use the TOP clause in a SQL Server SELECT statement. (I discussed TOP in Chapter 7.) The TOP clause also lets you specify WITH TIES. It's not always clear what that means, so an example might help. The following SELECT statement returns the rows with the five highest values for the ytd_sales column, including any ties:

 SELECT TOP 5 WITH TIES title_id, ytd_sales, title FROM titles ORDER BY ytd_sales DESC

Here's the result:

 title_id ytd_sales title -------- --------- ----- MC3021 22246 The Gourmet Microwave BU2075 18722 You Can Combat Computer Stress! TC4203 15096 Fifty Years in Buckingham Palace Kitchens PC1035 8780 But Is It User Friendly? BU1032 4095 The Busy Executive's Database Guide BU7832 4095 Straight Talk About Computers PC8888 4095 Secrets of Silicon Valley TC7777 4095 Sushi, Anyone? (8 row(s) affected)

You end up with more than the five rows you requested. SQL Server returned the first five rows after sorting the data, and because additional rows had the same value as the fifth row returned (4095), it kept returning rows as long as there were equivalent ytd_sales values. The WITH TIES clause requires that the data be sorted using ORDER BY. A more "formal" definition of TOP n WITH TIES and ORDER BY c DESC is "return all rows that have a value of c for which there are at most n-1 greater values of c." So, in the dataset above, n is 5 and c is ytd_sales. The value 22246 has 0 rows with a greater value of ytd_sales, and 18722 has only one value greater. The last value returned, 4095, has only four values that are greater.

Another way to think of this concept of "the number of distinct values greater than a given value" is as the rank of the value. A value for which there are no greater values has a rank of 1, and a value for which there are two greater values has a rank of 3. The previous query returns all the rows with a rank of 5 or less, using the TOP clause. However, it's not so simple to get a list of the rows with a rank of 5 or less along with an integer indicating the actual rank value. Let's look at several possible solutions to this problem.

Approach 1: The Standard SQL Approach Using a View

First, let's look again at the titles table. Suppose you want to assign a rank value to all rows in the table based on their ytd_sales values (with the highest value getting the top rank). You can do this by performing a ranking of the values by nesting a SELECT inside the select list. For each row in the outer SELECT, the inner SELECT returns the number of values in the table that are equal to or greater than the value being ranked. This "rows rank" value is then correlated back to the main query. You can query the view to obtain any rank.

Performance suffers on large tables because a table must be scanned for every row. Ties are assigned equal rank; unique numbers are not guaranteed unless the columns are known to be unique. In the next example, the next nontie value's rank is one higher, not lower. Notice that we completely disregard rows with NULL values for ytd_sales.

 CREATE VIEW ranked_sales (rank, title_id, ytd_sales, title) AS SELECT (SELECT COUNT(DISTINCT T2.ytd_sales) FROM titles AS T2 WHERE T2.ytd_sales >= T1.ytd_sales ) AS rank, title_id, ytd_sales, title FROM titles AS T1 WHERE ytd_sales IS NOT NULL GO SELECT * FROM ranked_sales WHERE rank <= 10 ORDER BY rank rank title_id ytd_sales title ---- -------- --------- --------------------------------------------- 1 MC3021 22246 The Gourmet Microwave 2 BU2075 18722 You Can Combat Computer Stress! 3 TC4203 15096 Fifty Years in Buckingham Palace Kitchens 4 PC1035 8780 But Is It User Friendly? 5 BU1032 4095 The Busy Executive's Database Guide 5 BU7832 4095 Straight Talk About Computers 5 PC8888 4095 Secrets of Silicon Valley 5 TC7777 4095 Sushi, Anyone? 6 PS3333 4072 Prolonged Data Deprivation: Four Case Studies 7 BU1111 3876 Cooking with Computers: Surreptitious Balance Sheets 8 PS7777 3336 Emotional Security: A New Algorithm 9 PS2091 2045 Is Anger the Enemy? 10 MC2222 2032 Silicon Valley Gastronomic Treats

Approach 2: The Standard SQL Approach Without Using a View

This approach is basically the same as the first one, but it saves you the step of creating the view, at the possible cost of being a bit harder to understand. Instead of using a view, you use a derived table (by adding a SELECT in the FROM clause). Performance is identical to that of the view approach, however. (That is, it's not too good.) The results are also identical, so I won't show them here.

 SELECT rank, title_id, ytd_sales, title FROM (SELECT T1.title_id, ytd_sales, T1.title, (SELECT COUNT(DISTINCT T2.ytd_sales) FROM titles AS T2 WHERE T1.ytd_sales <= T2.ytd_sales) AS rank FROM titles AS T1) AS X WHERE (ytd_sales IS NOT NULL) AND (rank <= 10) ORDER BY rank

Approach 3: Using a Temporary Table with an identity Column and No Ties

If you want to assign a unique number even in the case of ties, or if you know that ties will not occur, you can create a temporary table with an identity column and then SELECT into the temporary table in an ordered fashion. This gives you a materialized table with rankings. This approach is conceptually easy to understand, and it's fast. Its only downside is that it does not recognize tied values. It's not ANSI-standard SQL, but this approach takes advantage of SQL Server-specific features. This approach can be useful with other sequential operations, as you'll see later.

 CREATE TABLE #ranked_order ( rank int IDENTITY NOT NULL PRIMARY KEY, title_id char(6) NOT NULL, ytd_sales int NOT NULL, title varchar(80) NOT NULL ) GO INSERT #ranked_order SELECT title_id, ytd_sales, title FROM titles WHERE ytd_sales IS NOT NULL ORDER BY ytd_sales DESC SELECT * FROM #ranked_order DROP TABLE #ranked_order rank title_id ytd_sales title ---- -------- --------- --------------------------------------------- 1 MC3021 22246 The Gourmet Microwave 2 BU2075 18722 You Can Combat Computer Stress! 3 TC4203 15096 Fifty Years in Buckingham Palace Kitchens 4 PC1035 8780 But Is It User Friendly? 5 BU1032 4095 The Busy Executive's Database Guide 6 BU7832 4095 Straight Talk About Computers 7 PC8888 4095 Secrets of Silicon Valley 8 TC7777 4095 Sushi, Anyone? 9 PS3333 4072 Prolonged Data Deprivation: Four Case Studies 10 BU1111 3876 Cooking with Computers: Surreptitious Balance Sheets 11 PS7777 3336 Emotional Security: A New Algorithm 12 PS2091 2045 Is Anger the Enemy? 13 MC2222 2032 Silicon Valley Gastronomic Treats 14 PS1372 375 Computer Phobic AND Non-Phobic Individuals 15 TC3218 375 Onions, Leeks, and Garlic: Cooking Secrets of the Med 16 PS2106 111 Life Without Fear

Approach 4: Using a Temporary Table with an identity Column and Ties

With standard SQL solutions, if the query focuses on four rows with two rows tied for second rank, then one row has the rank of first, two rows are second, and one row is third. No row is ranked fourth. An alternative way to rank them is to make one row first, two rows second, no row third, and one row fourth. (This is how standings in a race are posted if two swimmers tie for second place.) After populating the temporary table, you can query the table for the lowest rank for a given value and correlate that value back to the main query with a nested SELECT.

 -- Approach 4A. Create a temp table with an identity, and then do -- an ordered select to populate it. -- Do a nested select correlated back to itself to find the lowest -- rank for a given value. CREATE TABLE #ranked_order ( rank int IDENTITY NOT NULL, title_id char(6) NOT NULL, ytd_sales int NOT NULL, title varchar(80) NOT NULL ) GO INSERT #ranked_order SELECT title_id, ytd_sales, title FROM titles WHERE ytd_sales IS NOT NULL ORDER BY ytd_sales DESC SELECT B.rank, A.title_id, B.ytd_sales , A.title FROM (SELECT MIN(T2.rank) AS rank, T2.ytd_sales FROM #ranked_order AS T2 GROUP BY T2.ytd_sales) AS B, #ranked_order AS A WHERE A.ytd_sales=B.ytd_sales ORDER BY B.rank DROP TABLE #ranked_order rank title_id ytd_sales title ---- -------- --------- --------------------------------------------- 1 MC3021 22246 The Gourmet Microwave 2 BU2075 18722 You Can Combat Computer Stress! 3 TC4203 15096 Fifty Years in Buckingham Palace Kitchens 4 PC1035 8780 But Is It User Friendly? 5 BU1032 4095 The Busy Executive's Database Guide 5 BU7832 4095 Straight Talk About Computers 5 PC8888 4095 Secrets of Silicon Valley 5 TC7777 4095 Sushi, Anyone? 9 PS3333 4072 Prolonged Data Deprivation: Four Case Studies 10 BU1111 3876 Cooking with Computers: Surreptitious Balance Sheets 11 PS7777 3336 Emotional Security: A New Algorithm 12 PS2091 2045 Is Anger the Enemy? 13 MC2222 2032 Silicon Valley Gastronomic Treats 14 PS1372 375 Computer Phobic AND Non-Phobic Individuals 14 TC3218 375 Onions, Leeks, and Garlic: Cooking Secrets of the Med 16 PS2106 111 Life Without Fear

You can slightly modify the SELECT on the temporary table and explicitly indicate where ties exist and how many values were tied. The creation and population of the temporary table are identical, so I'll show just the SELECT statement:

 -- Approach 4B. Same as 4A, explicitly noting the ties. CREATE TABLE #ranked_order ( rank int IDENTITY NOT NULL, title_id char(6) NOT NULL, ytd_sales int NOT NULL, title varchar(80) NOT NULL ) GO INSERT #ranked_order SELECT title_id, ytd_sales, title FROM titles WHERE ytd_sales IS NOT NULL ORDER BY ytd_sales DESC SELECT B.rank, CASE B.number_tied WHEN 1 THEN ' ' ELSE '('+ CONVERT(varchar, number_tied) + ' Way Tie)' END AS tie, A.title_id, B.ytd_sales, A.title FROM (SELECT MIN(T2.rank) AS rank, COUNT(*) AS number_tied, T2.ytd_sales FROM #ranked_order AS T2 GROUP BY T2.ytd_sales) AS B, #ranked_order AS A WHERE A.ytd_sales=B.ytd_sales ORDER BY B.rank DROP TABLE #ranked_order rank tie title_id ytd_sales title ---- ----------- -------- --------- ----------------------------------------- 1 MC3021 22246 The Gourmet Microwave 2 BU2075 18722 You Can Combat Computer Stress! 3 TC4203 15096 Fifty Years in Buckingham Palace Kitchens 4 PC1035 8780 But Is It User Friendly? 5 (4 Way Tie) BU1032 4095 The Busy Executive's Database Guide 5 (4 Way Tie) BU7832 4095 Straight Talk About Computers 5 (4 Way Tie) PC8888 4095 Secrets of Silicon Valley 5 (4 Way Tie) TC7777 4095 Sushi, Anyone? 9 PS3333 4072 Prolonged Data Deprivation 10 BU1111 3876 Cooking with Computers 11 PS7777 3336 Emotional Security: A New Algorithm 12 PS2091 2045 Is Anger the Enemy? 13 MC2222 2032 Silicon Valley Gastronomic Treats 14 (2 Way Tie) PS1372 375 Computer Phobic AND Non-Phobic Individuals 14 (2 Way Tie) TC3218 375 Onions, Leeks, and Garlic 16 PS2106 111 Life Without Fear

Finding Differences Between Intervals

Suppose we have a table with two columns that records data for measured temperatures. The first column is the datetime at which the measurement was taken, and the second column is the temperature. We want to find the change in temperature between one interval and the next, and we want it expressed as the absolute value of the number of degrees that changed per minute. There is no preexisting primary key, such as measurement_id. It's unlikely that duplicate measurements exist for any single time (that is, the datetime field is probably unique), but we don't know this for sure. And the intervals between data measurements are not consistent. In fact, the data was not inserted in order. Like the rankings problem, this problem can be solved using standard (but tricky) SQL. In fact, you can think of this as another type of ranking problem—but here, you want to see the differences between adjacent ranks.

First, I'll set up the table with some fairly random data. I really don't care whether the temperature values are realistic or how much they might fluctuate in even a few minutes. So if you run this example and see temperatures of -100º F or a 40-degree change in temperature in five minutes, don't worry. These readings might not be from Earth!

NOTE
It's worthwhile to set up the routine to generate test data even though in this case it would be quicker to manually insert 20 rows of data. Once the routine is working and you've checked your solution on the small table, you can easily change the constant in the WHILE loop to add much more data. This is important because many solutions perform well with small amounts of data but can degrade badly with large amounts of data. This can be especially true if a solution uses correlated subqueries or self-joins or if it does table scans.

 -- diff_intervals.sql IF NULLIF(OBJECT_ID('measurements'), 0) > 0 DROP TABLE measurements GO CREATE TABLE measurements ( when_taken datetime NOT NULL, temperature numeric(4, 1) -- (Fahrenheit) ) CREATE CLUSTERED INDEX measurements_idx01 ON measurements (when_taken) GO DECLARE @counter int, @whendate datetime, @val numeric(4, 1), @randdiff smallint, @randmins smallint SELECT @counter=1, @whendate=GETDATE(), @val=50.0 /* Insert 20 rows of data. Change constant if you want more. */ WHILE (@counter <= 20) BEGIN INSERT measurements VALUES (@whendate, @val) -- Get a random number between -20 and 20 for change in -- temperature. This will be added to the previous value, -- plus RAND again to give a fractional component. SELECT @randdiff=CASE WHEN CONVERT(int, RAND() * 100) % 2 = 1 THEN CONVERT(int, RAND() * 1000) % 21 * -1 ELSE CONVERT(int, RAND() * 1000) % 21 END, -- Get a random number between 0 and 10080 (the number of minutes -- in a week). This will be added to the current GETDATE -- value. Since GETDATE returns a value to the millisecond, -- it's unlikely there will ever be a duplicate, although it -- is possible if the result of the addition and the current -- GETDATE() value happen to collide with the addition in -- another row. (That's intentional; we are not assuming -- that duplicates are automatically prevented.) @randmins=CONVERT(int, RAND() * 100000) % 10080 SELECT @counter=@counter + 1, @whendate=DATEADD(mi, @randmins, GETDATE()), @val=@val + @randdiff + RAND() END SELECT * FROM measurements when_taken temperature --------------------------- ----------- 2000-07-17 12:31:41.203 50.0 2000-07-17 16:07:41.243 48.0 2000-07-17 23:56:41.233 55.7 2000-07-18 00:13:41.243 32.8 2000-07-18 13:40:41.273 -18.2 2000-07-18 20:35:41.283 27.2 2000-07-18 21:53:41.243 34.6 2000-07-19 01:21:41.283 -34.6 2000-07-19 22:52:41.253 16.8 2000-07-20 03:56:41.283 -27.8 2000-07-20 06:56:41.283 38.1 2000-07-21 08:28:41.243 67.6 2000-07-21 20:43:41.283 -7.0 2000-07-22 19:12:41.293 20.9 2000-07-23 05:59:41.273 .4 2000-07-23 07:33:41.243 58.1 2000-07-23 09:25:41.243 65.6 2000-07-23 16:04:41.283 7.0 2000-07-23 16:05:41.253 2.2 2000-07-23 20:59:41.253 30.9

Approach 1: Using Standard SQL

This approach is similar to the rankings solution. I assign a ranking to each row and then join the table back to itself on the ranking value less 1. If duplicate datetime values exist, this approach still computes the differential to the previous (nonduplicate) measurement, which might or might not be how you'd want to deal with it. As with the rankings solution, you can use this approach either as a view or as a derived table. Since a view is easier for most people to understand, I'll write it that way.

As in the top n and rankings problems, this approach is a good brainteaser, but it's not a good performer if the table is anything but very small. In addition, this approach is problematic if duplicate datetime values might exist.

 CREATE VIEW rankdates (when_taken, temperature, daterank) AS SELECT when_taken, temperature, (SELECT COUNT(DISTINCT when_taken) FROM measurements AS T1 WHERE T1.when_taken <= T0.when_taken) AS rank FROM measurements AS T0 GO SELECT * FROM rankdates ORDER BY daterank GO when_taken temperature daterank --------------------------- ----------- -------- 2000-07-17 12:31:41.203 50.0 1 2000-07-17 16:07:41.243 48.0 2 2000-07-17 23:56:41.233 55.7 3 2000-07-18 00:13:41.243 32.8 4 2000-07-18 13:40:41.273 -18.2 5 2000-07-18 20:35:41.283 27.2 6 2000-07-18 21:53:41.243 34.6 7 2000-07-19 01:21:41.283 -34.6 8 2000-07-19 22:52:41.253 16.8 9 2000-07-20 03:56:41.283 -27.8 10 2000-07-20 06:56:41.283 38.1 11 2000-07-21 08:28:41.243 67.6 12 2000-07-21 20:43:41.283 -7.0 13 2000-07-22 19:12:41.293 20.9 14 2000-07-23 05:59:41.273 .4 15 2000-07-23 07:33:41.243 58.1 16 2000-07-23 09:25:41.243 65.6 17 2000-07-23 16:04:41.283 7.0 18 2000-07-23 16:05:41.253 2.2 19 2000-07-23 20:59:41.253 30.9 20 -- Correlate each value with the one right before it SELECT P1_WHEN=V1.when_taken, P2_WHEN=V2.when_taken, P1=V1.temperature, P2=V2.temperature, DIFF=(V2.temperature - V1.temperature) FROM rankdates AS V1 LEFT OUTER JOIN rankdates AS V2 ON (V2.daterank=V1.daterank + 1) GO P1_WHEN P2_WHEN P1 P2 DIFF ----------------------- ----------------------- ----- ----- ---- 2000-07-17 12:31:41.203 2000-07-17 16:07:41.243 50.0 48.0 -2.0 2000-07-17 16:07:41.243 2000-07-17 23:56:41.233 48.0 55.7 7.7 2000-07-17 23:56:41.233 2000-07-18 00:13:41.243 55.7 32.8 -22.9 2000-07-18 00:13:41.243 2000-07-18 13:40:41.273 32.8 -18.2 -51.0 2000-07-18 13:40:41.273 2000-07-18 20:35:41.283 -18.2 27.2 45.4 2000-07-18 20:35:41.283 2000-07-18 21:53:41.243 27.2 34.6 7.4 2000-07-18 21:53:41.243 2000-07-19 01:21:41.283 34.6 -34.6 -69.2 2000-07-19 01:21:41.283 2000-07-19 22:52:41.253 -34.6 16.8 51.4 2000-07-19 22:52:41.253 2000-07-20 03:56:41.283 16.8 -27.8 -44.6 2000-07-20 03:56:41.283 2000-07-20 06:56:41.283 -27.8 38.1 65.9 2000-07-20 06:56:41.283 2000-07-21 08:28:41.243 38.1 67.6 29.5 2000-07-21 08:28:41.243 2000-07-21 20:43:41.283 67.6 -7.0 -74.6 2000-07-21 20:43:41.283 2000-07-22 19:12:41.293 -7.0 20.9 27.9 2000-07-22 19:12:41.293 2000-07-23 05:59:41.273 20.9 .4 -20.5 2000-07-23 05:59:41.273 2000-07-23 07:33:41.243 .4 58.1 57.7 2000-07-23 07:33:41.243 2000-07-23 09:25:41.243 58.1 65.6 7.5 2000-07-23 09:25:41.243 2000-07-23 16:04:41.283 65.6 7.0 -58.6 2000-07-23 16:04:41.283 2000-07-23 16:05:41.253 7.0 2.2 -4.8 2000-07-23 16:05:41.253 2000-07-23 20:59:41.253 2.2 30.9 28.7 2000-07-23 20:59:41.253 NULL 30.9 NULL NULL

NOTE
If you're interested in the derived table solution, go back to the rankings problem and note the difference between a view and a derived table; it should be easy to see how the derived table solution can be done here as well.

Approach 2: Materializing Rankings and Using a Self-Join

Conceptually, this solution is similar to the standard SQL approach. But rather than using a true view, we'll materialize the rankings as a temporary table, similar to the approach in one of the rankings solutions. Then we'll do the self-join with the temporary table. The advantage, of course, is that we create the rankings only once instead of many times. This is faster because the identity value simply assigns the rank in the order that the rows are presented.

In SQL Server 2000, this solution performs about 33 percent faster than the standard SQL solution. The results are the same, so I won't bother repeating the output. The standard SQL approach takes 122 logical reads with this dataset; the temporary table approach here takes 44 logical reads to create the temp table and 42 logical reads to access it. Because so little data is involved, no more than two physical reads occur in either of these solutions.

 CREATE TABLE #rankdates ( when_taken datetime, temperature numeric(4, 1), daterank int IDENTITY PRIMARY KEY) GO INSERT #rankdates (when_taken, temperature) SELECT when_taken, temperature FROM measurements ORDER BY when_taken ASC GO SELECT P1_WHEN=V1.when_taken, P2_WHEN=V2.when_taken, P1=V1.temperature, P2=V2.temperature, DIFF=(V2.temperature - V1.temperature) FROM #rankdates AS V1 LEFT OUTER JOIN #rankdates AS V2 ON (V2.daterank=V1.daterank + 1) GO DROP TABLE #rankdates

Selecting Instead of Iterating

One common need is to perform some operation for every value, from 1 to n. Earlier in this chapter, I used the trigonometry functions to produce a listing of sine/cosine/tangent for every 10 degrees between 0 and 180. I took the obvious route and wrote a simple loop. However, that approach results in every row being its own result set. As I've mentioned many times, sending many result sets when one will do is inefficient, and it is also more cumbersome to process in the client application. To refresh your memory, here's the solution I used earlier:

 DECLARE @degrees smallint DECLARE @radians float SELECT @degrees=0 SELECT @radians=0 WHILE (@degrees <= 180) BEGIN SELECT DEGREES=@degrees, RADIANS=STR(@radians, 8, 5), SINE=STR(SIN(@radians), 8, 5), COSINE=STR(COS(@radians), 8, 5), TANGENT=STR(TAN(@radians), 8, 5) SELECT @degrees=@degrees + 10 SELECT @radians=RADIANS(CONVERT(float, @degrees)) END

Since it's common to iterate in this way, a simple table of ordered numbers can come in handy:

 -- Create the seq_num table CREATE TABLE seq_num (seq_num INT PRIMARY KEY NOT NULL) -- Populate the seq_num table with values from -500 through 500 DECLARE @counter int SELECT @counter= -500 WHILE (@counter <= 500) BEGIN INSERT seq_num VALUES (@counter) SELECT @counter=@counter + 1 END -- If doing this for real, you might as well set FILLFACTOR to 100

I can select or join to this handy seq_num table rather than iterating. The following solution runs much faster than the earlier solution. It returns a single result set, and it's easier to write:

 SELECT ANGLE=seq_num * 10, SINE=STR(SIN(seq_num * 10), 7, 4), COSINE=STR(COS(seq_num * 10), 7, 4), TANGENT=STR(TAN(seq_num * 10), 7, 4) FROM seq_num WHERE seq_num BETWEEN 0 AND 18 ANGLE SINE COSINE TANGENT ----- ------- ------- ------- 0 0.0000 1.0000 0.0000 10 -0.5440 -0.8391 0.6484 20 0.9129 0.4081 2.2372 30 -0.9880 0.1543 -6.4053 40 0.7451 -0.6669 -1.1172 50 -0.2624 0.9650 -0.2719 60 -0.3048 -0.9524 0.3200 70 0.7739 0.6333 1.2220 80 -0.9939 -0.1104 9.0037 90 0.8940 -0.4481 -1.9952 100 -0.5064 0.8623 -0.5872 110 -0.0442 -0.9990 0.0443 120 0.5806 0.8142 0.7131 130 -0.9301 -0.3673 2.5323 140 0.9802 -0.1978 -4.9554 150 -0.7149 0.6993 -1.0223 160 0.2194 -0.9756 -0.2249 170 0.3466 0.9380 0.3696 180 -0.8012 -0.5985 1.3387