Recipe 12.17. Grouping Rows by Units of TimeProblemYou want to summarize data by some interval of time. For example, you have a transaction log and want to summarize transactions by 5-second intervals. The rows in table TRX_LOG are shown below: select trx_id, trx_date, trx_cnt from trx_log TRX_ID TRX_DATE TRX_CNT ------ -------------------- ---------- 1 28-JUL-2005 19:03:07 44 2 28-JUL-2005 19:03:08 18 3 28-JUL-2005 19:03:09 23 4 28-JUL-2005 19:03:10 29 5 28-JUL-2005 19:03:11 27 6 28-JUL-2005 19:03:12 45 7 28-JUL-2005 19:03:13 45 8 28-JUL-2005 19:03:14 32 9 28-JUL-2005 19:03:15 41 10 28-JUL-2005 19:03:16 15 11 28-JUL-2005 19:03:17 24 12 28-JUL-2005 19:03:18 47 13 28-JUL-2005 19:03:19 37 14 28-JUL-2005 19:03:20 48 15 28-JUL-2005 19:03:21 46 16 28-JUL-2005 19:03:22 44 17 28-JUL-2005 19:03:23 36 18 28-JUL-2005 19:03:24 41 19 28-JUL-2005 19:03:25 33 20 28-JUL-2005 19:03:26 19 You want to return the following result set: GRP TRX_START TRX_END TOTAL --- -------------------- -------------------- ---------- 1 28-JUL-2005 19:03:07 28-JUL-2005 19:03:11 141 2 28-JUL-2005 19:03:12 28-JUL-2005 19:03:16 178 3 28-JUL-2005 19:03:17 28-JUL-2005 19:03:21 202 4 28-JUL-2005 19:03:22 28-JUL-2005 19:03:26 173 SolutionGroup the entries into five row buckets. There are several ways to accomplish that logical grouping; this recipe does so by dividing the TRX_ID values by 5, using a technique shown earlier in "Creating Buckets of Data, of a Fixed Size." Once you've created the "groups," use the aggregate functions MIN, MAX, and SUM to find the start time, end time, and total number of transactions for each "group" (SQL Server users should use CEILING instead of CEIL): 1 select ceil(trx_id/5.0) as grp, 2 min(trx_date) as trx_start, 3 max(trx_date) as trx_end, 4 sum(trx_cnt) as total 5 from trx_log 6 group by ceil(trx_id/5.0) DiscussionThe first step, and the key to the whole solution, is to logically group the rows together. By dividing by 5 and taking the smallest whole number greater than the quotient, you can create logical groups. For example: select trx_id, trx_date, trx_cnt, trx_id/5.0 as val, ceil(trx_id/5.0) as grp from trx_log TRX_ID TRX_DATE TRX_CNT VAL GRP ------ -------------------- ------- ------ --- 1 28-JUL-2005 19:03:07 44 .20 1 2 28-JUL-2005 19:03:08 18 .40 1 3 28-JUL-2005 19:03:09 23 .60 1 4 28-JUL-2005 19:03:10 29 .80 1 5 28-JUL-2005 19:03:11 27 1.00 1 6 28-JUL-2005 19:03:12 45 1.20 2 7 28-JUL-2005 19:03:13 45 1.40 2 8 28-JUL-2005 19:03:14 32 1.60 2 9 28-JUL-2005 19:03:15 41 1.80 2 10 28-JUL-2005 19:03:16 15 2.00 2 11 28-JUL-2005 19:03:17 24 2.20 3 12 28-JUL-2005 19:03:18 47 2.40 3 13 28-JUL-2005 19:03:19 37 2.60 3 14 28-JUL-2005 19:03:20 48 2.80 3 15 28-JUL-2005 19:03:21 46 3.00 3 16 28-JUL-2005 19:03:22 44 3.20 4 17 28-JUL-2005 19:03:23 36 3.40 4 18 28-JUL-2005 19:03:24 41 3.60 4 19 28-JUL-2005 19:03:25 33 3.80 4 20 28-JUL-2005 19:03:26 19 4.00 4 The last step is to apply the appropriate aggregate functions to find the total number of transactions per 5 seconds along with the start and end times for each transaction: select ceil(trx_id/5.0) as grp, min(trx_date) as trx_start, max(trx_date) as trx_end, sum(trx_cnt) as total from trx_log group by ceil(trx_id/5.0) GRP TRX_START TRX_END TOTAL --- -------------------- -------------------- ---------- 1 28-JUL-2005 19:03:07 28-JUL-2005 19:03:11 141 2 28-JUL-2005 19:03:12 28-JUL-2005 19:03:16 178 3 28-JUL-2005 19:03:17 28-JUL-2005 19:03:21 202 4 28-JUL-2005 19:03:22 28-JUL-2005 19:03:26 173 If your data is slightly different (perhaps you don't have an ID for each row), you can always "group" by dividing the seconds of each TRX_DATE row by 5 to create a similar grouping. Then you can include the hour for each TRX_DATE and group by the actual hour and logical "grouping," GRP. Following is an example of this technique (using Oracle's TO_CHAR and TO_NUMBER functions, you would use the appropriate date and character formatting functions for your platform): select trx_date,trx_cnt, to_number(to_char(trx_date,'hh24')) hr, ceil(to_number(to_char(trx_date-1/24/60/60,'miss'))/5.0) grp from trx_log TRX_DATE TRX_CNT HR GRP -------------------- ---------- ---------- ---------- 28-JUL-2005 19:03:07 44 19 62 28-JUL-2005 19:03:08 18 19 62 28-JUL-2005 19:03:09 23 19 62 28-JUL-2005 19:03:10 29 19 62 28-JUL-2005 19:03:11 27 19 62 28-JUL-2005 19:03:12 45 19 63 28-JUL-2005 19:03:13 45 19 63 28-JUL-2005 19:03:14 32 19 63 28-JUL-2005 19:03:15 41 19 63 28-JUL-2005 19:03:16 15 19 63 28-JUL-2005 19:03:17 24 19 64 28-JUL-2005 19:03:18 47 19 64 28-JUL-2005 19:03:19 37 19 64 28-JUL-2005 19:03:20 48 19 64 28-JUL-2005 19:03:21 46 19 64 28-JUL-2005 19:03:22 44 19 65 28-JUL-2005 19:03:23 36 19 65 28-JUL-2005 19:03:24 41 19 65 28-JUL-2005 19:03:25 33 19 65 28-JUL-2005 19:03:26 19 19 65 Regardless of the actual values for GRP, the key here is that you are grouping for every 5 seconds. From there you can apply the aggregate functions in the same way as in the original solution: select hr,grp,sum(trx_cnt) total from ( select trx_date,trx_cnt, to_number(to_char(trx_date,'hh24')) hr, ceil(to_number(to_char(trx_date-1/24/60/60,'miss'))/5.0) grp from trx_log ) x group by hr,grp HR GRP TOTAL -- ---------- ---------- 19 62 141 19 63 178 19 64 202 19 65 173 Including the hour in the grouping is useful if your transaction log spans hours. In DB2 and Oracle, you can also use the window function SUM OVER to produce the same result. The following query returns all rows from TRX_LOG along with a running total for TRX_CNT by logical "group," and the TOTAL for TRX_CNT for each row in the "group": select trx_id, trx_date, trx_cnt, sum(trx_cnt)over(partition by ceil(trx_id/5.0) order by trx_date range between unbounded preceding and current row) runing_total, sum(trx_cnt)over(partition by ceil(trx_id/5.0)) total, case when mod(trx_id,5.0) = 0 then 'X' end grp_end from trx_log TRX_ID TRX_DATE TRX_CNT RUNING_TOTAL TOTAL GRP_END ------ -------------------- ---------- ------------ ---------- ------- 1 28-JUL-2005 19:03:07 44 44 141 2 28-JUL-2005 19:03:08 18 62 141 3 28-JUL-2005 19:03:09 23 85 141 4 28-JUL-2005 19:03:10 29 114 141 5 28-JUL-2005 19:03:11 27 141 141 X 6 28-JUL-2005 19:03:12 45 45 178 7 28-JUL-2005 19:03:13 45 90 178 8 28-JUL-2005 19:03:14 32 122 178 9 28-JUL-2005 19:03:15 41 163 178 10 28-JUL-2005 19:03:16 15 178 178 X 11 28-JUL-2005 19:03:17 24 24 202 12 28-JUL-2005 19:03:18 47 71 202 13 28-JUL-2005 19:03:19 37 108 202 14 28-JUL-2005 19:03:20 48 156 202 15 28-JUL-2005 19:03:21 46 202 202 X 16 28-JUL-2005 19:03:22 44 44 173 17 28-JUL-2005 19:03:23 36 80 173 18 28-JUL-2005 19:03:24 41 121 173 19 28-JUL-2005 19:03:25 33 154 173 20 28-JUL-2005 19:03:26 19 173 173 X |