Recipe12.17.Grouping Rows by Units of Time


Recipe 12.17. Grouping Rows by Units of Time

Problem

You want to summarize data by some interval of time. For example, you have a transaction log and want to summarize transactions by 5-second intervals. The rows in table TRX_LOG are shown below:

  select trx_id,        trx_date,        trx_cnt   from trx_log TRX_ID TRX_DATE                TRX_CNT ------ -------------------- ----------      1 28-JUL-2005 19:03:07         44      2 28-JUL-2005 19:03:08         18      3 28-JUL-2005 19:03:09         23      4 28-JUL-2005 19:03:10         29      5 28-JUL-2005 19:03:11         27      6 28-JUL-2005 19:03:12         45      7 28-JUL-2005 19:03:13         45      8 28-JUL-2005 19:03:14         32      9 28-JUL-2005 19:03:15         41     10 28-JUL-2005 19:03:16         15     11 28-JUL-2005 19:03:17         24     12 28-JUL-2005 19:03:18         47     13 28-JUL-2005 19:03:19         37     14 28-JUL-2005 19:03:20         48     15 28-JUL-2005 19:03:21         46     16 28-JUL-2005 19:03:22         44     17 28-JUL-2005 19:03:23         36     18 28-JUL-2005 19:03:24         41     19 28-JUL-2005 19:03:25         33     20 28-JUL-2005 19:03:26         19 

You want to return the following result set:

 GRP TRX_START            TRX_END                   TOTAL --- -------------------- -------------------- ----------   1 28-JUL-2005 19:03:07 28-JUL-2005 19:03:11        141   2 28-JUL-2005 19:03:12 28-JUL-2005 19:03:16        178   3 28-JUL-2005 19:03:17 28-JUL-2005 19:03:21        202   4 28-JUL-2005 19:03:22 28-JUL-2005 19:03:26        173 

Solution

Group the entries into five row buckets. There are several ways to accomplish that logical grouping; this recipe does so by dividing the TRX_ID values by 5, using a technique shown earlier in "Creating Buckets of Data, of a Fixed Size."

Once you've created the "groups," use the aggregate functions MIN, MAX, and SUM to find the start time, end time, and total number of transactions for each "group" (SQL Server users should use CEILING instead of CEIL):

  1 select ceil(trx_id/5.0) as grp,  2        min(trx_date)    as trx_start,  3        max(trx_date)    as trx_end,  4        sum(trx_cnt)     as total  5   from trx_log  6 group by ceil(trx_id/5.0) 

Discussion

The first step, and the key to the whole solution, is to logically group the rows together. By dividing by 5 and taking the smallest whole number greater than the quotient, you can create logical groups. For example:

  select trx_id,        trx_date,        trx_cnt,        trx_id/5.0 as val,        ceil(trx_id/5.0) as grp   from trx_log TRX_ID TRX_DATE             TRX_CNT     VAL GRP ------ -------------------- -------  ------ ---      1 28-JUL-2005 19:03:07      44     .20   1      2 28-JUL-2005 19:03:08      18     .40   1      3 28-JUL-2005 19:03:09      23     .60   1      4 28-JUL-2005 19:03:10      29     .80   1      5 28-JUL-2005 19:03:11      27    1.00   1      6 28-JUL-2005 19:03:12      45    1.20   2      7 28-JUL-2005 19:03:13      45    1.40   2      8 28-JUL-2005 19:03:14      32    1.60   2      9 28-JUL-2005 19:03:15      41    1.80   2     10 28-JUL-2005 19:03:16      15    2.00   2     11 28-JUL-2005 19:03:17      24    2.20   3     12 28-JUL-2005 19:03:18      47    2.40   3     13 28-JUL-2005 19:03:19      37    2.60   3     14 28-JUL-2005 19:03:20      48    2.80   3     15 28-JUL-2005 19:03:21      46    3.00   3     16 28-JUL-2005 19:03:22      44    3.20   4     17 28-JUL-2005 19:03:23      36    3.40   4     18 28-JUL-2005 19:03:24      41    3.60   4     19 28-JUL-2005 19:03:25      33    3.80   4     20 28-JUL-2005 19:03:26      19    4.00   4 

The last step is to apply the appropriate aggregate functions to find the total number of transactions per 5 seconds along with the start and end times for each transaction:

  select ceil(trx_id/5.0) as grp,        min(trx_date) as trx_start,        max(trx_date) as trx_end,        sum(trx_cnt) as total   from trx_log  group by ceil(trx_id/5.0) GRP TRX_START            TRX_END                   TOTAL --- -------------------- -------------------- ----------   1 28-JUL-2005 19:03:07 28-JUL-2005 19:03:11        141   2 28-JUL-2005 19:03:12 28-JUL-2005 19:03:16        178   3 28-JUL-2005 19:03:17 28-JUL-2005 19:03:21        202   4 28-JUL-2005 19:03:22 28-JUL-2005 19:03:26        173 

If your data is slightly different (perhaps you don't have an ID for each row), you can always "group" by dividing the seconds of each TRX_DATE row by 5 to create a similar grouping. Then you can include the hour for each TRX_DATE and group by the actual hour and logical "grouping," GRP. Following is an example of this technique (using Oracle's TO_CHAR and TO_NUMBER functions, you would use the appropriate date and character formatting functions for your platform):

  select trx_date,trx_cnt,        to_number(to_char(trx_date,'hh24')) hr,        ceil(to_number(to_char(trx_date-1/24/60/60,'miss'))/5.0) grp   from trx_log TRX_DATE                TRX_CNT         HR         GRP -------------------- ---------- ----------  ---------- 28-JUL-2005 19:03:07         44         19          62 28-JUL-2005 19:03:08         18         19          62 28-JUL-2005 19:03:09         23         19          62 28-JUL-2005 19:03:10         29         19          62 28-JUL-2005 19:03:11         27         19          62 28-JUL-2005 19:03:12         45         19          63 28-JUL-2005 19:03:13         45         19          63 28-JUL-2005 19:03:14         32         19          63 28-JUL-2005 19:03:15         41         19          63 28-JUL-2005 19:03:16         15         19          63 28-JUL-2005 19:03:17         24         19          64 28-JUL-2005 19:03:18         47         19          64 28-JUL-2005 19:03:19         37         19          64 28-JUL-2005 19:03:20         48         19          64 28-JUL-2005 19:03:21         46         19          64 28-JUL-2005 19:03:22         44         19          65 28-JUL-2005 19:03:23         36         19          65 28-JUL-2005 19:03:24         41         19          65 28-JUL-2005 19:03:25         33         19          65 28-JUL-2005 19:03:26         19         19          65 

Regardless of the actual values for GRP, the key here is that you are grouping for every 5 seconds. From there you can apply the aggregate functions in the same way as in the original solution:

  select hr,grp,sum(trx_cnt) total   from ( select trx_date,trx_cnt,        to_number(to_char(trx_date,'hh24')) hr,        ceil(to_number(to_char(trx_date-1/24/60/60,'miss'))/5.0) grp   from trx_log        ) x  group by hr,grp HR        GRP       TOTAL -- ----------  ---------- 19         62         141 19         63         178 19         64         202 19         65         173 

Including the hour in the grouping is useful if your transaction log spans hours. In DB2 and Oracle, you can also use the window function SUM OVER to produce the same result. The following query returns all rows from TRX_LOG along with a running total for TRX_CNT by logical "group," and the TOTAL for TRX_CNT for each row in the "group":

  select trx_id, trx_date, trx_cnt,        sum(trx_cnt)over(partition by ceil(trx_id/5.0)                         order by trx_date                         range between unbounded preceding                           and current row) runing_total,        sum(trx_cnt)over(partition by ceil(trx_id/5.0)) total,        case when mod(trx_id,5.0) = 0 then 'X' end grp_end   from trx_log TRX_ID TRX_DATE                TRX_CNT RUNING_TOTAL      TOTAL GRP_END ------ -------------------- ---------- ------------ ---------- -------      1 28-JUL-2005 19:03:07         44           44       141      2 28-JUL-2005 19:03:08         18           62       141      3 28-JUL-2005 19:03:09         23           85       141      4 28-JUL-2005 19:03:10         29          114       141      5 28-JUL-2005 19:03:11         27          141       141  X      6 28-JUL-2005 19:03:12         45           45       178      7 28-JUL-2005 19:03:13         45           90       178      8 28-JUL-2005 19:03:14         32          122       178      9 28-JUL-2005 19:03:15         41          163       178     10 28-JUL-2005 19:03:16         15          178       178  X     11 28-JUL-2005 19:03:17         24           24       202     12 28-JUL-2005 19:03:18         47           71       202     13 28-JUL-2005 19:03:19         37          108       202     14 28-JUL-2005 19:03:20         48          156       202     15 28-JUL-2005 19:03:21         46          202       202  X     16 28-JUL-2005 19:03:22         44           44       173     17 28-JUL-2005 19:03:23         36           80       173     18 28-JUL-2005 19:03:24         41          121       173     19 28-JUL-2005 19:03:25         33          154       173     20 28-JUL-2005 19:03:26         19          173       173  X 




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net