Recipe 8.15. Date-Based Summaries


Problem

You want to produce a summary based on date or time values.

Solution

Use GROUP BY to place temporal values into categories of the appropriate duration. Often this involves using expressions to extract the significant parts of dates or times.

Discussion

To put rows in time order, use an ORDER BY clause to sort a column that has a temporal type. If instead you want to summarize rows based on groupings into time intervals, you need to determine how to categorize each row into the proper interval and use GROUP BY to group them accordingly.

For example, to determine how many drivers were on the road and how many miles were driven each day, group the rows in the driver_log table by date:

mysql> SELECT trav_date,     -> COUNT(*) AS 'number of drivers', SUM(miles) As 'miles logged'     -> FROM driver_log GROUP BY trav_date; +------------+-------------------+--------------+ | trav_date  | number of drivers | miles logged | +------------+-------------------+--------------+ | 2006-08-26 |                 1 | 115          | | 2006-08-27 |                 1 | 96           | | 2006-08-29 |                 3 | 822          | | 2006-08-30 |                 2 | 355          | | 2006-09-01 |                 1 | 197          | | 2006-09-02 |                 2 | 581          | +------------+-------------------+--------------+ 

However, this summary will grow lengthier as you add more rows to the table. At some point, the number of distinct dates likely will become so large that the summary fails to be useful, and you'd probably decide to change the category size from daily to weekly or monthly.

When a temporal column contains so many distinct values that it fails to categorize well, it's typical for a summary to group rows using expressions that map the relevant parts of the date or time values onto a smaller set of categories. For example, to produce a time-of-day summary for rows in the mail table, do this:[*]

[*] Note that the result includes an entry only for hours of the day actually represented in the data. To generate a summary with an entry for every hour, use a join to fill in the "missing" values. See Section 12.8.

mysql> SELECT HOUR(t) AS hour,     -> COUNT(*) AS 'number of messages',     -> SUM(size) AS 'number of bytes sent'     -> FROM mail     -> GROUP BY hour; +------+--------------------+----------------------+ | hour | number of messages | number of bytes sent | +------+--------------------+----------------------+ |    7 |                  1 |                 3824 | |    8 |                  1 |                  978 | |    9 |                  2 |                 2904 | |   10 |                  2 |              1056806 | |   11 |                  1 |                 5781 | |   12 |                  2 |               195798 | |   13 |                  1 |                  271 | |   14 |                  1 |                98151 | |   15 |                  1 |                 1048 | |   17 |                  2 |              2398338 | |   22 |                  1 |                23992 | |   23 |                  1 |                10294 | +------+--------------------+----------------------+ 

To produce a day-of-week summary instead, use the DAYOFWEEK⁠(⁠ ⁠ ⁠) function:

mysql> SELECT DAYOFWEEK(t) AS weekday,     -> COUNT(*) AS 'number of messages',     -> SUM(size) AS 'number of bytes sent'     -> FROM mail     -> GROUP BY weekday; +---------+--------------------+----------------------+ | weekday | number of messages | number of bytes sent | +---------+--------------------+----------------------+ |       1 |                  1 |                  271 | |       2 |                  4 |              2500705 | |       3 |                  4 |              1007190 | |       4 |                  2 |                10907 | |       5 |                  1 |                  873 | |       6 |                  1 |                58274 | |       7 |                  3 |               219965 | +---------+--------------------+----------------------+ 

To make the output more meaningful, you might want to use DAYNAME⁠(⁠ ⁠ ⁠) to display weekday names instead. However, because day names sort lexically (for example, "Tuesday" sorts after "Friday"), use DAYNAME⁠(⁠ ⁠ ⁠) only for display purposes. Continue to group based on the numeric day values so that output rows sort that way:

mysql> SELECT DAYNAME(t) AS weekday,     -> COUNT(*) AS 'number of messages',     -> SUM(size) AS 'number of bytes sent'     -> FROM mail     -> GROUP BY DAYOFWEEK(t); +-----------+--------------------+----------------------+ | weekday   | number of messages | number of bytes sent | +-----------+--------------------+----------------------+ | Sunday    |                  1 |                  271 | | Monday    |                  4 |              2500705 | | Tuesday   |                  4 |              1007190 | | Wednesday |                  2 |                10907 | | Thursday  |                  1 |                  873 | | Friday    |                  1 |                58274 | | Saturday  |                  3 |               219965 | +-----------+--------------------+----------------------+ 

A similar technique can be used for summarizing month-of-year categories that are sorted by numeric value but displayed by month name.

Uses for temporal categorizations are numerous:

  • DATETIME or TIMESTAMP columns have the potential to contain many unique values. To produce daily summaries, strip off the time of day part to collapse all values occurring within a given day to the same value. Any of the following GROUP BY clauses will do this, although the last one is likely to be slowest:

    GROUP BY DATE(col_name) GROUP BY FROM_DAYS(TO_DAYS(col_name)) GROUP BY YEAR(col_name), MONTH(col_name), DAYOFMONTH(col_name) GROUP BY DATE_FORMAT(col_name,'%Y-%m-%e') 

  • To produce monthly or quarterly sales reports, group by MONTH( col_name ) or QUARTER( col_name ) to place dates into the correct part of the year.

  • To summarize web server activity, store your server's logs in MySQL and run statements that collapse the rows into different time categories. Section 19.14 discusses how to do this for Apache.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net