Recipe9.10.Filling in Missing Dates | SQL Cookbook (Cookbooks (OReilly))

Recipe 9.10. Filling in Missing Dates

Problem

You need to generate a row for every date (or every month, week, or year) within a given range. Such rowsets are often used to generate summary reports. For example, you want to count the number of employees hired every month of every year in which any employee has been hired. Examining the dates of all the employees hired, there have been hirings from 1980 to 1983:

  select distinct        extract(year from hiredate) as year   from emp YEAR -----  1980  1981  1982  1983

You want to determine the number of employees hired each month from 1980 to 1983. A portion of the desired result set is shown below:

 MTH          NUM_HIRED ----------- ---------- 01-JAN-1981          0 01-FEB-1981          2 01-MAR-1981          0 01-APR-1981          1 01-MAY-1981          1 01-JUN-1981          1 01-JUL-1981          0 01-AUG-1981          0 01-SEP-1981          2 01-OCT-1981          0 01-NOV-1981          1 01-DEC-1981          2

Solution

The trick here is that you want to return a row for each month even if no employee was hired (i.e., the count would be zero). Because there isn't an employee hired every month between 1980 and 1983, you must generate those months yourself, and then outer join to table EMP on HIREDATE (truncating the actual HIREDATE to its month, so it can match the generated months when possible).

DB2

Use the recursive WITH clause to generate every month (the first day of each month from January 1, 1980, to December 1, 1983). Once you have all the months for the required range of dates, outer join to table EMP and use the aggregate function COUNT to count the number of hires for each month:

  1   with x (start_date,end_date)  2     as (  3 select (min(hiredate)  4          dayofyear(min(hiredate)) day +1 day) start_date,  5        (max(hiredate)  6          dayofyear(max(hiredate)) day +1 day) +1 year end_date  7   from emp  8  union all  9 select start_date +1 month, end_date 10   from x 11  where (start_date +1 month) < end_date 12 ) 13 select x.start_date mth, count(e.hiredate) num_hired 14   from x left join emp e 15     on (x.start_date = (e.hiredate-(day(hiredate)-1) day)) 16  group by x.start_date 17  order by 1

Oracle

Use the CONNECT BY clause to generate each month between 1980 and 1983. Then outer join to table EMP and use the aggregate function COUNT to count the number of employees hired in each month. If you are on Oracle8i Database and earlier, the ANSI outer join is not available to you, nor is the ability to use CONNECT BY as a row generator; a simple workaround is to use a traditional pivot table (like the one used in the MySQL solution). Following as an Oracle solution using Oracle's outer-join syntax:

  1   with x  2     as (  3 select add_months(start_date,level-1) start_date  4   from (  5 select min(trunc(hiredate,'y')) start_date,  6        add_months(max(trunc(hiredate,'y')),12) end_date  7   from emp  8        )  9  connect by level <= months_between(end_date,start_date) 10 ) 11 select x.start_date MTH, count(e.hiredate) num_hired 12   from x, emp e 13  where x.start_date = trunc(e.hiredate(+),'mm') 14  group by x.start_date 15  order by 1

and here is a second Oracle solution, this time using the ANSI syntax:

  1   with x  2     as (  3 select add_months(start_date,level-1) start_date  4   from (  5 select min(trunc(hiredate,'y')) start_date,  6        add_months(max(trunc(hiredate,'y')),12) end_date  7   from emp  8        )  9  connect by level <= months_between(end_date,start_date) 10 ) 11 select x.start_date MTH, count(e.hiredate) num_hired 12   from x left join emp e 13     on (x.start_date = trunc(e.hiredate,'mm')) 14  group by x.start_date 15  order by 1

PostgreSQL

To improve readability, this solution uses a view, named V, to return the number of months between the first day of the first month of the year the first employee was hired and the first day of the last month of the year the most recent employee was hired. Use the value returned by view V as the second value passed to the function GENERATE_SERIES, so that the correct number of months (rows) are generated. Once you have all the months for the required range of dates, outer join to table EMP and use the aggregate function COUNT to count the number of hires for each month:

 create view v as select cast(         extract(year from age(last_month,first_month))*12-1           as integer) as mths   from ( select cast(date_trunc('year',min(hiredate)) as date) as first_month,        cast(cast(date_trunc('year',max(hiredate))              as date) + interval '1 year'              as date) as last_month   from emp        ) x 1 select y.mth, count(e.hiredate) as num_hired 2   from ( 3 select cast(e.start_date + (x.id * interval '1 month') 4          as date) as mth 5   from generate_series (0,(select mths from v)) x(id), 6        ( select cast( 7                  date_trunc('year',min(hiredate)) 8                    as date) as start_date 9            from emp ) e 10        ) y left join emp e 11     on (y.mth = date_trunc('month',e.hiredate)) 12 group by y.mth 13 order by 1

MySQL

Use the pivot table T500 to generate each month between 1980 and 1983. Then outer join to table EMP and use the aggregate function COUNT to count the number of employees hired for each month:

 1 select z.mth, count(e.hiredate) num_hired 2   from ( 3 select date_add(min_hd,interval t500.id-1 month) mth 4   from ( 5 select min_hd, date_add(max_hd,interval 11 month) max_hd 6   from ( 7 select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd, 8        adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd 9   from emp 10        ) x 11        ) y, 12        t500 13 where date_add(min_hd,interval t500.id-1 month) <= max_hd 14       ) z left join emp e 15    on (z.mth = adddate( 16               date_add( 17               last_day(e.hiredate),interval -1 month),1)) 18 group by z.mth 19 order by 1

SQL Server

 1   with x (start_date,end_date) 2     as ( 3 select (min(hiredate) 4          datepart(dy,min(hiredate))+1) start_date, 5        dateadd(yy,1, 6         (max(hiredate) 7          datepart(dy,max(hiredate))+1)) end_date 8   from emp 9  union all 10 select dateadd(mm,1,start_date), end_date 11   from x 12  where dateadd(mm,1,start_date) < end_date 13 ) 14 select x.start_date mth, count(e.hiredate) num_hired 15   from x left join emp e 16     on (x.start_date = 17            dateadd(dd,-day(e.hiredate)+1,e.hiredate)) 18 group by x.start_date 19 order by 1

Discussion

DB2

The first step is to generate every month (actually the first day of each month) from 1980 to 1983. Start using the DAYOFYEAR function on the MIN and MAX HIREDATEs to find the boundary months:

  select (min(hiredate)          dayofyear(min(hiredate)) day +1 day) start_date,        (max(hiredate)          dayofyear(max(hiredate)) day +1 day) +1 year end_date   from emp START_DATE  END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984

Your next step is to repeatedly add months to START_DATE to return all the months necessary for the final result set. The value for END_DATE is one day more than it should be. This is OK. As you recursively add months to START_DATE, you can stop before you hit END_DATE. A portion of the months created is shown below:

  with x (start_date,end_date)   as ( select (min(hiredate)          dayofyear(min(hiredate)) day +1 day) start_date,        (max(hiredate)          dayofyear(max(hiredate)) day +1 day) +1 year end_date   from emp  union all select start_date +1 month, end_date   from x  where (start_date +1 month) < end_date ) select *   from x START_DATE  END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 01-FEB-1980 01-JAN-1984 01-MAR-1980 01-JAN-1984 … 01-OCT-1983 01-JAN-1984 01-NOV-1983 01-JAN-1984 01-DEC-1983 01-JAN-1984

At this point, you have all the months you need, and you can simply outer join to EMP.HIREDATE. Because the day for each START_DATE is the first of the month, truncate EMP.HIREDATE to the first day of its month. Finally, use the aggregate function COUNT on EMP.HIREDATE.

Oracle

The first step is to generate the first day of every for every month from 1980 to 1983. Start by using TRUNC and ADD_MONTHS together with the MIN and MAX HIREDATE values to find the boundary months:

  select min(trunc(hiredate,'y')) start_date,        add_months(max(trunc(hiredate,'y')),12) end_date   from emp START_DATE  END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984

Then repeatedly add months to START_DATE to return all the months necessary for the final result set. The value for END_DATE is one day more than it should be, which is OK. As you recursively add months to START_DATE, you can stop before you hit END_DATE. A portion of the months created is shown below:

  with x as ( select add_months(start_date,level-1) start_date   from ( select min(trunc(hiredate,'y')) start_date,        add_months(max(trunc(hiredate,'y')),12) end_date   from emp        )  connect by level <= months_between(end_date,start_date) ) select *   from x START_DATE ----------- 01-JAN-1980 01-FEB-1980 01-MAR-1980 … 01-OCT-1983 01-NOV-1983 01-DEC-1983

At this point, you have all the months you need; simply outer join to EMP.HIREDATE. Because the day for each START_DATE is the first of the month, truncate EMP.HIREDATE to the first day of the month it is in. The final step is to use the aggregate function COUNT on EMP.HIREDATE.

PostgreSQL

This solution uses the function GENERATE_SERIES to return the months you need. If you do not have the GENERATE_SERIES function available, you can use a pivot table as in the MySQL solution. The first step is to understand view V. View V simply finds the number of months you'll need to generate by finding the boundary dates for the range. Inline view X in view V uses the MIN and MAX HIREDATEs to find the start and end boundary dates and is shown below:

  select cast(date_trunc('year',min(hiredate)) as date) as first_month,        cast(cast(date_trunc('year',max(hiredate))              as date) + interval '1 year'              as date) as last_month   from emp FIRST_MONTH LAST_MONTH ----------- ----------- 01-JAN-1980 01-JAN-1984

The value for LAST_MONTH is actually one day more than it should be. This is fine, as you can just subtract 1 when you calculate the months between these two dates. The next step is to use the AGE function to find the difference between the two dates in years, then multiply by 12 (and remember, subtract by 1!):

  select cast(         extract(year from age(last_month,first_month))*12-1           as integer) as mths   from ( select cast(date_trunc('year',min(hiredate)) as date) as first_month,        cast(cast(date_trunc('year',max(hiredate))              as date) + interval '1 year'              as date) as last_month   from emp        ) x MTHS ----   47

Use the value returned by view V as the second parameter of GENERATE_SERIES to return the number of months you need. Your next step is then to find your start date. You'll repeatedly add months to your start date to create your range of months. Inline view Y uses the DATE_TRUNC function on the MIN(HIREDATE) to find the start date, and uses the values returned by GENERATE_SERIES to add months. Partial results are shown below:

  select cast(e.start_date + (x.id * interval '1 month')         as date) as mth   from generate_series (0,(select mths from v)) x(id),        ( select cast(                  date_trunc('year',min(hiredate))                    as date) as start_date            from emp        )  e MTH ----------- 01-JAN-1980 01-FEB-1980 01-MAR-1980 … 01-OCT-1983 01-NOV-1983 01-DEC-1983

Now that you have each month you need for the final result set, outer join to EMP. HIREDATE and use the aggregate function COUNT to count the number of hires for each month.

MySQL

First, find the boundary dates by using the aggregate functions MIN and MAX along with the DAYOFYEAR and ADDDATE functions. The result set shown below is from inline view X:

  select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd,        adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd   from emp MIN_HD      MAX_HD ----------- ----------- 01-JAN-1980 01-JAN-1983

Next, increment MAX_HD to the last month of the year:

  select min_hd, date_add(max_hd,interval 11 month) max_hd   from ( select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd,        adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd   from emp        ) x MIN_HD      MAX_HD ----------- ----------- 01-JAN-1980 01-DEC-1983

Now that you have the boundary dates, add months to MIN_HD up to and including MAX_HD by using pivot table T500 to generate the rows you need. A portion of the results is shown below:

  select date_add(min_hd,interval t500.id-1 month) mth   from ( select min_hd, date_add(max_hd,interval 11 month) max_hd   from ( select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd,        adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd   from emp        ) x        ) y,        t500 where date_add(min_hd,interval t500.id-1 month) <= max_hd MTH ----------- 01-JAN-1980 01-FEB-1980 01-MAR-1980 … 01-OCT-1983 01-NOV-1983 01-DEC-1983

Now that you have all the months you need for the final result set, outer join to EMP.HIREDATE (be sure to truncate EMP.HIREDATE to the first day of the month) and use the aggregate function COUNT on EMP.HIREDATE to count the number of hires in each month.

SQL Server

Begin by generating every month (actually, the first day of each month) from 1980 to 1983. Then find the boundary months by applying the DAYOFYEAR function to the MIN and MAX HIREDATEs:

  select (min(hiredate) -          datepart(dy,min(hiredate))+1) start_date,          dateadd(yy,1,         (max(hiredate) -          datepart(dy,max(hiredate))+1)) end_date   from emp START_DATE  END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984

Your next step is to repeatedly add months to START_DATE to return all the months necessary for the final result set. The value for END_DATE is one day more than it should be, which is OK, as you can stop recursively adding months to START_DATE before you hit END_DATE. A portion of the months created is shown below:

  with x (start_date,end_date)   as ( select (min(hiredate) -          datepart(dy,min(hiredate))+1) start_date,        dateadd(yy,1,         (max(hiredate) -          datepart(dy,max(hiredate))+1)) end_date   from emp  union all select dateadd(mm,1,start_date), end_date   from x  where dateadd(mm,1,start_date) < end_date ) select *   from x START_DATE  END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 01-FEB-1980 01-JAN-1984 01-MAR-1980 01-JAN-1984 … 01-OCT-1983 01-JAN-1984 01-NOV-1983 01-JAN-1984 01-DEC-1983 01-JAN-1984

At this point, you have all the months you need. Simply outer join to EMP.HIREDATE. Because the day for each START_DATE is the first of the month, truncate EMP.HIREDATE to the first day of the month. The final step is to use the aggregate function COUNT on EMP.HIREDATE.