Recipe 9.10. Filling in Missing DatesProblemYou need to generate a row for every date (or every month, week, or year) within a given range. Such rowsets are often used to generate summary reports. For example, you want to count the number of employees hired every month of every year in which any employee has been hired. Examining the dates of all the employees hired, there have been hirings from 1980 to 1983: select distinct extract(year from hiredate) as year from emp YEAR ----- 1980 1981 1982 1983 You want to determine the number of employees hired each month from 1980 to 1983. A portion of the desired result set is shown below: MTH NUM_HIRED ----------- ---------- 01-JAN-1981 0 01-FEB-1981 2 01-MAR-1981 0 01-APR-1981 1 01-MAY-1981 1 01-JUN-1981 1 01-JUL-1981 0 01-AUG-1981 0 01-SEP-1981 2 01-OCT-1981 0 01-NOV-1981 1 01-DEC-1981 2 SolutionThe trick here is that you want to return a row for each month even if no employee was hired (i.e., the count would be zero). Because there isn't an employee hired every month between 1980 and 1983, you must generate those months yourself, and then outer join to table EMP on HIREDATE (truncating the actual HIREDATE to its month, so it can match the generated months when possible). DB2Use the recursive WITH clause to generate every month (the first day of each month from January 1, 1980, to December 1, 1983). Once you have all the months for the required range of dates, outer join to table EMP and use the aggregate function COUNT to count the number of hires for each month: 1 with x (start_date,end_date) 2 as ( 3 select (min(hiredate) 4 dayofyear(min(hiredate)) day +1 day) start_date, 5 (max(hiredate) 6 dayofyear(max(hiredate)) day +1 day) +1 year end_date 7 from emp 8 union all 9 select start_date +1 month, end_date 10 from x 11 where (start_date +1 month) < end_date 12 ) 13 select x.start_date mth, count(e.hiredate) num_hired 14 from x left join emp e 15 on (x.start_date = (e.hiredate-(day(hiredate)-1) day)) 16 group by x.start_date 17 order by 1 OracleUse the CONNECT BY clause to generate each month between 1980 and 1983. Then outer join to table EMP and use the aggregate function COUNT to count the number of employees hired in each month. If you are on Oracle8i Database and earlier, the ANSI outer join is not available to you, nor is the ability to use CONNECT BY as a row generator; a simple workaround is to use a traditional pivot table (like the one used in the MySQL solution). Following as an Oracle solution using Oracle's outer-join syntax: 1 with x 2 as ( 3 select add_months(start_date,level-1) start_date 4 from ( 5 select min(trunc(hiredate,'y')) start_date, 6 add_months(max(trunc(hiredate,'y')),12) end_date 7 from emp 8 ) 9 connect by level <= months_between(end_date,start_date) 10 ) 11 select x.start_date MTH, count(e.hiredate) num_hired 12 from x, emp e 13 where x.start_date = trunc(e.hiredate(+),'mm') 14 group by x.start_date 15 order by 1 and here is a second Oracle solution, this time using the ANSI syntax: 1 with x 2 as ( 3 select add_months(start_date,level-1) start_date 4 from ( 5 select min(trunc(hiredate,'y')) start_date, 6 add_months(max(trunc(hiredate,'y')),12) end_date 7 from emp 8 ) 9 connect by level <= months_between(end_date,start_date) 10 ) 11 select x.start_date MTH, count(e.hiredate) num_hired 12 from x left join emp e 13 on (x.start_date = trunc(e.hiredate,'mm')) 14 group by x.start_date 15 order by 1 PostgreSQLTo improve readability, this solution uses a view, named V, to return the number of months between the first day of the first month of the year the first employee was hired and the first day of the last month of the year the most recent employee was hired. Use the value returned by view V as the second value passed to the function GENERATE_SERIES, so that the correct number of months (rows) are generated. Once you have all the months for the required range of dates, outer join to table EMP and use the aggregate function COUNT to count the number of hires for each month: create view v as select cast( extract(year from age(last_month,first_month))*12-1 as integer) as mths from ( select cast(date_trunc('year',min(hiredate)) as date) as first_month, cast(cast(date_trunc('year',max(hiredate)) as date) + interval '1 year' as date) as last_month from emp ) x 1 select y.mth, count(e.hiredate) as num_hired 2 from ( 3 select cast(e.start_date + (x.id * interval '1 month') 4 as date) as mth 5 from generate_series (0,(select mths from v)) x(id), 6 ( select cast( 7 date_trunc('year',min(hiredate)) 8 as date) as start_date 9 from emp ) e 10 ) y left join emp e 11 on (y.mth = date_trunc('month',e.hiredate)) 12 group by y.mth 13 order by 1 MySQLUse the pivot table T500 to generate each month between 1980 and 1983. Then outer join to table EMP and use the aggregate function COUNT to count the number of employees hired for each month: 1 select z.mth, count(e.hiredate) num_hired 2 from ( 3 select date_add(min_hd,interval t500.id-1 month) mth 4 from ( 5 select min_hd, date_add(max_hd,interval 11 month) max_hd 6 from ( 7 select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd, 8 adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd 9 from emp 10 ) x 11 ) y, 12 t500 13 where date_add(min_hd,interval t500.id-1 month) <= max_hd 14 ) z left join emp e 15 on (z.mth = adddate( 16 date_add( 17 last_day(e.hiredate),interval -1 month),1)) 18 group by z.mth 19 order by 1 SQL ServerUse the recursive WITH clause to generate every month (the first day of each month from January 1, 1980, to December 1, 1983). Once you have all the months for the required range of dates, outer join to table EMP and use the aggregate function COUNT to count the number of hires for each month: 1 with x (start_date,end_date) 2 as ( 3 select (min(hiredate) 4 datepart(dy,min(hiredate))+1) start_date, 5 dateadd(yy,1, 6 (max(hiredate) 7 datepart(dy,max(hiredate))+1)) end_date 8 from emp 9 union all 10 select dateadd(mm,1,start_date), end_date 11 from x 12 where dateadd(mm,1,start_date) < end_date 13 ) 14 select x.start_date mth, count(e.hiredate) num_hired 15 from x left join emp e 16 on (x.start_date = 17 dateadd(dd,-day(e.hiredate)+1,e.hiredate)) 18 group by x.start_date 19 order by 1 DiscussionDB2The first step is to generate every month (actually the first day of each month) from 1980 to 1983. Start using the DAYOFYEAR function on the MIN and MAX HIREDATEs to find the boundary months: select (min(hiredate) dayofyear(min(hiredate)) day +1 day) start_date, (max(hiredate) dayofyear(max(hiredate)) day +1 day) +1 year end_date from emp START_DATE END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 Your next step is to repeatedly add months to START_DATE to return all the months necessary for the final result set. The value for END_DATE is one day more than it should be. This is OK. As you recursively add months to START_DATE, you can stop before you hit END_DATE. A portion of the months created is shown below: with x (start_date,end_date) as ( select (min(hiredate) dayofyear(min(hiredate)) day +1 day) start_date, (max(hiredate) dayofyear(max(hiredate)) day +1 day) +1 year end_date from emp union all select start_date +1 month, end_date from x where (start_date +1 month) < end_date ) select * from x START_DATE END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 01-FEB-1980 01-JAN-1984 01-MAR-1980 01-JAN-1984 … 01-OCT-1983 01-JAN-1984 01-NOV-1983 01-JAN-1984 01-DEC-1983 01-JAN-1984 At this point, you have all the months you need, and you can simply outer join to EMP.HIREDATE. Because the day for each START_DATE is the first of the month, truncate EMP.HIREDATE to the first day of its month. Finally, use the aggregate function COUNT on EMP.HIREDATE. OracleThe first step is to generate the first day of every for every month from 1980 to 1983. Start by using TRUNC and ADD_MONTHS together with the MIN and MAX HIREDATE values to find the boundary months: select min(trunc(hiredate,'y')) start_date, add_months(max(trunc(hiredate,'y')),12) end_date from emp START_DATE END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 Then repeatedly add months to START_DATE to return all the months necessary for the final result set. The value for END_DATE is one day more than it should be, which is OK. As you recursively add months to START_DATE, you can stop before you hit END_DATE. A portion of the months created is shown below: with x as ( select add_months(start_date,level-1) start_date from ( select min(trunc(hiredate,'y')) start_date, add_months(max(trunc(hiredate,'y')),12) end_date from emp ) connect by level <= months_between(end_date,start_date) ) select * from x START_DATE ----------- 01-JAN-1980 01-FEB-1980 01-MAR-1980 … 01-OCT-1983 01-NOV-1983 01-DEC-1983 At this point, you have all the months you need; simply outer join to EMP.HIREDATE. Because the day for each START_DATE is the first of the month, truncate EMP.HIREDATE to the first day of the month it is in. The final step is to use the aggregate function COUNT on EMP.HIREDATE. PostgreSQLThis solution uses the function GENERATE_SERIES to return the months you need. If you do not have the GENERATE_SERIES function available, you can use a pivot table as in the MySQL solution. The first step is to understand view V. View V simply finds the number of months you'll need to generate by finding the boundary dates for the range. Inline view X in view V uses the MIN and MAX HIREDATEs to find the start and end boundary dates and is shown below: select cast(date_trunc('year',min(hiredate)) as date) as first_month, cast(cast(date_trunc('year',max(hiredate)) as date) + interval '1 year' as date) as last_month from emp FIRST_MONTH LAST_MONTH ----------- ----------- 01-JAN-1980 01-JAN-1984 The value for LAST_MONTH is actually one day more than it should be. This is fine, as you can just subtract 1 when you calculate the months between these two dates. The next step is to use the AGE function to find the difference between the two dates in years, then multiply by 12 (and remember, subtract by 1!): select cast( extract(year from age(last_month,first_month))*12-1 as integer) as mths from ( select cast(date_trunc('year',min(hiredate)) as date) as first_month, cast(cast(date_trunc('year',max(hiredate)) as date) + interval '1 year' as date) as last_month from emp ) x MTHS ---- 47 Use the value returned by view V as the second parameter of GENERATE_SERIES to return the number of months you need. Your next step is then to find your start date. You'll repeatedly add months to your start date to create your range of months. Inline view Y uses the DATE_TRUNC function on the MIN(HIREDATE) to find the start date, and uses the values returned by GENERATE_SERIES to add months. Partial results are shown below: select cast(e.start_date + (x.id * interval '1 month') as date) as mth from generate_series (0,(select mths from v)) x(id), ( select cast( date_trunc('year',min(hiredate)) as date) as start_date from emp ) e MTH ----------- 01-JAN-1980 01-FEB-1980 01-MAR-1980 … 01-OCT-1983 01-NOV-1983 01-DEC-1983 Now that you have each month you need for the final result set, outer join to EMP. HIREDATE and use the aggregate function COUNT to count the number of hires for each month. MySQLFirst, find the boundary dates by using the aggregate functions MIN and MAX along with the DAYOFYEAR and ADDDATE functions. The result set shown below is from inline view X: select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd, adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd from emp MIN_HD MAX_HD ----------- ----------- 01-JAN-1980 01-JAN-1983 Next, increment MAX_HD to the last month of the year: select min_hd, date_add(max_hd,interval 11 month) max_hd from ( select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd, adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd from emp ) x MIN_HD MAX_HD ----------- ----------- 01-JAN-1980 01-DEC-1983 Now that you have the boundary dates, add months to MIN_HD up to and including MAX_HD by using pivot table T500 to generate the rows you need. A portion of the results is shown below: select date_add(min_hd,interval t500.id-1 month) mth from ( select min_hd, date_add(max_hd,interval 11 month) max_hd from ( select adddate(min(hiredate),-dayofyear(min(hiredate))+1) min_hd, adddate(max(hiredate),-dayofyear(max(hiredate))+1) max_hd from emp ) x ) y, t500 where date_add(min_hd,interval t500.id-1 month) <= max_hd MTH ----------- 01-JAN-1980 01-FEB-1980 01-MAR-1980 … 01-OCT-1983 01-NOV-1983 01-DEC-1983 Now that you have all the months you need for the final result set, outer join to EMP.HIREDATE (be sure to truncate EMP.HIREDATE to the first day of the month) and use the aggregate function COUNT on EMP.HIREDATE to count the number of hires in each month. SQL ServerBegin by generating every month (actually, the first day of each month) from 1980 to 1983. Then find the boundary months by applying the DAYOFYEAR function to the MIN and MAX HIREDATEs: select (min(hiredate) - datepart(dy,min(hiredate))+1) start_date, dateadd(yy,1, (max(hiredate) - datepart(dy,max(hiredate))+1)) end_date from emp START_DATE END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 Your next step is to repeatedly add months to START_DATE to return all the months necessary for the final result set. The value for END_DATE is one day more than it should be, which is OK, as you can stop recursively adding months to START_DATE before you hit END_DATE. A portion of the months created is shown below: with x (start_date,end_date) as ( select (min(hiredate) - datepart(dy,min(hiredate))+1) start_date, dateadd(yy,1, (max(hiredate) - datepart(dy,max(hiredate))+1)) end_date from emp union all select dateadd(mm,1,start_date), end_date from x where dateadd(mm,1,start_date) < end_date ) select * from x START_DATE END_DATE ----------- ----------- 01-JAN-1980 01-JAN-1984 01-FEB-1980 01-JAN-1984 01-MAR-1980 01-JAN-1984 … 01-OCT-1983 01-JAN-1984 01-NOV-1983 01-JAN-1984 01-DEC-1983 01-JAN-1984 At this point, you have all the months you need. Simply outer join to EMP.HIREDATE. Because the day for each START_DATE is the first of the month, truncate EMP.HIREDATE to the first day of the month. The final step is to use the aggregate function COUNT on EMP.HIREDATE. |