Recipe9.13.Identifying Overlapping Date Ranges


Recipe 9.13. Identifying Overlapping Date Ranges

Problem

You want to find all instances of an employee starting a new project before ending an existing project. Consider table EMP_PROJECT:

  select *   from emp_project EMPNO ENAME      PROJ_ID PROJ_START  PROJ_END ----- ---------- ------- ----------- ----------- 7782  CLARK            1 16-JUN-2005 18-JUN-2005 7782  CLARK            4 19-JUN-2005 24-JUN-2005 7782  CLARK            7 22-JUN-2005 25-JUN-2005 7782  CLARK           10 25-JUN-2005 28-JUN-2005 7782  CLARK           13 28-JUN-2005 02-JUL-2005 7839  KING             2 17-JUN-2005 21-JUN-2005 7839  KING             8 23-JUN-2005 25-JUN-2005 7839  KING            14 29-JUN-2005 30-JUN-2005 7839  KING            11 26-JUN-2005 27-JUN-2005 7839  KING             5 20-JUN-2005 24-JUN-2005 7934  MILLER           3 18-JUN-2005 22-JUN-2005 7934  MILLER          12 27-JUN-2005 28-JUN-2005 7934  MILLER          15 30-JUN-2005 03-JUL-2005 7934  MILLER           9 24-JUN-2005 27-JUN-2005 7934  MILLER           6 21-JUN-2005 23-JUN-2005 

Looking at the results for employee KING, you see that KING began PROJ_ID 8 before finishing PROJ_ID 5 and began PROJ_ID 5 before finishing PROJ_ID 2. You want to return the following result set:

 EMPNO ENAME      MSG ----- ---------- -------------------------------- 7782  CLARK      project 7 overlaps project 4 7782  CLARK      project 10 overlaps project 7 7782  CLARK      project 13 overlaps project 10 7839  KING       project 8 overlaps project 5 7839  KING       project 5 overlaps project 2 7934  MILLER     project 12 overlaps project 9 7934  MILLER     project 6 overlaps project 3 

Solution

The key here is to find rows where PROJ_START (the date the new project starts) occurs on or after another project's PROJ_START date and on or before that other project's PROJ_END date. To begin, you need to be able to compare each project with each other project (for the same employee). By self joining EMP_PROJECT on employee, you generate every possible combination of two projects for each employee. To find the overlaps, simply find the rows where PROJ_START for any PROJ_ID falls between PROJ_START and PROJ_END for another PROJ_ID by the same employee.

DB2, PostgreSQL, and Oracle

Self join EMP_PROJECT. Then use the concatenation operator "||" to construct the message that explains which projects overlap:

 1 select a.empno,a.ename, 2        'project '||b.proj_id|| 3        ' overlaps project '||a.proj_id as msg 4   from emp_project a, 5        emp_project b 6  where a.empno = b.empno 7    and b.proj_start >= a.proj_start 8    and b.proj_start <= a.proj_end 9    and a.proj_id != b.proj_id 

MySQL

Self join EMP_PROJECT. Then use the CONCAT function to construct the message that explains which projects overlap:

 1 select a.empno,a.ename, 2        concat('project ',b.proj_id, 3         ' overlaps project ',a.proj_id) as msg 4   from emp_project a, 5        emp_project b 6  where a.empno = b.empno 7    and b.proj_start >= a.proj_start 8    and b.proj_start <= a.proj_end 9    and a.proj_id != b.proj_id 

SQL Server

Self join EMP_PROJECT. Then use the concatenation operator "+" to construct the message that explains which projects overlap:

 1 select a.empno,a.ename, 2        'project '+b.proj_id+ 3        ' overlaps project '+a.proj_id as msg 4   from emp_project a, 5        emp_project b 6  where a.empno = b.empno 7    and b.proj_start >= a.proj_start 8    and b.proj_start <= a.proj_end 9    and a.proj_id != b.proj_id 

Discussion

The only difference between the solutions lies in the string concatenation, so one discussion using the DB2 syntax will cover all three solutions. The first step is a self join of EMP_PROJECT so that the PROJ_START dates can be compared amongst the different projects. The output of the self join for employee KING is shown below. You can observe how each project can "see" the other projects:

  select a.ename,        a.proj_id as a_id,        a.proj_start as a_start,        a.proj_end as a_end,        b.proj_id as b_id,        b.proj_start as b_start   from emp_project a,        emp_project b  where a.ename = 'KING'    and a.empno = b.empno    and a.proj_id != b.proj_id order by 2 ENAME  A_ID  A_START     A_END       B_ID  B_START ------ ----- ----------- ----------- ----- ----------- KING       2 17-JUN-2005 21-JUN-2005     8 23-JUN-2005 KING       2 17-JUN-2005 21-JUN-2005    14 29-JUN-2005 KING       2 17-JUN-2005 21-JUN-2005    11 26-JUN-2005 KING       2 17-JUN-2005 21-JUN-2005     5 20-JUN-2005 KING       5 20-JUN-2005 24-JUN-2005     2 17-JUN-2005 KING       5 20-JUN-2005 24-JUN-2005     8 23-JUN-2005 KING       5 20-JUN-2005 24-JUN-2005    11 26-JUN-2005 KING       5 20-JUN-2005 24-JUN-2005    14 29-JUN-2005 KING       8 23-JUN-2005 25-JUN-2005     2 17-JUN-2005 KING       8 23-JUN-2005 25-JUN-2005    14 29-JUN-2005 KING       8 23-JUN-2005 25-JUN-2005     5 20-JUN-2005 KING       8 23-JUN-2005 25-JUN-2005    11 26-JUN-2005 KING      11 26-JUN-2005 27-JUN-2005     2 17-JUN-2005 KING      11 26-JUN-2005 27-JUN-2005     8 23-JUN-2005 KING      11 26-JUN-2005 27-JUN-2005    14 29-JUN-2005 KING      11 26-JUN-2005 27-JUN-2005     5 20-JUN-2005 KING      14 29-JUN-2005 30-JUN-2005     2 17-JUN-2005 KING      14 29-JUN-2005 30-JUN-2005     8 23-JUN-2005 KING      14 29-JUN-2005 30-JUN-2005     5 20-JUN-2005 KING      14 29-JUN-2005 30-JUN-2005    11 26-JUN-2005 

As you can see from the result set above, the self join makes finding overlapping dates easy; simply return each row where B_START occurs between A_START and A_END. If you look at the WHERE clause on lines 7 and 8 of the solution:

 and b.proj_start >= a.proj_start and b.proj_start <= a.proj_end 

it is doing just that. Once you have the required rows, constructing the messages is just a matter of concatenating the return values.

Oracle users can use the window function LEAD OVER to avoid the self join, if the maximum number of projects per employee is fixed. This can come in handy if the self join is expensive for your particular results (if the self join requires more resources than the sorts needed for LEAD OVER). For example, consider the alternative for employee KING using LEAD OVER:

  select empno,        ename,        proj_id,        proj_start,        proj_end,        case        when lead(proj_start,1)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        when lead(proj_start,2)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        when lead(proj_start,3)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        when lead(proj_start,4)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        end is_overlap   from emp_project  where ename = 'KING' EMPNO ENAME  PROJ_ID PROJ_START  PROJ_END    IS_OVERLAP ----- ------ ------- ----------- ----------- ---------- 7839  KING         2 17-JUN-2005 21-JUN-2005          5 7839  KING         5 20-JUN-2005 24-JUN-2005          8 7839  KING         8 23-JUN-2005 25-JUN-2005 7839  KING        11 26-JUN-2005 27-JUN-2005 7839  KING        14 29-JUN-2005 30-JUN-2005 

Because the number of projects is fixed at five for employee KING, you can use LEAD OVER to move examine the dates of all the projects without a self join. From here, producing the final result set is easy. Simply keep the rows where IS_OVERLAP is not NULL:

  select empno,ename,        'project '||is_overlap||        ' overlaps project '||proj_id msg   from ( select empno,        ename,        proj_id,        proj_start,        proj_end,        case        when lead(proj_start,1)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        when lead(proj_start,2)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        when lead(proj_start,3)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        when lead(proj_start,4)over(order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(order by proj_start)        end is_overlap   from emp_project  where ename = 'KING'        )  where is_overlap is not null EMPNO ENAME  MSG ----- ------ -------------------------------- 7839  KING   project 5 overlaps project 2 7839  KING   project 8 overlaps project 5 

To allow the solution to work for all employees (not just KING), partition by ENAME in the LEAD OVER function:

  select empno,ename,        'project '||is_overlap||        ' overlaps project '||proj_id msg   from ( select empno,        ename,        proj_id,        proj_start,        proj_end,        case        when lead(proj_start,1)over(partition by ename                                        order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(partition by ename                                   order by proj_start)        when lead(proj_start,2)over(partition by ename                                        order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(partition by ename                                   order by proj_start)        when lead(proj_start,3)over(partition by ename                                        order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(partition by ename                                   order by proj_start)        when lead(proj_start,4)over(partition by ename                                        order by proj_start)             between proj_start and proj_end        then lead(proj_id)over(partition by ename                                   order by proj_start)        end is_overlap  from emp_project       ) where is_overlap is not null EMPNO ENAME  MSG ----- ------ ------------------------------- 7782  CLARK  project 7 overlaps project 4 7782  CLARK  project 10 overlaps project 7 7782  CLARK  project 13 overlaps project 10 7839  KING   project 5 overlaps project 2 7839  KING   project 8 overlaps project 5 7934  MILLER project 6 overlaps project 3 7934  MILLER project 12 overlaps project 9 




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net