Recipe 10.1. Locating a Range of Consecutive ValuesProblemYou want to determine which rows represent a range of consecutive projects. Consider the following result set from view V, which contains data about a project and its start and end dates: select * from V PROJ_ID PROJ_START PROJ_END ------- ----------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 4 04-JAN-2005 05-JAN-2005 5 06-JAN-2005 07-JAN-2005 6 16-JAN-2005 17-JAN-2005 7 17-JAN-2005 18-JAN-2005 8 18-JAN-2005 19-JAN-2005 9 19-JAN-2005 20-JAN-2005 10 21-JAN-2005 22-JAN-2005 11 26-JAN-2005 27-JAN-2005 12 27-JAN-2005 28-JAN-2005 13 28-JAN-2005 29-JAN-2005 14 29-JAN-2005 30-JAN-2005 Excluding the first row, each row's PROJ_START should equal the PROJ_END of the row before it ("before" is defined as PROJ_ID1 for the current row). Examining the first five rows from view V, PROJ_IDs 1 through 3 are part of the same "group" as each PROJ_END equals the PROJ_START of the row after it. Because you want to find the range of dates for consecutive projects, you would like to return all rows where the current PROJ_END equals the next row's PROJ_START. If the first five rows comprised the entire result set, you would like to return only the first three rows. The final result set (using all 14 rows from view V) should be: PROJ_ID PROJ_START PROJ_END ------- ----------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 6 16-JAN-2005 17-JAN-2005 7 17-JAN-2005 18-JAN-2005 8 18-JAN-2005 19-JAN-2005 11 26-JAN-2005 27-JAN-2005 12 27-JAN-2005 28-JAN-2005 13 28-JAN-2005 29-JAN-2005 The rows with PROJ_IDs 4,5,9,10, and 14 are excluded from this result set because the PROJ_END of each of these rows does not match the PROJ_START of the row following it. SolutionDB2, MySQL, PostgreSQL, and SQL ServerUse a self join to find the rows with consecutive values: 1 select v1.proj_id, 2 v1.proj_start, 3 v1.proj_end 4 from V v1, V v2 5 where v1.proj_end = v2.proj_start OracleThe preceding solution will also work for Oracle. Alternatively, here is another solution that takes advantage of the window function LEAD OVER to look at the "next" row's BEGIN_DATE, thus avoiding the need to self join: 1 select proj_id, proj_start, proj_end 2 from ( 3 select proj_id, proj_start, proj_end, 4 lead(proj_start)over(order by proj_id) next_proj_start 5 from V 6 ) 7 where next_proj_start = proj_end DiscussionDB2, MySQL, PostgreSQL, and SQL ServerBy self joining the view to itself, each row can be compared to every other row returned. Consider a partial result set for IDs 1 and 4: select v1.proj_id as v1_id, v1.proj_end as v1_end, v2.proj_start as v2_begin, v2.proj_id as v2_id from v v1, v v2 where v1.proj_id in ( 1, 4 ) V1_ID V1_END V2_BEGIN V2_ID ----- ----------- ----------- ---------- 1 02-JAN-2005 01-JAN-2005 1 1 02-JAN-2005 02-JAN-2005 2 1 02-JAN-2005 03-JAN-2005 3 1 02-JAN-2005 04-JAN-2005 4 1 02-JAN-2005 06-JAN-2005 5 1 02-JAN-2005 16-JAN-2005 6 1 02-JAN-2005 17-JAN-2005 7 1 02-JAN-2005 18-JAN-2005 8 1 02-JAN-2005 19-JAN-2005 9 1 02-JAN-2005 21-JAN-2005 10 1 02-JAN-2005 26-JAN-2005 11 1 02-JAN-2005 27-JAN-2005 12 1 02-JAN-2005 28-JAN-2005 13 1 02-JAN-2005 29-JAN-2005 14 4 05-JAN-2005 01-JAN-2005 1 4 05-JAN-2005 02-JAN-2005 2 4 05-JAN-2005 03-JAN-2005 3 4 05-JAN-2005 04-JAN-2005 4 4 05-JAN-2005 06-JAN-2005 5 4 05-JAN-2005 16-JAN-2005 6 4 05-JAN-2005 17-JAN-2005 7 4 05-JAN-2005 18-JAN-2005 8 4 05-JAN-2005 19-JAN-2005 9 4 05-JAN-2005 21-JAN-2005 10 4 05-JAN-2005 26-JAN-2005 11 4 05-JAN-2005 27-JAN-2005 12 4 05-JAN-2005 28-JAN-2005 13 4 05-JAN-2005 29-JAN-2005 14 Examining this result set, you can see why PROJ_ID 1 is included in the final result set and PROJ_ID 4 is not: there is no corresponding V2_BEGIN value for the V1_ END value returned for V1_ID 4. Depending on how you view the data, PROJ_ID 4 can just as easily be considered contiguous. Consider the following result set: select * from V where proj_id <= 5 PROJ_ID PROJ_START PROJ_END ------- ---------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 4 04-JAN-2005 05-JAN-2005 5 06-JAN-2005 07-JAN-2005 If "contiguous" is defined as a project that starts the same day another project ends, then PROJ_ID 4 should be included in the result set. PROJ_ID 4 was originally eliminated because of the forward comparison (comparing its PROJ_END with the next PROJ_START), but if you do a backwards comparison (PROJ_START with the prior PROJ_END), then PROJ_ID 4 will be included in the result set. Modifying the solution to include PROJ_ID 4 is trivial: simply add an additional predicate to ensure that both PROJ_START and PROJ_END are checked for being contiguous, not just PROJ_END. The modification shown in the following query produces a result set that includes PROJ_ID 4 (DISTINCT is necessary because some rows satisfy both predicate conditions): select distinct v1.proj_id, v1.proj_start, v1.proj_end from V v1, V v2 where v1.proj_end = v2.proj_start or v1.proj_start = v2.proj_end PROJ_ID PROJ_START PROJ_END ------- ----------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 4 04-JAN-2005 05-JAN-2005 OracleWhile the self-join solution certainly works, the window function LEAD OVER is perfect for this type of problem. The function LEAD OVER allows you to examine other rows without performing a self join (though the function must impose order on the result set to do so). Consider the results of the inline view (lines 35) for IDs 1 and 4: select * from ( select proj_id, proj_start, proj_end, lead(proj_start)over(order by proj_id) next_proj_start from v ) where proj_id in ( 1, 4 ) PROJ_ID PROJ_START PROJ_END NEXT_PROJ_START ------- ----------- ----------- --------------- 1 01-JAN-2005 02-JAN-2005 02-JAN-2005 4 04-JAN-2005 05-JAN-2005 06-JAN-2005 Examining the above snippet of code and its result set, it is particularly easy to see why PROJ_ID 4 is excluded from the final result set of the complete solution. It's excluded because its PROJ_END date of 05-JAN-2005 does not match the "next" project's start date of 06-JAN-2005. The function LEAD OVER is extremely handy when it comes to problems such as this one, particularly when examining partial results. When working with window functions, keep in mind that they are evaluated after the FROM and WHERE clauses, so the LEAD OVER function in the preceding query must be embedded within an inline view. Otherwise the LEAD OVER function is applied to the result set after the WHERE clause has filtered out all rows except for PROJ_ID's 1 and 4. Now, depending on how you view the data, you may very well want to include PROJ_ID 4 in the final result set. Consider the first five rows from view V: select * from V where proj_id <= 5 PROJ_ID PROJ_START PROJ_END ------- ----------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 4 04-JAN-2005 05-JAN-2005 5 06-JAN-2005 07-JAN-2005 If your requirement is such that PROJ_ID 4 is in fact contiguous (because PROJ_ START for PROJ_ID 4 matches PROJ_END for PROJ_ID 3), and that only PROJ_ ID 5 should be discarded, the proposed solution for this recipe is incorrect (!), or at the very least, incomplete: select proj_id, proj_start, proj_end from ( select proj_id, proj_start, proj_end, lead(proj_start)over(order by proj_id) next_start from V where proj_id <= 5 ) where proj_end = next_start PROJ_ID PROJ_START PROJ_END ------- ----------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 If you believe PROJ_ID 4 should be included, simply add LAG OVER to the query and use an additional filter in the WHERE clause: select proj_id, proj_start, proj_end from ( select proj_id, proj_start, proj_end, lead(proj_start)over(order by proj_id) next_start, lag(proj_end)over(order by proj_id) last_end from V where proj_id <= 5 ) where proj_end = next_start or proj_start = last_end PROJ_ID PROJ_START PROJ_END ------- ----------- ----------- 1 01-JAN-2005 02-JAN-2005 2 02-JAN-2005 03-JAN-2005 3 03-JAN-2005 04-JAN-2005 4 04-JAN-2005 05-JAN-2005 Now PROJ_ID 4 is included in the final result set, and only the evil PROJ_ID 5 is excluded. Please consider your exact requirements when applying these recipes to your code. |