Recipe11.7.Investigating Future Rows

Recipe 11.7. Investigating Future Rows

Problem

You want to find any employees who earn less than the employee hired immediately after them. Based on the following result set:

 ENAME             SAL HIREDATE ---------- ---------- --------- SMITH             800 17-DEC-80 ALLEN            1600 20-FEB-81 WARD             1250 22-FEB-81 JONES            2975 02-APR-81 BLAKE            2850 01-MAY-81 CLARK            2450 09-JUN-81 TURNER           1500 08-SEP-81 MARTIN           1250 28-SEP-81 KING             5000 17-NOV-81 JAMES             950 03-DEC-81 FORD             3000 03-DEC-81 MILLER           1300 23-JAN-82 SCOTT            3000 09-DEC-82 ADAMS            1100 12-JAN-83

SMITH, WARD, MARTIN, JAMES, and MILLER earn less than the person hired immediately after they were hired, so those are the employees you wish to find with a query.

Solution

The first step is to define what "future" means. You must impose order on your result set to be able to define a row as having a value that is "later" than another.

DB2, MySQL, PostgreSQL, and SQL Server

Use subqueries to determine the following for each employee:

The date of the first person subsequently hired with a greater salary
The date of the next person to be hired

When the two dates match, you have what you are looking for:

  1  select ename, sal, hiredate  2    from (  3  select a.ename, a.sal, a.hiredate,  4        (select min(hiredate) from emp b  5          where b.hiredate > a.hiredate  6            and b.sal > a.sal ) as next_sal_grtr,  7        (select min(hiredate) from emp b  8          where b.hiredate > a.hiredate) as next_hire  9    from emp a 10        ) x 11   where next_sal_grtr = next_hire

Oracle

You can use the LEAD OVER window function to access the salary of the next employee that was hired. It's then a simple matter to check whether that salary is larger:

 1  select ename, sal, hiredate 2    from ( 3  select ename, sal, hiredate, 4         lead(sal)over(order by hiredate) next_sal 5    from emp 6         ) 7   where sal < next_sal

Discussion

DB2, MySQL, PostgreSQL, and SQL Server

The scalar subqueries return, for each employee, the HIREDATE of the very next employee hired and the HIREDATE of the first, subsequently hired employee who earns more than the current employee. Here's a look at the raw data:

  select a.ename, a.sal, a.hiredate,        (select min(hiredate) from emp b          where b.hiredate > a.hiredate            and b.sal > a.sal ) as next_sal_grtr,        (select min(hiredate) from emp b          where b.hiredate > a.hiredate) as next_hire  from emp a ENAME      SAL HIREDATE  NEXT_SAL_GRTR NEXT_HIRE ------- ------ --------- ------------- --------- SMITH      800 17-DEC-80 20-FEB-81     20-FEB-81 ALLEN     1600 20-FEB-81 02-APR-81     22-FEB-81 WARD      1250 22-FEB-81 02-APR-81     02-APR-81 JONES     2975 02-APR-81 17-NOV-81     01-MAY-81 MARTIN    1250 28-SEP-81 17-NOV-81     17-NOV-81 BLAKE     2850 01-MAY-81 17-NOV-81     09-JUN-81 CLARK     2450 09-JUN-81 17-NOV-81     08-SEP-81 SCOTT     3000 09-DEC-82               12-JAN-83 KING      5000 17-NOV-81               03-DEC-81 TURNER    1500 08-SEP-81 17-NOV-81     28-SEP-81 ADAMS     1100 12-JAN-83 JAMES      950 03-DEC-81 23-JAN-82     23-JAN-82 FORD      3000 03-DEC-81               23-JAN-82 MILLER    1300 23-JAN-82 09-DEC-82     09-DEC-82

Someone hired subsequently may or may not have been hired immediately after the current employee was hired. The next (and last) step then is to return only rows where NEXT_SAL_GRTR (the earliest HIREDATE of an employee who earns more than the current employee) equals NEXT_HIRE (the HIREDATE of the very next employee relative to the current employee's HIREDATE).

Oracle

The window function LEAD OVER is perfect for a problem such as this one. It not only makes for a more readable query than the solution for the other products, LEAD OVER also leads to a more flexible solution because an argument can be passed to it that will determine how many rows ahead it should look (by default 1). Being able to leap ahead more than one row is important in the case of duplicates in the column you are ordering by.

The following example shows how easy it is to use LEAD OVER to look at the salary of the "next" employee hired:

  select ename, sal, hiredate,        lead(sal)over(order by hiredate) next_sal   from emp ENAME      SAL HIREDATE    NEXT_SAL ------- ------ --------- ---------- SMITH      800 17-DEC-80       1600 ALLEN     1600 20-FEB-81       1250 WARD      1250 22-FEB-81       2975 JONES     2975 02-APR-81       2850 BLAKE     2850 01-MAY-81       2450 CLARK     2450 09-JUN-81       1500 TURNER    1500 08-SEP-81       1250 MARTIN    1250 28-SEP-81       5000 KING      5000 17-NOV-81        950 JAMES      950 03-DEC-81       3000 FORD      3000 03-DEC-81       1300 MILLER    1300 23-JAN-82       3000 SCOTT     3000 09-DEC-82       1100 ADAMS     1100 12-JAN-83

The final step is to return only rows where SAL is less than NEXT_SAL. Because of LEAD OVER's default range of one row, if there had been duplicates in table EMP, in particular, multiple employees hired on the same date, their SAL would be compared. This may or may not have been what you intended. If your goal is to compare the SAL of each employee with SAL of the next employee hired, excluding other employees hired on the same day, you can use the following solution as an alternative:

 select ename, sal, hiredate   from ( select ename, sal, hiredate,        lead(sal,cnt-rn+1)over(order by hiredate) next_sal   from ( select ename,sal,hiredate,        count(*)over(partition by hiredate) cnt,        row_number( )over(partition by hiredate order by empno) rn   from emp        )        )  where sal < next_sal

The idea behind this solution is to find the distance from the current row to the row it should be compared with. For example, if there are five duplicates, the first of the five needs to leap five rows to get to its correct LEAD OVER row. The value for CNT represents, for each employee with a duplicate HIREDATE, how many duplicates there are in total for their HIREDATE. The value for RN represents a ranking for the employees in DEPTNO 10. The rank is partitioned by HIREDATE so only employees with a HIREDATE that another employee has will have a value greater than one. The ranking is sorted by EMPNO (this is arbitrary). Now that you now how many total duplicates there are and you have a ranking of each duplicate, the distance to the next HIREDATE is simply the total number of duplicates minus the current rank plus one (CNT-RN+1).