Recipe 11.10. Suppressing DuplicatesProblemYou want to find the different job types in table EMP but do not want to see duplicates. The result set should be: JOB --------- ANALYST CLERK MANAGER PRESIDENT SALESMAN SolutionAll of the RDBMSs support the keyword DISTINCT, and it arguably is the easiest mechanism for suppressing duplicates from the result set. However, this recipe will also cover two additional methods for suppressing duplicates. DB2, Oracle, and SQL ServerThe traditional method of using DISTINCT and sometimes GROUP BY (as seen next in the MySQL/PostgreSQL solution) certainly works for these RDBMSs. The solution below is an alternative that makes use of the window function ROW_NUMBER OVER: 1 select job 2 from ( 3 select job, 4 row_number( )over(partition by job order by job) rn 5 from emp 6 ) x 7 where rn = 1 MySQL and PostgreSQLUse the DISTINCT keyword to suppress duplicates from the result set: select distinct job from emp Additionally, it is also possible to use GROUP BY to suppress duplicates: select job from emp group by job DiscussionDB2, Oracle, and SQL ServerThis solution depends on some outside-the-box thinking about partitioned window functions. By using PARTITION BY in the OVER clause of ROW_NUMBER, you can reset the value returned by ROW_NUMBER to 1 whenever a new job is encountered. The results below are from inline view X: select job, row_number()over(partition by job order by job) rn from emp JOB RN --------- ---------- ANALYST 1 ANALYST 2 CLERK 1 CLERK 2 CLERK 3 CLERK 4 MANAGER 1 MANAGER 2 MANAGER 3 PRESIDENT 1 SALESMAN 1 SALESMAN 2 SALESMAN 3 SALESMAN 4 Each row is given an increasing, sequential number, and that number is reset to 1 whenever the job changes. To filter out the duplicates, all you must do is keep the rows where RN is 1. An ORDER BY clause is mandatory when using ROW_NUMBER OVER (except in DB2) but doesn't affect the result. Which job is returned is irrelevant so long as you return one of each job. MySQL and PostgreSQLThe first solution shows how to use the keyword DISTINCT to suppress duplicates from a result set. Keep in mind that DISTINCT is applied to the whole SELECT list; additional columns can and will change the result set. Consider the difference between the two queries below: select distinct job select distinct job, deptno from emp from emp JOB JOB DEPTNO --------- --------- ---------- ANALYST ANALYST 20 CLERK CLERK 10 MANAGER CLERK 20 PRESIDENT CLERK 30 SALESMAN MANAGER 10 MANAGER 20 MANAGER 30 PRESIDENT 10 SALESMAN 30 By adding DEPTNO to the SELECT list, what you return is each DISTINCT pair of JOB/DEPTNO values from table EMP. The second solution uses GROUP BY to suppress duplicates. While using GROUP BY this way is not uncommon, keep in mind that GROUP BY and DISTINCT are two very different clauses that are not interchangeable. I've included GROUP BY in this solution for completeness, as you will no doubt come across it at some point. |