Recipe11.10.Suppressing Duplicates | SQL Cookbook (Cookbooks (OReilly))

Recipe 11.10. Suppressing Duplicates

Problem

You want to find the different job types in table EMP but do not want to see duplicates. The result set should be:

 JOB --------- ANALYST CLERK MANAGER PRESIDENT SALESMAN

Solution

All of the RDBMSs support the keyword DISTINCT, and it arguably is the easiest mechanism for suppressing duplicates from the result set. However, this recipe will also cover two additional methods for suppressing duplicates.

DB2, Oracle, and SQL Server

The traditional method of using DISTINCT and sometimes GROUP BY (as seen next in the MySQL/PostgreSQL solution) certainly works for these RDBMSs. The solution below is an alternative that makes use of the window function ROW_NUMBER OVER:

 1  select job 2    from ( 3  select job, 4         row_number( )over(partition by job order by job) rn 5    from emp 6         ) x 7   where rn = 1

MySQL and PostgreSQL

Use the DISTINCT keyword to suppress duplicates from the result set:

 select distinct job   from emp

Additionally, it is also possible to use GROUP BY to suppress duplicates:

 select job   from emp  group by job

Discussion

DB2, Oracle, and SQL Server

This solution depends on some outside-the-box thinking about partitioned window functions. By using PARTITION BY in the OVER clause of ROW_NUMBER, you can reset the value returned by ROW_NUMBER to 1 whenever a new job is encountered. The results below are from inline view X:

  select job,        row_number()over(partition by job order by job) rn   from emp  JOB               RN --------- ---------- ANALYST            1 ANALYST            2 CLERK              1 CLERK              2 CLERK              3 CLERK              4 MANAGER            1 MANAGER            2 MANAGER            3 PRESIDENT          1 SALESMAN           1 SALESMAN           2 SALESMAN           3 SALESMAN           4

Each row is given an increasing, sequential number, and that number is reset to 1 whenever the job changes. To filter out the duplicates, all you must do is keep the rows where RN is 1.

An ORDER BY clause is mandatory when using ROW_NUMBER OVER (except in DB2) but doesn't affect the result. Which job is returned is irrelevant so long as you return one of each job.

MySQL and PostgreSQL

The first solution shows how to use the keyword DISTINCT to suppress duplicates from a result set. Keep in mind that DISTINCT is applied to the whole SELECT list; additional columns can and will change the result set. Consider the difference between the two queries below:

 select distinct job           select distinct job, deptno   from emp                      from emp JOB                           JOB           DEPTNO ---------                     --------- ---------- ANALYST                       ANALYST           20 CLERK                         CLERK             10 MANAGER                       CLERK             20 PRESIDENT                     CLERK             30 SALESMAN                      MANAGER           10                               MANAGER           20                               MANAGER           30                               PRESIDENT         10                               SALESMAN          30

By adding DEPTNO to the SELECT list, what you return is each DISTINCT pair of JOB/DEPTNO values from table EMP.

The second solution uses GROUP BY to suppress duplicates. While using GROUP BY this way is not uncommon, keep in mind that GROUP BY and DISTINCT are two very different clauses that are not interchangeable. I've included GROUP BY in this solution for completeness, as you will no doubt come across it at some point.