Recipe 3.7. Determining Whether Two Tables Have the Same DataProblemYou want to know if two tables or views have the same data (cardinality and values). Consider the following view: create view V as select * from emp where deptno != 10 union all select * from emp where ename = 'WARD' select * from V EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ---------- --------- ----- ----------- ----- ----- ------ 7369 SMITH CLERK 7902 17-DEC-1980 800 20 7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 7566 JONES MANAGER 7839 02-APR-1981 2975 20 7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1300 30 7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30 7788 SCOTT ANALYST 7566 09-DEC-1982 3000 20 7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30 7876 ADAMS CLERK 7788 12-JAN-1983 1100 20 7900 JAMES CLERK 7698 03-DEC-1981 950 30 7902 FORD ANALYST 7566 03-DEC-1981 3000 20 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 You want to determine whether or not this view has exactly the same data as table EMP. The row for employee "WARD" is duplicated to show that the solution will reveal not only different data but duplicates as well. Based on the rows in table EMP the difference will be the three rows for employees in department 10 and the two rows for employee "WARD". You want to return the following result set: EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ --- 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2 7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1 7839 KING PRESIDENT 17-NOV-1981 5000 10 1 7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1 SolutionFunctions that perform SET difference (MINUS or EXCEPT, depending on your DBMS) make the problem of comparing tables a relatively easy one to solve. If your DBMS does not offer such functions, you can use a correlated subquery. DB2 and PostgreSQLUse the set operations EXCEPT and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V: 1 ( 2 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 3 count(*) as cnt 4 from V 5 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 6 except 7 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 8 count(*) as cnt 9 from emp 10 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 11 ) 12 union all 13 ( 14 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 15 count(*) as cnt 16 from emp 17 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 18 except 19 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 20 count(*) as cnt 21 from v 22 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 23 ) OracleUse the set operations MINUS and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V: 1 ( 2 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 3 count(*) as cnt 4 from V 5 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 6 minus 7 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 8 count(*) as cnt 9 from emp 10 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 11 ) 12 union all 13 ( 14 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 15 count(*) as cnt 16 from emp 17 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 18 minus 19 select empno,ename,job,mgr,hiredate,sal,comm,deptno, 20 count(*) as cnt 21 from v 22 group by empno,ename,job,mgr,hiredate,sal,comm,deptno 23 ) MySQL and SQL ServerUse a correlated subquery and UNION ALL to find the rows in view V and not in table EMP combined with the rows in table EMP and not in view V: 1 select * 2 from ( 3 select e.empno,e.ename,e.job,e.mgr,e.hiredate, 4 e.sal,e.comm,e.deptno, count(*) as cnt 5 from emp e 6 group by empno,ename,job,mgr,hiredate, 7 sal,comm,deptno 8 ) e 9 where not exists ( 10 select null 11 from ( 12 select v.empno,v.ename,v.job,v.mgr,v.hiredate, 13 v.sal,v.comm,v.deptno, count(*) as cnt 14 from v 15 group by empno,ename,job,mgr,hiredate, 16 sal,comm,deptno 17 ) v 18 where v.empno = e.empno 19 and v.ename = e.ename 20 and v.job = e.job 21 and v.mgr = e.mgr 22 and v.hiredate = e.hiredate 23 and v.sal = e.sal 24 and v.deptno = e.deptno 25 and v.cnt = e.cnt 26 and coalesce(v.comm,0) = coalesce(e.comm,0) 27 ) 28 union all 29 select * 30 from ( 31 select v.empno,v.ename,v.job,v.mgr,v.hiredate, 32 v.sal,v.comm,v.deptno, count(*) as cnt 33 from v 34 group by empno,ename,job,mgr,hiredate, 35 sal,comm,deptno 36 ) v 37 where not exists ( 38 select null 39 from ( 40 select e.empno,e.ename,e.job,e.mgr,e.hiredate, 41 e.sal,e.comm,e.deptno, count(*) as cnt 42 from emp e 43 group by empno,ename,job,mgr,hiredate, 44 sal,comm,deptno 45 ) e 46 where v.empno = e.empno 47 and v.ename = e.ename 48 and v.job = e.job 49 and v.mgr = e.mgr 50 and v.hiredate = e.hiredate 51 and v.sal = e.sal 52 and v.deptno = e.deptno 53 and v.cnt = e.cnt 54 and coalesce(v.comm,0) = coalesce(e.comm,0) 55 ) DiscussionDespite using different techniques, the concept is the same for all solutions:
If the tables in question are equal, then no rows are returned. If the tables are different, the rows causing the difference are returned. As an easy first step when comparing tables, you can compare the cardinalities alone rather than including them with the data comparison. The following query is a simple example of this and will work on all DBMSs: select count(*) from emp union select count(*) from dept COUNT(*) -------- 4 14 Because UNION will filter out duplicates, only one row will be returned if the tables' cardinalities are the same. Because two rows are returned in this example, you know that the tables do not contain identical rowsets. DB2, Oracle, and PostgreSQLMINUS and EXCEPT work in the same way, so I will use EXCEPT for this discussion. The queries before and after the UNION ALL are very similar. So, to understand how the solution works, simply execute the query prior to the UNION ALL by itself. The following result set is produced by executing lines 111 in the solution section: ( select empno,ename,job,mgr,hiredate,sal,comm,deptno, count(*) as cnt from V group by empno,ename,job,mgr,hiredate,sal,comm,deptno except select empno,ename,job,mgr,hiredate,sal,comm,deptno, count(*) as cnt from emp group by empno,ename,job,mgr,hiredate,sal,comm,deptno ) EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ --- 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2 The result set represents a row found in view V that is either not in table EMP or has a different cardinality than that same row in table EMP. In this case, the duplicate row for employee "WARD" is found and returned. If you're still having trouble understanding how the result set is produced, run each query on either side of EXCEPT individually. You'll notice the only difference between the two result sets is the CNT for employee "WARD" returned by view V. The portion of the query after the UNION ALL does the opposite of the query preceding UNION ALL. The query returns rows in table EMP not in view V: ( select empno,ename,job,mgr,hiredate,sal,comm,deptno, count(*) as cnt from emp group by empno,ename,job,mgr,hiredate,sal,comm,deptno minus select empno,ename,job,mgr,hiredate,sal,comm,deptno, count(*) as cnt from v group by empno,ename,job,mgr,hiredate,sal,comm,deptno ) EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ --- 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1 7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1 7839 KING PRESIDENT 17-NOV-1981 5000 10 1 7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1 The results are then combined by UNION ALL to produce the final result set. MySQL and SQL ServerThe queries before and after the UNION ALL are very similar. To understand how the subquery-based solution works, simply execute the query prior to the UNION ALL by itself. The query below is from lines 127 in the solution: select * from ( select e.empno,e.ename,e.job,e.mgr,e.hiredate, e.sal,e.comm,e.deptno, count(*) as cnt from emp e group by empno,ename,job,mgr,hiredate, sal,comm,deptno ) e where not exists ( select null from ( select v.empno,v.ename,v.job,v.mgr,v.hiredate, v.sal,v.comm,v.deptno, count(*) as cnt from v group by empno,ename,job,mgr,hiredate, sal,comm,deptno ) v where v.empno = e.empno and v.ename = e.ename and v.job = e.job and v.mgr = e.mgr and v.hiredate = e.hiredate and v.sal = e.sal and v.deptno = e.deptno and v.cnt = e.cnt and coalesce(v.comm,0) = coalesce(e.comm,0) ) EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ --- 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 1 7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 1 7839 KING PRESIDENT 17-NOV-1981 5000 10 1 7934 MILLER CLERK 7782 23-JAN-1982 1300 10 1 Notice that the comparison is not between table EMP and view V, but rather between inline view E and inline view V. The cardinality for each row is found and returned as an attribute for that row. You are comparing each row and its occurrence count. If you are having trouble understanding how the comparison works, run the subqueries independently. The next step is to find all rows (including CNT) in inline view E that do not exist in inline view V. The comparison uses a correlated subquery and NOT EXISTS. The joins will determine which rows are the same, and the result will be all rows from inline view E that are not the rows returned by the join. The query after the UNION ALL does the opposite; it finds all rows in inline view V that do not exist in inline view E: select * from ( select v.empno,v.ename,v.job,v.mgr,v.hiredate, v.sal,v.comm,v.deptno, count(*) as cnt from v group by empno,ename,job,mgr,hiredate, sal,comm,deptno ) v where not exists ( select null from ( select e.empno,e.ename,e.job,e.mgr,e.hiredate, e.sal,e.comm,e.deptno, count(*) as cnt from emp e group by empno,ename,job,mgr,hiredate, sal,comm,deptno ) e where v.empno = e.empno and v.ename = e.ename and v.job = e.job and v.mgr = e.mgr and v.hiredate = e.hiredate and v.sal = e.sal and v.deptno = e.deptno and v.cnt = e.cnt and coalesce(v.comm,0) = coalesce(e.comm,0) ) EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ --- 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 2 The results are then combined by UNION ALL to produce the final result set.
|