Recipe3.7.Determining Whether Two Tables Have the Same Data


Recipe 3.7. Determining Whether Two Tables Have the Same Data

Problem

You want to know if two tables or views have the same data (cardinality and values). Consider the following view:

  create view V as select * from emp where deptno != 10  union all select * from emp where ename = 'WARD' select * from V EMPNO ENAME       JOB        MGR  HIREDATE      SAL  COMM DEPTNO ----- ----------  --------- ----- ----------- ----- ----- ------  7369 SMITH       CLERK      7902 17-DEC-1980   800           20  7499 ALLEN       SALESMAN   7698 20-FEB-1981  1600   300     30  7521 WARD        SALESMAN   7698 22-FEB-1981  1250   500     30  7566 JONES       MANAGER    7839 02-APR-1981  2975           20  7654 MARTIN      SALESMAN   7698 28-SEP-1981  1250  1300     30  7698 BLAKE       MANAGER    7839 01-MAY-1981  2850           30  7788 SCOTT       ANALYST    7566 09-DEC-1982  3000           20  7844 TURNER      SALESMAN   7698 08-SEP-1981  1500     0     30  7876 ADAMS       CLERK      7788 12-JAN-1983  1100           20  7900 JAMES       CLERK      7698 03-DEC-1981   950           30  7902 FORD        ANALYST    7566 03-DEC-1981  3000           20  7521 WARD        SALESMAN   7698 22-FEB-1981  1250   500     30 

You want to determine whether or not this view has exactly the same data as table EMP. The row for employee "WARD" is duplicated to show that the solution will reveal not only different data but duplicates as well. Based on the rows in table EMP the difference will be the three rows for employees in department 10 and the two rows for employee "WARD". You want to return the following result set:

 EMPNO ENAME       JOB         MGR  HIREDATE      SAL   COMM DEPTNO CNT ----- ----------  ---------  ----- ----------- -----  ----- ------ ---  7521 WARD        SALESMAN    7698 22-FEB-1981  1250    500     30   1  7521 WARD        SALESMAN    7698 22-FEB-1981  1250    500     30   2  7782 CLARK       MANAGER     7839 09-JUN-1981  2450            10   1  7839 KING        PRESIDENT        17-NOV-1981  5000            10   1  7934 MILLER      CLERK       7782 23-JAN-1982  1300            10   1 

Solution

Functions that perform SET difference (MINUS or EXCEPT, depending on your DBMS) make the problem of comparing tables a relatively easy one to solve. If your DBMS does not offer such functions, you can use a correlated subquery.

DB2 and PostgreSQL

Use the set operations EXCEPT and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V:

  1  (  2   select empno,ename,job,mgr,hiredate,sal,comm,deptno,  3          count(*) as cnt  4     from V  5    group by empno,ename,job,mgr,hiredate,sal,comm,deptno  6   except  7   select empno,ename,job,mgr,hiredate,sal,comm,deptno,  8          count(*) as cnt  9     from emp 10    group by empno,ename,job,mgr,hiredate,sal,comm,deptno 11   ) 12    union all 13   ( 14   select empno,ename,job,mgr,hiredate,sal,comm,deptno, 15          count(*) as cnt 16     from emp 17    group by empno,ename,job,mgr,hiredate,sal,comm,deptno 18   except 19   select empno,ename,job,mgr,hiredate,sal,comm,deptno, 20          count(*) as cnt 21     from v 22    group by empno,ename,job,mgr,hiredate,sal,comm,deptno 23  ) 

Oracle

Use the set operations MINUS and UNION ALL to find the difference between view V and table EMP combined with the difference between table EMP and view V:

  1  (  2   select empno,ename,job,mgr,hiredate,sal,comm,deptno,  3          count(*) as cnt  4     from  V  5    group by empno,ename,job,mgr,hiredate,sal,comm,deptno  6    minus  7   select empno,ename,job,mgr,hiredate,sal,comm,deptno,  8          count(*) as cnt  9     from emp 10    group by empno,ename,job,mgr,hiredate,sal,comm,deptno 11   ) 12     union all 13   ( 14   select empno,ename,job,mgr,hiredate,sal,comm,deptno, 15          count(*) as cnt 16     from emp 17    group by empno,ename,job,mgr,hiredate,sal,comm,deptno 18    minus 19   select empno,ename,job,mgr,hiredate,sal,comm,deptno, 20          count(*) as cnt 21     from v 22    group by empno,ename,job,mgr,hiredate,sal,comm,deptno 23  ) 

MySQL and SQL Server

Use a correlated subquery and UNION ALL to find the rows in view V and not in table EMP combined with the rows in table EMP and not in view V:

  1  select *  2    from (  3  select e.empno,e.ename,e.job,e.mgr,e.hiredate,  4         e.sal,e.comm,e.deptno, count(*) as cnt  5    from emp e  6   group by empno,ename,job,mgr,hiredate,  7           sal,comm,deptno  8         ) e  9   where not exists ( 10  select null 11    from ( 12  select v.empno,v.ename,v.job,v.mgr,v.hiredate, 13         v.sal,v.comm,v.deptno, count(*) as cnt 14    from v 15   group by empno,ename,job,mgr,hiredate, 16            sal,comm,deptno 17         ) v 18    where v.empno     = e.empno 19       and v.ename    = e.ename 20       and v.job      = e.job 21       and v.mgr      = e.mgr 22       and v.hiredate = e.hiredate 23       and v.sal      = e.sal 24       and v.deptno   = e.deptno 25       and v.cnt      = e.cnt 26       and coalesce(v.comm,0) = coalesce(e.comm,0) 27  ) 28     union all 29    select * 30      from ( 31    select v.empno,v.ename,v.job,v.mgr,v.hiredate, 32           v.sal,v.comm,v.deptno, count(*) as cnt 33      from v 34    group by empno,ename,job,mgr,hiredate, 35             sal,comm,deptno 36          ) v 37    where not exists ( 38   select null 39     from ( 40   select e.empno,e.ename,e.job,e.mgr,e.hiredate, 41          e.sal,e.comm,e.deptno, count(*) as cnt 42     from emp e 43    group by empno,ename,job,mgr,hiredate, 44             sal,comm,deptno 45          ) e 46    where v.empno     = e.empno 47      and v.ename     = e.ename 48      and v.job       = e.job 49      and v.mgr       = e.mgr 50      and v.hiredate  = e.hiredate 51      and v.sal       = e.sal 52      and v.deptno    = e.deptno 53      and v.cnt       = e.cnt 54      and coalesce(v.comm,0) = coalesce(e.comm,0) 55  ) 

Discussion

Despite using different techniques, the concept is the same for all solutions:

  1. First, find rows in table EMP that do not exist in view V.

  2. Then combine (UNION ALL) those rows with rows from view V that do not exist in table EMP.

If the tables in question are equal, then no rows are returned. If the tables are different, the rows causing the difference are returned. As an easy first step when comparing tables, you can compare the cardinalities alone rather than including them with the data comparison. The following query is a simple example of this and will work on all DBMSs:

  select count(*)   from emp  union select count(*)   from dept COUNT(*) --------        4       14 

Because UNION will filter out duplicates, only one row will be returned if the tables' cardinalities are the same. Because two rows are returned in this example, you know that the tables do not contain identical rowsets.

DB2, Oracle, and PostgreSQL

MINUS and EXCEPT work in the same way, so I will use EXCEPT for this discussion. The queries before and after the UNION ALL are very similar. So, to understand how the solution works, simply execute the query prior to the UNION ALL by itself. The following result set is produced by executing lines 111 in the solution section:

  (   select empno,ename,job,mgr,hiredate,sal,comm,deptno,          count(*) as cnt     from V    group by empno,ename,job,mgr,hiredate,sal,comm,deptno   except   select empno,ename,job,mgr,hiredate,sal,comm,deptno,          count(*) as cnt     from emp   group by empno,ename,job,mgr,hiredate,sal,comm,deptno )  EMPNO ENAME      JOB        MGR  HIREDATE      SAL  COMM DEPTNO CNT  ----- ---------- --------- ----- ----------- ----- ----- ------ ---  7521  WARD       SALESMAN   7698 22-FEB-1981  1250   500     30   2 

The result set represents a row found in view V that is either not in table EMP or has a different cardinality than that same row in table EMP. In this case, the duplicate row for employee "WARD" is found and returned. If you're still having trouble understanding how the result set is produced, run each query on either side of EXCEPT individually. You'll notice the only difference between the two result sets is the CNT for employee "WARD" returned by view V.

The portion of the query after the UNION ALL does the opposite of the query preceding UNION ALL. The query returns rows in table EMP not in view V:

  (   select empno,ename,job,mgr,hiredate,sal,comm,deptno,          count(*) as cnt     from emp    group by empno,ename,job,mgr,hiredate,sal,comm,deptno    minus   select empno,ename,job,mgr,hiredate,sal,comm,deptno,          count(*) as cnt     from v   group by empno,ename,job,mgr,hiredate,sal,comm,deptno ) EMPNO ENAME      JOB         MGR HIREDATE      SAL  COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ --- 7521  WARD       SALESMAN   7698 22-FEB-1981  1250   500     30   1 7782  CLARK      MANAGER    7839 09-JUN-1981  2450           10   1 7839  KING       PRESIDENT       17-NOV-1981  5000           10   1 7934  MILLER     CLERK      7782 23-JAN-1982  1300           10   1 

The results are then combined by UNION ALL to produce the final result set.

MySQL and SQL Server

The queries before and after the UNION ALL are very similar. To understand how the subquery-based solution works, simply execute the query prior to the UNION ALL by itself. The query below is from lines 127 in the solution:

  select *    from (  select e.empno,e.ename,e.job,e.mgr,e.hiredate,         e.sal,e.comm,e.deptno, count(*) as cnt    from emp e   group by empno,ename,job,mgr,hiredate,            sal,comm,deptno         ) e   where not exists ( select null   from ( select v.empno,v.ename,v.job,v.mgr,v.hiredate,        v.sal,v.comm,v.deptno, count(*) as cnt   from v  group by empno,ename,job,mgr,hiredate,           sal,comm,deptno        ) v   where v.empno    = e.empno     and v.ename    = e.ename     and v.job      = e.job     and v.mgr      = e.mgr     and v.hiredate = e.hiredate     and v.sal      = e.sal     and v.deptno   = e.deptno     and v.cnt      = e.cnt     and coalesce(v.comm,0) = coalesce(e.comm,0)  )  EMPNO ENAME      JOB         MGR HIREDATE      SAL  COMM DEPTNO CNT  ----- ---------- --------- ----- ----------- ----- ----- ------ ---   7521 WARD       SALESMAN   7698 22-FEB-1981  1250   500     30   1   7782 CLARK      MANAGER    7839 09-JUN-1981  2450           10   1   7839 KING       PRESIDENT       17-NOV-1981  5000           10   1   7934 MILLER     CLERK      7782 23-JAN-1982  1300           10   1 

Notice that the comparison is not between table EMP and view V, but rather between inline view E and inline view V. The cardinality for each row is found and returned as an attribute for that row. You are comparing each row and its occurrence count. If you are having trouble understanding how the comparison works, run the subqueries independently. The next step is to find all rows (including CNT) in inline view E that do not exist in inline view V. The comparison uses a correlated subquery and NOT EXISTS. The joins will determine which rows are the same, and the result will be all rows from inline view E that are not the rows returned by the join. The query after the UNION ALL does the opposite; it finds all rows in inline view V that do not exist in inline view E:

  select *   from ( select v.empno,v.ename,v.job,v.mgr,v.hiredate,        v.sal,v.comm,v.deptno, count(*) as cnt   from v  group by empno,ename,job,mgr,hiredate,           sal,comm,deptno        ) v   where not exists ( select null   from (  select e.empno,e.ename,e.job,e.mgr,e.hiredate,         e.sal,e.comm,e.deptno, count(*) as cnt    from emp e   group by empno,ename,job,mgr,hiredate,            sal,comm,deptno        ) e   where v.empno    = e.empno     and v.ename    = e.ename     and v.job      = e.job     and v.mgr      = e.mgr     and v.hiredate = e.hiredate     and v.sal      = e.sal     and v.deptno   = e.deptno     and v.cnt      = e.cnt     and coalesce(v.comm,0) = coalesce(e.comm,0) ) EMPNO ENAME      JOB         MGR HIREDATE      SAL  COMM DEPTNO CNT ----- ---------- --------- ----- ----------- ----- ----- ------ ---  7521 WARD       SALESMAN   7698 22-FEB-1981  1250   500     30   2 

The results are then combined by UNION ALL to produce the final result set.

Ales Spectic and Jonathan Gennick give an alternate solution in their book Transact-SQL Cookbook (O'Reilly). See the section "Comparing Two Sets for Equality" in Chapter 2.





SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net