INTERSECT Function | Microsoft OLE DB 2.0 Programmers Reference and Data Access SDK (Microsoft Professional Editions)

OLE DB Programmer's Reference

This section explains how the INTERSECT function handles duplicates.

Removing Duplicates

Consider INTERSECT(S1, S2) where:

S1 = {Kansas, Buffalo, Buffalo, Topeka, Topeka, Wichita, Canada, BC}

S2 = {Buffalo, Topeka, Topeka, Canada, Quebec}

As shown in the section "Literal Sets," create two tables—S1 and S2—that contain the fully qualified member names in the sets S1 and S2, respectively.

The following query does the intersect:

SELECT Name, NewRank AS Rank FROM          (((SELECT Name, Rank FROM S1          WHERE S1.Name IN (SELECT Name FROM S2))       RANKTOP 1 LEVELS RANKORDER BY Rank RANKGROUP BY Name)    RANK ROWS AS NewRank RANKORDER BY Rank) ORDER BY Rank

In English, the query does the following:

Selects the rows of S1 that are also in S2.
The RANKTOP operation groups duplicates into separate groups by using the GROUP BY clause, orders each group by Rank, and then picks up the first one. This ensures that duplicates are eliminated from the tail.
The RANK operation orders the filtered rows by Rank and assigns a new rank to each row so that input ordering is preserved.

Note You cannot use the INTERSECT operation of SQL-92 because duplicate elimination is not guaranteed to remove duplicates from the tail.

Retaining Duplicates

Consider INTERSECT(S1, S2, [ALL]). As in the preceding section, create the tables S1 and S2. Then use the following query:

SELECT Name, NewRank AS Rank FROM       ((SELECT Name, Rank FROM S1       WHERE S1.Name IN (SELECT Name FROM S2))    RANK ROWS AS NewRank RANKORDER BY Rank) ORDER BY Rank

This is simpler because there is no need to remove duplicates. After the query that finds the intersect, the RANK operation generates a rank value for each of the rows of the resulting table. The outer SELECT produces this table, ordering it by the NewRank (renamed as Rank) column.

Note You cannot use INTERSECT ALL from SQL-92 because its semantics are as follows: If a row r appears n times in S1 and m times in S2, then r appears exactly p times in the result, where p is the lesser of m and n. Moreover, the order of these p rows is not preserved.

Generalizing for Sets with Arbitrary Dimensionality

Each tuple has more than one element for these sets. Replace the NAME column in the select list of each SELECT statement above with a list of columns (equivalent to the number of dimensions in the input sets). Add as many IN clauses in the WHERE as there are columns, one for each column.