Combining the Results of Multiple Queries | Microsoft SQL Server 2008 Bible

It is possible to produce a single result combining the results of two or more queries. The combined resultset might be a simple aggregation of all records from the queries; or some operation related to the theory of sets (see Appendix L) could be performed before the final resultset was returned.

The SQL99 standard supports UNION, INTERSECT, and EXCEPT clauses that could be used to combine the results of two or more queries.

UNION

The following query returns all the records containing some information about customers that do not yet have an assigned salesman:

SELECT   phone_custid_fn OWNER_ID,          'CUSTOMER PHONE' PHONE_TYPE,          phone_phonenum_s FROM     phone WHERE    phone_type_s = 'PHONE' AND      phone_salesmanid_fn IS NULL     OWNER_ID    PHONE_TYPE     phone_phonenum_s      ----------- -------------- --------------------  51          CUSTOMER PHONE (817) 555-5524 5           CUSTOMER PHONE (847) 555-2924 12          CUSTOMER PHONE (508) 555-5224 61          CUSTOMER PHONE (541) 555-3091 . . .       . . .          . . . 16          CUSTOMER PHONE (201) 555-9302 89          CUSTOMER PHONE (908) 555-3779 85          CUSTOMER PHONE (281) 555-2835 152         CUSTOMER PHONE (541) 555-8979     37 records selected

This query returns a total of 37 records. Now, assume that you also would like to include in the resultset the list of salesmen's phones who do not have a customer assigned to them yet. Here is the query to find these salesmen; it returns six records:

SELECT      phone_salesmanid_fn,             'SALESMAN PHONE',             phone_phonenum_s FROM        phone WHERE       phone_type_s = 'PHONE' AND         phone_custid_fn IS NULL     phone_salesmanid_fn                phone_phonenum_s      ------------------- -------------- --------------------  23                  SALESMAN PHONE (305) 555-8502 24                  SALESMAN PHONE (626) 555-4435 25                  SALESMAN PHONE (717) 555-5479 26                  SALESMAN PHONE (718) 555-7879 27                  SALESMAN PHONE (718) 555-5091 28                  SALESMAN PHONE (814) 555-0324     6 records selected

To combine these records into a single resultset, you would use the UNION statement:

SELECT             phone_custid_fn OWNER_ID,                    'CUSTOMER PHONE' PHONE_TYPE,                    phone_phonenum_s FROM               phone WHERE              phone_type_s = 'PHONE' AND                phone_salesmanid_fn IS NULL UNION SELECT             phone_salesmanid_fn,                    'SALESMAN PHONE',                    phone_phonenum_s FROM               phone WHERE              phone_type_s = 'PHONE' AND                phone_custid_fn IS NULL ORDER BY   2, 1       OWNER_ID PHONE_TYPE     PHONE_PHONENUM_S ---------- -------------- --------------------          1 CUSTOMER PHONE (909) 555-9957          2 CUSTOMER PHONE (501) 555-5702          3 CUSTOMER PHONE (520) 555-5513          4 CUSTOMER PHONE (802) 555-2091         ...         26 SALESMAN PHONE (718) 555-7879         27 SALESMAN PHONE (718) 555-5091         28 SALESMAN PHONE (814) 555-0324     43 rows selected.

Now you have a full list that includes all records from the query about customers, combined with the results brought by the query about salesmen. You may visualize this as two resultsets glued together. All queries in an SQL statement containing a UNION operator must have an equal number of expressions in their lists. In addition, these expressions (which could be columns, literals, results of functions, etc.) must be of compatible data types: For example, if the expression evaluates to a character string in one query, it cannot be a number in the second query that is joined to the first by the UNION operator.

The results of UNION could be ordered (as we can see in the UNION query above) but the ORDER BY clause could be used only with the final resultset — that is, it can refer to the result of the UNION, not to particular queries used in it.

If the queries potentially could bring duplicate records, you may want to filter the duplicates, or, conversely, make sure that they all are present. By default, the UNION operator excludes duplicate records; specifying UNION ALL makes sure that your final resultset has all the records returned by all the queries participating in the UNION.

Here is a simple query returning status code and description from the STATUS table of the ACME database:

SELECT status_code_s,         status_desc_s FROM   status     ST STATUS_DESC_S -- ------------------------------ 60 SHIPPED 20 COMPLETE 70 INVOICED 80 CANCELLED     4 rows selected.

You can use exactly the same query to produce a resultset containing the duplicate rows, and combine the resultsets together using UNION:

SELECT status_code_s,         status_desc_s FROM   status UNION SELECT status_code_s,         status_desc_s FROM   status     ST STATUS_DESC_S -- ------------------------------ 20 COMPLETE 60 SHIPPED 70 INVOICED 80 CANCELLED     4 rows selected.

As you can see, the duplicates (i.e., the record where each corresponding column matched in both queries) were excluded, and the final record count is still 4, which is what you could have gotten from the one query only. This behaviour is in full compliance with the Set Theory. Running the same queries combined with the UNION ALL operator (which overrides this behaviour) returns all records from both queries, no duplicates are excluded:

SELECT   status_code_s,           status_desc_s FROM     status UNION ALL SELECT   status_code_s,           status_desc_s FROM     status     ST STATUS_DESC_S -- ------------------------------ 60 SHIPPED 20 COMPLETE 70 INVOICED 80 CANCELLED 60 SHIPPED 20 COMPLETE 70 INVOICED 80 CANCELLED     8 records selected.

Note

The ALL keyword in the UNION operator brings the business world into our discussion: UNION ALL does not comply with set theory, where Idempotent Law A U A = A (see Appendix L) simply states that union of a set with itself is the same union. It is more intuitive from a business point of view to expect all records from constituent queries combined in the final resultset.

INTERSECT

The INTERSECT operator is used to evaluate results returned by two queries but includes only the records produced by the first query that have matching ones in the second. This operator is implemented in Oracle and IBM DB2 UDB but not in Microsoft SQL Server 2000, which uses EXISTS operator for this purpose.

Consider the query that selects customer IDs (field CUST_ID_N) from the CUSTOMER table of the ACME database and intersects them with results returned by a second query, producing a resultset of customer's IDs who placed an order:

SELECT  cust_id_n FROM    customer INTERSECT SELECT  ordhdr_custid_fn  FROM    order_header   CUST_ID_N ----------          1          7         55         63        152     5 records selected

The same results are achievable in a variety of ways. Here is an example, using a subquery and an IN operator:

SELECT cust_id_n FROM   customer WHERE  cust_id_n IN  (SELECT ordhdr_custid_fn   FROM   order_header)     CUST_ID_N ----------          1          7         55         63        152     5 records selected

MS SQL Server does not have INTERSECT, offering the EXISTS operator instead. The following query could be rewritten in SQL Sever syntax using this operator and a correlated query:

SELECT  cust_id_n FROM    customer WHERE EXISTS    (SELECT ordhdr_custid_fn      FROM   order_header     WHERE  ordhdr_custid_fn = cust_id_n)     cust_id_n -----------  1 7 55 63 152 (5 row(s) affected)

Note

The EXISTS keyword is common for all our "big three" databases, so this query would work for all of them.

While INTERSECT could be replaced with a combination of other SQL methods, it contributes to the clarity of the SQL code and speeds up its execution because it is more optimized than a subquery.

EXCEPT (MINUS)

When combining the results of two or more queries into a single resultset, you may want to exclude some records from the first query based on what was returned by the second. This keyword is implemented in IBM DB2 UDB only, whereas Oracle uses the MINUS keyword and Microsoft SQL Server 2000 uses EXISTS for the same purpose. The operation's functionality directly follows the rules of set theory, covered in Appendix L.

Consider the following two queries; both select some information about a customer. Here is the resultset returned by the first query:

SELECT cust_id_n FROM   customer     cust_id_n    -----------  61 64 14 65 . . . 152 . . . 6 7 1     37 rows selected

The first query returns 37 rows. And here are the results of the second query:

SELECT ordhdr_custid_fn  FROM   order_header     ordhdr_custid_fn  ----------------  NULL NULL 1 1 . . . 1 7 7 . . . 152     (51 row(s) affected)

The EXCEPT result will be all the records from the first minus those returned by the second.

SELECT  cust_id_n FROM    customer MINUS SELECT  ordhdr_custid_fn  FROM    order_header;     cust_id_n    -----------  61 . .  68 69 15 . . . 9 12 . . . 2 3     (32 row(s) affected)

And the MINUS operation brings 32 records. Five of the records from the second query were excepted from the records returned by the first query. We've shortened the output of all three queries to fit the page, but you may run the query yourself on the RDBMS of your choice; with any luck the results would match these described in the paragraph.

Oracle's name for this operator is MINUS, while DB2 UDB retains the SQL99 standard — EXCEPT.

The DB2 UDB syntax for this query would be identical save for the use of the EXCEPT operator:

SELECT cust_id_n FROM   customer EXCEPT SELECT ordhdr_custid_fn  FROM   order_header

Microsoft SQL Server 2000 offers the NOT EXISTS operator instead; surprisingly this syntax, while not part of the SQL99 standard, is common to all "big-three" databases. To get the results with the NOT EXISTS operator you need to use a correlated subquery. A correlated subquery differs from a regular subquery in that it accepts a parameter from the outer query as a criterion in the WHERE clause:

SELECT  cust_name_s,          cust_alias_s,          cust_status_s FROM    customer cust WHERE NOT EXISTS (SELECT *                   FROM order_header                   WHERE ordhdr_custid_fn = cust.cust_id_n)     cust_name_s                       cust_alias_s   cust_status_s  --------------------------------- --------------- -------------  MAGNETICS USA INC.                                 MNGA71398       Y MAGNETOMETRIC DEVICES INC.                         MNGA71400       Y FAIR PARK GARDENS                                  NULL            Y FAIR AND SONS AIR CONDTNG                          FRIA71406       Y . . .                                              . . .         ... INSULECTRO INC.                                    IUSN71521       Y INTEGRATED POWER DESIGNS                           IETN71523       Y EASTERN SATELLITE COMPANY                          ETSA71626       Y     (32 row(s) affected)

In this case, we used the CUST_ID_N field of the CUSTOMER table as a criterion for the subquery to limit the number of the potential records. If there is a record for a customer ID (ORDHDR_CUSTID_FN = CUST.CUST_ID_N) in the ORDER_HEADER table, the record will be excluded from the outer query final resultset; only records for customers that have not placed orders will be returned.