15.1 Know When to Use Specific Constructs | Mastering Oracle SQL, 2nd Edition

Depending on the circumstances, certain SQL constructs are preferable to others. For example, use of the EXISTS predicate is often preferable to DISTINCT. The next sections discuss the usage of such constructs.

15.1.1 EXISTS Is Preferable to DISTINCT

The DISTINCT keyword used in a SELECT clause eliminates duplicate rows in the result set. To eliminate those duplicates, Oracle performs a sort, and that sort requires time and disk space. Therefore, avoid using DISTINCT if you can tolerate having duplicate rows returned by a query. If you can't tolerate the duplicate rows, or your application can't handle them, use EXISTS in place of DISTINCT.

For example, assume you are trying to find the names of customers who have orders. Your query has to be based on two tables: customer and cust_order. Using DISTINCT, your query would be written as follows:

SELECT DISTINCT c.cust_nbr, c.name FROM customer c JOIN cust_order o ON c.cust_nbr = o.cust_nbr;

The corresponding execution plan for this query is as follows. Note the SORT operation, which is a result of DISTINCT being used.

Query Plan ----------------------------------------- SELECT STATEMENT   Cost = 3056   SORT UNIQUE     MERGE JOIN       INDEX FULL SCAN IND_ORD_CUST_NBR       SORT JOIN         TABLE ACCESS FULL CUSTOMER

To use EXISTS, the query needs to be rewritten as follows:

SELECT c.cust_nbr, c.name FROM customer c WHERE EXISTS (SELECT 1 FROM cust_order o WHERE c.cust_nbr = o.cust_nbr);

Here is the execution plan for the EXISTS version of the queries:

Query Plan --------------------------------------- SELECT STATEMENT   Cost = 320   FILTER     TABLE ACCESS FULL CUSTOMER     INDEX RANGE SCAN IND_ORD_CUST_NBR

Notice that the second query eliminates the overhead of the sort operation, and therefore runs faster.

15.1.2 WHERE Versus HAVING

We discussed the GROUP BY and HAVING clauses in Chapter 4. Sometimes, when writing a GROUP BY query, you have a condition that you can specify in either the WHERE or HAVING clause. In situations where you have a choice, you'll always get better performance if you specify the condition in the WHERE clause. The reason is that it's less expensive to eliminate rows before they are summarized than it is to eliminate results after summarization.

Let's look at an example illustrating the advantage of WHERE over HAVING. Here's a query with the HAVING clause that reports the number of orders in the year 2000:

SELECT year, COUNT(*) FROM orders GROUP BY year HAVING year = 2001;       YEAR   COUNT(*) ---------- ----------       2001       1440

The execution plan for this query is as follows:

Query Plan ------------------------------------------- SELECT STATEMENT   Cost = 6   FILTER     SORT GROUP BY       INDEX FAST FULL SCAN ORDERS_PK

Now, look at that same query, but with the year restriction in the WHERE clause:

SELECT year, COUNT(*) FROM orders WHERE year = 2001 GROUP BY year;      YEAR   COUNT(*) --------- ----------      2001       1440

The execution plan for this version of the query is:

Query Plan ------------------------------------- SELECT STATEMENT   Cost = 2   SORT GROUP BY NOSORT     INDEX FAST FULL SCAN ORDERS_PK

With the HAVING clause, the query performs the group operation first, and then filters the groups for the condition specified. The WHERE clause version of the query filters the rows before performing the group operation. The result of filtering with the WHERE clause is that there are fewer rows to summarize, and consequently, the query performs better.

However, you should note that not all types of filtering can be achieved using the WHERE clause. Sometimes, you may need to summarize the data first and then filter the summarized data based on the summarized values. In such situations, you have to filter using the HAVING clause, because only the HAVING clause can "see" summarized values. Moreover, there are situations when you may need to use the WHERE clause and the HAVING clause together in a query to filter the results the way you want. For details, see Chapter 4.

15.1.3 UNION Versus UNION ALL

We discussed UNION and UNION ALL in Chapter 7. UNION ALL combines the results of two SELECT statements. UNION combines the results of two SELECT statements, and then returns only distinct rows from the combination; duplicates are eliminated. It is, therefore, obvious that to remove the duplicates, UNION performs one extra step than UNION ALL. This extra step is a sort, which is costly in terms of performance. Therefore, whenever your application can handle duplicates or you are certain that no duplicates will result, consider using UNION ALL instead of UNION.

Let's look an example to understand this issue better. The following query uses UNION to return a list of orders where the sale price exceeds $50.00 or where the customer is located in region 5:

SELECT order_nbr, cust_nbr  FROM cust_order  WHERE sale_price > 50 UNION SELECT order_nbr, cust_nbr  FROM cust_order WHERE cust_nbr IN  (SELECT cust_nbr FROM customer WHERE region_id = 5);  ORDER_NBR   CUST_NBR ---------- ----------       1000          1       1001          1       1002          5       1003          4       1004          4       1005          8       1006          1       1007          5       1008          5       1009          1       1011          1       1012          1       1015          5       1017          4       1019          4       1021          8       1023          1       1025          5       1027          5       1029          1 20 rows selected.

The execution plan for this UNION query is:

Query Plan --------------------------------------------------------------------------- SELECT STATEMENT   Cost = 8   SORT UNIQUE     UNION-ALL       TABLE ACCESS FULL CUST_ORDER       HASH JOIN         TABLE ACCESS FULL CUSTOMER         TABLE ACCESS FULL CUST_ORDER

The following query uses UNION ALL instead of UNION to get the same information:

SELECT order_nbr, cust_nbr  FROM cust_order  WHERE sale_price > 50 UNION ALL SELECT order_nbr, cust_nbr  FROM cust_order WHERE cust_nbr IN  (SELECT cust_nbr FROM customer WHERE region_id = 5);  ORDER_NBR   CUST_NBR ---------- ----------       1001          1       1003          4       1005          8       1009          1       1012          1       1017          4       1021          8       1029          1       1001          1       1000          1       1002          5       1003          4       1004          4       1006          1       1007          5       1008          5       1009          1       1012          1       1011          1       1015          5       1017          4       1019          4       1023          1       1025          5       1027          5       1029          1 26 rows selected.

Note the duplicate rows in the output. However, note also that UNION ALL performs better than UNION, as you can see from the following execution plan:

Query Plan --------------------------------------------------------------------------- SELECT STATEMENT   Cost = 4   UNION-ALL     TABLE ACCESS FULL CUST_ORDER     HASH JOIN       TABLE ACCESS FULL CUSTOMER       TABLE ACCESS FULL CUST_ORDER

You can see that the extra operation (SORT UNIQUE) in the UNION makes it run slower than UNION ALL.

15.1.4 LEFT Versus RIGHT OUTER JOIN

As you have seen in Chapter 3, outer joins can be of type LEFT, RIGHT, or FULL. LEFT and RIGHT are really two ways of looking at the same operation. Mixing LEFT and RIGHT outer joins in the same application can cause confusion, as you and other programmers must constantly shift your point-of-view from one approach to the other. Use both LEFT and RIGHT outer joins in the same query, and you'll find your confusion greatly magnified. For example:

SELECT e.lname, j.function, d.name FROM job j LEFT OUTER JOIN employee e ON e.job_id = j.job_id            RIGHT OUTER JOIN department d ON e.dept_id = d.dept_id; LNAME                FUNCTION                       NAME -------------------- ------------------------------ ------------ MILLER               CLERK                          ACCOUNTING CLARK                MANAGER                        ACCOUNTING KING                 PRESIDENT                      ACCOUNTING SMITH                CLERK                          RESEARCH FORD                 ANALYST                        RESEARCH JONES                MANAGER                        RESEARCH SCOTT                ANALYST                        RESEARCH JAMES                CLERK                          SALES BLAKE                MANAGER                        SALES MARTIN               SALESPERSON                    SALES TURNER               SALESPERSON                    SALES ALLEN                SALESPERSON                    SALES                                                     OPERATIONS

Such confusion is unnecessary. Since both LEFT and RIGHT outer joins represent the same operation, but from differing points of view, you can simply pick one point of view and use it consistently. For example, many programmers write all outer joins as either FULL or LEFT, ignoring RIGHT.

The preceding query uses a LEFT and then a RIGHT outer join to do the following:

Connect an outer join from employee to job, with employee as the required table
Connect another outer join from department to the results from Step 1, with department as the required table

Using parentheses to explicitly state the above order of operations, you can rewrite the query using all LEFT outer joins, as follows:

SELECT e.lname, j.function, d.name FROM department d LEFT OUTER JOIN         (job j LEFT OUTER JOIN employee e          ON e.job_id = j.job_id)      ON e.dept_id = d.dept_id; LNAME                FUNCTION                       NAME -------------------- ------------------------------ ------------- MILLER               CLERK                          ACCOUNTING CLARK                MANAGER                        ACCOUNTING KING                 PRESIDENT                      ACCOUNTING SMITH                CLERK                          RESEARCH FORD                 ANALYST                        RESEARCH JONES                MANAGER                        RESEARCH SCOTT                ANALYST                        RESEARCH JAMES                CLERK                          SALES BLAKE                MANAGER                        SALES MARTIN               SALESPERSON                    SALES TURNER               SALESPERSON                    SALES ALLEN                SALESPERSON                    SALES                                                     OPERATIONS

The tradeoff here is between using parentheses and mixing RIGHT and LEFT outer joins. This second version of the query still joins employee to job, and then joins department to that result. The operations are exactly the same as in the previous version. This time, the parentheses make the order of operations clearer, and we personally find the second version of the query a bit easier to understand.

Table of content

Mastering Oracle SQL, 2nd Edition

ISBN: 0596006322
EAN: 2147483647

Year: 2003
Pages: 154

Authors: Sanjay Mishra, Alan Beaulieu

BUY ON AMAZON

Metrics and Models in Software Quality Engineering (2nd Edition)

Lotus Notes and Domino 6 Development (2nd Edition)

Java for RPG Programmers, 2nd Edition

Cisco CallManager Fundamentals (2nd Edition)

.NET System Management Services

FileMaker 8 Functions and Scripts Desk Reference