Set Operators


Set operators combine the results of two or more queries into a single query result. The set operators in Oracle are UNION, UNION ALL, INTERSECT, and MINUS.

All of the set operators have the same precedence. To override the default left-to-right evaluation, use parentheses to group SELECT statements that you want evaluated first.

UNION and UNION ALL

The UNION operator will combine two query result sets into a single result set, sorted by the first column of the SELECT clause for both queries. The syntax for using UNION is very straightforward: Two queries that can otherwise stand alone are combined with the keyword UNION. The first query does not need a semicolon; the entire SQL statement is terminated by a single semicolon, after the second query.

There are a few rules in force when writing a compound query using UNION. The number of columns in both queries must match, and the corresponding columns must also have the same datatypes. The names of the columns need not match though; the query result will use the column names from the first query.

A compound query using UNION removes duplicates by using a sort operation before returning the results of the query. The values of all columns must be equal for one of the rows to be removed from the query result. This is one of the few cases where a NULL value in one of the queries is considered to be equal to a corresponding NULL value in the other query.

UNION ALL operates in much the same way as UNION, except that duplicates are not removed. A row that exists in both queries will show up twice in the results. Because a UNION ALL does not need to remove duplicates, a sort operation does not occur. Therefore, a UNION ALL will usually return results faster than a UNION with the same queries. If you know ahead of time that the two queries do not have duplicates, use UNION ALL.

At Scott’s widget company, the database not only keeps track of an employee’s current information in the EMPLOYEES table, but it also keeps track of what jobs the employees have held in the past in the JOB_HISTORY table. The boss, King, wants to get a report that includes both the current and previous positions held by employees in the company, along with the beginning and ending dates for when the employee held that position. Janice realizes that she’ll need a UNION or UNION ALL operation, plus a sort operation. She is not sure how she will retrieve the employee names from the JOB_HISTORY table, since it has only the employee’s ID number.

Her first attempt at a query tries to combine the job history information with the current employment information, as follows:

select employee_id, last_name, hire_date, job_id, department_id from employees union select employee_id, start_date, end_date, job_id, department_id from job_history; select employee_id, last_name, hire_date, job_id, department_id                     * ERROR at line 1: ORA-01790: expression must have same datatype as      corresponding expression

The two queries have the same number of columns, but the datatypes of the corresponding columns don’t match. This is because the employee data doesn’t have an ending date, and the JOB_HISTORY table doesn’t have a column to store the employee name. To fix this problem, Janice changes the first query to include a NULL value for an ending date (since the EMPLOYEES file has only active employees):

select employee_id emp#, last_name, hire_date,    NULL end_date, job_id, department_id dept# from employees

She changes the second query to include a constant of an empty string to be a placeholder to match the name in the other query:

select employee_id, ‘’, start_date,    end_date, job_id, department_id from job_history;

The resultant query using the UNION operator looks like this:

select employee_id emp#, last_name, hire_date,    NULL end_date, job_id, department_id dept# from employees union select employee_id, ‘’, start_date,    end_date, job_id, department_id from job_history;  EMP# LAST_NAME     HIRE_DATE END_DATE  JOB_ID      DEPT# ----- ------------- --------- --------- ---------- ------   100 King          17-JUN-87           AD_PRES        90   101 Kochhar       21-SEP-89           AD_VP          90   101               21-SEP-89 27-OCT-93 AC_ACCOUNT    110   101               28-OCT-93 15-MAR-97 AC_MGR        110   102 De Haan       13-JAN-93           AD_VP          90   102               13-JAN-93 24-JUL-98 IT_PROG        60   103 Hunold        03-JAN-90           IT_PROG        60 ...   201 Hartstein     17-FEB-96           MK_MAN         20   201               17-FEB-96 19-DEC-99 MK_REP         20   202 Fay           17-AUG-97           MK_REP         20   203 Mavris        07-JUN-94           HR_REP         40   204 Baer          07-JUN-94           PR_REP         70   205 Higgins       07-JUN-94           AC_MGR        110   206 Gietz         07-JUN-94           AC_ACCOUNT    110 117 rows selected.

Since the UNION of the two queries will result in adjacent employee IDs due to the default sort behavior of the UNION operator, the report makes sense to King. From this report, he can see that Kochhar was employed as both an account representative and account manager, before becoming a vice president in her current position.

Also worth noting in this report is that the columns EMPLOYEE_ID and DEPARTMENT_ID were assigned column aliases in the first query, and so those aliases applied to the entire result.

But, of course, Janice is not satisfied with the results of the report. The HIRE_DATE column should really be a starting date for the employee in that department, but for the rows in the EMPLOYEE table, it is the employee’s starting date at the company. To make the column more accurate, Janice changes the column alias for the first query to STRT_DATE and makes it a correlated subquery, so that the date is actually the date the employees started in their current department:

select employee_id emp#, last_name,      coalesce(      (select max(end_date)+1       from job_history       where employee_id = emp.employee_id),       hire_date) strt_date,    NULL end_date, job_id, department_id dept# from employees emp union select employee_id, ‘’, start_date,    end_date, job_id, department_id from job_history order by emp# asc, strt_date desc;  EMP# LAST_NAME     STRT_DATE END_DATE  JOB_ID      DEPT# ----- ------------- --------- --------- ---------- ------   100 King          17-JUN-87           AD_PRES        90   101 Kochhar       16-MAR-97           AD_VP          90   101               28-OCT-93 15-MAR-97 AC_MGR        110   101               21-SEP-89 27-OCT-93 AC_ACCOUNT    110   102 De Haan       25-JUL-98           AD_VP          90   102               13-JAN-93 24-JUL-98 IT_PROG        60   103 Hunold        03-JAN-90           IT_PROG        60 ...   201 Hartstein     20-DEC-99           MK_MAN         20   201               17-FEB-96 19-DEC-99 MK_REP         20   202 Fay           17-AUG-97           MK_REP         20   203 Mavris        07-JUN-94           HR_REP         40   204 Baer          07-JUN-94           PR_REP         70   205 Higgins       07-JUN-94           AC_MGR        110   206 Gietz         07-JUN-94           AC_ACCOUNT    110 117 rows selected.

There are two differences between this query and the previous one. A minor difference is that the query result is sorted by employee number in ascending order and by the starting date in descending order. King wants to see the employee’s most recent job first.

The second difference is a bit more complex. Janice’s goal was to find out if the employee had any previous jobs, and if so, return the ending date for the last job that employee had. Remember that you can have the SQL text (in parentheses) of a correlated subquery in the SELECT, FROM, or WHERE clause of the parent query. In this case, the correlated subquery is as follows:

(select max(end_date)+1       from job_history       where employee_id = emp.employee_id)

For each row in the EMPLOYEE table, this subquery will find the last date that the employee worked in any department and adds one day, resulting in the first date that the employee started in their current position. But if the employee has never switched departments, there will be no rows in the JOB_HISTORY table, and therefore the subquery will return a NULL result. The solution is to wrap the COALESCE function around the query.

The COALESCE function will return the first non-NULL argument in the argument list. The HIRE_DATE column is specified as the second argument to COALESCE, so if the employee has never switched departments, the original hire date will be returned from this function:

coalesce(      (select max(end_date)+1       from job_history       where employee_id = emp.employee_id),       hire_date) strt_date,

To reiterate, the above section of SQL evaluates to either the first day employees started in their current department or their hiring date, if they have never switched departments. The column alias STRT_DATE is assigned to this derived column.

The next morning, Janice realizes that she could have used UNION ALL instead of UNION in this query. There will never be any duplicate records between the two queries in this compound query, mainly because the database does not store the employee’s current job position and starting date in the JOB_HISTORY table.

Tip

DBAs should be on the lookout for queries that use UNION when UNION ALL would produce the same desired results. Because UNION does a sort while removing duplicates, many UNION queries will have a much more noticeable performance impact on the system than the same queries that use UNION ALL.

INTERSECT

There are times when you need to know which rows two tables or queries have in common. The INTERSECT operator provides this functionality. As with the UNION operator, the number and types of the columns in the two queries to be compared must be the same, but the column names can be different. Rows are returned from an INTERSECT operation only if all columns in the two queries match.

In Scott’s widget database, the current employment information is kept in the EMPLOYEES table, and the previous employment information (when employees have changed jobs) is kept in the JOB_HISTORY table. The boss wants to find out which employees have changed departments multiple times and have come back to work in the department they worked in previously, with the same job title. Janice knows that she needs to use the EMPLOYEES and JOB_HISTORY tables, and decides to use the INTERSECT operator to see if there are current employees in a particular department and job title that are also in the JOB_HISTORY table. Janice realizes that a multicolumn join in a WHERE clause may produce similar results, but she thinks that the INTERSECT method is more straightforward and easier to use and maintain. Her first query looks like this:

select employee_id, job_id, department_id from employees intersect select employee_id, job_id, department_id from job_history; EMPLOYEE_ID JOB_ID     DEPARTMENT_ID ----------- ---------- -------------         176 SA_REP                80 1 row selected.

King looks at this report and thinks that something is amiss. He is sure that there was another employee besides employee number 176 who has changed job titles and came back to work with her original job title. Janice realizes that she is comparing too many columns, and she rewrites her query as follows:

select employee_id, job_id from employees intersect select employee_id, job_id from job_history; EMPLOYEE_ID JOB_ID ----------- ----------         176 SA_REP         200 AD_ASST 2 rows selected.

As King suspected, employee number 200 is back working with her old job title, after previously switching departments. Because one of the three columns was different in the previous query, employee number 200 did not show up in the results.

Now that Janice has the result set that King was looking for, she decides that it would be more readable if the employee’s last name and first name were in the report also. The problem is, she can’t add it to the EMPLOYEES query with the INTERSECT operator, since the JOB_HISTORY table does not have the employee last name, and as a result the compound INTERSECT query would not return any rows. Instead, she treats the last query as a subquery and joins it back to the EMPLOYEES table:

select e.employee_id, e.last_name, e.first_name,    e.job_id from employees e inner join     (select employee_id, job_id from employees      intersect      select employee_id, job_id from job_history) i on e.employee_id = i.employee_id; EMPLOYEE_ID LAST_NAME         FIRST_NAME     JOB_ID ----------- ----------------- -------------- ----------         176 Taylor            Jonathon       SA_REP         200 Whalen            Jennifer       AD_ASST 2 rows selected.

Notice that Janice is using Oracle9i’s new INNER JOIN syntax. The query in parentheses is treated just as if it were another table being joined in the new query.

MINUS

The MINUS compound-query operator returns rows from the first query only if they are not in a second query. In other words, the second query is subtracted from the first query. Any rows in the second query that are not in the first query are ignored and do not affect the results of the entire compound query. As with the UNION operator, the number and types of the columns in the two queries to be compared must be the same, but the column names can be different.

The boss wants to make sure that the company’s expansion plans are going well, and he wants to know which countries don’t yet have a department located in that country. Janice realizes that a MINUS operator might do the trick here. She can subtract the countries with departments from a query with the COUNTRIES table. The first part of her query is straightforward. It is a SELECT from the COUNTRIES table:

select country_id, country_name from countries; CO COUNTRY_NAME -- ---------------------------------------- AR Argentina AU Australia BE Belgium BR Brazil CA Canada CH Switzerland CN China DE Germany DK Denmark EG Egypt FR France HK HongKong IL Israel IN India IT Italy JP Japan KW Kuwait MX Mexico NG Nigeria NL Netherlands SG Singapore UK United Kingdom US United States of America ZM Zambia ZW Zimbabwe 25 rows selected.

The second part is a bit trickier. She needs to subtract the countries in which the departments reside. The DEPARTMENTS table does not have a COUNTRY_ID column, but it does have a LOCATION_ID column. The LOCATIONS table has a COUNTRY_ID column, so Janice will need to join the DEPARTMENTS and LOCATIONS table to get the list of countries with departments:

select distinct country_id from departments d, locations l where d.location_id = l.location_id; CO -- CA DE UK US 4 rows selected.

Janice realizes that she will also need the country name in the query for the INTERSECT operation to work, so this query needs to have the COUNTRIES table as part of the join:

select distinct c.country_id, country_name from departments d, locations l, countries c where d.location_id = l.location_id   and c.country_id = l.country_id; CO COUNTRY_NAME -- ---------------------------------------- CA Canada DE Germany UK United Kingdom US United States of America 4 rows selected.

Janice can now bring it all together by using the MINUS operator to subtract this query from the first query against the COUNTRIES table:

select country_id, country_name from countries minus select distinct c.country_id, country_name from departments d, locations l, countries c where d.location_id = l.location_id   and c.country_id = l.country_id; CO COUNTRY_NAME -- ---------------------------------------- AR Argentina AU Australia BE Belgium BR Brazil CH Switzerland CN China DK Denmark EG Egypt FR France HK HongKong IL Israel IN India IT Italy JP Japan KW Kuwait MX Mexico NG Nigeria NL Netherlands SG Singapore ZM Zambia ZW Zimbabwe 21 rows selected.

King now realizes that the company is a long way from having a significant presence in all of the countries where there are company employees.




Oracle9i DBA JumpStart
Oracle9i DBA JumpStart
ISBN: 0782141897
EAN: 2147483647
Year: 2006
Pages: 138
Authors: Bob Bryla

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net