As I described in Chapter 4, a union is an SQL construct that allows you to knit together the results of several SQL queries and treat those results as if they had been returned by just one query. I find them invaluable when writing queries. One of the more creative uses I've discovered involves using unions to produce reports that need to show data grouped by categories and that may need to show the same records in more than one of those categories.
7.6.1 A Typical Union Example
A good example of this type of report would be one that fulfills the following request:
Produce an employee turnover report that lists everyone employed at the beginning of the year, everyone hired during the year, everyone terminated during the year, and everyone employed at the end of the year. The report should be divided into four sections, one for each of those categories.
This is a common kind of request, for me at least. The interesting thing about this request, though, is that every employee will need to be listed in two categories. That means you would need to write a query that returned each employee record twice in the correct categories.
When you are faced with this type of query, it can be helpful to simplify the problem by thinking in terms of separate queries, one for each category. It's fairly easy to conceive of a query to bring back a list of employees that were on board at the beginning of the year. You need to make sure the first of the year is between the hire date and the termination date, and account for the termination date possibly being null. Here's the query to return a list of people employed as of January 1, 2004:
SELECT employee_id, employee_name, employee_hire_date, employee_termination_date FROM employee WHERE employee_hire_date < TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_termination_date IS NULL OR employee_termination_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy'))
This gives you the first section of the report, which is those employed at the beginning of the year. Retrieving the data for the remaining sections is a matter of using a different WHERE clause for each section. Table 7-2 shows the selection criteria for each section of the report.
Table 7-2. Union query selection criteria
Report section |
WHERE clause |
---|---|
Employed at beginning of year |
WHERE employee_hire_date < TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_termination_date IS NULL OR employee_termination_date >= TO_DATE('1-Jan-2004', 'dd-mon-yyyy')) |
Hired during year |
WHERE employee_hire_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_hire_date < TO_DATE('1-Jan-2005','dd-mon-yyyy')) |
Terminated during year |
WHERE employee_termination_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_termination_date < TO_DATE('1-Jan-2005', 'dd-mon-yyyy')) |
Employed at end of year |
WHERE employee_hire_date < TO_DATE('1-Jan-2005','dd-mon-yyyy') AND (employee_termination_date IS NULL OR employee_termination_date >= TO_DATE('1-Jan-2005', 'dd-mon-yyyy')) |
7.6.2 The UNION Query
After separately developing the four queries, one for each section of the report, you can use SQL's UNION operator to link those four queries together into one large query. There are four things to consider when doing this:
To be certain of getting all the records back from the query, use the UNION ALL operator to tie the queries together. Using UNION by itself causes SQL to filter out any duplicate rows in the result set. That unwanted filtering is done (usually) by a sorting operation, which consumes disk and CPU resources. Use UNION ALL to ensure that you get back all rows in each result set and without any unwanted sorts.
In order to properly group the records, you can add a numeric constant to each of the four queries. For example, the query to return the list of those employed at the beginning of the year could return an arbitrary value of 1:
SELECT 1 sort_column, employee_id, employee_name, . . .
The other queries would return values of 2, 3, and 4 in the sort column. Sorting the query results on these arbitrary numeric values serves two purposes. First, the records for each section of the report will be grouped together because they will all have the same constant. Second, the value of the sort column controls the order in which the sections print. Use a value of 1 for the section to be printed first, a value of 2 for the second section, and so on.
The final thing to worry about is identifying the results to the reader of the report. The values used in the sort column won't mean anything to the reader, so you need to add a column with some descriptive text. Here's how the final query for people employed at the beginning of the year looks with that text added:
SELECT 1 sort_column, 'Employed at Beginning of Year' employee_status_text, employee_id, employee_name, employee_hire_date, employee_termination_date FROM employee WHERE employee_hire_date < TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_termination_date IS NULL OR employee_termination_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy'))
The first column returned by this query is used to sort these records to the top of the report, and the second column serves to identify those records for the reader. Example 7-7 shows the full-blown UNION query to produce all four sections of the report.
Example 7-7. A UNION query to generate a four-section report
SELECT 1 sort_column, 'Employed at Beginning of Year' employee_status_text, employee_id, employee_name, employee_hire_date, employee_termination_date FROM employee WHERE employee_hire_date < TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_termination_date IS NULL OR employee_termination_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy')) UNION ALL SELECT 2 as sort_column, 'Hired During Year' as employee_status_text, employee_id, employee_name, employee_hire_date, employee_termination_date FROM employee WHERE employee_hire_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_hire_date < TO_DATE('1-Jan-2005','dd-mon-yyyy')) UNION ALL SELECT 3 as sort_column, 'Terminated During Year' as employee_status_text, employee_id, employee_name, employee_hire_date, employee_termination_date FROM employee WHERE employee_termination_date >= TO_DATE('1-Jan-2004','dd-mon-yyyy') AND (employee_termination_date < TO_DATE('1-Jan-2005','dd-mon-yyyy')) UNION ALL SELECT 4 as sort_column, 'Employed at End of Year' as employee_status_text, employee_id, employee_name, employee_hire_date, employee_termination_date FROM employee WHERE employee_hire_date < TO_DATE('1-Jan-2005','dd-mon-yyyy') AND (employee_termination_date IS NULL OR employee_termination_date >= TO_DATE('1-Jan-2005','dd-mon-yyyy')) ORDER BY sort_column, employee_id, employee_hire_date;
The four queries have been unioned together in the same order in which the report is to be printed. That's done for readability though. It's the ORDER BY clause at the bottom that ensures that the records are returned in the proper order.
7.6.3 The Final Report
Now that you've worked out the query, you only have to follow the remaining steps in the report development methodology to format and print the report. Example 7-8 shows a script to produce a reasonably formatted report. I have elided the SELECT statement to save space, but it's the same as in Example 7-7.
Example 7-8. A UNION report with four sections
--Set up pagesize parameters SET NEWPAGE 1 SET PAGESIZE 55 --Set the linesize, which must match the number of equal signs used --for the ruling lines in the headers and footers. SET LINESIZE 75 TTITLE CENTER 'The Fictional Company' SKIP 2 - CENTER 'Employee Turnover Report' SKIP 1 - LEFT '===================================' - '=======================================' - SKIP 3 --Format the columns CLEAR COLUMNS COLUMN sort_column NOPRINT COLUMN employee_id NOPRINT COLUMN employee_status_text HEADING 'Status' FORMAT A29 COLUMN employee_name HEADING 'Employee Name' FORMAT A20 COLUMN employee_hire_date HEADING 'Hire Date' FORMAT A11 COLUMN employee_termination_date HEADING 'Term Date' FORMAT A11 --Breaks and computations BREAK ON employee_status_text SKIP 2 NODUPLICATES CLEAR COMPUTES COMPUTE NUMBER LABEL 'Total Count' OF employee_name ON employee_status_text --Set the date format to use ALTER SESSION SET NLS_DATE_FORMAT = 'dd-Mon-yyyy'; SELECT 1 sort_column, 'Employed at Beginning of Year' employee_status_text, employee_id, ... EXIT
The output from Example 7-8 will look like this:
The Fictional Company Employee Turnover Report ========================================================================== Status Employee Name Hire Date Term Date ----------------------------- -------------------- ----------- ----------- Employed at Beginning of Year Marusia Churai 15-Nov-1961 Mykhailo Hrushevsky 16-Sep-1964 05-May-2004 Pavlo Virsky 29-Dec-1987 01-Apr-2004 Pavlo Chubynsky 01-Mar-1994 15-Nov-2004 Taras Shevchenko 23-Aug-1976 Igor Sikorsky 15-Nov-1961 04-Apr-2004 ***************************** -------------------- Total Count 6 Hired During Year Mykola Leontovych 15-Jun-2004 Lesia Ukrainka 02-Jan-2004 Ivan Mazepa 04-Apr-2004 30-Sep-2004 Mykhailo Verbytsky 03-Mar-2004 31-Oct-2004 Roxolana Lisovsky 03-Jun-2004 ***************************** -------------------- Total Count 5 Terminated During Year Mykhailo Hrushevsky 16-Sep-1964 05-May-2004 Pavlo Virsky 29-Dec-1987 01-Apr-2004 Pavlo Chubynsky 01-Mar-1994 15-Nov-2004 Ivan Mazepa 04-Apr-2004 30-Sep-2004 Igor Sikorsky 15-Nov-1961 04-Apr-2004 Mykhailo Verbytsky 03-Mar-2004 31-Oct-2004 ***************************** -------------------- Total Count 6 Employed at End of Year Marusia Churai 15-Nov-1961 Mykola Leontovych 15-Jun-2004 Lesia Ukrainka 02-Jan-2004 Taras Shevchenko 23-Aug-1976 Roxolana Lisovsky 03-Jun-2004 ***************************** -------------------- Total Count 5
That's all there is to it. It wouldn't be a big leap to turn this report into a master/detail report with each section starting on a new page. Using this technique, you can develop similar reports with any number of sections you need.