A Closer Look at the SELECT Statement and Its Clauses | DB2 9 Fundamentals: Certification Study Guide

We just saw that you can retrieve every value stored in a base table by executing a simple query that looks something like this:

 SELECT * FROM [TableName]

But what if you wanted to see only the values stored in two columns of a table? Or what if you wanted the data retrieved to be ordered alphabetically in ascending order? (Data is stored in a table in no particular order, and unless otherwise specified, a query only returns data in the order in which it is found.) How do you construct a query that retrieves only certain data values and returns those values in a very specific format? You do so by using a more advanced form of the SELECT SQL statement to construct your query. The syntax for the form of the SELECT statement that is used to construct more advanced queries is:

 SELECT <DISTINCT> [*|[Expression] <<AS> [NewColumnName]>, ...] FROM [[TableName]|[ViewName]  <<AS> [CorrelationName]>, ...] <WhereClause> <GroupByClause> <HavingClause> <OrderByClause> <FetchFirstClause>

where:

Expression	Identifies one or more columns values that are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are the most common elements used.
NewColumnName	Specifies a new column name to be used in place of the table or view column name specified in the result data set returned by the query.
TableName	Identifies the name(s) assigned to one or more tables from which data is to be retrieved.
ViewName	Identifies the name(s) assigned to one or more views from which data is to be retrieved.
CorrelationName	Identifies a shorthand name that can be used when referencing the table or view the correlation name is associated with in any of the SELECT statement clauses.
WhereClause	Identifies a WHERE clause that is to be used with the SELECT statement.
GroupByClause	Identifies a GROUP BY clause that is to be used with the SELECT statement.
HavingClause	Identifies a HAVING clause that is to be used with the SELECT statement.
OrderByClause	Identifies an ORDER BY clause that is to be used with the SELECT statement.
FetchFirstClause	Identifies a FETCH FIRST clause that is to be used with the SELECT statement.

Thus, if you wanted to retrieve all values for the columns named WORKDEPT and JOB from a table named EMPLOYEE, you could do so by executing a SELECT statement that looks something like this:

 SELECT workdept, job FROM employee

And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 WORKDEPT   JOB --------   ----- A00        PRES B01        MANAGER C01        MANAGER E01        MANAGER D11        MANAGER D21        MANAGER E11        MANAGER E21        MANAGER A00        SALESREP A00        CLERK C01        ANALYST C01        ANALYST D11        DESIGNER D11        DESIGNER D11        DESIGNER D11        DESIGNER D11        DESIGNER D11        DESIGNER D11        DESIGNER D11        DESIGNER D21        CLERK D21        CLERK D21        CLERK D21        CLERK D21        CLERK E11        OPERATOR E11        OPERATOR E11        OPERATOR E11        OPERATOR E21        FIELDREP E21        FIELDREPE 21         FIELDREP A00        SALESREP A00        CLERK C01        ANALYST D11        DESIGNER D11        DESIGNER D21        CLERK E11        OPERATOR E11        OPERATOR E21        FIELDREP E21        FIELDREP 42 record(s) selected.

If the DISTINCT clause is specified with a SELECT statement, duplicate rows are removed from the final result data set returned. Two rows are considered to be duplicates of one another if the value of every column of the first row is identical to the value of the corresponding column of the second row. For the purpose of determining whether or not two rows are identical, null values are considered equal. However, the DISTINCT clause cannot be used if the result data set produced contains LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, DBCLOB, or XML data.

Thus, if you were to execute the same SELECT statement that was executed earlier with the DISTINCT clause specified, the resulting statement would look something like this:

 SELECT DISTINCT workdept, job FROM employee

This time, when the query is executed, you should see a result data set that looks something like this:

 WORKDEPT   JOB ---        ------- C01        ANALYST A00        CLERK D21        CLERK D11        DESIGNER E21        FIELDREP B01        MANAGER C01        MANAGER D11        MANAGER D21        MANAGER E01        MANAGER E11        MANAGER E21        MANAGER E11        OPERATOR A00        PRES A00        SALESREP 15 record(s) selected.

Now suppose you wanted to retrieve every unique value (no duplicates) for a column named JOB in a table named EMPLOYEE, and you wanted to change the name of the JOB column in the result data set produced to TITLE. You could do so by executing a SELECT statement that looks something like this:

 SELECT DISTINCT job AS title FROM employee

This time, when the query is executed, you should see a result data set that looks something like this:.

 TITLE ------ ANALYST CLERK DESIGNER FIELDREP MANAGER OPERATOR PRES SALESREP  8 record(s) selected.

You could also produce this result data set by executing the same query and using the correlation name E for the table named EMPLOYEE. The only difference is that, this time, the SELECT statement would look like this:

 SELECT DISTINCT e.job AS title FROM employee AS e

Notice that the column named JOB is qualified with the correlation name E that was assigned to the table named EMPLOYEE. In this example, the use of a correlation name is not really necessary, because data is being retrieved from only one table and no two columns in a table can have the same name. However, if data was being retrieved from two or more tables and if some of the columns in those tables had the same name, a qualifier would be needed to tell the DB2 Database Manager which table to retrieve data from for a particular column.

If you were counting when we examined the syntax for the SELECT statement earlier, you may have noticed that a single SELECT statement can contain up to seven different clauses. These clauses are:

The DISTINCT clause
The FROM clause
The WHERE clause
The GROUP BY clause
The HAVING clause
The ORDER BY clause
The FETCH FIRST clause

(Incidentally, these clauses are processed in the order shown.) We just saw how the DISTINCT clause and the FROM clause are used. Now let's turn our attention to the other clauses the SELECT statement recognizes.

The WHERE Clause

The WHERE clause is used to tell the DB2 Database Manager how to select the rows that are to be returned in the result data set produced in response to a query. When specified, the WHERE clause is followed by a search condition, which is nothing more than a simple test that, when applied to a row of data, will evaluate to TRUE, FALSE, or Unknown. If this test evaluates to TRUE, the row is returned in the result data set produced; if the test evaluates to FALSE or Unknown, the row is ignored.

The search condition of a WHERE clause is made up of one or more predicates that are used to make comparisons. Six common types of WHERE clause predicates are recognized by DB2. They are:

Relational predicates
BETWEEN
LIKE
IN
EXISTS
IS NULL

Each of these predicates can be used alone, or they can be combined using parentheses or Boolean operators such as AND and OR.

Relational Predicates

The relational predicates (or comparison operators) consist of a set of special operators that are used to define a comparison relationship between the contents of a column and a constant value, the contents of two columns from the same table, or the contents of a column in one table with the contents of a column in another table. The following comparison operators are available:

< (Less than)
> (Greater than)
<= (Less than or equal to)
>= (Greater than or equal to)
= (Equal to)
<> (Not equal to)

Typically, relational predicates are used to include or exclude specific rows from the final result data set produced in response to a query. Thus, if you wanted to retrieve values from columns named EMPNO and SALARY in a table named EMPLOYEE where the value for the SALARY column is greater than or equal to $70,000.00, you could do so by executing a SELECT statement that looks something like this:

 SELECT empno, salary FROM employee WHERE salary >= 70000.00

And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 EMPNO   SALARY ----    ------- 000010  152750.00 000020  94250.00 000030  98250.00 000050  80175.00 000060  72250.00 000070  96170.00 000090  89750.00 000100  86150.00 000130  73800.00  9 record(s) selected.

It is important to note that the data types of all items involved in a relational predicate comparison must be compatible, or the comparison will fail. If necessary, the built-in functions provided with DB2 can be used to make any conversions required. For example, if you wanted to retrieve values from columns named LASTNAME and HIREDATE in a table named EMPLOYEE for every employee who was hired in December, you could do so by executing a SELECT statement that looks something like this:

 SELECT lastname, hiredate FROM employee WHERE MONTHNAME(hiredate) = 'December'

In this example, the built-in function MONTHNAME() is used to produce a mixed-case character string containing the name of the month for the month portion of the date value stored in the HIREDATE column. This string value is then compared with the string 'December' to determine which rows are to be returned in a result data set. If you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:

 LASTNAME         HIREDATE ---------        --------- O'CONNELL        12/05/1993 NICHOLLS         12/15/2006 MARINO           12/05/2004 NATZ             12/15/2006 MONTEVERDE       12/05/2004  5 record(s) selected.

The BETWEEN Predicate

The BETWEEN predicate is used to define a comparison relationship in which the contents of a column are checked to see whether they fall within a specified range of values. As with relational predicates, the BETWEEN predicate is used to include or exclude specific rows from the result data set produced in response to a query.

Therefore, if you wanted to retrieve values for columns named EMPNO and SALARY in a table named EMPLOYEE where the value for the SALARY column is greater than or equal to $60,000.00 and less than or equal to $70,000.00, you could do so by executing a SELECT statement that looks something like this:

 SELECT empno, salary FROM employee WHERE salary BETWEEN 60000.00 AND 70000.00

And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 EMPNO      SALARY ------     -------- 000110     66500.00 000140     68420.00 000160     62250.00 000210     68270.00 200140     68420.00 200170     64680.00 200220     69840.00  7 record(s) selected.

If the NOT (negation) operator is used in conjunction with the BETWEEN predicate (or with any other predicate, for that matter), the meaning of the predicate is reversed. In this case, contents of a column are checked, and only values that fall outside the range of values specified are returned to the final result data set produced. Thus, if you wanted to retrieve values for columns named EMPNO and SALARY in a table named EMPLOYEE where the value for the SALARY column is less than $20,000.00 or more than $90,000.00, you could do so by executing a SELECT statement that looks something like this:

 SELECT empno, salary FROM employee WHERE salary NOT BETWEEN 20000.00 AND 90000.00

When this query is executed, you should see a result data set that looks something like this:

 EMPNO    SALARY -------  --------- 000010  152750.00 000020   94250.00 000030   98250.00 000070   96170.00  4 record(s) selected.

The LIKE Predicate

The LIKE predicate is used to define a comparison relationship in which a character string value is checked to see whether it contains a specific pattern of characters. The pattern of characters specified can consist of regular alphanumeric characters and/or special metacharacters that are interpreted as follows:

The underscore character (_) is treated as a wildcard character that stands for any single alphanumeric character.
The percent character (%) is treated as a wildcard character that stands for any sequence of alphanumeric characters.

Thus, if you wanted to retrieve values for columns named EMPNO and LASTNAME in a table named EMPLOYEE where the value for the LASTNAME column begins with the letter S, you could do so by executing a SELECT statement that looks something like this:

 SELECT empno, lastname FROM employee WHERE lastname LIKE 'S%'

If you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:

 EMPNO    LASTNAME ------   -------- 000060   STERN 000100   SPENSER 000180   SCOUTTEN 000250   SMITH 000280   SCHNEIDER 000300   SMITH 000310   SETRIGHT 200280   SCHWARTZ 200310   SPRINGER  9 record(s) selected.

When using wildcard characters, you must take care to ensure that they are placed in the appropriate location in the pattern string specified. Note that in the previous example, when the character string pattern 'S%' was specified, only records for employees whose last name begins with the letter S were returned. If the character string pattern used had been '%S%', records for employees whose last name contains the character S (anywhere in the name) would have been returned, and the result data set produced would have looked like this instead:

 EMPNO    LASTNAME ------   ------- 000010   HAAS 000020   THOMPSON 000060   STERN 000070   PULASKI 000090   HENDERSON 000100   SPENSER 000110   LUCCHESSI 000140   NICHOLLS 000150   ADAMSON 000170   YOSHIMURA 000180   SCOUTTEN 000210   JONES 000230   JEFFERSON 000250   SMITH 000260   JOHNSON 000280   SCHNEIDER 000300   SMITH 000310   SETRIGHT 200280   SCHWARTZ 200310   SPRINGER   20 record(s) selected.

Likewise, you must also be careful about using uppercase and lowercase characters in pattern strings; if the data being examined is stored in a case-sensitive manner, the characters used in a pattern string must match the case that was used to store the value in the column being searched, or no matching records will be found.

Tip

Although the LIKE predicate provides a relatively easy way to search for data values, it should be used with caution; the overhead involved in processing a LIKE predicate is very high and can be extremely resource-intensive.

The IN Predicate

The IN predicate is used to define a comparison relationship in which a value is checked to see whether it matches a value in a finite set of values. This finite set of values can consist of one or more literal values that are coded directly in the IN predicate, or it can be composed of the non-null values found in a result data set produced by a subquery.

Tip

A subquery may include search conditions of its own, and these search conditions may in turn include their own subqueries. When such "nested" subqueries are processed, the DB2 Database Manager executes the innermost query first and uses the results to execute the next outer query, and so on until all nested queries have been processed.

Thus, if you wanted to retrieve values for columns named EMPNO and WORKDEPT in a table named EMPLOYEE where the value for the WORKDEPT column matches a value in a list of predefined department code values, you could do so by executing a SELECT statement that looks something like this:

 SELECT lastname, workdept FROM employee WHERE workdept IN ('E11', 'E21')

And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 LASTNAME      WORKDEPT ---------     -------- HENDERSON     E11 SCHNEIDER     E11 PARKER        E11 SMITH         E11 SETRIGHT      E11 SCHWARTZ      E11 SPRINGER      E11 SPENSER       E21 MEHTA         E21 LEE           E21 GOUNOT        E21 WONG          E21 ALONZO        E21  13 record(s) selected.

On the other hand, if you wanted to retrieve values for columns named LASTNAME and WORKDEPT in a table named EMPLOYEE where the value for the WORKDEPT column matches a value in a list of department codes for departments that are managed by department A00, you could do so by executing a SELECT statement that looks something like this:

 SELECT lastname, workdept FROM employee WHERE workdept IN  (SELECT deptno FROM department WHERE admrdept = 'A00')

When this query is executed, you should see a result data set that looks something like this:

 LASTNAME        WORKDEPT --------        -------- HAAS            A00 THOMPSON        B01 KWAN            C01 GEYER           E01 LUCCHESSI       A00 O'CONNELL       A00 QUINTANA        C01 NICHOLLS        C01 HEMMINGER       A00 ORLANDO         A00 NATZ            C01  11 record(s) selected.

In this case, the subquery SELECT deptno FROM department WHERE admrdept = 'A00' produces a result data set containing the values A00, B01, C01, D01, and E01, and the main query evaluates each value found in the WORKDEPT column of the EMPLOYEE table to determine whether it matches one of the values in the result data set produced by the subquery. If there is a match, the record is returned.

The EXISTS Predicate

The EXISTS predicate is used to determine whether a particular value exists in a given result data set. The EXISTS predicate is always followed by a subquery, and it returns either TRUE or FALSE to indicate whether a specific value is found in the result data set produced by the subquery. Thus, if you wanted to find out which values found in the column named DEPTNO in a table named DEPARTMENT are used in the column named WORKDEPT in a table named EMPLOYEE, you could do so by executing a SELECT statement that looks something like this:

 SELECT deptno, deptname FROM department WHERE EXISTS  (SELECT workdept FROM employee WHERE workdept = deptno)

And if you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:

 DEPTNO   DEPTNAME ----     --------------------------- A00      SPIFFY COMPUTER SERVICE DIV. B01      PLANNING C01      INFORMATION CENTER D11      MANUFACTURING SYSTEMS D21      ADMINISTRATION SYSTEMS E01      SUPPORT SERVICES E11      OPERATIONS E21      SOFTWARE SUPPORT  8 record(s) selected.

In most situations, the EXISTS predicate is ANDed with other predicates to determine final row selection.

The NULL Predicate

The NULL predicate is used to determine whether or not a particular value is a NULL value. So, if you wanted to retrieve values for columns named DEPTNO and DEPTNAME in a table named DEPARTMENT where the value for the MGRNO column is a NULL value, you could do so by executing a SELECT statement that looks something like this:

 SELECT deptno, deptname FROM department WHERE mgrno IS NULL

If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 DEPTNO   DEPTNAME -----    ------------ D01      DEVELOPMENT CENTER F22      BRANCH OFFICE F2 G22      BRANCH OFFICE G2 H22      BRANCH OFFICE H2 I22      BRANCH OFFICE I2 J22      BRANCH OFFICE J2  6 record(s) selected.

When using the NULL predicate, it is important to keep in mind that NULL, zero (0), and blank ("") are three different values. NULL is a special marker that is used to represent missing information, while zero and blank (empty string) are actual values that may be stored in a column to indicate a specific value or lack thereof. Moreover, some columns accept NULL values, while other columns do not, depending upon their definition. So, before writing WHERE clauses that check for NULL values, make sure that the NULL value is supported by the column(s) being queried.

The GROUP BY Clause

The GROUP BY clause is used to tell the DB2 Database Manager how to organize rows of data returned in the result data set produced in response to a query. In its simplest form, the GROUP BY clause is followed by a grouping expression that is usually one or more column names that identify columns found in the result data set to be organized. The GROUP BY clause can also be used to specify what columns are to be grouped together to provide input to aggregate functions such as SUM() and AVG().

Thus, if you wanted to obtain the average salary for all departments found in a column named WORKDEPT, using salary information stored in a column named SALARY in a table named EMPLOYEE, and you wanted to round the salary information retrieved to two decimal points and organize it by department, you could do so by executing a SELECT statement that looks something like this:

 SELECT workdept, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY workdept

If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 WORKDEPT    AVG_SALARY ----        ---------- A00         70850.00 B01         94250.00 C01         77222.50 D11         58783.63 D21         51240.00 E01         80175.00 E11         45305.71 E21         47086.66  8 record(s) selected.

In this example, each row returned in the result data set produced contains the department code, along with the average salary for individuals who work in that department.

Tip

A common mistake that is often made when using the GROUP BY clause is the addition of nonaggregate columns to the list of columns that follow the GROUP BY clause. Since grouping is performed by combining all of the nonaggregate columns together into a single concatenated key and breaking whenever that key value changes, extraneous columns can cause unexpected breaks to occur.

The GROUP BY ROLLUP Clause

The GROUP BY ROLLUP clause is used to analyze a collection of data in a single dimension but at more than one level of detail. For example, you could group data by successively larger organizational units, such as team, department, and division, or by successively larger geographical units, such as city, county, state or province, country, and continent. Thus, if you were to change the GROUP BY clause in the previous SELECT statement to a GROUP BY ROLLUP clause, you would end up with a SELECT statement that looks something like this:

 SELECT workdept, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY ROLLUP (workdept)

And if you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:

 WORKDEPT    AVG_SALARY --------    --------- -           58155.35 A00         70850.00 B01         94250.00 C01         77222.50 D11         58783.63 D21         51240.00 E01         80175.00 E11         45305.71 E21         47086.66  9 record(s) selected.

This result data set contains average salary information for all employees found in the table named EMPLOYEES regardless of which department they work in (the line in the result data set returned that has a null value assigned to the WORKDEPT column), as well as average salary information for each department available (the remaining lines in the result data set returned).

In this example, only one expression (known as the grouping expression) is specified in the GROUP BY ROLLUP clause (in this case, the grouping expression is WORKDEPT). However, one or more grouping expressions can be specified in a single GROUP BY ROLLUP clause (for example, GROUP BY ROLLUP (workdept, division)). When multiple grouping expressions are specified, the DB2 Database Manager groups the data by all grouping expressions used, then by all but the last grouping expression used, and so on. Then, it makes one final grouping that consists of the entire contents of the specified table. Therefore, when specifying multiple grouping expressions, it is important to ensure that they are listed in the appropriate order-if one kind of group is logically contained inside another (for example departments within a division), then that group should be listed after the group it is contained in (i.e., GROUP BY ROLLUP (department, division)), never before.

The GROUP BY CUBE Clause

The GROUP BY CUBE clause is used to analyze a collection of data by organizing it into groups in multiple dimensions. Thus, if you were to execute a SELECT statement that looks something like this:

 SELECT workdept, sex, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY CUBE (workdept, sex)

You might see a result data set that looks something like this:

 WORKDEPT         SEX           AVG_SALARY -----            ----          ---------- -                F             63243.68 -                M             53951.95 -                -             58155.35 A00              -             70850.00 B01              -             94250.00 C01              -             77222.50 D11              -             58783.63 D21              -             51240.00 E01              -             80175.00 E11              -             45305.71 E21              -             47086.66 A00              F             99625.00 A00              M             51666.66 B01              M             94250.00 C01              F             77222.50 D11              F             58317.50 D11              M             59050.00 D21              F             60266.66 D21              M             44470.00 E01              M             80175.00 E11              F             48810.00 E11              M             36545.00 E21              F             35370.00 E21              M             49430.00  24 record(s) selected.

This result set contains average salary information for each department found in the table named EMPLOYEE (the lines that contain a null value in the SEX column and a value in the WORKDEPT column of the result data set returned), average salary information for all employees found in the table named EMPLOYEE regardless of which department they work in (the line that contains a null value for both the SEX and the WORKDEPT column of the result data set returned), average salary information for each genter (the lines that contain values in the SEX column and a null value in the WORKDEPT column of the result data set returned), and average salary information for each genter in each department available (the remaining lines in the result data set returned).

In other words, the data in the result data set produced is grouped:

By department only
By gender only
By gender and department
As a single group that contains all genders and all departments.

The term CUBE is intended to suggest that data is being analyzed in more than one dimension, and as you can see in the previous example, data analysis was actually performed in two dimensions, which resulted in four types of groupings. Suppose the following SELECT statement had been used instead:

 SELECT workdept, sex, job,  DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY CUBE (workdept, sex, job)

Data analysis would have been performed in three dimensions, and the data would have been broken into eight types of groupings. Thus, the number of types of groups produced by a GROUP BY CUBE operation can be determined by the formula: 2ⁿ where n is the number of expressions used in the GROUP BY CUBE clause.

The HAVING Clause

The HAVING clause is used to apply further selection criteria to columns referenced in a GROUP BY clause. This clause behaves like the WHERE clause, except that it refers to data that has already been grouped (the HAVING clause is used to tell the DB2 Database Manager how to select the rows to be returned in a result data set from rows that have already been grouped). And, like the WHERE clause, the HAVING clause is followed by a search condition that acts as a simple test that, when applied to a row of data, will evaluate to TRUE, FALSE, or Unknown . If this test evaluates to TRUE, the row is returned in the result data set produced; if the test evaluates to FALSE or Unknown, the row is skipped. Because it behaves like a WHERE clause, the search condition of a HAVING clause can consist of the same predicates that are recognized by the WHERE clause.

Thus, if you wanted to obtain the average salary for all departments found in a column named WORKDEPT, using salary information stored in a column named SALARY in a table named EMPLOYEE, and you wanted to organize the data retrieved by department, but you are interested only in departments whose average salary is greater than $60,000.00, you could obtain this information by executing a SELECT statement that looks something like this:

 SELECT workdept, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY work dept HAVING AVG(salary) > 60000

And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 WORKDEPT      AVG_SALARY --------      ---------- A00           70850.00 B01           94250.00 C01           77222.50 E01           80175.00  4 record(s) selected.

As you can see from the output provided, each row in the result data set produced contains the department ID for every department whose average salary for individuals working in that department is greater than $60,000.00, along with the average salary found for each department.

The ORDER BY Clause

The ORDER BY clause is used to tell the DB2 Database Manager how to sort and order the rows that are to be returned in a result data set produced in response to a query. When specified, the ORDER BY clause is followed by the name of one or more column(s) whose data values are to be sorted and a keyword that indicates the desired sort order. If the keyword ASC follows the column's name, ascending order is used, and if the keyword DESC follows the column name, descending order is used. Multiple columns can be used for sorting, and each column used can be ordered in either ascending or descending order. Furthermore, when more than one column is identified in an ORDER BY clause, the corresponding result data set is sorted by the first column specified (the primary sort), then the sorted data is sorted again by the next column specified, and so on until the data has been sorted by each column specified.

Thus, if you wanted to retrieve values for columns named EMPNO, LASTNAME, and WORKDEPT in a table named EMPLOYEE and you wanted the information sorted by WORKDEPT in ascending order, followed by LASTNAME in descending order, you could do so by executing a SELECT statement that looks something like this:

 SELECT empno, lastname, workdept FROM employee ORDER BY workdept ASC, lastname DESC

And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 EMPNO     LASTNAME     WORKDEPT ------    --------     -------- 200120    ORLANDO      A00 000120    O'CONNELL    A00 000110    LUCCHESSI    A00 200010    HEMMINGER    A00 000010    HAAS         A00 000020    THOMPSON     B01 000130    QUINTANA     C01 000140    NICHOLLS     C01 200140    NATZ         C01 000030    KWAN         C01 000170    YOSHIMURA    D11 200170    YAMAMOTO     D11 000190    WALKER       D11 000060    STERN        D11 000180    SCOUTTEN     D11 000160    PIANKA       D11 000220    LUTZ         D11 000210    JONES        D11 200220    JOHN         D11 000200    BROWN        D11 000150    ADAMSON      D11 000250    SMITH        D21 000070    PULASKI      D21 000270    PEREZ        D21 200240    MONTEVERDE   D21 000240    MARINO       D21 000260    JOHNSON      D21 000230    JEFFERSON    D21 000050    GEYER        E01 200310    SPRINGER     E11 000300    SMITH        E11 000310    SETRIGHT     E11 200280    SCHWARTZ     E11 000280    SCHNEIDER    E11 000290    PARKER       E11 000090    HENDERSON    E11 EMPNO     LASTNAME      WORKDEPT ------    --------      -------- 200330    WONG          E21 000100    SPENSER       E21 000320    MEHTA         E21 000330    LEE           E21 000340    GOUNOT        E21 200340    ALONZO        E21   42 record(s) selected.

Using the ORDER BY clause is easy if the result data set consists only of named columns. But what happens if the result data set produced needs to be ordered by a summary column or a result column that cannot be specified by name? Because these situations can exist, the ORDER BY clause allows an integer value that corresponds to a particular column's number to be used in place of the column name. When integer values are used, the first or leftmost column in the result data set produced is treated as column 1, the next is column 2, and so on. Therefore, the result data set produced by the previous query could have also been produced by executing a SELECT statement that looks like this:

 SELECT empno, lastname, workdept FROM employee ORDER BY 1 ASC, 2 DESC

It is important to note that even though integer values are primarily used in the ORDER BY clause to specify columns that cannot be specified by name, they can be used in place of any column name.

The FETCH FIRST Clause

The FETCH FIRST clause is used to limit the number of rows returned to the result data set produced in response to a query. When used, the FETCH FIRST clause is followed by a positive integer value and the words ROWS ONLY. This tells the DB2 Database Manager that the user/application executing the query does not want to see more than n number of rows, regardless of how many rows might exist in the result data set produced were the FETCH FIRST clause not specified.

Thus, if you wanted to retrieve just the first 10 values for columns named EMPNO, LASTNAME, and WORKDEPT from a table named EMPLOYEE, you could do so by executing a SELECT statement that looks something like this:

 SELECT empno, lastname, workdept FROM employee FETCH FIRST 10 ROWS ONLY

If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 EMPNO      LASTNAME      WORKDEPT ------     --------      -------- 000010     HAAS          A00 000020     THOMPSON      B01 000030     KWAN          C01 000050     GEYER         E01 000060     STERN         D11 000070     PULASKI       D21 000090     HENDERSON     E11 000100     SPENSER       E21 000110     LUCCHESSI     A00 000120     O'CONNELL     A00  10 record(s) selected.

It is important to note that if you use other clauses to format the data before returning a specified number of rows, the actual values returned can be different. For example, suppose the previous SELECT statement was modified to include an ORDER BY clause like this:

 SELECT empno, lastname, workdept FROM employee ORDER BY lastname FETCH FIRST 10 ROWS ONLY

The result data set produced when the query is executed might look something like this:

 EMPNO       LASTNAME       WORKDEPT --------    --------       -------- 000150      ADAMSON        D11 200340      ALONZO         E21 000200      BROWN          D11 000050      GEYER          E01 000340      GOUNOT         E21 000010      HAAS           A00 200010      HEMMINGER      A00 000090      HENDERSON      E11 000230      JEFFERSON      D21 200220      JOHN           D11  10 record(s) selected.

A Word About Common Table Expressions

Common table expressions are mechanisms that are used to construct local temporary tables that reside in memory and exist only for the life of the SQL statement that defines them. (In fact, the table that is created in response to a common table expression can only be referenced by the SQL statement that created it.) Common table expressions are typically used:

In place of a view (when the creation of a view is undesirable, when general use of a view is not required, and when positioned update or delete operations are not used)
To enable grouping by a column that is derived from a subquery or a scalar function that performs some external action
When the desired result table is based on host variables
When the same result table needs to be used by several different queries
When the results of a query need to be derived using recursion

The syntax used to construct a common table expression is:

 WITH [TableName]<([ColumnName], ...])> AS([SELECTStatement])

where:

TableName	Specifies the name that is to be assigned to the temporary table to be created.
ColumnName	Specifies the name(s) to be assigned to one or more columns that are to be included in the temporary table to be created. Each column name specified must be unique and unqualified; if no column names are specified, the names derived from the result data set produced by the SELECTStatement specified will be used. If a list of column names is specified, the number of column names provided must match the number of columns that will be returned by the SELECT statement used to create the temporary table. If a common table expression is recursive, or if the result data set produced by the SELECT statement specified contains duplicate column names, column names must be specified.
SELECTStatement	Identifies a SELECT SQL statement that, when executed, will produce the data values to be added to the column(s) in the temporary table to be created.

Thus, if you wanted to retrieve all values for the columns named EMPNO,LASTNAME, HIREDATE, and SEX from a table named EMPLOYEE and store them in a common table that is then referenced in a query that is designed to obtain employee number and hire date information for all female employees working for the company, you could do so by executing an SQL statement that looks something like this:

 WITH  emp_info (empno, lastname, hiredate, sex) AS   (SELECT empno, lastname, hiredate, sex FROM employee) SELECT empno, hiredate FROM emp_info WHERE sex = 'F'

And if you executed this statement against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 EMPNO     HIREDATE ------    ---------- 000010    01/01/1995 000030    04/05/2005 000070    09/30/2005 000090    08/15/2000 000130    07/28/2001 000140    12/15/2006 000160    10/11/2006 000180    07/07/2003 000220    08/29/1998 000260    09/11/2005 000270    09/30/2006 000280    03/24/1997 000310    09/12/1994 200010    01/01/1995 200140    12/15/2006 200220    08/29/2005 200280    03/24/1997 200310    09/12/1994 200330    02/23/2006  19 record(s) selected.

Multiple common table expressions can be specified following a single WITH keyword, and each common table expression specified can be referenced, by name, in the FROM clause of subsequent common table expressions. However, if multiple common table expressions are defined within the same WITH keyword, the table name assigned to each temporary table created must be unique from all other table names used in the SELECT statement that creates them. It is also important to note that the table name assigned to the temporary table created by a common table expression will take precedence over any existing table, view, or alias (in the system catalog) that has the same qualified name; if the SELECT SQL statement -references the original table, view, or alias, it will actually be working with the temporary table created. (Existing tables, views, and aliases whose names match that of the temporary table are not altered but are simply no longer accessible.)

A Word About CASE Expressions

One efficient and concise way to display compared values in a readable format is to use one or more CASE expressions in the selection list of a query. Each CASE operation evaluates a specified expression and supplies a different value, depending on whether a certain condition is met. A CASE expression can take one of two forms: simple or searched. The basic syntax used to create a simple CASE expression is:

 CASE [Expression1]   [WHEN [Expression2] THEN [Result1], ...] ELSE [Result2] <END>

where:

Expression1	Identifies an expression or value that is to be compared to one or more Expression2 expressions or values.
Expression2	Identifies one or more expressions or values that, when compared to Expression1, evaluate to TRUE or FALSE.
Result1	Identifies a value to be used when a search condition evaluates to TRUE.
Result2	Identifies a value to be used when a search condition evaluates to FALSE.

Thus, if you wanted to retrieve values for columns named EMPNO, LASTNAME, and DIVISION from a table named EMPLOYEE, and you knew that the first character of the work department code associated with an employee represents a division within the company, you could use a simple CASE expression to translate the codes and provide a complete division name to which each employee belongs by executing an SQL statement that looks something like this:

 SELECT empno, lastname,   CASE SUBSTR(workdept, 1, 1)    WHEN 'A' THEN 'ADMINISTRATION'    WHEN 'B' THEN 'HUMAN RESOURCES'    WHEN 'C' THEN 'DESIGN'    WHEN 'D' THEN 'OPERATIONS'    ELSE 'UNKNOWN DEPARTMENT'   END AS division FROM employee

If you executed this statement against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 EMPNO     LASTNAME    DIVISION ------    ----------  ------------------ 000010    HAAS        DMINISTRATION 000020    THOMPSON    HUMAN RESOURCES 000030    KWAN        DESIGN 000050    GEYER       UNKNOWN DEPARTMENT 000060    STERN       OPERATIONS 000070    PULASKI     OPERATIONS 000090    HENDERSON   UNKNOWN DEPARTMENT 000100    SPENSER     UNKNOWN DEPARTMENT 000110    LUCCHESSI   ADMINISTRATION 000120    O'CONNELL   ADMINISTRATION 000130    QUINTANA    DESIGN 000140    NICHOLLS    DESIGN 000150    ADAMSON     OPERATIONS 000160    PIANKA      OPERATIONS 000170    YOSHIMURA   OPERATIONS 000180    SCOUTTEN    OPERATIONS 000190    WALKER      OPERATIONS 000200    BROWN       OPERATIONS 000210    JONES       OPERATIONS 000220    LUTZ        OPERATIONS 000230    JEFFERSON   OPERATIONS 000240    MARINO      OPERATIONS 000250    SMITH       OPERATIONS 000260    JOHNSON     OPERATIONS 000270    PEREZ       OPERATIONS 000280    SCHNEIDER   UNKNOWN DEPARTMENT EMPNO     LASTNAME     DIVISION ------    ---------    ------------------- 000290    PARKER       UNKNOWN DEPARTMENT 000300    SMITH        UNKNOWN DEPARTMENT 000310    SETRIGHT     UNKNOWN DEPARTMENT 000320    MEHTA        UNKNOWN DEPARTMENT 000330    LEE          UNKNOWN DEPARTMENT 000340    GOUNOT       UNKNOWN DEPARTMENT 200010    HEMMINGER    ADMINISTRATION 200120    ORLANDO      ADMINISTRATION 200140    NATZ         DESIGN 200170    YAMAMOTO     OPERATIONS 200220    JOHN         OPERATIONS 200240    MONTEVERDE   OPERATIONS 200280    SCHWARTZ     UNKNOWN DEPARTMENT 200310    SPRINGER     UNKNOWN DEPARTMENT 200330    WONG         UNKNOWN DEPARTMENT 200340    ALONZO       UNKNOWN DEPARTMENT   42 record(s) selected.

The syntax used to create a searched CASE expression is:

 CASE   [WHEN [SearchCondition] THEN [Result1], ...]    ELSE [Result2] <END>

where:

SearchCondition	Identifies one or more logical conditions that evaluate to TRUE or FALSE.
Result1	Identifies a value to be used when a search condition evaluates to TRUE.
Result2	Identifies a value to be used when a search condition evaluates to FALSE.

Thus, if you wanted to retrieve values for columns named EMPNO, LASTNAME,JOB, and SALARY from a table named EMPLOYEE, and you wanted to calculate salary increases based on the job each employee performs, you could use a searched CASE expression to provide new salary values based on the job held by executing an SQL statement that looks something like this:

 SELECT empno, lastname, job, salary,   CASE    WHEN job IN ('MANAGER', 'SUPRVSR') THEN salary * 1.10    WHEN job IN ('DBA', 'SYS PROG') THEN salary * 1.08    WHEN job = 'PRGRMR' THEN salary * 1.05    ELSE salary * 1.035   END as new_salary FROM employee

And if you executed this statement, you should get a result data set that looks something like this:

 EMPNO    LASTNAME    JOB         SALARY       NEW_SALARY ------   --------    --------    ------       ---------- 000010   HAAS        PRES        152750.00    158096.25000 000020   THOMPSON    MANAGER     94250.00     103675.00000 000030   KWAN        MANAGER     98250.00     108075.00000 000050   GEYER       MANAGER     80175.00     88192.50000 000060   STERN       MANAGER     72250.00     79475.00000 000070   PULASKI     MANAGER     96170.00     105787.00000 000090   HENDERSON   MANAGER     89750.00     98725.00000 000100   SPENSER     MANAGER     86150.00     94765.00000 000110   LUCCHESSI   SALESREP    66500.00     68827.50000 000120   O'CONNELL   CLERK       49250.00     50973.75000 000130   QUINTANA    ANALYST     73800.00     76383.00000 000140   NICHOLLS    ANALYST     68420.00     70814.70000 000150   ADAMSON     DESIGNER    55280.00     57214.80000 000160   PIANKA      DESIGNER    62250.00     64428.75000 000170   YOSHIMURA   DESIGNER    44680.00     46243.80000 000180   SCOUTTEN    DESIGNER    51340.00     53136.90000 000190   WALKER      DESIGNER    50450.00     52215.75000 000200   BROWN       DESIGNER    57740.00     59760.90000 000210   JONES       DESIGNER    68270.00     70659.45000 000220   LUTZ        DESIGNER    49840.00     51584.40000 000230   JEFFERSON   CLERK       42180.00     43656.30000 000240   MARINO      CLERK       48760.00     50466.60000 000250   SMITH       CLERK       49180.00     50901.30000 000260   JOHNSON     CLERK       47250.00     48903.75000 000270   PEREZ       CLERK       37380.00     38688.30000 000280   SCHNEIDER   OPERATOR    36250.00     37518.75000 000290   PARKER      OPERATOR    35340.00     36576.90000 000300   SMITH       OPERATOR    37750.00     39071.25000 000310   SETRIGHT    OPERATOR    35900.00     37156.50000 000320   MEHTA       FIELDREP    39950.00     41348.25000 000330   LEE         FIELDREP    45370.00     46957.95000 000340   GOUNOT      FIELDREP    43840.00     45374.40000 200010   HEMMINGER   SALESREP    46500.00     48127.50000 200120   ORLANDO     CLERK       39250.00     40623.75000 200140   NATZ        ANALYST     68420.00     70814.70000 200170   YAMAMOTO    DESIGNER    64680.00     66943.80000 EMPNO    LASTNAME    JOB          SALARY       NEW_SALARY ------   --------    -------- -   -----        ---------- 200220   JOHN        DESIGNER     69840.00     72284.40000 200240   MONTEVERDE  CLERK        37760.00     39081.60000 200280   SCHWARTZ    OPERATOR     46250.00     47868.75000 200310   SPRINGER    OPERATOR     35900.00     37156.50000 200330   WONG        FIELDREP     35370.00     36607.95000 200340   ALONZO      FIELDREP     31840.00     32954.40000  42 record(s) selected.

As you can see from these examples, the value provided by a CASE expression is the value of the result-expression (Result1) that follows the first (leftmost) case that evaluates to TRUE. If no case evaluates to TRUE and the ELSE keyword is present, then the result is the value of the second result-expression (Result2). If no case evaluates to TRUE and the ELSE keyword is not present, then the result is null. It is important to note that when a case evaluates to Unknown (because of null values), the case is not true and is treated the same as a case that evaluates to FALSE. And it goes without saying that the data type of the search condition specified must be comparable to the data type of each result-expression used.

Joining Tables

So far, all the examples we have looked at have involved only one table. However, one of the more powerful features of the SELECT statement (and the element that makes data normalization possible) is the ability to retrieve data from two or more tables by performing what is known as a join operation. In its simplest form, the syntax for a SELECT statement that performs a join operation is:

 SELECT * FROM [ [TableName]|[ViewName], ...]

where:

TableName	Identifies the name assigned to each of two or more tables from which data is to be retrieved.
ViewName	Identifies the name assigned to each of two or more views from which data is to be retrieved.

Consequently, if you wanted to retrieve all values stored in two base tables named CL_SCHED and ORG, you could do so by executing a SELECT statement that looks something like this:

 SELECT * FROM cl_sched, org

When such a SELECT statement is executed, the result data set produced will contain all possible combinations of the rows found in each table specified (otherwise known as a Cartesian product). Every row in the result data set produced is a row from the first table referenced concatenated with a row from the second table referenced, concatenated in turn with a row from the third table referenced, and so on. The total number of rows found in the result data set produced is the product of the number of rows in all the individual table references. Thus, if the table named CL_SCHED in our previous example contains five rows and the table named ORG contains eight rows, the result data set produced by the statement "SELECT * FROM cl_sched, org" will consist of 40 rows (5 × 8 = 40).

Tip

A Cartesian product join operation should be used with extreme caution when working with large tables; the amount of resources required to perform such a join operation can have a serious negative impact on performance.

A more common join operation involves collecting data from two or more tables that have one specific column in common and combining the results to create a result data set. The syntax for a SELECT statement that performs this type of join operation is:

 SELECT [*|[Expression] <<AS> [NewColumnName]>, ...] FROM [[TableName] <<AS> [CorrelationName]>, ...] [JoinCondition]

where:

Expression	Identifies one or more columns whose values are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are commonly used.
NewColumnName	Identifies a new column name that is to be used in place of the corresponding table or view column name specified in the result data set returned by the SELECT statement.
TableName	Identifies the name(s) assigned to one or more tables from which data is to be retrieved.
CorrelationName	Identifies a shorthand name that can be used when referencing the table name specified in the TableName parameter.
JoinCondition	Identifies the condition to be used to join the tables specified. Typically, this is a WHERE clause in which the values of a column in one table are compared with the values of a similar column in another table.

Thus, a simple join operation could be conducted by executing a SELECT statement that looks something like this:

 SELECT lastname, deptname FROM employee e, department d WHERE e.workdept = d.deptno

If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks like this:

 LASTNAME       DEPTNAME --------       --------------------------- HAAS           SPIFFY COMPUTER SERVICE DIV. THOMPSON       PLANNING KWAN           INFORMATION CENTER GEYER          SUPPORT SERVICES STERN          MANUFACTURING SYSTEMS PULASKI        ADMINISTRATION SYSTEMS HENDERSON      OPERATIONS SPENSER        SOFTWARE SUPPORT LUCCHESSI      SPIFFY COMPUTER SERVICE DIV. O'CONNELL      SPIFFY COMPUTER SERVICE DIV. LASTNAME       DEPTNAME --------       --------------------------- QUINTANA       INFORMATION CENTER NICHOLLS       INFORMATION CENTER ADAMSON        MANUFACTURING SYSTEMS PIANKA         MANUFACTURING SYSTEMS YOSHIMURA      MANUFACTURING SYSTEMS SCOUTTEN       MANUFACTURING SYSTEMS WALKER         MANUFACTURING SYSTEMS BROWN          MANUFACTURING SYSTEMS JONES          MANUFACTURING SYSTEMS LUTZ           MANUFACTURING SYSTEMS EFFERSON       ADMINISTRATION SYSTEMS MARINO         ADMINISTRATION SYSTEMS SMITH          ADMINISTRATION SYSTEMS JOHNSON        ADMINISTRATION SYSTEMS PEREZ          ADMINISTRATION SYSTEMS SCHNEIDER      OPERATIONS PARKER         OPERATIONS SMITH          OPERATIONS SETRIGHT       OPERATIONS MEHTA          SOFTWARE SUPPORT LEE            SOFTWARE SUPPORT GOUNOT         SOFTWARE SUPPORT HEMMINGER      SPIFFY COMPUTER SERVICE DIV. ORLANDO        SPIFFY COMPUTER SERVICE DIV. NATZ           INFORMATION CENTER YAMAMOTO       MANUFACTURING SYSTEMS JOHN           MANUFACTURING SYSTEMS MONTEVERDE     ADMINISTRATION SYSTEMS SCHWARTZ       OPERATIONS SPRINGER       OPERATIONS WONG           SOFTWARE SUPPORT ALONZO         SOFTWARE SUPPORT  42 record(s) selected.

This type of join is referred to as an inner join. Aside from a Cartesian product, only two types of joins can exist: inner joins and outer joins. And as you might imagine, there is a significant difference between the two.

Inner joins

After a Cartesian product, an inner join is the simplest type of join operation that can be performed. An inner join can be thought of as the cross product of two tables, in which every row in one table that has a corresponding row in another table is combined with that row to produce a new record. This type of join works well as long as every row in the first table has a corresponding row in the second table. However, if this is not the case, the result table produced may be missing rows found in either or both of the tables that were joined. In the last example, we saw the SELECT statement syntax that is commonly used to perform inner join operations. However, the following syntax can also be used to create a SELECT statement that performs an inner join operation:

 SELECT [*|[Expression] <<AS> [NewColumnName]> ,...] FROM [[TableName1] <<AS> [CorrelationName1]>] <INNER> JOIN [[TableName2] <<AS> [CorrelationName2]>] ON [JoinCondition]

where:

Expression	Identifies one or more columns whose values are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are commonly used.
NewColumnName	Identifies a new column name to be used in place of the corresponding table or view column name specified in the result data set returned by the SELECT statement.
TableName1	Identifies the name assigned to the first table from which data is to be retrieved.
CorrelationName1	Identifies a shorthand name that can be used when referencing the leftmost table of the join operation.
TableName2	Identifies the name assigned to the second table from which data is to be retrieved.
CorrelationName2	Identifies a shorthand name that can be used when referencing the rightmost table of the join operation.
JoinCondition	Identifies the condition to be used to join the two tables specified.

Consequently, the same inner join operation we looked at earlier could be conducted by executing a SELECT statement that looks something like this:

 SELECT lastname, deptname FROM employee e INNER JOIN department d ON e.workdept = d.deptno

Figure 5-1 illustrates how such an inner join operation would work.

image from book
Figure 5-1: A simple inner join operation.

Outer joins

Outer join operations are used when a join operation is required and any rows that would normally be eliminated by an inner join operation need to be preserved. With DB2, three types of outer joins are available:

Left outer join. When a left outer join operation is performed, rows that would have been returned by an inner join operation, together with all rows stored in the leftmost table of the join operation (i.e., the table listed first in the OUTER JOIN clause) that would have been eliminated by the inner join operation, are returned in the result data set produced.
Right outer join. When a right outer join operation is performed, rows that would have been returned by an inner join operation, together with all rows stored in the rightmost table of the join operation (i.e., the table listed last in the OUTER JOIN clause) that would have been eliminated by the inner join operation, are returned in the result data set produced.
Full outer join. When a full outer join operation is performed, rows that would have been returned by an inner join operation, together with all rows stored in both tables of the join operation that would have been eliminated by the inner join operation, are returned in the result data set produced.

To understand the basic principles behind an outer join operation, it helps to look at an example. Suppose Table A and Table B are joined by an ordinary inner join operation. Any row in either Table A or Table B that does not have a matching row in the other table (according to the rules of the join condition) is eliminated from the final result data set produced. By contrast, if Table A and Table B are joined by an outer join, any row in either Table A or Table B that does not contain a matching row in the other table is included in the result data set (exactly once), and columns in that row that would have contained matching values from the other table are empty. Thus, an outer join operation adds nonmatching rows to the final result data set produced whereas an inner join operation excludes them. A left outer join of Table A with Table B preserves all nonmatching rows found in Table A, a right outer join of Table A with Table B preserves all nonmatching rows found in Table B, and a full outer join preserves nonmatching rows found in both Table A and Table B.

Figure 5-2 illustrates how a left outer join operation works; Figure 5-3 illustrates how a right outer join operation works; and Figure 5-4 illustrates how a full join operation works.

image from book
Figure 5-2: A simple left outer join operation.

image from book
Figure 5-3: A simple right outer join operation.

image from book
Figure 5-4: A simple full outer join operation.

The basic syntax used to create a SELECT statement that performs an outer join operation is:

 SELECT [*|[Expression] <<AS> [NewColumnName]>, ...] FROM [[TableName1] <<AS> [CorrelationName1]>] [LEFT | RIGHT | FULL] OUTER JOIN [[TableName2] <<AS> [CorrelationName2]>] ON [JoinCondition]

where:

Expression	Identifies one or more columns whose values are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are commonly used.
NewColumnName	Identifies a new column name that is to be used in place of the corresponding table or view column name specified in the result data set returned by the SELECT statement.
TableName1	Identifies the name assigned to the first table from which data is to be retrieved. This table is considered the "left" table in an outer join.
CorrelationName1	Identifies a shorthand name that can be used when referencing the leftmost table of the join operation.
TableName2	Identifies the name assigned to the second table from which data is to be retrieved. This table is considered the "right" table in an outer join.
CorrelationName2	Identifies a shorthand name that can be used when referencing the rightmost table of the join operation.
JoinCondition	Identifies the condition to be used to join the two tables specified.

Thus, a simple left outer join operation could be conducted by executing a SELECT statement that looks something like this:

 SELECT lastname, deptname FROM employee e LEFT OUTER JOIN department d ON e.workdept = d.deptno

A simple right outer join operation could be conducted by executing a SELECT statement that looks something like this:

 SELECT lastname, deptname FROM employee e RIGHT OUTER JOIN department d ON e.workdept = d.deptno

And finally, a simple full outer join operation could be conducted by executing a SELECT statement that looks something like this:

 SELECT lastname, deptname FROM employee e FULL OUTER JOIN department d ON e.workdept = d.deptno

If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:

 LASTNAME         DEPTNAME --------         --------------------------- HAAS             SPIFFY COMPUTER SERVICE DIV. THOMPSON         PLANNING KWAN             INFORMATION CENTER GEYER            SUPPORT SERVICES STERN            MANUFACTURING SYSTEMS PULASKI          ADMINISTRATION SYSTEMS HENDERSON        OPERATIONS SPENSER          SOFTWARE SUPPORT LUCCHESSI        SPIFFY COMPUTER SERVICE DIV. O'CONNELL        SPIFFY COMPUTER SERVICE DIV. QUINTANA         INFORMATION CENTER NICHOLLS         INFORMATION CENTER ADAMSON          MANUFACTURING SYSTEMS PIANKA           MANUFACTURING SYSTEMS YOSHIMURA        MANUFACTURING SYSTEMS SCOUTTEN         MANUFACTURING SYSTEMS WALKER           MANUFACTURING SYSTEMS BROWN            MANUFACTURING SYSTEMS JONES            MANUFACTURING SYSTEMS LUTZ             MANUFACTURING SYSTEMS JEFFERSON        ADMINISTRATION SYSTEMS LASTNAME         DEPTNAME --------         ------------------------- MARINO           ADMINISTRATION SYSTEMS SMITH            ADMINISTRATION SYSTEMS JOHNSON          ADMINISTRATION SYSTEMS PEREZ            ADMINISTRATION SYSTEMS SCHNEIDER        OPERATIONS PARKER           OPERATIONS SMITH            OPERATIONS SETRIGHT         OPERATIONS MEHTA            SOFTWARE SUPPORT LEE              SOFTWARE SUPPORT GOUNOT           SOFTWARE SUPPORT HEMMINGER        SPIFFY COMPUTER SERVICE DIV. ORLANDO          SPIFFY COMPUTER SERVICE DIV. NATZ             INFORMATION CENTER YAMAMOTO         MANUFACTURING SYSTEMS JOHN             MANUFACTURING SYSTEMS MONTEVERDE       ADMINISTRATION SYSTEMS SCHWARTZ         OPERATIONS SPRINGER         OPERATIONS WONG             SOFTWARE SUPPORT ALONZO           SOFTWARE SUPPORT -                BRANCH OFFICE H2 -                RANCH OFFICE I2 -                BRANCH OFFICE G2 -                DEVELOPMENT CENTER -                BRANCH OFFICE F2 -                BRANCH OFFICE J2 48 record(s) selected.

Records that have a null value assigned to one of the columns are records that did not have a matching value in the two tables.

Combining the Results of Two or More Queries Using a Set Operator

With DB2, it is possible to combine two or more queries into a single query by using a special operator known as a set operator. When a set operator is used, the results of each query executed are combined in a specific manner to produce a single result data set. The following set operators are available:

UNION. When the UNION set operator is used, the result data sets produced by each individual query are combined and all duplicate rows are eliminated. Figure 5-5 illustrates how a simple UNION operation works.
UNION ALL. When the UNION ALL set operator is used, the result data sets produced by each individual query are combined; all duplicate rows found are retained. Figure 5-6 illustrates how a simple UNION ALL operation would work.
INTERSECT. When the INTERSECT set operator is used, the result data sets produced by each individual query are combined, all duplicate rows found are eliminated, and all records found in the first result data set that do not have a corresponding record in the second result data set are eliminated, leaving just the records that are found in both result data sets. Figure 5-7 illustrates how a simple INTERSECT operation would work.
INTERSECT ALL: When the INTERSECT ALL set operator is used, the result data sets produced by each individual query are combined and all records found in the first result data set that do not have a corresponding record in the second result data set are eliminated, leaving just the records that are found in both result data sets; all duplicate rows found are retained. Figure 5-8 illustrates how a simple INTERSECT ALL operation would work.
EXCEPT: When the EXCEPT set operator is used, the result data sets produced by each individual query are combined, all duplicate rows found are eliminated, and all records found in the first result data set that have a corresponding record in the second result data set are eliminated, leaving just the records that are not found in both result data sets. Figure 5-9 illustrates how a simple EXCEPT operation would work.

It is important to note that, in contrast to the UNION, UNION ALL, INTERSECT, and INTERSECT ALL set operators, if the queries used with the EXCEPT and EXCEPT ALL set operators are reversed, a different result data set will be produced. Figure 5-10 illustrates how the same EXCEPT operation would work if the order of the queries specified were reversed.
EXCEPT ALL: When the EXCEPT ALL set operator is used, the result data sets produced by each individual query are combined and all records found in the first result data set that have a corresponding record in the second result data set are eliminated, leaving just the records that are not found in both result data sets; all duplicate rows found are retained. Figure 5-11 illustrates how a simple EXCEPT ALL operation would work.

image from book
Figure 5-5: A simple UNION set operation.

image from book
Figure 5-6: A simple UNION ALL set operation.

image from book
Figure 5-7: A simple INTERSECT set operation.

image from book
Figure 5-8: A simple INTERSECT ALL set operation.

image from book
Figure 5-9: A simple EXCEPT set operation.

image from book
Figure 5-10: A simple EXCEPT set operation with the queries reversed.

image from book
Figure 5-11: A simple EXCEPT ALL set operation.

In order for two result data sets to be combined with a set operator, both must have the same number of columns, and each of those columns must have the same data types assigned to it. So when would you want to combine the results of two queries using a set operator? Suppose your company keeps individual employee expense account information in a table whose contents are archived at the end of each fiscal year. When a new fiscal year begins, expenditures for that year are essentially recorded in a new table. Now suppose, for tax purposes, you need a record of all employees' expenses for the last two years. To obtain this information, each archived table must be queried, and the results must then be combined. Rather than do this by running individual queries against the archived tables and storing the results in some kind of temporary table, this operation could be performed simply by using the UNION set operator, along with two SELECT SQL statements. Such a set of SELECT statements might look something like this:

 SELECT * FROM emp_exp_02 UNION SELECT * FROM emp_exp_01