We just saw that you can retrieve every value stored in a base table by executing a simple query that looks something like this:
SELECT * FROM [TableName]
But what if you wanted to see only the values stored in two columns of a table? Or what if you wanted the data retrieved to be ordered alphabetically in ascending order? (Data is stored in a table in no particular order, and unless otherwise specified, a query only returns data in the order in which it is found.) How do you construct a query that retrieves only certain data values and returns those values in a very specific format? You do so by using a more advanced form of the SELECT SQL statement to construct your query. The syntax for the form of the SELECT statement that is used to construct more advanced queries is:
SELECT <DISTINCT> [*|[Expression] <<AS> [NewColumnName]>, ...] FROM [[TableName]|[ViewName] <<AS> [CorrelationName]>, ...] <WhereClause> <GroupByClause> <HavingClause> <OrderByClause> <FetchFirstClause>
where:
Expression | Identifies one or more columns values that are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are the most common elements used. |
NewColumnName | Specifies a new column name to be used in place of the table or view column name specified in the result data set returned by the query. |
TableName | Identifies the name(s) assigned to one or more tables from which data is to be retrieved. |
ViewName | Identifies the name(s) assigned to one or more views from which data is to be retrieved. |
CorrelationName | Identifies a shorthand name that can be used when referencing the table or view the correlation name is associated with in any of the SELECT statement clauses. |
WhereClause | Identifies a WHERE clause that is to be used with the SELECT statement. |
GroupByClause | Identifies a GROUP BY clause that is to be used with the SELECT statement. |
HavingClause | Identifies a HAVING clause that is to be used with the SELECT statement. |
OrderByClause | Identifies an ORDER BY clause that is to be used with the SELECT statement. |
FetchFirstClause | Identifies a FETCH FIRST clause that is to be used with the SELECT statement. |
Thus, if you wanted to retrieve all values for the columns named WORKDEPT and JOB from a table named EMPLOYEE, you could do so by executing a SELECT statement that looks something like this:
SELECT workdept, job FROM employee
And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
WORKDEPT JOB -------- ----- A00 PRES B01 MANAGER C01 MANAGER E01 MANAGER D11 MANAGER D21 MANAGER E11 MANAGER E21 MANAGER A00 SALESREP A00 CLERK C01 ANALYST C01 ANALYST D11 DESIGNER D11 DESIGNER D11 DESIGNER D11 DESIGNER D11 DESIGNER D11 DESIGNER D11 DESIGNER D11 DESIGNER D21 CLERK D21 CLERK D21 CLERK D21 CLERK D21 CLERK E11 OPERATOR E11 OPERATOR E11 OPERATOR E11 OPERATOR E21 FIELDREP E21 FIELDREPE 21 FIELDREP A00 SALESREP A00 CLERK C01 ANALYST D11 DESIGNER D11 DESIGNER D21 CLERK E11 OPERATOR E11 OPERATOR E21 FIELDREP E21 FIELDREP 42 record(s) selected.
If the DISTINCT clause is specified with a SELECT statement, duplicate rows are removed from the final result data set returned. Two rows are considered to be duplicates of one another if the value of every column of the first row is identical to the value of the corresponding column of the second row. For the purpose of determining whether or not two rows are identical, null values are considered equal. However, the DISTINCT clause cannot be used if the result data set produced contains LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, DBCLOB, or XML data.
Thus, if you were to execute the same SELECT statement that was executed earlier with the DISTINCT clause specified, the resulting statement would look something like this:
SELECT DISTINCT workdept, job FROM employee
This time, when the query is executed, you should see a result data set that looks something like this:
WORKDEPT JOB --- ------- C01 ANALYST A00 CLERK D21 CLERK D11 DESIGNER E21 FIELDREP B01 MANAGER C01 MANAGER D11 MANAGER D21 MANAGER E01 MANAGER E11 MANAGER E21 MANAGER E11 OPERATOR A00 PRES A00 SALESREP 15 record(s) selected.
Now suppose you wanted to retrieve every unique value (no duplicates) for a column named JOB in a table named EMPLOYEE, and you wanted to change the name of the JOB column in the result data set produced to TITLE. You could do so by executing a SELECT statement that looks something like this:
SELECT DISTINCT job AS title FROM employee
This time, when the query is executed, you should see a result data set that looks something like this:.
TITLE ------ ANALYST CLERK DESIGNER FIELDREP MANAGER OPERATOR PRES SALESREP 8 record(s) selected.
You could also produce this result data set by executing the same query and using the correlation name E for the table named EMPLOYEE. The only difference is that, this time, the SELECT statement would look like this:
SELECT DISTINCT e.job AS title FROM employee AS e
Notice that the column named JOB is qualified with the correlation name E that was assigned to the table named EMPLOYEE. In this example, the use of a correlation name is not really necessary, because data is being retrieved from only one table and no two columns in a table can have the same name. However, if data was being retrieved from two or more tables and if some of the columns in those tables had the same name, a qualifier would be needed to tell the DB2 Database Manager which table to retrieve data from for a particular column.
If you were counting when we examined the syntax for the SELECT statement earlier, you may have noticed that a single SELECT statement can contain up to seven different clauses. These clauses are:
The DISTINCT clause
The FROM clause
The WHERE clause
The GROUP BY clause
The HAVING clause
The ORDER BY clause
The FETCH FIRST clause
(Incidentally, these clauses are processed in the order shown.) We just saw how the DISTINCT clause and the FROM clause are used. Now let's turn our attention to the other clauses the SELECT statement recognizes.
The WHERE clause is used to tell the DB2 Database Manager how to select the rows that are to be returned in the result data set produced in response to a query. When specified, the WHERE clause is followed by a search condition, which is nothing more than a simple test that, when applied to a row of data, will evaluate to TRUE, FALSE, or Unknown. If this test evaluates to TRUE, the row is returned in the result data set produced; if the test evaluates to FALSE or Unknown, the row is ignored.
The search condition of a WHERE clause is made up of one or more predicates that are used to make comparisons. Six common types of WHERE clause predicates are recognized by DB2. They are:
Relational predicates
BETWEEN
LIKE
IN
EXISTS
IS NULL
Each of these predicates can be used alone, or they can be combined using parentheses or Boolean operators such as AND and OR.
The relational predicates (or comparison operators) consist of a set of special operators that are used to define a comparison relationship between the contents of a column and a constant value, the contents of two columns from the same table, or the contents of a column in one table with the contents of a column in another table. The following comparison operators are available:
< (Less than)
> (Greater than)
<= (Less than or equal to)
>= (Greater than or equal to)
= (Equal to)
<> (Not equal to)
Typically, relational predicates are used to include or exclude specific rows from the final result data set produced in response to a query. Thus, if you wanted to retrieve values from columns named EMPNO and SALARY in a table named EMPLOYEE where the value for the SALARY column is greater than or equal to $70,000.00, you could do so by executing a SELECT statement that looks something like this:
SELECT empno, salary FROM employee WHERE salary >= 70000.00
And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
EMPNO SALARY ---- ------- 000010 152750.00 000020 94250.00 000030 98250.00 000050 80175.00 000060 72250.00 000070 96170.00 000090 89750.00 000100 86150.00 000130 73800.00 9 record(s) selected.
It is important to note that the data types of all items involved in a relational predicate comparison must be compatible, or the comparison will fail. If necessary, the built-in functions provided with DB2 can be used to make any conversions required. For example, if you wanted to retrieve values from columns named LASTNAME and HIREDATE in a table named EMPLOYEE for every employee who was hired in December, you could do so by executing a SELECT statement that looks something like this:
SELECT lastname, hiredate FROM employee WHERE MONTHNAME(hiredate) = 'December'
In this example, the built-in function MONTHNAME() is used to produce a mixed-case character string containing the name of the month for the month portion of the date value stored in the HIREDATE column. This string value is then compared with the string 'December' to determine which rows are to be returned in a result data set. If you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:
LASTNAME HIREDATE --------- --------- O'CONNELL 12/05/1993 NICHOLLS 12/15/2006 MARINO 12/05/2004 NATZ 12/15/2006 MONTEVERDE 12/05/2004 5 record(s) selected.
The BETWEEN predicate is used to define a comparison relationship in which the contents of a column are checked to see whether they fall within a specified range of values. As with relational predicates, the BETWEEN predicate is used to include or exclude specific rows from the result data set produced in response to a query.
Therefore, if you wanted to retrieve values for columns named EMPNO and SALARY in a table named EMPLOYEE where the value for the SALARY column is greater than or equal to $60,000.00 and less than or equal to $70,000.00, you could do so by executing a SELECT statement that looks something like this:
SELECT empno, salary FROM employee WHERE salary BETWEEN 60000.00 AND 70000.00
And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
EMPNO SALARY ------ -------- 000110 66500.00 000140 68420.00 000160 62250.00 000210 68270.00 200140 68420.00 200170 64680.00 200220 69840.00 7 record(s) selected.
If the NOT (negation) operator is used in conjunction with the BETWEEN predicate (or with any other predicate, for that matter), the meaning of the predicate is reversed. In this case, contents of a column are checked, and only values that fall outside the range of values specified are returned to the final result data set produced. Thus, if you wanted to retrieve values for columns named EMPNO and SALARY in a table named EMPLOYEE where the value for the SALARY column is less than $20,000.00 or more than $90,000.00, you could do so by executing a SELECT statement that looks something like this:
SELECT empno, salary FROM employee WHERE salary NOT BETWEEN 20000.00 AND 90000.00
When this query is executed, you should see a result data set that looks something like this:
EMPNO SALARY ------- --------- 000010 152750.00 000020 94250.00 000030 98250.00 000070 96170.00 4 record(s) selected.
The LIKE predicate is used to define a comparison relationship in which a character string value is checked to see whether it contains a specific pattern of characters. The pattern of characters specified can consist of regular alphanumeric characters and/or special metacharacters that are interpreted as follows:
The underscore character (_) is treated as a wildcard character that stands for any single alphanumeric character.
The percent character (%) is treated as a wildcard character that stands for any sequence of alphanumeric characters.
Thus, if you wanted to retrieve values for columns named EMPNO and LASTNAME in a table named EMPLOYEE where the value for the LASTNAME column begins with the letter S, you could do so by executing a SELECT statement that looks something like this:
SELECT empno, lastname FROM employee WHERE lastname LIKE 'S%'
If you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:
EMPNO LASTNAME ------ -------- 000060 STERN 000100 SPENSER 000180 SCOUTTEN 000250 SMITH 000280 SCHNEIDER 000300 SMITH 000310 SETRIGHT 200280 SCHWARTZ 200310 SPRINGER 9 record(s) selected.
When using wildcard characters, you must take care to ensure that they are placed in the appropriate location in the pattern string specified. Note that in the previous example, when the character string pattern 'S%' was specified, only records for employees whose last name begins with the letter S were returned. If the character string pattern used had been '%S%', records for employees whose last name contains the character S (anywhere in the name) would have been returned, and the result data set produced would have looked like this instead:
EMPNO LASTNAME ------ ------- 000010 HAAS 000020 THOMPSON 000060 STERN 000070 PULASKI 000090 HENDERSON 000100 SPENSER 000110 LUCCHESSI 000140 NICHOLLS 000150 ADAMSON 000170 YOSHIMURA 000180 SCOUTTEN 000210 JONES 000230 JEFFERSON 000250 SMITH 000260 JOHNSON 000280 SCHNEIDER 000300 SMITH 000310 SETRIGHT 200280 SCHWARTZ 200310 SPRINGER 20 record(s) selected.
Likewise, you must also be careful about using uppercase and lowercase characters in pattern strings; if the data being examined is stored in a case-sensitive manner, the characters used in a pattern string must match the case that was used to store the value in the column being searched, or no matching records will be found.
Tip | Although the LIKE predicate provides a relatively easy way to search for data values, it should be used with caution; the overhead involved in processing a LIKE predicate is very high and can be extremely resource-intensive. |
The IN predicate is used to define a comparison relationship in which a value is checked to see whether it matches a value in a finite set of values. This finite set of values can consist of one or more literal values that are coded directly in the IN predicate, or it can be composed of the non-null values found in a result data set produced by a subquery.
Tip | A subquery may include search conditions of its own, and these search conditions may in turn include their own subqueries. When such "nested" subqueries are processed, the DB2 Database Manager executes the innermost query first and uses the results to execute the next outer query, and so on until all nested queries have been processed. |
Thus, if you wanted to retrieve values for columns named EMPNO and WORKDEPT in a table named EMPLOYEE where the value for the WORKDEPT column matches a value in a list of predefined department code values, you could do so by executing a SELECT statement that looks something like this:
SELECT lastname, workdept FROM employee WHERE workdept IN ('E11', 'E21')
And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
LASTNAME WORKDEPT --------- -------- HENDERSON E11 SCHNEIDER E11 PARKER E11 SMITH E11 SETRIGHT E11 SCHWARTZ E11 SPRINGER E11 SPENSER E21 MEHTA E21 LEE E21 GOUNOT E21 WONG E21 ALONZO E21 13 record(s) selected.
On the other hand, if you wanted to retrieve values for columns named LASTNAME and WORKDEPT in a table named EMPLOYEE where the value for the WORKDEPT column matches a value in a list of department codes for departments that are managed by department A00, you could do so by executing a SELECT statement that looks something like this:
SELECT lastname, workdept FROM employee WHERE workdept IN (SELECT deptno FROM department WHERE admrdept = 'A00')
When this query is executed, you should see a result data set that looks something like this:
LASTNAME WORKDEPT -------- -------- HAAS A00 THOMPSON B01 KWAN C01 GEYER E01 LUCCHESSI A00 O'CONNELL A00 QUINTANA C01 NICHOLLS C01 HEMMINGER A00 ORLANDO A00 NATZ C01 11 record(s) selected.
In this case, the subquery SELECT deptno FROM department WHERE admrdept = 'A00' produces a result data set containing the values A00, B01, C01, D01, and E01, and the main query evaluates each value found in the WORKDEPT column of the EMPLOYEE table to determine whether it matches one of the values in the result data set produced by the subquery. If there is a match, the record is returned.
The EXISTS predicate is used to determine whether a particular value exists in a given result data set. The EXISTS predicate is always followed by a subquery, and it returns either TRUE or FALSE to indicate whether a specific value is found in the result data set produced by the subquery. Thus, if you wanted to find out which values found in the column named DEPTNO in a table named DEPARTMENT are used in the column named WORKDEPT in a table named EMPLOYEE, you could do so by executing a SELECT statement that looks something like this:
SELECT deptno, deptname FROM department WHERE EXISTS (SELECT workdept FROM employee WHERE workdept = deptno)
And if you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:
DEPTNO DEPTNAME ---- --------------------------- A00 SPIFFY COMPUTER SERVICE DIV. B01 PLANNING C01 INFORMATION CENTER D11 MANUFACTURING SYSTEMS D21 ADMINISTRATION SYSTEMS E01 SUPPORT SERVICES E11 OPERATIONS E21 SOFTWARE SUPPORT 8 record(s) selected.
In most situations, the EXISTS predicate is ANDed with other predicates to determine final row selection.
The NULL predicate is used to determine whether or not a particular value is a NULL value. So, if you wanted to retrieve values for columns named DEPTNO and DEPTNAME in a table named DEPARTMENT where the value for the MGRNO column is a NULL value, you could do so by executing a SELECT statement that looks something like this:
SELECT deptno, deptname FROM department WHERE mgrno IS NULL
If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
DEPTNO DEPTNAME ----- ------------ D01 DEVELOPMENT CENTER F22 BRANCH OFFICE F2 G22 BRANCH OFFICE G2 H22 BRANCH OFFICE H2 I22 BRANCH OFFICE I2 J22 BRANCH OFFICE J2 6 record(s) selected.
When using the NULL predicate, it is important to keep in mind that NULL, zero (0), and blank ("") are three different values. NULL is a special marker that is used to represent missing information, while zero and blank (empty string) are actual values that may be stored in a column to indicate a specific value or lack thereof. Moreover, some columns accept NULL values, while other columns do not, depending upon their definition. So, before writing WHERE clauses that check for NULL values, make sure that the NULL value is supported by the column(s) being queried.
The GROUP BY clause is used to tell the DB2 Database Manager how to organize rows of data returned in the result data set produced in response to a query. In its simplest form, the GROUP BY clause is followed by a grouping expression that is usually one or more column names that identify columns found in the result data set to be organized. The GROUP BY clause can also be used to specify what columns are to be grouped together to provide input to aggregate functions such as SUM() and AVG().
Thus, if you wanted to obtain the average salary for all departments found in a column named WORKDEPT, using salary information stored in a column named SALARY in a table named EMPLOYEE, and you wanted to round the salary information retrieved to two decimal points and organize it by department, you could do so by executing a SELECT statement that looks something like this:
SELECT workdept, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY workdept
If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
WORKDEPT AVG_SALARY ---- ---------- A00 70850.00 B01 94250.00 C01 77222.50 D11 58783.63 D21 51240.00 E01 80175.00 E11 45305.71 E21 47086.66 8 record(s) selected.
In this example, each row returned in the result data set produced contains the department code, along with the average salary for individuals who work in that department.
Tip | A common mistake that is often made when using the GROUP BY clause is the addition of nonaggregate columns to the list of columns that follow the GROUP BY clause. Since grouping is performed by combining all of the nonaggregate columns together into a single concatenated key and breaking whenever that key value changes, extraneous columns can cause unexpected breaks to occur. |
The GROUP BY ROLLUP clause is used to analyze a collection of data in a single dimension but at more than one level of detail. For example, you could group data by successively larger organizational units, such as team, department, and division, or by successively larger geographical units, such as city, county, state or province, country, and continent. Thus, if you were to change the GROUP BY clause in the previous SELECT statement to a GROUP BY ROLLUP clause, you would end up with a SELECT statement that looks something like this:
SELECT workdept, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY ROLLUP (workdept)
And if you executed this query against the SAMPLE database provided with DB2, you should get a result data set that looks something like this:
WORKDEPT AVG_SALARY -------- --------- - 58155.35 A00 70850.00 B01 94250.00 C01 77222.50 D11 58783.63 D21 51240.00 E01 80175.00 E11 45305.71 E21 47086.66 9 record(s) selected.
This result data set contains average salary information for all employees found in the table named EMPLOYEES regardless of which department they work in (the line in the result data set returned that has a null value assigned to the WORKDEPT column), as well as average salary information for each department available (the remaining lines in the result data set returned).
In this example, only one expression (known as the grouping expression) is specified in the GROUP BY ROLLUP clause (in this case, the grouping expression is WORKDEPT). However, one or more grouping expressions can be specified in a single GROUP BY ROLLUP clause (for example, GROUP BY ROLLUP (workdept, division)). When multiple grouping expressions are specified, the DB2 Database Manager groups the data by all grouping expressions used, then by all but the last grouping expression used, and so on. Then, it makes one final grouping that consists of the entire contents of the specified table. Therefore, when specifying multiple grouping expressions, it is important to ensure that they are listed in the appropriate order-if one kind of group is logically contained inside another (for example departments within a division), then that group should be listed after the group it is contained in (i.e., GROUP BY ROLLUP (department, division)), never before.
The GROUP BY CUBE clause is used to analyze a collection of data by organizing it into groups in multiple dimensions. Thus, if you were to execute a SELECT statement that looks something like this:
SELECT workdept, sex, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY CUBE (workdept, sex)
You might see a result data set that looks something like this:
WORKDEPT SEX AVG_SALARY ----- ---- ---------- - F 63243.68 - M 53951.95 - - 58155.35 A00 - 70850.00 B01 - 94250.00 C01 - 77222.50 D11 - 58783.63 D21 - 51240.00 E01 - 80175.00 E11 - 45305.71 E21 - 47086.66 A00 F 99625.00 A00 M 51666.66 B01 M 94250.00 C01 F 77222.50 D11 F 58317.50 D11 M 59050.00 D21 F 60266.66 D21 M 44470.00 E01 M 80175.00 E11 F 48810.00 E11 M 36545.00 E21 F 35370.00 E21 M 49430.00 24 record(s) selected.
This result set contains average salary information for each department found in the table named EMPLOYEE (the lines that contain a null value in the SEX column and a value in the WORKDEPT column of the result data set returned), average salary information for all employees found in the table named EMPLOYEE regardless of which department they work in (the line that contains a null value for both the SEX and the WORKDEPT column of the result data set returned), average salary information for each genter (the lines that contain values in the SEX column and a null value in the WORKDEPT column of the result data set returned), and average salary information for each genter in each department available (the remaining lines in the result data set returned).
In other words, the data in the result data set produced is grouped:
By department only
By gender only
By gender and department
As a single group that contains all genders and all departments.
The term CUBE is intended to suggest that data is being analyzed in more than one dimension, and as you can see in the previous example, data analysis was actually performed in two dimensions, which resulted in four types of groupings. Suppose the following SELECT statement had been used instead:
SELECT workdept, sex, job, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY CUBE (workdept, sex, job)
Data analysis would have been performed in three dimensions, and the data would have been broken into eight types of groupings. Thus, the number of types of groups produced by a GROUP BY CUBE operation can be determined by the formula: 2n where n is the number of expressions used in the GROUP BY CUBE clause.
The HAVING clause is used to apply further selection criteria to columns referenced in a GROUP BY clause. This clause behaves like the WHERE clause, except that it refers to data that has already been grouped (the HAVING clause is used to tell the DB2 Database Manager how to select the rows to be returned in a result data set from rows that have already been grouped). And, like the WHERE clause, the HAVING clause is followed by a search condition that acts as a simple test that, when applied to a row of data, will evaluate to TRUE, FALSE, or Unknown . If this test evaluates to TRUE, the row is returned in the result data set produced; if the test evaluates to FALSE or Unknown, the row is skipped. Because it behaves like a WHERE clause, the search condition of a HAVING clause can consist of the same predicates that are recognized by the WHERE clause.
Thus, if you wanted to obtain the average salary for all departments found in a column named WORKDEPT, using salary information stored in a column named SALARY in a table named EMPLOYEE, and you wanted to organize the data retrieved by department, but you are interested only in departments whose average salary is greater than $60,000.00, you could obtain this information by executing a SELECT statement that looks something like this:
SELECT workdept, DECIMAL(AVG(salary), 9, 2) AS avg_salary FROM employee GROUP BY work dept HAVING AVG(salary) > 60000
And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
WORKDEPT AVG_SALARY -------- ---------- A00 70850.00 B01 94250.00 C01 77222.50 E01 80175.00 4 record(s) selected.
As you can see from the output provided, each row in the result data set produced contains the department ID for every department whose average salary for individuals working in that department is greater than $60,000.00, along with the average salary found for each department.
The ORDER BY clause is used to tell the DB2 Database Manager how to sort and order the rows that are to be returned in a result data set produced in response to a query. When specified, the ORDER BY clause is followed by the name of one or more column(s) whose data values are to be sorted and a keyword that indicates the desired sort order. If the keyword ASC follows the column's name, ascending order is used, and if the keyword DESC follows the column name, descending order is used. Multiple columns can be used for sorting, and each column used can be ordered in either ascending or descending order. Furthermore, when more than one column is identified in an ORDER BY clause, the corresponding result data set is sorted by the first column specified (the primary sort), then the sorted data is sorted again by the next column specified, and so on until the data has been sorted by each column specified.
Thus, if you wanted to retrieve values for columns named EMPNO, LASTNAME, and WORKDEPT in a table named EMPLOYEE and you wanted the information sorted by WORKDEPT in ascending order, followed by LASTNAME in descending order, you could do so by executing a SELECT statement that looks something like this:
SELECT empno, lastname, workdept FROM employee ORDER BY workdept ASC, lastname DESC
And if you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
EMPNO LASTNAME WORKDEPT ------ -------- -------- 200120 ORLANDO A00 000120 O'CONNELL A00 000110 LUCCHESSI A00 200010 HEMMINGER A00 000010 HAAS A00 000020 THOMPSON B01 000130 QUINTANA C01 000140 NICHOLLS C01 200140 NATZ C01 000030 KWAN C01 000170 YOSHIMURA D11 200170 YAMAMOTO D11 000190 WALKER D11 000060 STERN D11 000180 SCOUTTEN D11 000160 PIANKA D11 000220 LUTZ D11 000210 JONES D11 200220 JOHN D11 000200 BROWN D11 000150 ADAMSON D11 000250 SMITH D21 000070 PULASKI D21 000270 PEREZ D21 200240 MONTEVERDE D21 000240 MARINO D21 000260 JOHNSON D21 000230 JEFFERSON D21 000050 GEYER E01 200310 SPRINGER E11 000300 SMITH E11 000310 SETRIGHT E11 200280 SCHWARTZ E11 000280 SCHNEIDER E11 000290 PARKER E11 000090 HENDERSON E11 EMPNO LASTNAME WORKDEPT ------ -------- -------- 200330 WONG E21 000100 SPENSER E21 000320 MEHTA E21 000330 LEE E21 000340 GOUNOT E21 200340 ALONZO E21 42 record(s) selected.
Using the ORDER BY clause is easy if the result data set consists only of named columns. But what happens if the result data set produced needs to be ordered by a summary column or a result column that cannot be specified by name? Because these situations can exist, the ORDER BY clause allows an integer value that corresponds to a particular column's number to be used in place of the column name. When integer values are used, the first or leftmost column in the result data set produced is treated as column 1, the next is column 2, and so on. Therefore, the result data set produced by the previous query could have also been produced by executing a SELECT statement that looks like this:
SELECT empno, lastname, workdept FROM employee ORDER BY 1 ASC, 2 DESC
It is important to note that even though integer values are primarily used in the ORDER BY clause to specify columns that cannot be specified by name, they can be used in place of any column name.
The FETCH FIRST clause is used to limit the number of rows returned to the result data set produced in response to a query. When used, the FETCH FIRST clause is followed by a positive integer value and the words ROWS ONLY. This tells the DB2 Database Manager that the user/application executing the query does not want to see more than n number of rows, regardless of how many rows might exist in the result data set produced were the FETCH FIRST clause not specified.
Thus, if you wanted to retrieve just the first 10 values for columns named EMPNO, LASTNAME, and WORKDEPT from a table named EMPLOYEE, you could do so by executing a SELECT statement that looks something like this:
SELECT empno, lastname, workdept FROM employee FETCH FIRST 10 ROWS ONLY
If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
EMPNO LASTNAME WORKDEPT ------ -------- -------- 000010 HAAS A00 000020 THOMPSON B01 000030 KWAN C01 000050 GEYER E01 000060 STERN D11 000070 PULASKI D21 000090 HENDERSON E11 000100 SPENSER E21 000110 LUCCHESSI A00 000120 O'CONNELL A00 10 record(s) selected.
It is important to note that if you use other clauses to format the data before returning a specified number of rows, the actual values returned can be different. For example, suppose the previous SELECT statement was modified to include an ORDER BY clause like this:
SELECT empno, lastname, workdept FROM employee ORDER BY lastname FETCH FIRST 10 ROWS ONLY
The result data set produced when the query is executed might look something like this:
EMPNO LASTNAME WORKDEPT -------- -------- -------- 000150 ADAMSON D11 200340 ALONZO E21 000200 BROWN D11 000050 GEYER E01 000340 GOUNOT E21 000010 HAAS A00 200010 HEMMINGER A00 000090 HENDERSON E11 000230 JEFFERSON D21 200220 JOHN D11 10 record(s) selected.
Common table expressions are mechanisms that are used to construct local temporary tables that reside in memory and exist only for the life of the SQL statement that defines them. (In fact, the table that is created in response to a common table expression can only be referenced by the SQL statement that created it.) Common table expressions are typically used:
In place of a view (when the creation of a view is undesirable, when general use of a view is not required, and when positioned update or delete operations are not used)
To enable grouping by a column that is derived from a subquery or a scalar function that performs some external action
When the desired result table is based on host variables
When the same result table needs to be used by several different queries
When the results of a query need to be derived using recursion
The syntax used to construct a common table expression is:
WITH [TableName]<([ColumnName], ...])> AS([SELECTStatement])
where:
TableName | Specifies the name that is to be assigned to the temporary table to be created. |
ColumnName | Specifies the name(s) to be assigned to one or more columns that are to be included in the temporary table to be created. Each column name specified must be unique and unqualified; if no column names are specified, the names derived from the result data set produced by the SELECTStatement specified will be used. If a list of column names is specified, the number of column names provided must match the number of columns that will be returned by the SELECT statement used to create the temporary table. If a common table expression is recursive, or if the result data set produced by the SELECT statement specified contains duplicate column names, column names must be specified. |
SELECTStatement | Identifies a SELECT SQL statement that, when executed, will produce the data values to be added to the column(s) in the temporary table to be created. |
Thus, if you wanted to retrieve all values for the columns named EMPNO,LASTNAME, HIREDATE, and SEX from a table named EMPLOYEE and store them in a common table that is then referenced in a query that is designed to obtain employee number and hire date information for all female employees working for the company, you could do so by executing an SQL statement that looks something like this:
WITH emp_info (empno, lastname, hiredate, sex) AS (SELECT empno, lastname, hiredate, sex FROM employee) SELECT empno, hiredate FROM emp_info WHERE sex = 'F'
And if you executed this statement against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
EMPNO HIREDATE ------ ---------- 000010 01/01/1995 000030 04/05/2005 000070 09/30/2005 000090 08/15/2000 000130 07/28/2001 000140 12/15/2006 000160 10/11/2006 000180 07/07/2003 000220 08/29/1998 000260 09/11/2005 000270 09/30/2006 000280 03/24/1997 000310 09/12/1994 200010 01/01/1995 200140 12/15/2006 200220 08/29/2005 200280 03/24/1997 200310 09/12/1994 200330 02/23/2006 19 record(s) selected.
Multiple common table expressions can be specified following a single WITH keyword, and each common table expression specified can be referenced, by name, in the FROM clause of subsequent common table expressions. However, if multiple common table expressions are defined within the same WITH keyword, the table name assigned to each temporary table created must be unique from all other table names used in the SELECT statement that creates them. It is also important to note that the table name assigned to the temporary table created by a common table expression will take precedence over any existing table, view, or alias (in the system catalog) that has the same qualified name; if the SELECT SQL statement -references the original table, view, or alias, it will actually be working with the temporary table created. (Existing tables, views, and aliases whose names match that of the temporary table are not altered but are simply no longer accessible.)
One efficient and concise way to display compared values in a readable format is to use one or more CASE expressions in the selection list of a query. Each CASE operation evaluates a specified expression and supplies a different value, depending on whether a certain condition is met. A CASE expression can take one of two forms: simple or searched. The basic syntax used to create a simple CASE expression is:
CASE [Expression1] [WHEN [Expression2] THEN [Result1], ...] ELSE [Result2] <END>
where:
Expression1 | Identifies an expression or value that is to be compared to one or more Expression2 expressions or values. |
Expression2 | Identifies one or more expressions or values that, when compared to Expression1, evaluate to TRUE or FALSE. |
Result1 | Identifies a value to be used when a search condition evaluates to TRUE. |
Result2 | Identifies a value to be used when a search condition evaluates to FALSE. |
Thus, if you wanted to retrieve values for columns named EMPNO, LASTNAME, and DIVISION from a table named EMPLOYEE, and you knew that the first character of the work department code associated with an employee represents a division within the company, you could use a simple CASE expression to translate the codes and provide a complete division name to which each employee belongs by executing an SQL statement that looks something like this:
SELECT empno, lastname, CASE SUBSTR(workdept, 1, 1) WHEN 'A' THEN 'ADMINISTRATION' WHEN 'B' THEN 'HUMAN RESOURCES' WHEN 'C' THEN 'DESIGN' WHEN 'D' THEN 'OPERATIONS' ELSE 'UNKNOWN DEPARTMENT' END AS division FROM employee
If you executed this statement against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
EMPNO LASTNAME DIVISION ------ ---------- ------------------ 000010 HAAS DMINISTRATION 000020 THOMPSON HUMAN RESOURCES 000030 KWAN DESIGN 000050 GEYER UNKNOWN DEPARTMENT 000060 STERN OPERATIONS 000070 PULASKI OPERATIONS 000090 HENDERSON UNKNOWN DEPARTMENT 000100 SPENSER UNKNOWN DEPARTMENT 000110 LUCCHESSI ADMINISTRATION 000120 O'CONNELL ADMINISTRATION 000130 QUINTANA DESIGN 000140 NICHOLLS DESIGN 000150 ADAMSON OPERATIONS 000160 PIANKA OPERATIONS 000170 YOSHIMURA OPERATIONS 000180 SCOUTTEN OPERATIONS 000190 WALKER OPERATIONS 000200 BROWN OPERATIONS 000210 JONES OPERATIONS 000220 LUTZ OPERATIONS 000230 JEFFERSON OPERATIONS 000240 MARINO OPERATIONS 000250 SMITH OPERATIONS 000260 JOHNSON OPERATIONS 000270 PEREZ OPERATIONS 000280 SCHNEIDER UNKNOWN DEPARTMENT EMPNO LASTNAME DIVISION ------ --------- ------------------- 000290 PARKER UNKNOWN DEPARTMENT 000300 SMITH UNKNOWN DEPARTMENT 000310 SETRIGHT UNKNOWN DEPARTMENT 000320 MEHTA UNKNOWN DEPARTMENT 000330 LEE UNKNOWN DEPARTMENT 000340 GOUNOT UNKNOWN DEPARTMENT 200010 HEMMINGER ADMINISTRATION 200120 ORLANDO ADMINISTRATION 200140 NATZ DESIGN 200170 YAMAMOTO OPERATIONS 200220 JOHN OPERATIONS 200240 MONTEVERDE OPERATIONS 200280 SCHWARTZ UNKNOWN DEPARTMENT 200310 SPRINGER UNKNOWN DEPARTMENT 200330 WONG UNKNOWN DEPARTMENT 200340 ALONZO UNKNOWN DEPARTMENT 42 record(s) selected.
The syntax used to create a searched CASE expression is:
CASE [WHEN [SearchCondition] THEN [Result1], ...] ELSE [Result2] <END>
where:
SearchCondition | Identifies one or more logical conditions that evaluate to TRUE or FALSE. |
Result1 | Identifies a value to be used when a search condition evaluates to TRUE. |
Result2 | Identifies a value to be used when a search condition evaluates to FALSE. |
Thus, if you wanted to retrieve values for columns named EMPNO, LASTNAME,JOB, and SALARY from a table named EMPLOYEE, and you wanted to calculate salary increases based on the job each employee performs, you could use a searched CASE expression to provide new salary values based on the job held by executing an SQL statement that looks something like this:
SELECT empno, lastname, job, salary, CASE WHEN job IN ('MANAGER', 'SUPRVSR') THEN salary * 1.10 WHEN job IN ('DBA', 'SYS PROG') THEN salary * 1.08 WHEN job = 'PRGRMR' THEN salary * 1.05 ELSE salary * 1.035 END as new_salary FROM employee
And if you executed this statement, you should get a result data set that looks something like this:
EMPNO LASTNAME JOB SALARY NEW_SALARY ------ -------- -------- ------ ---------- 000010 HAAS PRES 152750.00 158096.25000 000020 THOMPSON MANAGER 94250.00 103675.00000 000030 KWAN MANAGER 98250.00 108075.00000 000050 GEYER MANAGER 80175.00 88192.50000 000060 STERN MANAGER 72250.00 79475.00000 000070 PULASKI MANAGER 96170.00 105787.00000 000090 HENDERSON MANAGER 89750.00 98725.00000 000100 SPENSER MANAGER 86150.00 94765.00000 000110 LUCCHESSI SALESREP 66500.00 68827.50000 000120 O'CONNELL CLERK 49250.00 50973.75000 000130 QUINTANA ANALYST 73800.00 76383.00000 000140 NICHOLLS ANALYST 68420.00 70814.70000 000150 ADAMSON DESIGNER 55280.00 57214.80000 000160 PIANKA DESIGNER 62250.00 64428.75000 000170 YOSHIMURA DESIGNER 44680.00 46243.80000 000180 SCOUTTEN DESIGNER 51340.00 53136.90000 000190 WALKER DESIGNER 50450.00 52215.75000 000200 BROWN DESIGNER 57740.00 59760.90000 000210 JONES DESIGNER 68270.00 70659.45000 000220 LUTZ DESIGNER 49840.00 51584.40000 000230 JEFFERSON CLERK 42180.00 43656.30000 000240 MARINO CLERK 48760.00 50466.60000 000250 SMITH CLERK 49180.00 50901.30000 000260 JOHNSON CLERK 47250.00 48903.75000 000270 PEREZ CLERK 37380.00 38688.30000 000280 SCHNEIDER OPERATOR 36250.00 37518.75000 000290 PARKER OPERATOR 35340.00 36576.90000 000300 SMITH OPERATOR 37750.00 39071.25000 000310 SETRIGHT OPERATOR 35900.00 37156.50000 000320 MEHTA FIELDREP 39950.00 41348.25000 000330 LEE FIELDREP 45370.00 46957.95000 000340 GOUNOT FIELDREP 43840.00 45374.40000 200010 HEMMINGER SALESREP 46500.00 48127.50000 200120 ORLANDO CLERK 39250.00 40623.75000 200140 NATZ ANALYST 68420.00 70814.70000 200170 YAMAMOTO DESIGNER 64680.00 66943.80000 EMPNO LASTNAME JOB SALARY NEW_SALARY ------ -------- -------- - ----- ---------- 200220 JOHN DESIGNER 69840.00 72284.40000 200240 MONTEVERDE CLERK 37760.00 39081.60000 200280 SCHWARTZ OPERATOR 46250.00 47868.75000 200310 SPRINGER OPERATOR 35900.00 37156.50000 200330 WONG FIELDREP 35370.00 36607.95000 200340 ALONZO FIELDREP 31840.00 32954.40000 42 record(s) selected.
As you can see from these examples, the value provided by a CASE expression is the value of the result-expression (Result1) that follows the first (leftmost) case that evaluates to TRUE. If no case evaluates to TRUE and the ELSE keyword is present, then the result is the value of the second result-expression (Result2). If no case evaluates to TRUE and the ELSE keyword is not present, then the result is null. It is important to note that when a case evaluates to Unknown (because of null values), the case is not true and is treated the same as a case that evaluates to FALSE. And it goes without saying that the data type of the search condition specified must be comparable to the data type of each result-expression used.
So far, all the examples we have looked at have involved only one table. However, one of the more powerful features of the SELECT statement (and the element that makes data normalization possible) is the ability to retrieve data from two or more tables by performing what is known as a join operation. In its simplest form, the syntax for a SELECT statement that performs a join operation is:
SELECT * FROM [ [TableName]|[ViewName], ...]
where:
TableName | Identifies the name assigned to each of two or more tables from which data is to be retrieved. |
ViewName | Identifies the name assigned to each of two or more views from which data is to be retrieved. |
Consequently, if you wanted to retrieve all values stored in two base tables named CL_SCHED and ORG, you could do so by executing a SELECT statement that looks something like this:
SELECT * FROM cl_sched, org
When such a SELECT statement is executed, the result data set produced will contain all possible combinations of the rows found in each table specified (otherwise known as a Cartesian product). Every row in the result data set produced is a row from the first table referenced concatenated with a row from the second table referenced, concatenated in turn with a row from the third table referenced, and so on. The total number of rows found in the result data set produced is the product of the number of rows in all the individual table references. Thus, if the table named CL_SCHED in our previous example contains five rows and the table named ORG contains eight rows, the result data set produced by the statement "SELECT * FROM cl_sched, org" will consist of 40 rows (5 × 8 = 40).
Tip | A Cartesian product join operation should be used with extreme caution when working with large tables; the amount of resources required to perform such a join operation can have a serious negative impact on performance. |
A more common join operation involves collecting data from two or more tables that have one specific column in common and combining the results to create a result data set. The syntax for a SELECT statement that performs this type of join operation is:
SELECT [*|[Expression] <<AS> [NewColumnName]>, ...] FROM [[TableName] <<AS> [CorrelationName]>, ...] [JoinCondition]
where:
Expression | Identifies one or more columns whose values are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are commonly used. |
NewColumnName | Identifies a new column name that is to be used in place of the corresponding table or view column name specified in the result data set returned by the SELECT statement. |
TableName | Identifies the name(s) assigned to one or more tables from which data is to be retrieved. |
CorrelationName | Identifies a shorthand name that can be used when referencing the table name specified in the TableName parameter. |
JoinCondition | Identifies the condition to be used to join the tables specified. Typically, this is a WHERE clause in which the values of a column in one table are compared with the values of a similar column in another table. |
Thus, a simple join operation could be conducted by executing a SELECT statement that looks something like this:
SELECT lastname, deptname FROM employee e, department d WHERE e.workdept = d.deptno
If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks like this:
LASTNAME DEPTNAME -------- --------------------------- HAAS SPIFFY COMPUTER SERVICE DIV. THOMPSON PLANNING KWAN INFORMATION CENTER GEYER SUPPORT SERVICES STERN MANUFACTURING SYSTEMS PULASKI ADMINISTRATION SYSTEMS HENDERSON OPERATIONS SPENSER SOFTWARE SUPPORT LUCCHESSI SPIFFY COMPUTER SERVICE DIV. O'CONNELL SPIFFY COMPUTER SERVICE DIV. LASTNAME DEPTNAME -------- --------------------------- QUINTANA INFORMATION CENTER NICHOLLS INFORMATION CENTER ADAMSON MANUFACTURING SYSTEMS PIANKA MANUFACTURING SYSTEMS YOSHIMURA MANUFACTURING SYSTEMS SCOUTTEN MANUFACTURING SYSTEMS WALKER MANUFACTURING SYSTEMS BROWN MANUFACTURING SYSTEMS JONES MANUFACTURING SYSTEMS LUTZ MANUFACTURING SYSTEMS EFFERSON ADMINISTRATION SYSTEMS MARINO ADMINISTRATION SYSTEMS SMITH ADMINISTRATION SYSTEMS JOHNSON ADMINISTRATION SYSTEMS PEREZ ADMINISTRATION SYSTEMS SCHNEIDER OPERATIONS PARKER OPERATIONS SMITH OPERATIONS SETRIGHT OPERATIONS MEHTA SOFTWARE SUPPORT LEE SOFTWARE SUPPORT GOUNOT SOFTWARE SUPPORT HEMMINGER SPIFFY COMPUTER SERVICE DIV. ORLANDO SPIFFY COMPUTER SERVICE DIV. NATZ INFORMATION CENTER YAMAMOTO MANUFACTURING SYSTEMS JOHN MANUFACTURING SYSTEMS MONTEVERDE ADMINISTRATION SYSTEMS SCHWARTZ OPERATIONS SPRINGER OPERATIONS WONG SOFTWARE SUPPORT ALONZO SOFTWARE SUPPORT 42 record(s) selected.
This type of join is referred to as an inner join. Aside from a Cartesian product, only two types of joins can exist: inner joins and outer joins. And as you might imagine, there is a significant difference between the two.
After a Cartesian product, an inner join is the simplest type of join operation that can be performed. An inner join can be thought of as the cross product of two tables, in which every row in one table that has a corresponding row in another table is combined with that row to produce a new record. This type of join works well as long as every row in the first table has a corresponding row in the second table. However, if this is not the case, the result table produced may be missing rows found in either or both of the tables that were joined. In the last example, we saw the SELECT statement syntax that is commonly used to perform inner join operations. However, the following syntax can also be used to create a SELECT statement that performs an inner join operation:
SELECT [*|[Expression] <<AS> [NewColumnName]> ,...] FROM [[TableName1] <<AS> [CorrelationName1]>] <INNER> JOIN [[TableName2] <<AS> [CorrelationName2]>] ON [JoinCondition]
where:
Expression | Identifies one or more columns whose values are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are commonly used. |
NewColumnName | Identifies a new column name to be used in place of the corresponding table or view column name specified in the result data set returned by the SELECT statement. |
TableName1 | Identifies the name assigned to the first table from which data is to be retrieved. |
CorrelationName1 | Identifies a shorthand name that can be used when referencing the leftmost table of the join operation. |
TableName2 | Identifies the name assigned to the second table from which data is to be retrieved. |
CorrelationName2 | Identifies a shorthand name that can be used when referencing the rightmost table of the join operation. |
JoinCondition | Identifies the condition to be used to join the two tables specified. |
Consequently, the same inner join operation we looked at earlier could be conducted by executing a SELECT statement that looks something like this:
SELECT lastname, deptname FROM employee e INNER JOIN department d ON e.workdept = d.deptno
Figure 5-1 illustrates how such an inner join operation would work.
Figure 5-1: A simple inner join operation.
Outer join operations are used when a join operation is required and any rows that would normally be eliminated by an inner join operation need to be preserved. With DB2, three types of outer joins are available:
Left outer join. When a left outer join operation is performed, rows that would have been returned by an inner join operation, together with all rows stored in the leftmost table of the join operation (i.e., the table listed first in the OUTER JOIN clause) that would have been eliminated by the inner join operation, are returned in the result data set produced.
Right outer join. When a right outer join operation is performed, rows that would have been returned by an inner join operation, together with all rows stored in the rightmost table of the join operation (i.e., the table listed last in the OUTER JOIN clause) that would have been eliminated by the inner join operation, are returned in the result data set produced.
Full outer join. When a full outer join operation is performed, rows that would have been returned by an inner join operation, together with all rows stored in both tables of the join operation that would have been eliminated by the inner join operation, are returned in the result data set produced.
To understand the basic principles behind an outer join operation, it helps to look at an example. Suppose Table A and Table B are joined by an ordinary inner join operation. Any row in either Table A or Table B that does not have a matching row in the other table (according to the rules of the join condition) is eliminated from the final result data set produced. By contrast, if Table A and Table B are joined by an outer join, any row in either Table A or Table B that does not contain a matching row in the other table is included in the result data set (exactly once), and columns in that row that would have contained matching values from the other table are empty. Thus, an outer join operation adds nonmatching rows to the final result data set produced whereas an inner join operation excludes them. A left outer join of Table A with Table B preserves all nonmatching rows found in Table A, a right outer join of Table A with Table B preserves all nonmatching rows found in Table B, and a full outer join preserves nonmatching rows found in both Table A and Table B.
Figure 5-2 illustrates how a left outer join operation works; Figure 5-3 illustrates how a right outer join operation works; and Figure 5-4 illustrates how a full join operation works.
Figure 5-2: A simple left outer join operation.
Figure 5-3: A simple right outer join operation.
Figure 5-4: A simple full outer join operation.
The basic syntax used to create a SELECT statement that performs an outer join operation is:
SELECT [*|[Expression] <<AS> [NewColumnName]>, ...] FROM [[TableName1] <<AS> [CorrelationName1]>] [LEFT | RIGHT | FULL] OUTER JOIN [[TableName2] <<AS> [CorrelationName2]>] ON [JoinCondition]
where:
Expression | Identifies one or more columns whose values are to be returned when the SELECT statement is executed. The value specified for this option can be any valid SQL language element; however, corresponding table or view column names are commonly used. |
NewColumnName | Identifies a new column name that is to be used in place of the corresponding table or view column name specified in the result data set returned by the SELECT statement. |
TableName1 | Identifies the name assigned to the first table from which data is to be retrieved. This table is considered the "left" table in an outer join. |
CorrelationName1 | Identifies a shorthand name that can be used when referencing the leftmost table of the join operation. |
TableName2 | Identifies the name assigned to the second table from which data is to be retrieved. This table is considered the "right" table in an outer join. |
CorrelationName2 | Identifies a shorthand name that can be used when referencing the rightmost table of the join operation. |
JoinCondition | Identifies the condition to be used to join the two tables specified. |
Thus, a simple left outer join operation could be conducted by executing a SELECT statement that looks something like this:
SELECT lastname, deptname FROM employee e LEFT OUTER JOIN department d ON e.workdept = d.deptno
A simple right outer join operation could be conducted by executing a SELECT statement that looks something like this:
SELECT lastname, deptname FROM employee e RIGHT OUTER JOIN department d ON e.workdept = d.deptno
And finally, a simple full outer join operation could be conducted by executing a SELECT statement that looks something like this:
SELECT lastname, deptname FROM employee e FULL OUTER JOIN department d ON e.workdept = d.deptno
If you executed this query against the SAMPLE database provided with DB2, you would get a result data set that looks something like this:
LASTNAME DEPTNAME -------- --------------------------- HAAS SPIFFY COMPUTER SERVICE DIV. THOMPSON PLANNING KWAN INFORMATION CENTER GEYER SUPPORT SERVICES STERN MANUFACTURING SYSTEMS PULASKI ADMINISTRATION SYSTEMS HENDERSON OPERATIONS SPENSER SOFTWARE SUPPORT LUCCHESSI SPIFFY COMPUTER SERVICE DIV. O'CONNELL SPIFFY COMPUTER SERVICE DIV. QUINTANA INFORMATION CENTER NICHOLLS INFORMATION CENTER ADAMSON MANUFACTURING SYSTEMS PIANKA MANUFACTURING SYSTEMS YOSHIMURA MANUFACTURING SYSTEMS SCOUTTEN MANUFACTURING SYSTEMS WALKER MANUFACTURING SYSTEMS BROWN MANUFACTURING SYSTEMS JONES MANUFACTURING SYSTEMS LUTZ MANUFACTURING SYSTEMS JEFFERSON ADMINISTRATION SYSTEMS LASTNAME DEPTNAME -------- ------------------------- MARINO ADMINISTRATION SYSTEMS SMITH ADMINISTRATION SYSTEMS JOHNSON ADMINISTRATION SYSTEMS PEREZ ADMINISTRATION SYSTEMS SCHNEIDER OPERATIONS PARKER OPERATIONS SMITH OPERATIONS SETRIGHT OPERATIONS MEHTA SOFTWARE SUPPORT LEE SOFTWARE SUPPORT GOUNOT SOFTWARE SUPPORT HEMMINGER SPIFFY COMPUTER SERVICE DIV. ORLANDO SPIFFY COMPUTER SERVICE DIV. NATZ INFORMATION CENTER YAMAMOTO MANUFACTURING SYSTEMS JOHN MANUFACTURING SYSTEMS MONTEVERDE ADMINISTRATION SYSTEMS SCHWARTZ OPERATIONS SPRINGER OPERATIONS WONG SOFTWARE SUPPORT ALONZO SOFTWARE SUPPORT - BRANCH OFFICE H2 - RANCH OFFICE I2 - BRANCH OFFICE G2 - DEVELOPMENT CENTER - BRANCH OFFICE F2 - BRANCH OFFICE J2 48 record(s) selected.
Records that have a null value assigned to one of the columns are records that did not have a matching value in the two tables.
With DB2, it is possible to combine two or more queries into a single query by using a special operator known as a set operator. When a set operator is used, the results of each query executed are combined in a specific manner to produce a single result data set. The following set operators are available:
UNION. When the UNION set operator is used, the result data sets produced by each individual query are combined and all duplicate rows are eliminated. Figure 5-5 illustrates how a simple UNION operation works.
UNION ALL. When the UNION ALL set operator is used, the result data sets produced by each individual query are combined; all duplicate rows found are retained. Figure 5-6 illustrates how a simple UNION ALL operation would work.
INTERSECT. When the INTERSECT set operator is used, the result data sets produced by each individual query are combined, all duplicate rows found are eliminated, and all records found in the first result data set that do not have a corresponding record in the second result data set are eliminated, leaving just the records that are found in both result data sets. Figure 5-7 illustrates how a simple INTERSECT operation would work.
INTERSECT ALL: When the INTERSECT ALL set operator is used, the result data sets produced by each individual query are combined and all records found in the first result data set that do not have a corresponding record in the second result data set are eliminated, leaving just the records that are found in both result data sets; all duplicate rows found are retained. Figure 5-8 illustrates how a simple INTERSECT ALL operation would work.
EXCEPT: When the EXCEPT set operator is used, the result data sets produced by each individual query are combined, all duplicate rows found are eliminated, and all records found in the first result data set that have a corresponding record in the second result data set are eliminated, leaving just the records that are not found in both result data sets. Figure 5-9 illustrates how a simple EXCEPT operation would work.
It is important to note that, in contrast to the UNION, UNION ALL, INTERSECT, and INTERSECT ALL set operators, if the queries used with the EXCEPT and EXCEPT ALL set operators are reversed, a different result data set will be produced. Figure 5-10 illustrates how the same EXCEPT operation would work if the order of the queries specified were reversed.
EXCEPT ALL: When the EXCEPT ALL set operator is used, the result data sets produced by each individual query are combined and all records found in the first result data set that have a corresponding record in the second result data set are eliminated, leaving just the records that are not found in both result data sets; all duplicate rows found are retained. Figure 5-11 illustrates how a simple EXCEPT ALL operation would work.
Figure 5-5: A simple UNION set operation.
Figure 5-6: A simple UNION ALL set operation.
Figure 5-7: A simple INTERSECT set operation.
Figure 5-8: A simple INTERSECT ALL set operation.
Figure 5-9: A simple EXCEPT set operation.
Figure 5-10: A simple EXCEPT set operation with the queries reversed.
Figure 5-11: A simple EXCEPT ALL set operation.
In order for two result data sets to be combined with a set operator, both must have the same number of columns, and each of those columns must have the same data types assigned to it. So when would you want to combine the results of two queries using a set operator? Suppose your company keeps individual employee expense account information in a table whose contents are archived at the end of each fiscal year. When a new fiscal year begins, expenditures for that year are essentially recorded in a new table. Now suppose, for tax purposes, you need a record of all employees' expenses for the last two years. To obtain this information, each archived table must be queried, and the results must then be combined. Rather than do this by running individual queries against the archived tables and storing the results in some kind of temporary table, this operation could be performed simply by using the UNION set operator, along with two SELECT SQL statements. Such a set of SELECT statements might look something like this:
SELECT * FROM emp_exp_02 UNION SELECT * FROM emp_exp_01