Lesson 2:Using Advanced Query Techniques to Access Data | MCSE Training Kit (Exam 70-229): Microsoft SQL Server(TM) 2000 Database Design and Implementation (Pro Certification)

3 4

Once you have grown comfortable with the fundamentals of a SELECT statement and are familiar with the various clauses, you are ready to learn more advanced querying techniques. One of these techniques is to combine the contents of two or more tables to produce a result set that incorporates rows and columns from each table. Another technique is to use subqueries, which are SELECT statements nested inside other SELECT, INSERT, UPDATE, or DELETE statements. Subqueries can also be nested inside other subqueries. You can also use Transact-SQL elements such as CUBE and ROLLUP to summarize data. This lesson reviews each of these advanced techniques and provides examples of how they are used to retrieve specific data from a SQL Server database.

After this lesson, you will be able to:

Define joins that enable you to retrieve data from two or more tables.
Define subqueries inside SELECT statements.
Use the GROUP BY clause in a SELECT statement to summarize data.

Estimated lesson time: 35 minutes

Using Joins to Retrieve Data

By using joins, you can retrieve data from two or more tables based on logical relationships between the tables. Joins indicate how SQL Server should use data from one table to select the rows in another table.

Joins can be specified in either the FROM or WHERE clauses. The join conditions combine with the WHERE and HAVING search conditions to control the rows that are selected from the base tables referenced in the FROM clause. Specifying the join conditions in the FROM clause, however, helps separate them from any other search conditions that might be specified in a WHERE clause and is the recommended method for specifying joins.

When multiple tables are referenced in a single query, all column references must be unambiguous. The table name must be used to qualify any column name that is duplicated in two or more tables referenced in a single query.

The select list for a join can reference all of the columns in the joined tables or any subset of the columns. The select list is not required to contain columns from every table in the join. For example, in a three-table join, only one table can be used as a bridge from one of the other tables to the third table, and none of the columns from the middle table have to be referenced in the select list.

Although join conditions usually use the equals sign (=) comparison operator, other comparison or relational operators can be specified (as can other predicates).

When SQL Server processes joins, the query engine chooses the most efficient method (from several possibilities) of processing the join. Although the physical execution of various joins uses many different optimizations, the logical sequence is as follows:

The join conditions in the FROM clause are applied.
The join conditions and search conditions from the WHERE clause are applied.
The search conditions from the HAVING clause are applied.

This sequence can sometimes influence the result of the query if conditions are moved between the FROM and WHERE clauses.

Columns used in a join condition are not required to have the same name or to be the same data type. If the data types are not identical, however, they must be compatible or be types that SQL Server can implicitly convert. If the data types cannot be implicitly converted, the join condition must explicitly convert the data type by using the CAST function.

NOTE

Most joins can be rewritten as subqueries (a query nested within another query), and most subqueries can be rewritten as joins. Subqueries are covered in detail later in this lesson.

Most joins can be categorized as inner joins or outer joins. Inner joins return rows only when there is at least one row from both tables that matches the join condition, eliminating the rows that do not match with a row from the other table. Outer joins, however, return all rows from at least one of the tables or views mentioned in the FROM clause as long as these rows meet any WHERE or HAVING search conditions. You can also create cross-joins and self-joins. For more infor-mation about cross-joins and self-joins, refer to SQL Server Books Online.

Inner Joins

An inner join is a join in which the values in the columns being joined are compared through the use a comparison operator. In the SQL-92 standard, inner joins can be specified in either the FROM or WHERE clause. Inner joins are the only type of join that SQL-92 supports in the WHERE clause. Inner joins specified in the WHERE clause are known as old-style inner joins.

The following SELECT statement uses an inner join to retrieve data from the Publishers table and the Titles table in the Pubs database:

 SELECT t.Title, p.Pub_name FROM Publishers AS p INNER JOIN Titles AS t ON p.Pub_id = t.Pub_id ORDER BY Title ASC

This SELECT statement retrieves data from the Title column in the Titles (t) table and from the Pub_name column in the Publishers (p) table. Because the statement uses an inner join, it returns only those rows for which there is an equal value in the join columns (the p.Pub_id column and the t.Pub_id column).

Outer Joins

SQL Server supports three types of outer joins: left, right, and full. All rows retrieved from the left table are referenced with a left outer join, and all rows from the right table are referenced in a right outer join. All rows from both tables are returned in a full outer join.

Using Left Outer Joins

A result set generated by a SELECT statement that includes a left outer join includes all rows from the table referenced to the left of LEFT OUTER JOIN. The only rows that are retrieved from the table to the right are those that meet the join condition.

In the following SELECT statement, a left outer join is used to retrieve the authors' first names, last names, and (when applicable) the names of any publishers that are located in the same cities as the authors:

 USE Pubs SELECT a.Au_fname, a.Au_lname, p.Pub_name FROM Authors a LEFT OUTER JOIN Publishers p        ON a.City = p.City ORDER BY p.Pub_name ASC, a.Au_lname ASC, a.Au_fname ASC

The result set from this query will list the name of every author in the Authors table. The result set will include only those publishers that are located in the same cities as the authors, however. If a publisher is not located in the author's city, a null value is returned for the Pub_name column of the result set.

Using Right Outer Joins

A result set generated by a SELECT statement that includes a right outer join includes all rows from the table referenced to the right of RIGHT OUTER JOIN. The only rows that are retrieved from the table to the left are those that meet the join condition.

In the following SELECT statement, a right outer join is used to retrieve the list of publishers and the authors' first names and last names, if those authors are located in the same cities as the publishers:

 USE Pubs SELECT a.Au_fname, a.Au_lname, p.Pub_name FROM Authors a RIGHT OUTER JOIN Publishers p         ON a.City = p.City ORDER BY p.Pub_name ASC, a.Au_lname ASC, a.Au_fname ASC

The result set from this query will list the name of every publisher in the Publishers table. The result set will include only those authors that are located in the same cities as the publishers, however. If an author is not located in the publisher's city, a null value is returned for the Au_fname and Au_lname columns of the result set.

Using Full Outer Joins

A result set generated by a SELECT statement that includes a full outer join includes all rows from both tables, regardless of whether the tables have a matching value (as defined in the join condition).

In the following SELECT statement, a full outer join is used to retrieve the list of publishers and authors' first and last names:

 USE Pubs SELECT a.Au_fname, a.Au_lname, p.Pub_name FROM Authors a FULL OUTER JOIN Publishers p          ON a.City = p.City ORDER BY p.Pub_name ASC, a.Au_lname ASC, a.Au_fname ASC

The result set from this query lists the name of every publisher in the Publishers table and every author in the Authors table. If an author is not located in the publisher's city, a null value is returned for the Au_fname and Au_lname columns of the result set. If a publisher is not located in the author's city, a null value is returned in the Pub_name column of the result set. When the join condition is met, all columns in the result set will contain a value.

Defining Subqueries inside SELECT Statements

A subquery is a SELECT statement that returns a single value and is nested inside a SELECT, INSERT, UPDATE, or DELETE statement or inside another subquery. A subquery can be used anywhere an expression is allowed. A subquery is also called an inner query or inner select, while the statement containing a subquery is called an outer query or an outer select.

In the following example, a subquery is nested in the WHERE clause of the outer SELECT statement:

 USE Northwind SELECT ProductName FROM Products WHERE UnitPrice =       (      SELECT UnitPrice      FROM Products      WHERE ProductName = 'Sir Rodney''s Scones'      )

The embedded SELECT statement first identifies the UnitPrice value for Sir Rodney's Scones, which is $10. The $10 value is then used in the outer SELECT statement to return the product name of all products whose unit price equals $10.

If a table appears only in a subquery and not in the outer query, then columns from that table cannot be included in the output (the select list of the outer query).

In some Transact-SQL statements, the subquery can be evaluated as if it were an independent query. Conceptually, the subquery result is substituted into the outer query (although this is not necessarily how SQL Server actually processes Transact-SQL statements that have subqueries).

Types of Subqueries

Subqueries can be specified in many places within a SELECT statement. Statements that include a subquery usually take one of the following formats, however:

WHERE <expression> [NOT] IN (<subquery>)
WHERE <expression> <comparison_operator> [ANY | ALL] (<subquery>)
WHERE [NOT] EXISTS (<subquery>)

Subqueries that Are Used with IN or NOT IN

The result of a subquery introduced with IN (or with NOT IN) is a list of zero or more values. After the subquery returns the result, the outer query makes use of it.

In the following example, a subquery is nested inside the WHERE clause, and the IN keyword is used:

 USE Pubs SELECT Pub_name FROM Publishers WHERE Pub_id IN      (      SELECT Pub_id      FROM Titles      WHERE Type = 'business'      )

You can evaluate this statement in two steps. First, the inner query returns the identification numbers of the publishers that have published business books (1389 and 0736). Second, these values are substituted into the outer query, which finds the names that match the identification numbers in the Publishers table.

Subqueries introduced with the NOT IN keywords also return a list of zero or more values. The query is exactly the same as the one in subqueries with IN, except that NOT IN is substituted for IN.

Subqueries that Are Used with Comparison Operators

Comparison operators that introduce a subquery can be modified with the keyword ALL or ANY. The SOME keyword is a SQL-92 standard equivalent for ANY. Subqueries introduced with a modified comparison operator return a list of zero or more values and can include a GROUP BY or HAVING clause. These subqueries can be restated with EXISTS.

The ALL and ANY keywords each compare a scalar value with a single-column set of values. The ALL keyword applies to every value, and the ANY keyword applies to at least one value.

In the following example, the greater than (>) comparison operator is used with the ANY keyword:

 USE Pubs SELECT Title FROM Titles WHERE Advance > ANY       (       SELECT Advance       FROM Publishers INNER JOIN Titles       ON Titles.Pub_id = Publishers.Pub_id            AND Pub_name = 'Algodata Infosystems'       )

This statement finds the titles that received an advance larger than the minimum advance amount paid by Algodata Infosystems (which, in this case, is $5,000). The WHERE clause in the outer SELECT statement contains a subquery that uses a join to retrieve advance amounts for Algodata Infosystems. The minimum advance amount is then used to determine which titles to retrieve from the Titles table.

Subqueries that Are Used with EXISTS and NOT EXISTS

When a subquery is introduced with the keyword EXISTS, it functions as an existence test. The WHERE clause of the outer query tests for the existence of rows returned by the subquery. The subquery does not actually produce any data; instead, it returns a value of TRUE or FALSE.

In the following example, the WHERE clause in the outer SELECT statement contains the subquery and uses the EXISTS keyword:

 USE Pubs SELECT Pub_name FROM Publishers WHERE EXISTS      (      SELECT * FROM Titles      WHERE Titles.Pub_id = Publishers.Pub_id      AND Type = 'business'      )

To determine the result of this query, consider each publisher's name in turn. In this case, the first publisher's name is Algodata Infosystems, which has identification number 1389. Are there any rows in the Titles table in which Pub_id is 1389 and the type is business? If so, Algodata Infosystems should be one of the values selected. The same process is repeated for each of the other publishers' names.

The NOT EXISTS keywords work like EXISTS, except the WHERE clause in which NOT EXISTS is used is satisfied if the subquery returns no rows.

Summarizing Data

Transact-SQL includes several elements that enable you to generate simple summary reports. You can use the CUBE or ROLLUP operators, which are both part of the GROUP BY clause of the SELECT statement. You can also use the COMPUTE or COMPUTE BY operators, which are also associated with the GROUP BY clause.

The COMPUTE and COMPUTE BY operators are supported for backward compatibility. For more details about these operators, refer to SQL Server Books Online.

Using the CUBE Operator to Summarize Data

The CUBE operator generates a result set that is a multi-dimensional cube. A multi-dimensional cube is an expansion of fact data, or data that records individual events. The expansion is based on columns that the user wants to analyze. These columns are called dimensions. The cube is a result set containing a cross-tabulation of all the possible combinations of the dimensions.

The following SELECT statement uses the CUBE operator in the GROUP BY clause:

 USE Pubs SELECT SUBSTRING(Title, 1, 65) AS Title,     SUM(qty) AS 'Quantity' FROM Sales INNER JOIN Titles     ON Sales.Title_id = Titles.Title_id GROUP BY Title WITH CUBE ORDER BY Title

This SELECT statement covers a one-to-many relationship between book titles and the quantity of each book sold. By using the CUBE operator, the statement returns an extra row. The extra row (which contains a null value in the Title column of the result set) represents all values in the Title column of the Titles table. The result set returns values for the quantity sold of each title and the total quantity sold of all titles. In this case, applying the CUBE operator or ROLLUP operator returns the same result.

Using the ROLLUP Operator to Summarize Data

The ROLLUP operator is useful for generating reports that contain subtotals and totals. The ROLLUP operator generates a result set that is similar to the result sets generated by the CUBE operator. The differences between CUBE and ROLLUP are as follows:

CUBE generates a result set showing aggregates for all combinations of values in the selected columns.
ROLLUP generates a result set showing aggregates for a hierarchy of values in the selected columns.

The following SELECT statement contains a ROLLUP operator in the GROUP BY clause:

 USE Pubs SELECT Pub_name, Au_lname, Title, SUM(qty) AS 'Sum' FROM Authors a INNER JOIN TitleAuthor ta    ON a.Au_id = ta.Au_id INNER JOIN Titles t    ON t.Title_id = ta.Title_id INNER JOIN Publishers p    ON p.Pub_id = t.Pub_id INNER JOIN Sales s    ON s.Title_id = t.Title_id GROUP BY Pub_name, Au_lname, Title WITH ROLLUP

By using the ROLLUP operator, you can create groupings in the result set. For the grouped rows, a null value is used to represent all values for a column (except the Sum column). If you use the SELECT statement without the ROLLUP operator, the statement generates only a single grouping. If you use the CUBE operator, many more groupings will be returned. The ROLLUP operator returns the following data when the columns Pub_name, Au_lname, and Title are listed (in that order) in the GROUP BY clause:

Quantity of each title that each publisher has sold for each author
Quantity of all titles each publisher has sold for each author
Quantity of all titles each publisher has sold
Total quantity of all titles sold by all publishers for all authors

Exercise 2: Using Advanced Query Techniques to Retrieve Data

In this exercise, you will use SELECT statements and advanced query techniques to retrieve data from the Northwind database. The SELECT statements will include joins or subqueries, and you will use them to summarize data. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator.

To use an inner join to retrieve data

Open SQL Query Analyzer and connect to your local server.
In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT o.CustomerID, o.OrderID, s.CompanyName FROM Orders o JOIN Shippers s       ON o.ShipVia = s.ShipperID WHERE ShipCountry = 'USA' ORDER BY o.CustomerID, s.CompanyName

In this statement, you are identifying the database containing the table that you want to view, and you are using a SELECT statement to retrieve the customer ID, the order ID, and the name of the shipping company associated with each order. Because the name of the shipping company is in a separate table (Shippers), you must join the Orders table and the Shippers table and use the shipper ID for the join condition. The ShipVia column is a foreign key to the ShipperID column, and both columns contain the shipper IDs. Note that the Orders table is given an alias of o, and the Shippers table is given an alias of s; also note that for an inner join, you can simply use the JOIN keyword (you do not need to specify INNER JOIN).

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

To use a left outer join to retrieve data

In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT o.OrderID, o.CustomerID, c.ContactName, c.City FROM Orders o LEFT JOIN Customers c       ON o.CustomerID = c.CustomerID       AND o.ShipCity = c.City ORDER BY o.OrderID

In this statement, you are using a SELECT statement to retrieve data from the OrderID and CustomerID columns in the Orders table, and the ContactName and City columns in the Customers table. The join condition uses the CustomerID columns to join the tables and further qualifies the join by specifying that values in the ShipCity column must equal the values in the City column.

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

Click the Messages tab of the Results pane.

Notice that 830 rows are returned. Because a left outer join is used, every row in the Orders table is returned, but the only rows returned from the Customers table are those that contain the same city as where the orders are shipped.

Click the Grids tab of the Results pane.
Scroll down to row 108.

Notice that the ContactName and City columns contain null values. If you look at the data in the table, you will see that a value exists for the ContactName and City columns. Because one of the join conditions (o.ShipCity = c.City) is not met, however, null values are inserted into the columns. If you use an inner join for this query, 817 rows are returned. Only those rows that meet the join condition in both tables are returned.

To use a right outer join to retrieve data

In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT o.OrderID, o.CustomerID, c.ContactName, c.City FROM Orders o RIGHT JOIN Customers c       ON o.CustomerID = c.CustomerID       AND o.ShipCity = c.City ORDER BY o.OrderID

In this statement, you are using the same SELECT statement as in the previous procedure, only this time you are specifying a right outer join.

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

Click the Messages tab of the Results pane.

Notice that 820 rows have been returned. Because a right outer join is used, every row in the Customers table is returned, but the only rows returned from the Orders table are those that contain the same city value as where the customers are located.

Click the Grids tab of the Results pane.

Notice that the first three rows of the result set contain null values for the OrderID and CustomerID columns. These rows don't meet the join condition for the Orders table.

To use a full outer join to retrieve data

Use the same SELECT statement that you used in the two previous examples, except specify a full outer join.
Execute the Transact-SQL statement and view the Messages tab of the Results pane.

Notice that 833 rows have been returned. Because a full outer join is used, every row from both tables is returned.

Click the Grids tab of the Results pane.

Notice that once again, the first three columns contain null values in the OrderID and CustomerID columns.

Scroll down to row 111.

Notice that once again, the ContactName column and the City column contain null values.

To use the IN keyword in a subquery

In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT OrderID, EmployeeID AS EmpID FROM Orders WHERE EmployeeID IN  (   SELECT EmployeeID   FROM Employees   WHERE City = 'Seattle'   ) ORDER BY OrderID

In this statement, you are using a subquery to identify the employee ID of those employees who live in Seattle. The values returned from this query are then used in the WHERE clause of the outer SELECT statement to limit the result set to those orders that were processed by employees in Seattle.

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

To use comparison operators and the ALL keyword in a subquery

In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT OrderID, UnitPrice FROM [Order Details] WHERE UnitPrice > ALL  (  SELECT UnitPrice  FROM [Order Details] JOIN Orders  ON [Order Details].OrderID = Orders.OrderID  AND Orders.EmployeeID = 5  ) ORDER BY UnitPrice, OrderID

In this statement, you are using a subquery to identify the maximum unit price of orders processed by employee 5. The WHERE clause in the outer SELECT statement then uses that price to limit the result set to those orders whose unit price exceeds the amount derived from the subquery.

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

To use the EXISTS keyword in a subquery

In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT OrderID, CustomerID FROM Orders WHERE EXISTS  (  SELECT * FROM Customers  WHERE Customers.CustomerID = Orders.CustomerID       AND City = 'London'  ) ORDER BY OrderID

In this statement, you are using a subquery to identify those customers who are located in London. Customers located in London evaluate to TRUE. The WHERE clause in the outer SELECT statement uses those rows that return TRUE to determine which orders should be returned in the result set. Any order whose customer is located in London is included.

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

To use the ROLLUP operator to summarize data

In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE Northwind SELECT ProductID, UnitPrice, SUM(Quantity) AS 'Sum' FROM [Order Details] GROUP BY ProductID, UnitPrice WITH ROLLUP ORDER BY ProductID

In this statement, you are using a SELECT statement to summarize the number of products sold, based on the product ID and the unit price.

Execute the Transact-SQL statement.

The result set appears in the Grids tab of the Results pane.

Review the result set in the Grids tab.

Notice that the first row of the result set contains null values in the ProductID column and in the UnitPrice column. The value in the Sum column represents the total number of all units in the result set. Also notice that the remaining rows are first grouped together by product and that each product contains subgroups that are based on the unit price. Each product then includes a row that contains a null value in the UnitPrice column. For this row, the value in the Sum column is the total number of that type of product. For example, product 1 sold 174 units at a price of $14.40 per unit and sold 654 units at a price of $18.80 per unit. The total number of units sold with a ProductID value of 1 is 828.

Scroll through the result set to view the summaries of the other products.
Close SQL Query Analyzer.

Lesson Summary

Transact-SQL provides several advanced query techniques that you can use to retrieve data from a SQL Server database, including joins, subqueries, and the capability to summarize data. By using joins, you can retrieve data from two or more tables based on logical relationships between the tables. Most joins can be categorized as inner joins or outer joins. Inner joins return rows only if there is at least one row from both tables that matches the join condition—eliminating the rows that do not match with a row from the other table. Outer joins, however, return all rows from at least one of the tables or views mentioned in the FROM clause as long as these rows meet any WHERE or HAVING search condition. SQL Server supports three types of outer joins: left outer, right outer, and full outer. All rows are retrieved from the left table referenced with a left outer join, and all rows from the right table are referenced in a right outer join. All rows from both tables are returned in a full outer join. A subquery is a SELECT query that returns a single value and is nested inside a SELECT, INSERT, UPDATE, or DELETE statement or inside another subquery. Subqueries can be specified in many places within a SELECT statement. Most subqueries are specified in the WHERE clause and use the IN and NOT IN keywords, use comparison operators with the ANY or ALL keywords, or use the EXISTS and NOT EXISTS keywords. Transact-SQL also includes several elements that enable you to generate simple summary reports, including the CUBE or ROLLUP operators. The CUBE and ROLLUP operators are specified in the GROUP BY clause of a SELECT statement. These operators generate result sets that contain both detail rows for each item in the result set and summary rows for each group showing the aggregate totals for that group.