Lesson 2: GROUP BY Fundamentals

An aggregate function used in a simple SELECT statement that has only a FROM clause and a WHERE clause produces a single summary value for all rows in a column.

If you want to generate summary values for a column based on groups of rows, use aggregate functions in SELECT statements with the GROUP BY clause. Use the HAVING clause with the GROUP BY clause to restrict the groups of rows that are returned in the result set.

After this lesson, you will be able to:

  • Organize summary data for a column by using aggregate functions with the GROUP BY and HAVING clauses

Estimated lesson time: 50 minutes

Using the GROUP BY Clause

Use the GROUP BY clause on columns or expressions to organize rows into groups and to summarize those groups. For example, Figure F.1 shows a SELECT statement that uses the SUM aggregate function and the GROUP BY clause to determine the quantity of each product that was ordered.

Compare the result sets in Figure F.1. The table on the left lists all the rows in the OrderHist table. The table on the top right uses the GROUP BY clause to group all orders by ProductID and present the total quantity that is ordered for each product. The table on the bottom right uses the GROUP BY clause and the WHERE clause to further restrict the number of rows returned.

TIP
The OrderHist table is created in the Northwind database specifically for the examples in this lesson and Lesson 3. If you want to execute the examples in these lessons, you can create the OrderHist table by executing the script in the C:\SQLDW\Exercise\AppF\ordhist.SQL script file.

click to view at full size

Figure F.1 Using the GROUP BY clause

Partial Syntax

SELECT select_list
FROM table_ source[ WHERE search_condition ]
[ GROUP BY [ALL] group_by_expression [, n]]
[HAVING search_condition ]

When you use the GROUP BY clause, consider the following facts and guidelines:

  • SQL Server produces a single row of values for each defined group.
  • All columns that are specified in the select list must be included in the GROUP BY clause. Columns included in the GROUP BY clause do not need to be specified in the select list.
  • If you include a WHERE clause, SQL Server groups only the rows that satisfy the WHERE clause conditions.
  • The total size of the columns listed in the GROUP BY clause can be up to 8,060 bytes, which is the maximum row length supported by SQL Server 7.
  • If you use the GROUP BY clause on columns that contain null values, the null values are processed as a group. To exclude the rows with null values, use the IS NOT NULL comparison operator in the WHERE clause of the SELECT statement.
  • Use the ALL keyword with the GROUP BY clause to display summary rows for every group, regardless of whether the rows satisfy the WHERE clause. For groups for which no rows satisfy the WHERE clause, NULL is returned in the summary columns.

Example 1

This example returns information about orders from the OrderHist table. The query groups the orders for each product ID and calculates the total quantity ordered for that product. The total quantity is calculated with the SUM aggregate function, which displays one value for each product in the result set.

 USE Northwind  SELECT ProductID, SUM(Quantity) AS Total_Quantity  FROM OrderHist  GROUP BY ProductID 

Result

 ProductID   Total_Quantity   ----------- --------------   1           15  2           35  3           45 (3 row(s) affected) 

Example 2

This example adds a WHERE clause to the query in Example 1. The query restricts the rows to product ID 2 and then groups these rows and calculates the total quantity ordered. Compare this result set to that in Example 1.

 USE Northwind  SELECT ProductID, SUM(Quantity) AS Total_Quantity  FROM OrderHist  WHERE ProductID = 2  GROUP BY ProductID 

Result

 ProductID   Total_Quantity   ----------- --------------   2           35 (1 row(s) affected) 

Example 3

This example summarizes information from the [Order Details] table. The query groups the orders by product ID and calculates the total quantity ordered for each product. This example does not include a WHERE clause and, therefore, returns a total for each product ID.

 USE Northwind  SELECT ProductID, SUM(Quantity) AS Total_Quantity  FROM [Order Details]  GROUP BY ProductID 

Result

 ProductID   Total_Quantity   ----------- --------------   61          603  3           328  32          297    47          485  38          623 (77 row(s) affected) 

Example 4

This example adds a WHERE clause to the query in Example 3. The query returns only rows that have a product ID of less than 5. Compare this result set with the result set in Example 3.

CAUTION
The WHERE clause must be listed before the GROUP BY clause in the SELECT statement or SQL Server will return an error.

 USE Northwind  SELECT ProductID, SUM(Quantity) AS Total_Quantity  FROM [Order Details]  WHERE ProductID < 5  GROUP BY ProductID 

Result

 ProductID   Total_Quantity   ----------- --------------   1           828  2           1057  3           328  4           453 (4 row(s) affected) 

Exercise 1: Using the GROUP BY Clause to Summarize Data

In this exercise, you will open a script that contains a query that includes the GROUP BY clause.

  • To write a SELECT statement that uses the GROUP BY clause
    1. Open SQL Server Query Analyzer.
    2. In the DB list, click Northwind.
    3. Open C:\SQLDW\Exercise\AppF\groupby.SQL, review the contents of the script file, and then execute it.

    Result

    Your result will look similar to the following result set:

     CategoryID  Total_Quantity   ----------- --------------   1           9532  2           5298 (2 row(s) affected) 

    Using the GROUP BY Clause with the HAVING Clause

    Use the HAVING clause on columns or expressions to set conditions on the groups included in a result set. The HAVING clause sets conditions on the GROUP BY clause in much the same way that the WHERE clause sets conditions for the rows returned by the SELECT statement.

    The WHERE clause determines which rows are grouped. The HAVING clause determines which groups are returned.

    Figure F.2 shows how the HAVING clause limits the result set to products for which the total quantity ordered is at least 30.

    click to view at full size

    Figure F.2 Using the GROUP BY clause with the HAVING clause

    When you use the HAVING clause, consider the following facts and guidelines:

    • Use the HAVING clause only with the GROUP BY clause to restrict the grouping. Using the HAVING clause without the GROUP BY clause is not meaningful.
    • You can have up to 128 conditions in a HAVING clause. When you have multiple conditions, combine them with logical operators (AND, OR, or NOT).
    • In the HAVING clause, you can reference any of the columns that appear in the select list, including the aggregate functions. You cannot reference aggregate functions in the WHERE clause.
    • The HAVING clause is applied after the ALL keyword so the HAVING clause overrides the ALL keyword and returns only groups that satisfy the HAVING clause.

    Example 1

    This example lists each product from the OrderHist table that has orders of 30 or more units.

     USE Northwind  SELECT ProductID, SUM(Quantity) AS Total_Quantity  FROM OrderHist  GROUP BY ProductID  HAVING SUM(Quantity) >= 30 

    Result

     ProductID   Total_Quantity   ----------- --------------   2           35  3           45 (2 row(s) affected) 

    Example 2

    This example lists the product ID and quantity for products that have orders for more than 1,200 units in the [Order Details] table.

     USE Northwind  SELECT ProductID, SUM(Quantity) AS Total_Quantity  FROM [Order Details]  GROUP BY ProductID  HAVING SUM(Quantity) > 1200 

    Result

     ProductID   Total_Quantity   ----------- --------------   59          1496  56          1263  60          1577  31          1397 (4 row(s) affected) 

    Exercise 2: Using the GROUP BY and HAVING Clauses to Summarize Data

    In this exercise, you will write queries that include the GROUP BY and the HAVING clauses. The C:\SQLDW\Exercise\AppF\groupbya.SQL script file contains a completed script for this exercise.

  • To write a SELECT statement that uses the GROUP BY and HAVING clauses
    1. Open SQL Server Query Analyzer.
    2. Write a query that summarizes the quantity sold by category for all products, regardless of category in the [Order Details] table in the Northwind database. Include CategoryID and Total_Quantity in the select list of the query. Execute the query.
    3.  USE Northwind  SELECT CategoryID, SUM(Quantity) AS Total_Quantity  FROM [Order Details]  JOIN Products ON [Order Details].ProductID = Products.ProductID  GROUP BY CategoryID  ORDER BY CategoryID 

      Result

      Your result will look similar to the following result set:

       CategoryID  Total_Quantity   ----------- --------------  1           9532  2           5298  3           7906  4           9149  5           4562  6           4199  7           2990  8           7681 (8 row(s) affected) 

    4. Modify the query to summarize the quantity by the OrderID column rather than the CategoryID column, and then execute the query.
    5.  USE Northwind  SELECT OrderID, SUM(Quantity) AS Total_Quantity  FROM [Order Details]  GROUP BY OrderID 

      Result

      Your result will look similar to the following partial result set:

       OrderID     Total_Quantity   ----------- --------------   10248       27  10249       49  10250       60   11076       50  11077       72 (830 row(s) affected) 

    6. Modify the query to limit the group summaries to orders with a total quantity of 250 and higher, and then execute the query.
    7.  USE Northwind  SELECT OrderID, SUM(Quantity) AS Total_Quantity  FROM [Order Details]  GROUP BY OrderID  HAVING SUM(Quantity) >= 250 

      Result

      Your result will look similar to the following result set:

       OrderID     Total_Quantity   ----------- --------------   10515       286  10612       263  10658       255  10678       280  10847       288  10895       346  10990       256  11030       330 (8 row(s) affected) 

    Lesson Summary

    The GROUP BY clause allows you to specify columns and expressions by which the rows of a result set of a SELECT query are grouped into summary rows. The expressions in the select list of a query that has a GROUP BY clause must be aggregate functions or appear in the GROUP BY list in order for summary rows to be generated. The HAVING clause restricts the summary rows that are included in the result set.



    Microsoft Corporation - Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
    Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
    ISBN: 0735606706
    EAN: 2147483647
    Year: 1999
    Pages: 114

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net