GROUP BY and HAVING Clauses: Summarizing Results | Microsoft SQL Server 2008 Bible

Grouping records in the resultset based on some criteria could provide a valuable insight into data that has accumulated in the table. For example, you would like to see the final resultset of your orders (where there could be one or more order items per order) not in the random order they were entered in, but rather in groups of items that belong to the same order:

SELECT   ordline_ordhdrid_fn,           ordline_ordqty_n AS QTY_PER_ITEM FROM     order_line GROUP BY ordline_ordhdrid_fn,          ordline_ordqty_n;     ordline_ordhdrid_fn QTY_PER_ITEM  ------------------- ------------  30607               90 30607               500 30608               275 30608               340 30609               560 . . .               . . . 30666               560 30667               560 30668               72 30668               90 . . .               . . . 30669               120 30669               480 . . .               . . . 30670               126 30670               450     97 records selected.

Note the repeating values (groups) in the ORDLINE_ORDHDRID_FN field, representing the order header ID for which there could be one or more order items and for which there are different quantities. This information might become more concise with the use of aggregate functions that could sum the quantity for the order or calculate the average, and so on (see the example in this paragraph using SUM function).

Note

All columns specified in a SELECT clause list, with the exception of aggregate columns (these used as an argument for an aggregate function), must be listed in the GROUP BY clause as well.

The GROUP BY clause is mostly (but not always) used in conjunction with aggregate functions, which are introduced in Chapter 10. The aggregate functions return a single value as a result of an operation conducted on a set of values. The set is grouped to provide a series of sets for use with the aggregate functions.

Table 10-7 in Chapter 10 lists five aggregate functions alongside their descriptions and particulars of implementation across all three RDBMS. For examples of the specific function usage, please refer to Chapter 10; here we're going to discuss the concept of aggregation and how it refers to the GROUP BY clause within a SELECT query.

To calculate the total sum of an ordered quantity for all orders (and one order could have one or more items) in the table ORDER_LINE you could use the SUM aggregate function:

SELECT          SUM(ordline_ordqty_n) AS TOT_QTY_PER_ORDER FROM     order_line;     TOT_QTY_PER_ORDER  -----------------  31847     1 record selected.

The single value that summed up all ordered quantities for all the records in the table was returned. While useful, this information could be more valuable if the ordered quantity is summed up per order — you would know how many items were ordered in each and every order. Here is the query that accomplishes this task:

SELECT   ordline_ordhdrid_fn,           SUM(ordline_ordqty_n) AS TOT_QTY_PER_ORDER FROM     order_line GROUP BY ordline_ordhdrid_fn     ORDLINE_ORDHDRID_FN   TOT_QTY_PER_ORDER -------------------   -----------------               30607                 590               30608                 615               30609                 560               ...                   ...               30669                 600               30670                 720     51 rows selected.

Here we have much more detailed information, as the quantities were grouped by order and then summed up for each order, producing a single value for each order (as opposed to producing it for a whole resultset).

Here is an example of another aggregate function AVG,  which calculates the average of the values. In this case, you are  going to calculate the average quantity per order. SELECT   ordline_ordhdrid_fn,           AVG(ordline_ordqty_n) AS AVG_QTY_PER_ORDER FROM     order_line GROUP BY ordline_ordhdrid_fn ORDLINE_ORDHDRID_FN AVG_QTY_PER_ORDER ------------------- -----------------               30607               295               30608             307.5               30609               560               ...                 ...               30669               300               30670               180     51 rows selected.

Note

For order #30608, the average quantity is 307.5 in Oracle, but MS SQL Server and DB2 would round the value to 307 because the field is of INTEGER data type, whereas Oracle uses the NUMBER data type able to accommodate decimals. Refer to Chapter 3 for more information on data types.

The HAVING clause used exclusively with the GROUP BY clause provides a means of additional selectivity. Imagine that you need to select not all records in your GROUP BY query but only those that would have their grouped value greater than 750. Adding additional criterion to the WHERE clause would not help, as the value by which we could limit the records is calculated using GROUP BY and is unavailable outside it before the query has completed execution. The HAVING clause used within the GROUP BY clause allows us to add this additional criterion to the results of the GROUP BY operation. For example, to display orders with a total quantity greater than 750, the following query could be used:

SELECT    ordline_ordhdrid_fn,            SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER FROM      order_line GROUP BY  ordline_ordhdrid_fn HAVING   SUM(ordline_ordqty_n) > 750     ORDLINE_ORDHDRID_FN TOT_QTY_PER_ORDER  ------------------- -----------------                30628               789                30668               789     2 records selected.

As you can see, only 2 records among 51 from the previous query had satisfied this additional restriction.

We could have used a column ORDLINE_ORDHDRID_FN, without the SUM aggregate function in the HAVING clause to restrict the returned records by some other criteria, but we cannot use just any column from the SELECT clause: It also has to be listed in the GROUP BY clause to be used with HAVING. Here is a query example that sums up order quantities grouped by order header ID only if they fall into a specified list of orders:

SELECT   ordline_ordhdrid_fn,           SUM(ordline_ordqty_n) TOT_QTY_PER_ORDER FROM     order_line GROUP BY ordline_ordhdrid_fn HAVING   ordline_ordhdrid_fn IN (30607,30608,30611,30622)     ordline_ordhdrid_fn TOT_QTY_PER_ORDER  ------------------- -----------------  30607               590 30608               615 30611               625 30622               726     4 records selected.

Note

While GROUP BY would consider the null values in the columns by which the grouping is performed a valid group, this is not the way the NULLs are treated by the aggregate functions. Aggregate functions simply exclude the NULL records — they will not make it to the final result. See Chapter 10 for more information.