GROUP BY | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

The GROUP BY takes the result set and chops it up into groups. Each group can thus have its own aggregate values, which is useful for ad hoc reporting, as you will soon see. The syntax of this clause is as follows:

 [ GROUP BY [ ALL ] group_by_expression [ ,...n ] [ WITH { CUBE | ROLLUP } ] ]

Let’s go over the arguments and then look at some examples.

GROUP BY Arguments

The GROUP BY ALL directive tells SQL Server to generate all possible groups and result sets. It even returns groups that do not have any rows meeting the search condition that you specified in the WHERE clause. When ALL is specified, null values are returned for the summary columns of groups that do not meet the search condition.

The group_by_expression placeholder offers an expression that you can use to obtain a particular grouping. It can also be used as a grouping column in which the group_by_ expression can be a column or a nonaggregate expression that references a column. However, T-SQL does not allow you to use a column alias defined in the select list to specify a grouping column.

Note

Columns of type varchar(max), nvarchar(max), or varbinary(max) cannot be used in group_ by_expression.

When you have GROUP BY clauses that do not contain a CUBE or ROLLUP, the number of group_by_expression items is restricted to the GROUP BY column sizes, the aggregated columns, and the aggregate values involved in the query. This is an architecture limitation that originates from the storage limits on the intermediate worktables that hold the intermediate query results. In addition, you cannot use more than ten grouping expressions when specifying CUBE or ROLLUP, which now follow.

WITH CUBE

The WITH CUBE argument lets you direct SQL Server to introduce summary rows to the result set, in addition to rows provided by the GROUP BY clause. This clause essentially results in a data explosion that returns a multidimensional cube.

ROLLUP

The ROLLUP argument also specifies that in addition to rows provided by the GROUP BY clause, summary rows must be introduced into the result set. The groups, however, are summarized in a hierarchical order, from the lowest level in the group to the highest. The group hierarchy is determined by the order in which the grouping columns are specified. Changing the order of the grouping columns can affect the number of rows produced in the result set.

Distinct aggregates such as AVG(DISTINCT column_name), COUNT(DISTINCT column_ name), and SUM(DISTINCT column_name) are not supported as part of CUBE or ROLLUP queries.

When GROUP BY is specified in a query, either each column in any nonaggregate expression in the select list should be included in the GROUP BY list, or the GROUP BY expression must match exactly the select list expression.

If aggregate functions are included in the select list, the GROUP BY calculates a summary value for each group. However, if the ORDER BY clause is not specified, groups returned using the GROUP BY clause are not in any particular order. You should thus make it a habit to always use the ORDER BY clause to specify a particular ordering of returned data.

Note

GROUP BY ALL is not supported in queries that access remote tables if there is also a WHERE clause in the query.

The following is a simple GROUP BY query, which you can later expand with more advanced capabilities such as joins:

 SELECT Item, Quantity, Category FROM Items

returns the following result set:

Item	Quantity	Category
Cabbage	3199	Veggie
Plums	15	Fruit
Squash	98	Veggie
Apes	3434	Animals
Peaches	23423	Fruit
Monkeys	342	Animals

But if you add the GROUP BY

 GROUP BY Category, Item, Quantity

the result set is grouped as follows on Category:

Item	Quantity	Category
Apes	3434	Animals
Monkeys	342	Animals
Peaches	23423	Fruit
Plums	15	Fruit
Cabbage	3199	Veggie
Squash	98	Veggie