The GROUP BY takes the result set and chops it up into groups. Each group can thus have its own aggregate values, which is useful for ad hoc reporting, as you will soon see. The syntax of this clause is as follows:
[ GROUP BY [ ALL ] group_by_expression [ ,...n ] [ WITH { CUBE | ROLLUP } ] ]
Let’s go over the arguments and then look at some examples.
The GROUP BY ALL directive tells SQL Server to generate all possible groups and result sets. It even returns groups that do not have any rows meeting the search condition that you specified in the WHERE clause. When ALL is specified, null values are returned for the summary columns of groups that do not meet the search condition.
The group_by_expression placeholder offers an expression that you can use to obtain a particular grouping. It can also be used as a grouping column in which the group_by_ expression can be a column or a nonaggregate expression that references a column. However, T-SQL does not allow you to use a column alias defined in the select list to specify a grouping column.
Note | Columns of type varchar(max), nvarchar(max), or varbinary(max) cannot be used in group_ by_expression. |
When you have GROUP BY clauses that do not contain a CUBE or ROLLUP, the number of group_by_expression items is restricted to the GROUP BY column sizes, the aggregated columns, and the aggregate values involved in the query. This is an architecture limitation that originates from the storage limits on the intermediate worktables that hold the intermediate query results. In addition, you cannot use more than ten grouping expressions when specifying CUBE or ROLLUP, which now follow.
The WITH CUBE argument lets you direct SQL Server to introduce summary rows to the result set, in addition to rows provided by the GROUP BY clause. This clause essentially results in a data explosion that returns a multidimensional cube.
The ROLLUP argument also specifies that in addition to rows provided by the GROUP BY clause, summary rows must be introduced into the result set. The groups, however, are summarized in a hierarchical order, from the lowest level in the group to the highest. The group hierarchy is determined by the order in which the grouping columns are specified. Changing the order of the grouping columns can affect the number of rows produced in the result set.
Distinct aggregates such as AVG(DISTINCT column_name), COUNT(DISTINCT column_ name), and SUM(DISTINCT column_name) are not supported as part of CUBE or ROLLUP queries.
When GROUP BY is specified in a query, either each column in any nonaggregate expression in the select list should be included in the GROUP BY list, or the GROUP BY expression must match exactly the select list expression.
If aggregate functions are included in the select list, the GROUP BY calculates a summary value for each group. However, if the ORDER BY clause is not specified, groups returned using the GROUP BY clause are not in any particular order. You should thus make it a habit to always use the ORDER BY clause to specify a particular ordering of returned data.
Note | GROUP BY ALL is not supported in queries that access remote tables if there is also a WHERE clause in the query. |
The following is a simple GROUP BY query, which you can later expand with more advanced capabilities such as joins:
SELECT Item, Quantity, Category FROM Items
returns the following result set:
Item | Quantity | Category |
---|---|---|
Cabbage | 3199 | Veggie |
Plums | 15 | Fruit |
Squash | 98 | Veggie |
Apes | 3434 | Animals |
Peaches | 23423 | Fruit |
Monkeys | 342 | Animals |
But if you add the GROUP BY
GROUP BY Category, Item, Quantity
the result set is grouped as follows on Category:
Item | Quantity | Category |
---|---|---|
Apes | 3434 | Animals |
Monkeys | 342 | Animals |
Peaches | 23423 | Fruit |
Plums | 15 | Fruit |
Cabbage | 3199 | Veggie |
Squash | 98 | Veggie |