Using Aggregate Functions


The simplest technique is aggregating all rows in a query. Aggregate functions include the means to summarize a range of values in a variety of ways. You may simply want to count the rows that match a criterion or get the sum of a range of numeric values. The following table contains all of the system-supplied aggregate functions supported by Transact-SQL used to summarize column values.

Function

Description

COUNT()

Calculates the count of all non-null values for a specific column. Can also be used as COUNT(*) to return the absolute count of rows regardless of null values. Returns Int data type.

COUNT_BIG()

Same as the COUNT() function but returns the BigInt data type. This would only be necessary if the table contains more than two billion rows.

SUM()

Returns the sum of all non-null values in the range. The return data type is the same as the numeric column data type.

AVG()

Returns the average of all non-null values in the range. The return data type is the same as the numeric column data type.

MIN()

Returns the smallest non-null value in the range. Can be used with any sortable data type.

MAX()

Returns the largest non-null value in the range. Can be used with any sortable data type.

STDEV()

Returns the simple standard deviation for all non-null values in a numeric range. Returns a Float data type regardless of the column type.

STDEVP()

Returns the standard deviation for a population, for all non-null values in a numeric range. Returns a Float data type regardless of the column type.

VAR()

Returns the simple variance for all non-null values in a numeric range. Returns a Float data type regardless of the column type.

VARP()

Returns the variance for a population, for all non-null values in a numeric range. Returns a Float data type regardless of the column type.

The COUNT() Function

The COUNT() function simply counts rows or non-null values in a column's value range. Because the data type of the column isn't considered, it will work with columns of practically any type of data. Consider the following two examples. If you execute this query against the Product table, the total number of rows is returned:

 SELECT COUNT(*) FROM Product 

As you can see, the Product table contains 999 rows. Now, count only the values in the ListPrice column using the following expression:

 SELECT COUNT(ListPrice) FROM Product 

Because 200 records don't have a ListPrice value (these rows contain the value NULL for this column), only 799 rows get counted. Now add the word DISTINCT before the column reference and execute the query again:

 SELECT COUNT(DISTINCT ListPrice) FROM Product 

Because so many of the products have the same prices, only 104 records are counted. The DISTINCT modifier can be used with any of the aggregate functions except when using the CUBE or ROLLUP statements, which are discussed later in this chapter.

The SUM() Function

The SUM() function simply returns the sum of a range of numeric column values. Like the others, this function only considers non-null values. A simple example returns the subtotal for a product order. This query adds up the UnitPrice for each detail line in the order whose SalesOrderID is 5005:

 SELECT SUM(UnitPrice) FROM SalesOrderDetail  WHERE SalesOrderID = 5005 

The result is a single row with a single column just like the previous examples, as shown in Figure 7-1.

image from book
Figure 7-1:

I have two issues with this result. The first is that the column doesn't have a name. When applying aggregate functions, the resulting column won't be named unless you specifically define an alias for the column name. If you use visual query design tools, such as Access or the Transact-SQL Designer (in Visual Studio or to create a view in Enterprise Manager), these tools will devise column aliases such as SumOfUnitPrice or Expr1. The first order of business is to assign an alias so this column has a sensible name. The other problem with this simple example is that it assumes that the customer purchased one of each product. The fact is that there are three detail rows for this order with respective quantities 1, 3, and 4. To accurately total the order, you'll have to do a little math. This query resolves both of these issues, calculating extended price and defining an alias for the column:

 SELECT SUM(UnitPrice * OrderQty) As OrderTotalPrice  FROM SalesOrderDetail WHERE SalesOrderID = 5005 

The result shown in Figure 7-2 contains the correct amount (the total of all three order detail rows, considering the quantity for the product purchased), and the column has a name.

image from book
Figure 7-2:

The AVG() Function

The AVG() function returns the calculated average for a range of numeric values. Internally, the query processor calculates the sum of all the values and then divides by the number of rows in the range (containing non-null values.) Optionally, the AVG() function can make a distinct selection of values and then perform the same calculation on this abbreviated set of values. Using a distinct selection can greatly affect the results and is not as common.

I'd like to use the product sales data in the AdventureWorks2000 database to demonstrate the practical application of these functions. In this scenario, the director of marketing has asked for an analysis of road bike sales in 2003. This information exists in three related tables. Pay no attention to the join statements for the time being; they are covered in Chapter 8. The following query uses the SalesOrderHeader table to filter the sales order, the Product table to filter by ProductSubCategoryID (2 is road bikes), and the UnitPrice is retrieved from the SalesOrderDetail table. For simplicity, I'm not considering the quantity of bikes purchased.

I'll start with the lowest price paid for a bike. Using the MIN() function should return only one value:

 SELECT   MIN(UnitPrice) FROM     SalesOrderHeader  INNER JOIN SalesOrderDetail ON  SalesOrderHeader.SalesOrderID = SalesOrderDetail.SalesOrderID INNER JOIN Product ON SalesOrderDetail.ProductID = Product.ProductID WHERE    Product.ProductSubCategoryID = 2 AND SalesOrderHeader.OrderDate BETWEEN ‘1-1- 

You can see that the lowest UnitPrice value in this range is $388.79. Just modify the query, substituting the following functions in place of the MIN() function in the example. The following table shows the results.

Question

Function

Result

What was the lowest price paid?

MIN()

388.79

What was the highest price paid?

MAX()

2443.35

What is the average price paid?

AVG()

1116.04




Beginning Transact-SQL with SQL Server 2000 and 2005
Beginning Transact-SQL With SQL Server 2000 and 2005
ISBN: 076457955X
EAN: 2147483647
Year: 2006
Pages: 131
Authors: Paul Turley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net