Function | Usage |
---|---|
AVG( expression ) | Computes the average value of a column given by expression . |
CORR( dependent , independent ) | Computes a correlation coefficient. |
COUNT( expression ) | Counts the rows defined by the expression . |
COUNT(*) | Counts all rows in the specified table or view. |
COVAR_POP ( dependent , independent ) | Computes population covariance. |
COVAR_SAMP( dependent , independent ) | Computes sample covariance. |
CUME_DIST( value_list ) WITHIN GROUP (ORDER BY sort_list ) | Computes the relative rank of a hypothetical row within a group of rows, where the rank is equal to the number of rows less than or equal to the hypothetical row divided by the number of rows in the group. |
DENSE_RANK( value_list ) WITHIN GROUP (ORDER BY sort_list ) | Generates a dense rank (no ranks are skipped ) for a hypothetical row ( value_list ) in a group of rows generated by GROUP BY. |
MIN( expression ) | Finds the minimum value in a column given by expression . |
MAX( expression ) | Finds the maximum value in a column given by expression . |
PERCENT_RANK( value_list ) WITHIN GROUP (ORDER BY sort_list ) | Generates a relative rank for a hypothetical row by dividing that row's rank less 1 by the number of rows in the group. |
PERCENTILE_CONT ( percentile ) WITHIN GROUP (ORDER BY sort_list ) | Generates an interpolated value that, if added to the group, would correspond to the percentile given. |
PERCENTILE_DISC ( percentile ) WITHIN GROUP (ORDER BY sort_list ) | Returns the value with the smallest cumulative distribution value greater than or equal to percentile . |
RANK( value_list ) WITHIN GROUP (ORDER BY sort_list ) | Generates a rank for a hypothetical row ( value_list ) in a group of rows generated by GROUP BY. |
REGR_AVGX( dependent , independent ) | Computes the average of the independent variable. |
REGR_AVGY( dependent , independent ) | Computes the average of the dependent variable. |
REGR_COUNT( dependent , independent ) | Counts the number of pairs remaining in the group after any pair with one or more NULL values has been eliminated. |
REGR_INTERCEPT( dependent , independent ) | Computes the y-intercept of the least-squares-fit linear equation. |
REGR_R2( dependent , independent ) | Squares the correlation coefficient. |
REGR_SLOPE( dependent , independent ) | Determines the slope of the least-squares-fit linear equation. |
REGR_SXX( dependent , independent ) | Sums the squares of the independent variables . |
REGR_SXY( dependent , independent ) | Sums the products of each pair of variables. |
REGR_SYY( dependent , independent ) | Sums the squares of the dependent variables. |
STDDEV_POP( expression ) | Computes the population standard deviation of all expression values in a group. |
STDDEV_SAMP( expression ) | Computes the sample standard deviation of all expression values in a group. |
SUM( expression ) | Computes the sum of column values given by expression . |
VAR_POP( expression ) | Computes the population variance of all expression values in a group. |
VAR_SAMP( expression ) | Computes the sample standard deviation of all expression values in a group. |
Technically speaking, ANY , EVERY , and SOME are considered aggregate functions. However, they have been discussed as range search criteria since they are most often used that way. Refer to ALL/ANY/SOME Operators for more information on these functions.
The number of values processed by an aggregate function varies depending on the number of rows queried from the table. This behavior makes aggregate functions different from scalar functions, which can only operate on the values of a single row per invocation.
The general syntax of an aggregate function is:
aggregate_function_name ( [ALL DISTINCT] expression )
The aggregate function name may be AVG, COUNT, MAX, MIN , or SUM , as listed in Table 4-1. The ALL keyword, which specifies the default behavior, evaluates all rows when aggregating the value of the function. The DISTINCT keyword uses only distinct values when evaluating the function.
|
AVG and SUM |
The AVG function computes the average of values in a column or an expression. SUM computes the sum. Both functions work with numeric values and ignore NULL values. Use the DISTINCT keyword to compute the average or sum of all distinct values of a column or expression.
AVG ([ALL DISTINCT] expression ) SUM ([ALL DISTINCT] expression )
All these platforms support the SQL2003 syntax of AVG and SUM .
DB2 and Oracle support the ANSI syntax and the following analytic syntax:
AVG ([ALL DISTINCT] expression ) OVER ( window_clause ) SUM ([ALL DISTINCT] expression ) OVER ( window_clause )
For an explanation of the window_clause , see the Section 4.3 later in this chapter.
The following query computes average year-to-date sales for each type of book:
SELECT type, AVG( ytd_sales ) AS "average_ytd_sales"FROM titles GROUP BY type;
This query returns the sum of year-to-date sales for each type of book:
SELECT type, SUM( ytd_sales ) FROM titles GROUP BY type;
CORR |
The CORR function returns the correlation coefficient between a set of dependent and independent variables.
Calls the function with two variables, one dependent and the other independent:
CORR( dependent , independent )
Any pair in which either the dependent variable, independent variable, or both are NULL is ignored. The result of the function is NULL when none of the input pairs consist of two non-NULL values.
Oracle supports the SQL2003 syntax, and the following analytic syntax:
CORR ( dependent , independent ) OVER ( window_ clause )
For an explanation of the window_clause , see the Section 4.3 later in this chapter.
These platforms do not support any form of the CORR function.
The following CORR example uses the data shown by the first SELECT :
SELECT * FROM test2; Y X ---------- ---------- 1 3 2 2 3 1 SELECT CORR(y,x) FROM test2; CORR(Y,X) ---------- -1
COUNT |
The COUNT function is used to compute the number of rows in an expression.
COUNT(*) COUNT( [ALLDISTINCT] expression )
Counts all the rows in the target table whether or not they include NULLs.
Computes the number of rows with non-NULL values in a specific column or expression. When the keyword DISTINCT is used, duplicate values will be ignored and a count of the distinct values is returned. ALL returns the number of non-NULL values in the expression and is implicit when DISTINCT is not used.
All of these platforms support the SQL2003 syntax of COUNT .
DB2 and Oracle support the ANSI syntax and the following analytic syntax:
COUNT ({*[DISTINCT] expression }) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
This query counts all rows in a table:
SELECT COUNT(*) FROM publishers;
The following query finds the number of different countries where publishers are located:
SELECT COUNT(DISTINCT country) "Count of Countries" FROM publishers
COVAR_POP |
The COVAR_POP function returns the population covariance of a set of dependent and independent variables.
Call the function with two variables, one dependent and the other independent:
COVAR_POP( dependent , independent )
The function disregards any pair in which either the dependent variable, independent variable, or both are NULL. If no rows remain in the group after NULL elimination , then the result of the function is NULL.
Oracle supports the SQL2003 syntax and implements the following analytic syntax:
COVAR_POP ( dependent , independent ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
In DB2, the function is named CORRELATION .
These platforms do not support any form of the COVAR_POP function.
The following COVAR_POP example uses the data shown by the first SELECT :
SELECT * FROM test2; Y X ---------- ---------- 1 3 2 2 3 1 SELECT COVAR_POP(y,x) FROM test2; COVAR_POP(Y,X) -------------- -.66666667
COVAR_SAMP |
The COVAR_SAMP function returns the sample covariance of a set of dependent and independent variables.
Call the function with two variables, one dependent and the other independent:
COVAR_SAMP( dependent , independent )
The function disregards any pair in which either the dependent variable, independent variable, or both are NULL. The result of the function is NULL when none of the input pairs consist of two non-NULL values.
Oracle supports the SQL2003 syntax and implements the following analytic syntax:
COVAR_SAMP ( dependent , independent ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
These platforms do not support any form of the COVAR_SAMP function.
The following COVAR_SAMP example uses the data shown by the first SELECT :
SELECT * FROM test2; Y X ---------- ---------- 1 3 2 2 3 1 SQL> SELECT COVAR_SAMP(y,x) FROM test2; COVAR_SAMP(Y,X) --------------- -1
CUME_DIST |
Computes the relative rank of a hypothetical row within a group of rows, where that relative rank is computed as follows :
( rows_preceding_hypothetical + rows_peered_with_hypothetical ) / rows_in_group
Bear in mind that the rows_in_group value includes the hypothetical row that you are proposing when you call the function.
In the following syntax, items in the value_list correspond by position to items in the sort_list . Therefore, both lists must have the same number of expressions.
CUME_DIST( value_list ) WITHIN GROUP (ORDER BY sort_list ) value_list ::= expression [, expression ...] sort_list ::= sort_item [, sort_item ...] sort_item ::= expression [ASCDESC] [NULLS FIRSTNULLS LAST]
Oracle follows the SQL2003 syntax and implements the following analytic syntax:
CUME_DIST OVER ( [partioning ] ordering )
For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3.
These platforms do not implement the CUME_DIST aggregate function.
The following example determines the relative rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4 , where groups are distinguished by the values in the odd column:
SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, CUME_DIST(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD CUME_DIST(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- ----------------------------------------- 0 1 1 .8
In group odd=0 , the new row comes after the three rows: (0,0) , (2,0) , and (4,0) . It will peer with itself. The total number of rows in the group will be four, which includes the hypothetical row. The relative rank, therefore, is computed as follows:
(3 rows preceding + 1 peering) / (3 in group + 1 hypothetical) = 4 / 4 = 1
In group odd=1 , the new row follows the three rows (1,1) , (3,1) , and a duplicate (3,1) . Again, there is one peer, the hypothetical row itself. The number of rows in the group is five, which includes the hypothetical row. The relative rank is then:
(3 rows preceding + 1 peering) / (4 in group + 1 hypothetical) = 4 / 5 = .8
DENSE_RANK |
Computes a rank in a group for a hypothetical row that you supply. This is a dense rank . Rankings are never skipped, even when a group contains rows that rank identically.
In the following syntax, items in the value_list correspond by position to items in the sort_list . Therefore, both lists must have the same number of expressions.
DENSE_RANK( value_list ) WITHIN GROUP (ORDER BY sort_list ) value_list ::= expression [, expression ...] sort_list ::= sort_item [, sort_item ...] sort_item ::= expression [ASCDESC] [NULLS FIRSTNULLS LAST]
Oracle follows the SQL2003 syntax and implements the following analytic syntax:
DENSE_RANK( ) OVER ( [partioning ] ordering )
For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3.
These platforms do not implement the DENSE_RANK aggregate function. However, DB2 does support DENSE_RANK as an analytic function. See the section later in this chapter titled Section 4.3.
The following example determines the dense rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4 , where groups are distinguished by the values in the odd column:
SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, DENSE_RANK(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD DENSE_RANK(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- ------------------------------------------ 0 4 1 3
In group odd=0 , the new row comes after (0,0) , (2,0) , and (4,0) , and thus it is position 4. In group odd=1 , the new row follows (1,1) , (3,1) , and a duplicate (3,1) . In that case, the duplicate occurrences of (3,1) both rank #2, so the new row is ranked #3. Compare this behavior with RANK , which gives a different result.
MIN and MAX |
MIN( expression ) and MAX( expression ) find the minimum and maximum value of expression (string, datetime, or numeric) in a set of rows. DISTINCT or ALL may be used with these functions, but do not affect the result.
MIN( [ALL DISTINCT] expression ) MAX( [ALL DISTINCT] expression )
These platforms support the SQL2003 syntax of MIN and MAX .
DB2 and Oracle support the ANSI syntax and implements the following analytic syntax:
MIN ({ALL[DISTINCT] expression }) OVER ( window_clause ) MAX ({ALL[DISTINCT] expression }) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
MySQL supports the SQL2003 syntax of MIN and MAX . MySQL also supports the functions LEAST( ) and GREATEST( ), providing the same capabilities.
The following query finds the best and worst sales for any title on record:
SELECT MIN(ytd_sales), MAX(ytd_sales) FROM titles;
Aggregate functions are used often in the HAVING clause of queries with GROUP BY . The following query selects all categories (types) of books that have an average price for all books in the category higher than $15.00:
SELECT type 'Category', AVG( price ) 'Average Price' FROM titles GROUP BY type HAVING AVG(price) > 15
PERCENT_RANK |
Generates a relative rank for a hypothetical row by dividing that row's rank less 1 by the number of rows in the group.
In the following syntax, items in the value_list correspond by position to items in the sort_list . Therefore, both lists must have the same number of expressions.
PERCENT_RANK( value_list ) WITHIN GROUP (ORDER BY sort_list ) value_list ::= expression [, expression ...] sort_list ::= sort_item [, sort_item ...] sort_item ::= expression [ASCDESC] [NULLS FIRSTNULLS LAST]
Oracle follows the SQL2003 syntax and implements the following syntax:
PERCENT_RANK( ) OVER ([ partioning ] ordering )
For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3.
These platforms do not implement the PERCENT_RANK aggregate function.
The following example determines the percentage rank of the hypothetical new row (num=4, odd=1) within each group of rows from test4 , where groups are distinguished by the values in the odd column:
SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, PERCENT_RANK(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD PERCENT_RANK(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- -------------------------------------------- 0 1 1 .75
In group odd=0 , the new row comes after (0,0) , (2,0) , and (4,0) , and thus it is position 4. The rank computation is: (4th rank - 1)/3 rows = 100% . In group odd=1 , the new row follows (1,1) , (3,1) , and a duplicate (3,1) , and is again ranked at #4. The rank computation for odd=1 is: (4th rank - 1)/4 rows = 3/4 = 75% .
PERCENTILE_CONT |
Generates an interpolated value corresponding to a percentile that you specify.
In the following syntax, percentile is a number between zero and one:
PERCENTILE_CONT( percentile ) WITHIN GROUP (ORDER BY sort_list ) sort_list ::= sort_item [, sort_item ...] sort_item ::= expression [ASCDESC] [NULLS FIRSTNULLS LAST]
Oracle allows only one expression in the ORDER BY clause:
PERCENTILE_CONT( percentile ) WITHIN GROUP (ORDER BY expression )
Oracle also allows some use of windowing syntax:
PERCENTILE_CONT ( percentile ) WITHIN GROUP (ORDER BY sort_list ) OVER ( partitioning )
See Section 4.3 later in this chapter for a description of partitioning.
These platforms do not implement PERCENTILE_CONT .
The following example groups the data in test4 by the column named odd , and invokes PERCENTILE_CONT to return a 50 th percentile value for each group:
SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY NUM) FROM test4 GROUP BY odd; ODD PERCENTILE_CONT(0.50)WITHINGROUP(ORDERBYNUM) ---------- -------------------------------------------- 0 2 1 3
PERCENTILE_DISC |
Determines the value in a group with the smallest cumulative distribution greater than or equal to a percentile that you specify.
In the following syntax, percentile is a number between zero and one:
PERCENTILE_DISC( percentile ) WITHIN GROUP (ORDER BY sort_list ) sort_list ::= sort_item [, sort_item ...] sort_item ::= expression [ASCDESC] [NULLS FIRSTNULLS LAST]
Oracle allows only one expression in the ORDER BY clause:
PERCENTILE_DISC( percentile ) WITHIN GROUP (ORDER BY expression )
Oracle also allows some use of windowing syntax:
PERCENTILE_DISC ( percentile ) WITHIN GROUP (ORDER BY sort_list ) OVER ( partitioning )
See Section 4.3 later in this chapter for a description of partitioning.
These platforms do not implement PERCENTILE_DISC .
The following example is similar to that for PERCENTILE_CONT , except that it returns, for each group, the value closest , but not exceeding , the 60 th percentile:
SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, PERCENTILE_DISC(0.60) WITHIN GROUP (ORDER BY NUM) FROM test4 GROUP BY odd; PERCENTILE_CONT(0.50)WITHINGROUP(ORDERBYNUM) -------------------------------------------- 2 3
RANK |
Computes a rank in a group for a hypothetical row that you supply. This is not a dense rank. If the group contains rows that rank identically, then it's possible for ranks to be skipped. If you want a dense rank, use the DENSE_RANK function.
In the following syntax, items in the value_list correspond by position to items in the sort_list . Therefore, both lists must have the same number of expressions.
RANK( value_list ) WITHIN GROUP (ORDER BY sort_list ) value_list ::= expression [, expression ...] sort_list ::= sort_item [, sort_item ...] sort_item ::= expression [ASCDESC] [NULLS FIRSTNULLS LAST]
Oracle follows the SQL2003 syntax and implements the following analytic syntax:
RANK( ) OVER ([ partitioning ] ordering )
For an explanation of the partioning and order clauses, see the section later in this chapter titled Section 4.3.
These platforms do not implement the RANK aggregate function.
The following example determines the rank of the hypothetical new row (num=4 , odd=1) within each group of rows from test4 , where groups are distinguished by the values in the odd column:
SELECT * FROM test4; NUM ODD ---------- ---------- 0 0 1 1 2 0 3 1 3 1 4 0 5 1 SELECT odd, RANK(4,1) WITHIN GROUP (ORDER BY num, odd) FROM test4 GROUP BY odd; ODD RANK(4,1)WITHINGROUP(ORDERBYNUM,ODD) ---------- ------------------------------------ 0 4 1 4
In both cases, the rank of the hypothetical new row is 4. In group odd=0 , the new row comes after: (0,0) , (2,0) , and (4,0) , and thus it is position 4. In group odd=1 , the new row follows (1,1) , (3,1) , and a duplicate (3,1) . In that case, the new row is preceding by three rows, so it is ranked #4. Compare this behavior with DENSE_RANK .
The REGR Family of Functions |
SQL2003 defines a family of functions, having names beginning with REGR_ , that relate to different aspects of linear regression. The functions work in the context of a least-squares regression line.
Following is the syntax and a brief description of each REGR_ function:
Averages (as in AVG( x ) ) the independent variable values.
Averages (as in AVG( y ) ) the dependent variable values.
Counts the number of non-NULL number pairs.
Computes the y-intercept of the regression line.
Computes the coefficient of determination.
Computes the slope of the regression line.
Sums the squares of the independent variable values.
Sums the products of each pair of values.
Sums the squares of the dependent variable values.
The REGR_ functions only work on number pairs containing two non-NULL values. Any number pair with one or more NULL values will be ignored.
DB2 and Oracle support the SQL2003 syntax for all REGR_ functions. In addition, DB2 allows the shortened name REGR_ICPT in place of REGR_INTERCEPT .
Oracle supports the following analytic syntax:
REGR_ function ( dependent , independent ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
These platforms do not implement the REGR family of functions.
The following REGEXP_COUNT example demonstrates that any pair with one or more NULL values is ignored. The table test3 contains three non-NULL number pairs, and three other pairs having at least one NULL:
SQL> SELECT * FROM test3; Y X ---------- ---------- 1 3 2 2 3 1 4 NULL NULL 4 NULL NULL
The REGR_COUNT function ignores the pairs having NULLs, counting only those pairs with non-NULL values:
SELECT REGR_COUNT(y,x) FROM test3; REGR_COUNT(Y,X) --------------- 3
Likewise, all other REGR_ functions filter out any pairs having NULL values before performing their respective computations .
STDDEV_POP |
Use STDDEV_POP to find the population standard deviation within a group of numeric values.
STDDEV_POP( numeric_expression )
Use the STDDEV function. In DB2 and MySQL, STDDEV returns the population standard deviation.
This platform does not provide a function to compute population standard deviation.
Oracle supports the standard syntax and the following analytic syntax:
STDDEV_POP ( numeric_expression ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
Use the STDEVP function.
The following example computes the population standard deviation for the values 1, 2, and 3:
SELECT * FROM test; X ---------- 1 2 3 SELECT STDDEV_POP(x) FROM test; STDDEV_POP(X) ------------- .816496581
STDDEV_SAMP |
Use STDDEV_SAMP to find the sample standard deviation within a group of numeric values.
STDDEV_SAMP( numeric_expression )
Oracle supports the standard syntax. Oracle also provides the STDDEV function, which operates similar to STDDEV_SAMP except that it returns zero as not NULL when there is only one value in the set.
Oracle also supports analytic syntax:
STDDEV_SAMP ( numeric_expression ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
This platform does not provide a function to compute sample standard deviation.
MySQL does not provide a function to compute sample standard deviation. MySQL does provide a function named STDDEV , but it returns the population standard deviation.
Use STDDEV .
Use STDEV (with only one D !).
The following example computes the sample standard deviation for the values 1, 2, and 3:
SELECT * FROM test; X ---------- 1 2 3 SELECT STDDEV_SAMP(x) FROM test; STDDEV_SAMP(X) -------------- 1
VAR_POP |
Use VAR_POP to compute the population variance of a set of values.
VAR_POP( numeric_expression )
These platforms do not provide a function to compute population variance.
Use the VARIANCE function, which in MySQL returns the population variance.
Oracle supports the standard syntax and the following analytic syntax:
VAR_POP ( numeric_expression ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
Use the VARP function.
The following example computes the population variance for the values 1, 2, and 3:
SELECT * FROM test; X ---------- 1 2 3 SELECT VAR_POP(x) FROM test; VAR_POP(X) ---------- .666666667
VAR_SAMP |
Use VAR_SAMP to compute the sample variance of a set of values.
VAR_SAMP( numeric_expression )
Use VARIANCE ( numeric_expression ) to compute sample variance.
MySQL provides no function for computing sample variance. There is the VARIANCE function, but in MySQL that function returns the population variance.
Oracle supports the standard syntax. You may also use the VARIANCE function, which differs from VAR_SAMP by returning zero (and not NULL) for sets having only a single value.
Oracle also supports analytic syntax:
VAR_SAMP ( numeric_expression ) OVER ( window_clause )
For an explanation of the window_clause , see the section later in this chapter titled Section 4.3.
Use the VAR function.
The following example computes the sample variance for the values 1, 2, and 3:
SELECT * FROM test; X ---------- 1 2 3 SELECT VAR_SAMP(x) FROM test; VAR_SAMP(X) -----------
1