Using the Analytic Functions


The database has many built-in analytic functions that enable you to perform complex calculations such as finding the top-selling product type for each month, the top salespersons, and so on. The analytic functions are organized into the following categories:

  • Ranking functions    Enable you to calculate ranks, percentiles, and n -tiles (tertiles, quartiles, and so on)

  • Inverse percentile functions    Enable you to calculate the value that corresponds to a percentile

  • Window functions    Enable you to calculate cumulative and moving aggregates

  • Reporting functions    Enable you to calculate things like market shares

  • Lag and lead functions    Enable you to get a value in a row where that row is a certain number of rows away from the current row

  • First and last functions    Enable you to get the first and last values in an ordered group

  • Linear regression functions    Enable you to fit an ordinary-least-squares regression line to a set of number pairs

  • Hypothetical rank and distribution functions    Enable you to calculate the rank and percentile that a new row would have if you inserted it into a table

You ll learn about these functions shortly, but first let s examine the example table used in this section.

The Example Table

You ll see the use of the all_sales table in the following sections. all_sales stores the sum of all the sales by dollar amount for a particular year, month, product type, and employee.

The all_sales table is created by the store_schema.sql script as follows :

 CREATE TABLE all_sales (year INTEGER NOT NULL,  month INTEGER NOT NULL,  prd_type_id INTEGER  CONSTRAINT all_sales_fk_product_types   REFERENCES product_types(product_type_id),  emp_id INTEGER   CONSTRAINT all_sales_fk_employees2  REFERENCES employees2(employee_id),  amount NUMBER(8, 2),  CONSTRAINT all_sales_pk PRIMARY KEY (year, month, prd_type_id, emp_id)); 

As you can see, the all_sales table contains five columns , which are as follows:

  • year    Stores the year the sales took place

  • month    Stores the month the sales took place (1 to 12)

  • prd_type_id    Stores the product_type_id of the product

  • emp_id    Stores the employee_id of the employee who handled the sales

  • amount    Stores the total dollar amount of the sales

The following output shows the first 13 rows in the all_sales table:

  YEAR   MONTH   PRD_TYPE_ID   EMP_ID   AMOUNT  ---------- ---------- ----------- ---------- ----------    2003   1    1  21 10034.84   2003    2    1   21  15144.65   2003   3  1    21 20137.83   2003   4  1   21 25057.45   2003   5   1   21 17214.56   2003   6   1 21 15564.64    2003   7  1  21 12654.84   2003    8     1   21 17434.82    2003   9   1   21 19854.57   2003     10    1  21  21754.19   2003    11    1  21  13029.73   2003    12    1    21 10034.84   2003   1    1   22  11034.84 
Note  

The all_sales table actually contains a lot more rows than this, but for brevity I ve omitted listing them all here.

Let s examine the ranking functions.

Using the Ranking Functions

You use the ranking functions to calculate ranks, percentiles, and n -tiles. The ranking functions are shown in Table 7-2.

Table 7-2: Ranking Functions

Function

Description

RANK()

Returns the rank of items in a group. RANK() leaves a gap in the sequence of rankings in the event of a tie.

DENSE_RANK()

Returns the rank of items in a group. DENSE_RANK() doesn't leave a gap in the sequence of rankings in the event of a tie.

CUME_DIST()

Returns the position of a specified value relative to a group of values; CUME_DIST() is short for cumulative distribution.

PERCENT_RANK()

Returns the percent rank of a value relative to a group of values.

NTILE()

Returns n -tiles: tertiles, quartiles, and so on.

ROW_NUMBER()

Returns a number with each row in a group.

Let s examine the RANK() and DENSE_RANK() functions first.

Using the RANK() and DENSE_RANK() Functions

You use RANK() and DENSE_RANK() to rank items in a group. The difference between these two functions is in the way they handle items that tie: RANK() leaves a gap in the sequence when there is a tie, but DENSE_RANK() leaves no gaps. For example, if you were ranking sales by product type and two product types tie for first place, RANK() would put the two product types in first place, but the next product type would be in third place. DENSE_RANK() would also put the two product types in first place, but the next product type would be in second place.

The following query illustrates the use of RANK() and DENSE_RANK() to get the ranking of sales by product type for 2003. Notice the use of the keyword OVER in the syntax when calling the RANK() and DENSE_RANK() functions:

  SELECT   prd_type_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,   DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dense_rank   FROM all_sales   WHERE year = 2003   AND amount IS NOT NULL   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT)  RANK DENSE_RANK ----------- ----------- ---------- ----------    1  905081.84   1    1    2 186381.22   4    4    3  478270.91    2    2      4 402751.16    3   3 

Notice sales for product type #1 are ranked first, sales for product type #2 are ranked fourth, and so on. Because there are no ties, RANK() and DENSE_RANK() return the same ranks.

The all_sales table actually contains nulls in the amount column for all rows whose prd_type_id column is 5, but the previous query omits these rows because of the inclusion of the line AND amount IS NOT NULL in the WHERE clause. The next example includes these rows by leaving out the AND line from the WHERE clause:

  SELECT   prd_type_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,   DENSE_RANK() OVER (ORDER BY SUM(amount) DESC) AS dense_rank   FROM all_sales   WHERE year = 2003   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT)  RANK DENSE_RANK ----------- ----------- ---------- ----------    1 905081.84    2   2   2  186381.22   5    5    3  478270.91   3   3   4 402751.16   4   4      5     1   1 

Notice the last row contains null for the sum of the amount column and RANK() and DENSE_RANK() return 1 for this row. This is because by default RANK() and DENSE_RANK() assign the highest rank of 1 to null values in descending rankings (that is, DESC is used in the OVER clause), and the lowest rank in ascending rankings (that is, ASC is used in the OVER clause).

Controlling Ranking of Null Values Using the NULLS FIRST and NULLS LAST Clauses    When using an analytic function, you can explicitly control whether nulls are the highest or lowest in a group using NULLS FIRST or NULLS LAST . The next example uses NULLS LAST to specify nulls are the lowest:

  SELECT   prd_type_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC NULLS LAST) AS rank,   DENSE_RANK() OVER (ORDER BY SUM(amount) DESC NULLS LAST) AS   dense_rank   FROM all_sales   WHERE year = 2003   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT)   RANK DENSE_RANK ----------- ----------- ---------- ----------    1 905081.84    1    1     2 186381.22    4    4     3 478270.91   2    2    4  402751.16    3    3     5      5   5 

Using the PARTITION BY Clause with Analytic Functions    You use the PARTITION BY clause with the analytic functions when you need to divide the groups into subgroups. For example, if you need to subdivide the sales amount by month, you can use PARTITION BY month as shown in the following query:

  SELECT   prd_type_id, month, SUM(amount),   RANK() OVER (PARTITION BY month ORDER BY SUM(amount) DESC) AS rank   FROM all_sales   WHERE year = 2003   AND amount IS NOT NULL   GROUP BY prd_type_id, month   ORDER BY prd_type_id, month;  PRD_TYPE_ID  MONTH SUM(AMOUNT)   RANK ----------- ---------- ----------- ----------    1  1  38909.04  1   1  2  70567.9   1     1 3 91826.98   1     1   4  120344.7  1     1    5 97287.36    1   1   6  57387.84  1     1   7  60929.04     2    1   8  75608.92    1     1  9 85027.42   1   1  10 105305.22   1    1   11  55678.38    1      1  12  46209.04  2     2   1  14309.04  4      2    2  13367.9    4    2     3  16826.98   4    2   4  15664.7   4     2    5  18287.36   4   2   6  14587.84   4    2    7  15689.04    3     2   8 16308.92    4     2    9  19127.42    4      2    10  13525.14    4     2    11  16177.84    4    2   12 12509.04   4   3     1  24909.04    2      3     2   15467.9    3    3   3 20626.98     3    3    4 23844.7    2     3   5  18687.36     3      3     6 19887.84   3     3    7 81589.04    1   3    8  62408.92   2   3   9 46127.42     3    3    10 70325.29   3    3  11  46187.38  2     3  12  48209.04   1    4    1 17398.43    3      4   2   17267.9    2    4     3  31026.98     2     4   4 16144.7    3   4    5  20087.36   2    4   6  33087.84   2   4  7 12089.04   4     4    8 58408.92   3     4  9  49327.42   2    4   10 75325.14    2  4  11  42178.38   3      4   12  30409.05    3 

Using ROLLUP, CUBE, and GROUPING SETS Operators with Analytic Functions    You can use the ROLLUP , CUBE , and GROUPING SETS operators with the analytic functions. The following query uses ROLLUP and RANK() to get the sales rankings by product type ID:

  SELECT   prd_type_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC) AS rank   FROM all_sales   WHERE year = 2003   GROUP BY ROLLUP(prd_type_id)   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT)  RANK ----------- ----------- ----------     1 905081.84    3     2 186381.22   6    3  478270.91   4    4 402751.16  5   5        1      1972485.13     2 

The next query uses CUBE and RANK() to get all rankings of sales by product type ID and employee ID:

  SELECT   prd_type_id, emp_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC) AS rank   FROM all_sales   WHERE year = 2003   GROUP BY CUBE(prd_type_id, emp_id)   ORDER BY prd_type_id, emp_id;  PRD_TYPE_ID  EMP_ID SUM(AMOUNT)   RANK ----------- ---------- ----------- ----------    1   21 197916.96   19    1   22 214216.96   17   1  23  98896.96    26    1 24 207216.96   18     1  25  93416.96    28    1    26  93417.04  27    1    905081.84     9    2   21  20426.96   40     2   22  19826.96  41    2  23 19726.96   42    2  24 43866.96  34   2   25 32266.96 38   2 26 50266.42   31    2     186381.22  21    3  21 140326.96  22     3   22 116826.96  23    3 23  112026.96  24     3   24 34829.96    36     3   25  29129.96   39    3   26 45130.11 33    3    478270.91   10    4   21 108326.96    25   4  22 81426.96   30     4  23 92426.96   29     4    24  47456.96   32     4    25 33156.96    37     4   26  39956.36   35     4    402751.16   13     5    21       1    5  22        1    5   23       1     5   24        1    5    25        1     5  26          1    5           1       21  466997.84   11      22  432297.84   12        23 323077.84 15      24 333370.84 14        25  187970.84    20       26 228769.93    16      1972485.13  8 

The next query uses GROUPING SETS and RANK() to get just the sales amount subtotal rankings:

  SELECT   prd_type_id, emp_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC) AS rank   FROM all_sales   WHERE year = 2003   GROUP BY GROUPING SETS(prd_type_id, emp_id)   ORDER BY prd_type_id, emp_id;  PRD_TYPE_ID  EMP_ID SUM(AMOUNT)   RANK ----------- ---------- ----------- ----------    1      905081.84    2   2      186381.22     11    3     478270.91   3     4      402751.16  6     5             1       21 466997.84   4      22  432297.84   5       23  323077.84    8       24 333370.84  7         25 187970.84   10         26  228769.93   9 

Using the CUME_DIST() and PERCENT_RANK() Functions

You use CUME_DIST() to calculate the position of a specified value relative to a group of values; CUME_DIST() is short for cumulative distribution. You use PERCENT_RANK() to calculate the percent rank of a value relative to a group of values.

The following query illustrates the use of CUME_DIST() and PERCENT_RANK() to get the cumulative distribution and percent rank of sales:

  SELECT   prd_type_id, SUM(amount),   CUME_DIST() OVER (ORDER BY SUM(amount) DESC) AS cume_dist,   PERCENT_RANK() OVER (ORDER BY SUM(amount) DESC) AS percent_rank   FROM all_sales   WHERE year = 2003   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT) CUME_DIST PERCENT_RANK ----------- ----------- ---------- ------------    1 905081.84   .4    .25     2 186381.22    1   1     3 478270.91  .6     .5    4 402751.16  .8     .75    5      .2    0 

Using the NTILE() Function

You use NTILE( buckets ) to calculate n -tiles: tertiles, quartiles, and so on; bucket specifies the number of buckets into which groups of rows are placed. For example, NTILE(2) specifies two buckets and divides the groups of rows in two; NTILE(4) divides the groups into four.

The following query illustrates the use of NTILE() . Notice 4 is passed to NTILE() to split the groups of rows into four buckets:

  SELECT   prd_type_id, SUM(amount),   NTILE(4) OVER (ORDER BY SUM(amount) DESC) AS ntile   FROM all_sales   WHERE year = 2003   AND amount IS NOT NULL   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT)   NTILE ----------- ----------- ----------    1  905081.84    1     2  186381.22  4    3 478270.91    2    4 402751.16    3 

Using the ROW_NUMBER() Function

You use ROW_NUMBER() to return a number with each row in a group, starting at 1. The following query illustrates the use of ROW_NUMBER() :

  SELECT   prd_type_id, SUM(amount),   ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS row_number   FROM all_sales   WHERE year = 2003   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT) ROW_NUMBER ----------- ----------- ----------    1  905081.84    2     2 186381.22   5    3  478270.91    3     4 402751.16    4   5      1 

This concludes the discussion of ranking functions.

Using the Inverse Percentile Functions

In the section Using the CUME_DIST() and PERCENT_RANK() Functions, you saw that CUME_DIST() is used to calculate the position of a specified value relative to a group of values. You also saw that PERCENT_RANK() is used to calculate the percent rank of a value relative to a group of values.

In this section, you ll see how you use the inverse percentile functions to get the value that corresponds to a percentile. There are two inverse percentile functions: PERCENTILE_DISC( x ) and PERCENTILE_CONT( x ) . They operate in the reverse sense of CUME_DIST() and PERCENT_RANK() . PERCENTILE_DISC( x ) examines the cumulative distribution values in each group until it finds one that is greater than or equal to x . PERCENTILE_CONT( x ) examines the percent rank values in each group until it finds one that is greater than or equal to x .

The following query illustrates the use of PERCENTILE_CONT() and PERCENTILE_DISC() to get the sum of the amount whose percentile is greater than or equal to 0.6:

  SELECT   PERCENTILE_CONT(0.6) WITHIN GROUP (ORDER BY SUM(amount) DESC)   AS percentile_cont,   PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY SUM(amount) DESC)   AS percentile_disc   FROM all_sales   WHERE year = 2003   GROUP BY prd_type_id;  PERCENTILE_CONT PERCENTILE_DISC --------------- ---------------   417855.11   402751.16 

If you compare the sum of the amounts shown in these results with those shown in the earlier section Using the CUME_DIST() and PERCENT_RANK() Functions, you ll see the sums correspond to those whose cumulative distribution and percent rank are 0.6 and 0.75, respectively.

Using the Window Functions

You use the window functions to calculate things like cumulative sums and moving averages within a specified range of rows, a range of values, or an interval of time. The term window is used because processing of results involves a sliding range of rows returned by a query.

You can use windows with the following functions: SUM() , AVG() , MAX() , MIN() , COUNT() , VARIANCE() , and STDDEV(); you saw these functions in Chapter 3. You can also use windows with FIRST_VALUE() and LAST_VALUE() , which return the first and last values in a window. (You ll learn more about the FIRST_VALUE() and LAST_VALUE() functions later in the section Getting the First and Last Rows Using FIRST_VALUE() and LAST_VALUE() .)

In this section, you ll see how to perform a cumulative sum, a moving average, and a centered average.

Performing a Cumulative Sum

The following query performs a cumulative sum to compute the cumulative sales amount for 2003 starting with January and ending in December. Notice each monthly sales amount is added to the cumulative amount that grows after each month:

  SELECT   month, SUM(amount) AS month_amount,   SUM(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)   AS cumulative_amount   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT CUMULATIVE_AMOUNT ---------- ------------ -----------------     1 95525.55    95525.55    2 116671.6    212197.15      3  160307.92 372505.07    4   175998.8  548503.87     5  154349.44    702853.31     6  124951.36    827804.67     7  170296.16   998100.83     8  212735.68   1210836.51     9  199609.68   1410446.19    10 264480.79  1674926.98   11  160221.98    1835148.96   12 137336.17  1972485.13 

Notice the previous query uses the following expression to compute the cumulative aggregate:

 SUM(SUM(amount)) OVER  (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)  AS cumulative_amount 

Let s break this expression down:

  • SUM(amount) computes the sum of an amount. The outer SUM() computes the cumulative amount.

  • ORDER BY month orders the rows read by the query by month.

  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW defines the start and end of the window. The start includes all rows read by the query as indicated by UNBOUNDED PRECEDING; the end of the window is the current row. CURRENT ROW is actually the default, and I could have implicitly indicated the window size using ROWS UNBOUNDED PRECEDING and the results of the query would be the same.

So the entire expression means compute the cumulative sum of the amount for each month starting at the first row read by the query.

Each row in the window is processed one at a time, starting with the first row in the window. As each row is processed, the current row s amount is added to the cumulative amount and the end of the window moves down to the next row. Processing continues until the last row read by the query is processed .

Caution  

Don t confuse the end of the window with the end of the rows read by the query. The end of the window slides down as each current row read from the query is processed. So, in the example, the window starts off with 1 row, increases by 1 row as each row is processed, and ends with 12 rows.

The next query uses a cumulative sum to compute the cumulative sales amount starting with June and ending in December of 2003. Notice the use of ROWS UNBOUNDED PRECEDING to implicitly indicate the end of the window is the current row:

  SELECT   month, SUM(amount) AS month_amount,   SUM(SUM(amount)) OVER (ORDER BY month ROWS UNBOUNDED PRECEDING) AS   cumulative_amount   FROM all_sales   WHERE year = 2003   AND month BETWEEN 6 AND 12   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT CUMULATIVE_AMOUNT ---------- ------------ -----------------    6  124951.36   124951.36    7  170296.16   295247.52    8  212735.68    507983.2    9  199609.68    707592.88   10  264480.79    972073.67    11  160221.98   1132295.65   12  137336.17   1269631.82 

Performing a Moving Average

The following query computes the moving average of the sales amount between the current month and the previous three months:

  SELECT   month, SUM(amount) AS month_amount,   AVG(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)   AS moving_average   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT MOVING_AVERAGE ---------- ------------ --------------   1  95525.55   95525.55    2   116671.6  106098.575    3 160307.92 124168.357     4  175998.8  137125.968     5  154349.44 151831.94    6  124951.36  153901.88     7  170296.16  156398.94    8  212735.68  165583.16    9  199609.68  176898.22    10  264480.79   211780.578    11  160221.98  209262.033     12 137336.17  190412.155 

Notice the previous query uses the following expression to compute the moving average:

 AVG(SUM(amount)) OVER  (ORDER BY month ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)  AS moving_average 

Let s break this expression down:

  • SUM(amount) computes the sum of an amount. The outer AVG() computes the average.

  • ORDER BY month orders the rows read by the query by month.

  • ROWS BETWEEN 3 PRECEDING AND CURRENT ROW defines the start of the window as including the three rows preceding the current row; the end of the window is the current row. I could have implicitly indicated the window size using ROWS 3 PRECEDING and the results of the query would be the same.

So the entire expression means compute the moving average of the sales amount between the current month and the previous three months. Because for the first two months less than the full three months of data are available, the moving average is based on only the months available.

Both the start and the end of the window begin at row #1 read by the query. The end of the window moves down after each row is processed. The start of the window only moves down once row #4 has been processed, after which time the start of the window moves down after each row is processed. Processing continues until the last row read by the query is processed.

Performing a Centered Average

The following query computes the moving average of the sales amount centered between the previous and next month from the current month:

  SELECT   month, SUM(amount) AS month_amount,   AVG(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)   AS moving_average   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT MOVING_AVERAGE ---------- ------------ --------------    1  95525.55   106098.575     2  116671.6  124168.357     3 160307.92  150992.773    4 175998.8  163552.053     5 154349.44  151766.533     6  124951.36  149865.653   7  170296.16 169327.733     8 212735.68   194213.84  9  199609.68  225608.717   10  264480.79  208104.15    11  160221.98  187346.313    12  137336.17 148779.075 

Notice the previous query uses the following expression to compute the moving average:

 AVG(SUM(amount)) OVER  (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)  AS moving_average 

Let s break this expression down:

  • SUM(amount) computes the sum of an amount. The outer AVG() computes the average.

  • ORDER BY month orders the rows read by the query by month.

  • ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING defines the start of the window as including the row preceding the current row. The end of the window is the row following the current row.

So the entire expression means compute the moving average of the sales amount between the current month and the previous month. Because for the first and last month less than the full three months of data are available, the moving average is based on only the months available.

The start of the window begins at row #1 read by the query. The end of the window begins at row #2 and moves down after each row is processed. The start of the window only moves down once row #2 has been processed. Processing continues until the last row read by the query is processed.

Getting the First and Last Rows Using FIRST_VALUE() and LAST_VALUE()

You use the FIRST_VALUE() and LAST_VALUE() functions to get the first and last rows in a window. The following query uses FIRST_VALUE() and LAST_VALUE() to get the previous and next month s sales amount:

  SELECT   month, SUM(amount) AS month_amount,   FIRST_VALUE(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)   AS previous_month_amount,   LAST_VALUE(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)   AS next_month_amount   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT PREVIOUS_MONTH_AMOUNT NEXT_MONTH_AMOUNT ---------- ------------ --------------------- -----------------     1  95525.55     95525.55   116671.6    2 116671.6     95525.55   160307.92     3 160307.92       116671.6   175998.8    4  175998.8     160307.92    154349.44    5  154349.44     175998.8    124951.36   6  124951.36    154349.44   170296.16    7  170296.16     124951.36   212735.68     8 212735.68    170296.16    199609.68    9  199609.68     212735.68    264480.79   10  264480.79    199609.68   160221.98     11  160221.98   264480.79   137336.17     12  137336.17     160221.98    137336.17 

The next query divides the current month s sales amount by the previous month s sales amount (labeled as curr_div_prev ), and also divides the current month s sales amount by the next month s sales amount (labeled as curr_div_next ):

  SELECT   month, SUM(amount) AS month_amount,   SUM(amount)/FIRST_VALUE(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)   AS curr_div_prev,   SUM(amount)/LAST_VALUE(SUM(amount)) OVER   (ORDER BY month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)   AS curr_div_next   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT CURR_DIV_PREV CURR_DIV_NEXT ---------- ------------ ------------- -------------    1  95525.55      1  .818755807    2 116671.6  1.22136538  .727796855     3  160307.92  1.37400978  .910846665     4 175998.8  1.09787963  1.14026199    5  154349.44  .876991434 1.23527619    6 124951.36  .809535558  .733729756    7  170296.16  1.36289961  .800505867    8  212735.68  1.24921008  1.06575833    9  199609.68  .93829902  .754722791    10 264480.79   1.3249898 1.65071478    11 160221.98 .605798175  1.16664081    12  137336.17 .857161858    1 

This concludes the discussion of window functions.

Using the Reporting Functions

You use the reporting functions to perform calculations across groups and partitions within groups.

You can perform reporting with the following functions: SUM() , AVG() , MAX() , MIN() , COUNT() , VARIANCE() , and STDDEV() . You can also use the RATIO_TO_REPORT() function to compute the ratio of a value to the sum of a set of values.

In this section, you ll see how to perform a report on a sum and use the RATIO_TO_REPORT() function.

Reporting on a Sum

For the first three months of 2003, the following query reports :

  • The total sum of all sales for all three months (labeled as total_month_amount )

  • The total sum of all sales for all product types (labeled as total_product_type_amount )

      SELECT   month, prd_type_id,   SUM(SUM(amount)) OVER (PARTITION BY month)   AS total_month_amount,   SUM(SUM(amount)) OVER (PARTITION BY prd_type_id)   AS total_product_type_amount   FROM all_sales   WHERE year = 2003   AND month  <  = 3   GROUP BY month, prd_type_id   ORDER BY month, prd_type_id;  MONTH PRD_TYPE_ID TOTAL_MONTH_AMOUNT TOTAL_PRODUCT_TYPE_AMOUNT ---------- ----------- ------------------ -------------------------   1   1   95525.55      201303.92     1    2     95525.55     44503.92    1   3   95525.55    61003.92     1    4   95525.55        65693.31    1    5    95525.55     2     1   116671.6      201303.92    2    2    116671.6     44503.92     2    3   116671.6       61003.92   2   4    116671.6      65693.31     2   5    116671.6    3    1  160307.92       201303.92     3    2    160307.92       44503.92    3    3    160307.92       61003.92    3    4     160307.92      65693.31     3    5    160307.92 

Notice the previous query uses the following expression to report the total sum of all sales for all months (labeled as total_month_amount ).

 SUM(SUM(amount)) OVER (PARTITION BY month)  AS total_month_amount 

Let s break this expression down:

  • SUM(amount) computes the sum of an amount. The outer SUM() computes the total sum.

  • OVER (PARTITION BY month) causes the outer SUM() to compute the sum for each month.

The previous query also uses the following expression to report the total sum of all sales for all product types (labeled as total_product_type_amount ):

 SUM(SUM(amount)) OVER (PARTITION BY prd_type_id)  AS total_product_type_amount 

Let s break this expression down:

  • SUM(amount) computes the sum of an amount. The outer SUM() computes the total sum.

  • OVER (PARTITION BY prd_type_id) causes the outer SUM() to compute the sum for each product type.

Using the RATIO_TO_REPORT() Function

You use the RATIO_TO_REPORT() function to compute the ratio of a value to the sum of a set of values.

For the first three months of 2003, the following query reports:

  • The sum of the sales amount by product type for each month (labeled as prd_type_amount )

  • The ratio of the product type s sales amount for the entire month s sales (labeled as prd_type_ratio ), which is computed using RATIO_TO_REPORT()

      SELECT   month, prd_type_id,   SUM(amount) AS prd_type_amount,   RATIO_TO_REPORT(SUM(amount)) OVER (PARTITION BY month)   AS prd_type_ratio   FROM all_sales   WHERE year = 2003   AND month  <  = 3   GROUP BY month, prd_type_id   ORDER BY month, prd_type_id;  MONTH PRD_TYPE_ID PRD_TYPE_AMOUNT PRD_TYPE_RATIO ---------- ----------- --------------- --------------    1     1   38909.04   .40731553    1     2   14309.04  .149792804   1    3    24909.04  .260757881   1   4  17398.43  .182133785   1    5     2   1   70567.9  .604842138    2    2   13367.9  .114577155    2    3   15467.9 .132576394    2   4   17267.9  .148004313    2    5     3    1   91826.98  .57281624     3    2   16826.98  .104966617    3     3   20626.98 .128670998    3    4   31026.98  .193546145    3     5 

Notice the previous query uses the following expression to compute the ratio (labeled as prd_type_ratio ):

 RATIO_TO_REPORT(SUM(amount)) OVER (PARTITION BY month)  AS prd_type_ratio 

Let s break this expression down:

  • SUM(amount) computes the sum of the amount.

  • OVER (PARTITION BY month) causes the outer SUM() to compute the sum for each month.

  • The ratio is computed by dividing the sum of the amount for each product type by the sum of the amount for the entire amount.

This concludes the discussion of reporting functions.

Using the LAG() and LEAD() Functions

You use the LAG() and LEAD() functions to get a value in a row where that row is a certain number of rows away from the current row. The following query uses LAG() and LEAD() to get the previous and next month s sales amount:

  SELECT   month, SUM(amount) AS month_amount,   LAG(SUM(amount), 1) OVER (ORDER BY month) AS previous_month_amount,   LEAD(SUM(amount), 1) OVER (ORDER BY month) AS next_month_amount   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  MONTH MONTH_AMOUNT PREVIOUS_MONTH_AMOUNT NEXT_MONTH_AMOUNT ---------- ------------ --------------------- -----------------    1  95525.55             116671.6     2  116671.6     95525.55  160307.92    3  160307.92      116671.6     175998.8     4   175998.8     160307.92    154349.44     5  154349.44      175998.8    124951.36    6  124951.36      154349.44    170296.16    7  170296.16     124951.36   212735.68     8  212735.68      170296.16    199609.68   9  199609.68    212735.68   264480.79     10  264480.79    199609.68    160221.98    11  160221.98    264480.79   137336.17   12  137336.17      160221.98 

Notice the previous query uses the following expressions to get the previous and next month s sales:

 LAG(SUM(amount), 1) OVER (ORDER BY month) AS previous_month_amount, LEAD(SUM(amount), 1) OVER (ORDER BY month) AS next_month_amount 

LAG(SUM(amount), 1) gets the previous row s sum of the amount. LEAD(SUM(amount), 1) gets the next row s sum of the amount.

Using the FIRST and LAST Functions

You use the FIRST and LAST functions to get the first and last values in an ordered group. You can use FIRST and LAST with the following functions: MIN() , MAX() , COUNT() , SUM() , AVG() , STDDEV() , and VARIANCE() .

The following query uses FIRST and LAST to get the months in 2003 that had the highest and lowest sales:

  SELECT   MIN(month) KEEP (DENSE_RANK FIRST ORDER BY SUM(amount))   AS highest_sales_month,   MIN(month) KEEP (DENSE_RANK LAST ORDER BY SUM(amount))   AS lowest_sales_month   FROM all_sales   WHERE year = 2003   GROUP BY month   ORDER BY month;  HIGHEST_SALES_MONTH LOWEST_SALES_MONTH ------------------- ------------------       1      10 

Using the Linear Regression Functions

You use the linear regression functions to fit an ordinary-least-squares regression line to a set of number pairs. You can use the linear regression functions as aggregate, windowing, or reporting functions. The following table shows the linear regression functions. In the function syntax, y is interpreted by the functions as a variable that depends on x .

Function

Description

REGR_AVGX( y , x )

Returns the average of x after eliminating x and y pairs where either x or y is null.

REGR_AVGY( y , x )

Returns the average of y after eliminating x and y pairs where either x or y is null.

REGR_COUNT( y , x )

Returns the number of non-null number pairs that are used to fit the regression line.

REGR_INTERCEPT( y , x )

Returns the intercept on the y -axis of the regression line.

REGR_R2( y , x )

Returns the coefficient of determination, or R-squared, of the regression line.

REGR_SLOPE( y , x )

Returns the slope of the regression line.

REGR_SXX( y , x )

Returns REG_COUNT( y , x ) * VAR_POP( x ) .

REGR_SXY( y , x )

Returns REG_COUNT( y , x ) * COVAR_POP( y , x ) .

REGR_SYY( y , x )

Returns REG_COUNT( y , x ) * VAR_POP( y ) .

The following query shows the use of the linear regression functions:

  SELECT   prd_type_id,   REGR_AVGX(amount, month) AS avgx,   REGR_AVGY(amount, month) AS avgy,   REGR_COUNT(amount, month) AS count,   REGR_INTERCEPT(amount, month) AS inter,   REGR_R2(amount, month) AS r2,   REGR_SLOPE(amount, month) AS slope,   REGR_SXX(amount, month) AS sxx,   REGR_SXY(amount, month) AS sxy,   REGR_SYY(amount, month) AS syy   FROM all_sales   WHERE year = 2003   GROUP BY prd_type_id;  PRD_TYPE_ID  AVGX  AVGY COUNT   INTER  R2 ----------- ---------- ---------- ---------- ---------- ----------   SLOPE   SXX   SXY   SYY ---------- ---------- ---------- ----------    1    6.5 12570.5811  72 13318.4543 .003746289 -115.05741  858 -98719.26 3031902717      2  6.5 2588.62806    72 2608.11268  .0000508  -2.997634   858 -2571.97 151767392    3  6.5 6642.65153   72 2154.23119 .126338815 690.526206   858 592471.485 3238253324     4   6.5 5593.76611   72 2043.47164 .128930297 546.199149   858 468638.87 1985337488     5           0 

Using the Hypothetical Rank and Distribution Functions

You use the hypothetical rank and distribution functions to calculate the rank and percentile that a new row would have if you inserted it into a table. You can perform hypothetical calculations with the following functions: RANK() , DENSE_RANK() , PERCENT_RANK() , and CUME_DIST() .

Before you see an example of a hypothetical, the following query uses RANK() and PERCENT_RANK() to get the rank and percent rank of sales by product type for 2003:

  SELECT   prd_type_id, SUM(amount),   RANK() OVER (ORDER BY SUM(amount) DESC) AS rank,   PERCENT_RANK() OVER (ORDER BY SUM(amount) DESC) AS percent_rank   FROM all_sales   WHERE year = 2003   AND amount IS NOT NULL   GROUP BY prd_type_id   ORDER BY prd_type_id;  PRD_TYPE_ID SUM(AMOUNT)   RANK PERCENT_RANK ----------- ----------- ---------- ------------    1  905081.84   1    0     2 186381.22    4    1    3 478270.91    2  .333333333      4 402751.16   3 .666666667 

The next query shows the hypothetical rank and percent rank of a sales amount of $500,000:

  SELECT   RANK(500000) WITHIN GROUP (ORDER BY SUM(amount) DESC)   AS rank,   PERCENT_RANK(500000) WITHIN GROUP (ORDER BY SUM(amount) DESC)   AS percent_rank   FROM all_sales   WHERE year = 2003   AND amount IS NOT NULL   GROUP BY prd_type_id   ORDER BY prd_type_id;  RANK PERCENT_RANK ---------- ------------     2   .25 

As you can see, the hypothetical rank and percent rank of a sales amount of $500,000 are 2 and .25.

This concludes the discussion of hypothetical functions.




Oracle Database 10g SQL
Oracle Database 10g SQL (Osborne ORACLE Press Series)
ISBN: 0072229810
EAN: 2147483647
Year: 2004
Pages: 217

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net