14.3 Windowing Functions

   

The ranking functions described thus far are quite useful when comparing items within a fixed window of time, such as "last year" or "second quarter." But what if you want to perform computations using a window that slides as you progress through the data set? Oracle's windowing functions allow aggregates to be calculated for each row in a result set based on a specified window. The aggregation window can be defined in one of three ways:

  • By specifying a set of rows: "From the current row to the end of the partition"

  • By specifying a time interval: "For the 30 days preceding the transaction date"

  • By specifying a range of values: "All rows having a transaction amount within 5% of the current row's transaction amount"

The first set of examples will generate a window that fills the entire partition, and then show how the window can be detached from one or both ends of the partition so that it floats with the current row. All of the examples will be based on the following query, which calculates total monthly sales in 2001 for the Mid-Atlantic region:

SELECT month,    SUM(tot_sales) monthly_sales FROM orders WHERE year = 2001    AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES ---------- -------------          1        610697          2        428676          3        637031          4        541146          5        592935          6        501485          7        606914          8        460520          9        392898         10        510117         11        532889         12        492458

The first step is to sum the monthly sales for the entire result set by specifying an "unbounded" window. Note the ROWS BETWEEN clause in the following example:

SELECT month,    SUM(tot_sales) monthly_sales,   SUM(SUM(tot_sales)) OVER (ORDER BY month      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) total_sales FROM orders WHERE year = 2001    AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES TOTAL_SALES ---------- ------------- -----------          1        610697     6307766          2        428676     6307766          3        637031     6307766          4        541146     6307766          5        592935     6307766          6        501485     6307766          7        606914     6307766          8        460520     6307766          9        392898     6307766         10        510117     6307766         11        532889     6307766         12        492458     6307766

Each time the function executes, it sums the monthly sales from months 1 through 12; thus, the same calculation is being performed 12 times. This is a rather inefficient way to generate the yearly sales total (see Section 14.4 later in this chapter for a better method), but it should give you an idea of the syntax for building an aggregation window. The next query will create a window that spans from the top of the partition to the current row. The function identifies the month that has the maximum sales, up to and including the current month:

SELECT month,    SUM(tot_sales) monthly_sales,   MAX(SUM(tot_sales)) OVER (ORDER BY month     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) max_preceeding FROM orders WHERE year = 2001    AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES MAX_PRECEEDING ---------- ------------- --------------          1        610697         610697          2        428676         610697          3        637031         637031          4        541146         637031          5        592935         637031          6        501485         637031          7        606914         637031          8        460520         637031          9        392898         637031         10        510117         637031         11        532889         637031         12        492458         637031

Unlike the first query, which has a window size fixed at 12 rows, this query's aggregation window grows from a single row for month 1 to 12 rows for month 12. The keywords CURRENT ROW are used to indicate that the window should end at the current row being inspected by the function. If you replace MAX in the previous query with SUM, you can calculate a running total:

SELECT month,    SUM(tot_sales) monthly_sales,   SUM(SUM(tot_sales)) OVER (ORDER BY month     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) running_total FROM orders WHERE year = 2001    AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES RUNNING_TOTAL ---------- ------------- --------------          1        610697         610697          2        428676        1039373          3        637031        1676404          4        541146        2217550          5        592935        2810485          6        501485        3311970          7        606914        3918884          8        460520        4379404          9        392898        4772302         10        510117        5282419         11        532889        5815308         12        492458        6307766

You have now seen examples using windows that are fixed at one or both ends. The next query will define a window that floats freely with each row:

SELECT month,    SUM(tot_sales) monthly_sales,   AVG(SUM(tot_sales)) OVER (ORDER BY month      ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001    AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- -----------          1        610697    519686.5          2        428676  558801.333          3        637031  535617.667          4        541146  590370.667          5        592935  545188.667          6        501485  567111.333          7        606914      522973          8        460520  486777.333          9        392898  454511.667         10        510117  478634.667         11        532889  511821.333         12        492458    512673.5

For each of the 12 rows, the function calculates the average sales of the current month, the previous month, and the following month. The value of the ROLLING_AVG column is therefore the average sales within a three month floating window centered on the current month, with the exception that months 1 and 12 are calculated using a two-month window, since there is no previous month for month 1 or following month for month 12.

14.3.1 Working with Ranges

The previous windowing examples use the ROWS BETWEEN option to specify which rows to include in the aggregation. You may alternately specify a range and let Oracle determine which rows lie within the range. For example, the previous query used ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to generate a three-month rolling average; the same results can be achieved by substituting RANGE for ROWS:

SELECT month,   SUM(tot_sales) monthly_sales,   AVG(SUM(tot_sales)) OVER (ORDER BY month   RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001   AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- -----------          1        610697    519686.5          2        428676  558801.333          3        637031  535617.667          4        541146  590370.667          5        592935  545188.667          6        501485  567111.333          7        606914      522973          8        460520  486777.333          9        392898  454511.667         10        510117  478634.667         11        532889  511821.333         12        492458    512673.5

This substitution works because the month column contains integer values, so adding and subtracting 1 from the current month yields a three-month range. The next variation achieves the same results but specifies a range of +/- 1.999:

SELECT month,   SUM(tot_sales) monthly_sales,   AVG(SUM(tot_sales)) OVER (ORDER BY month   RANGE BETWEEN 1.999 PRECEDING AND 1.999 FOLLOWING) rolling_avg FROM orders WHERE year = 2001   AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES ROLLING_AVG ---------- ------------- -----------          1        610697    519686.5          2        428676  558801.333          3        637031  535617.667          4        541146  590370.667          5        592935  545188.667          6        501485  567111.333          7        606914      522973          8        460520  486777.333          9        392898  454511.667         10        510117  478634.667         11        532889  511821.333         12        492458    512673.5

If you are generating a window based on a DATE column, you can specify a range in increments of days, months, or years. Since the orders table has no DATE columns, the next example shows how a date range can be specified against the order_dt column of the cust_order table:

SELECT TRUNC(order_dt) day,   SUM(sale_price) daily_sales,   AVG(SUM(sale_price)) OVER (ORDER BY TRUNC(order_dt)   RANGE BETWEEN INTERVAL '2' DAY PRECEDING      AND INTERVAL '2' DAY FOLLOWING) five_day_avg FROM cust_order WHERE sale_price IS NOT NULL    AND order_dt BETWEEN TO_DATE('01-JUL-2001','DD-MON-YYYY')   AND TO_DATE('31-JUL-2001','DD-MON-YYYY') GROUP BY TRUNC(order_dt); DAY       DAILY_SALES FIVE_DAY_AVG --------- ----------- ------------ 16-JUL-01         112          146 18-JUL-01         180          114 20-JUL-01          50          169 21-JUL-01          50   165.333333 22-JUL-01         396   165.333333

This query generates a five-day rolling window by specifying a range of +/- two days around the truncated order date.

14.3.2 FIRST_VALUE and LAST_VALUE

Oracle provides two additional aggregate functions, called FIRST_VALUE and LAST_VALUE, that can be used with windowing functions to identify the values of the first and last values in the window. In the case of the three-month rolling average query shown previously, you could display the values of all three months along with the average of the three, as in:

SELECT month,   FIRST_VALUE(SUM(tot_sales)) OVER (ORDER BY month      ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) prev_month,   SUM(tot_sales) monthly_sales,   LAST_VALUE(SUM(tot_sales)) OVER (ORDER BY month      ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) next_month,   AVG(SUM(tot_sales)) OVER (ORDER BY month      ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) rolling_avg FROM orders WHERE year = 2001    AND region_id = 6 GROUP BY month ORDER BY month;      MONTH PREV_MONTH MONTHLY_SALES NEXT_MONTH ROLLING_AVG ---------- ---------- ------------- ---------- -----------          1     610697        610697     428676    519686.5          2     610697        428676     637031  558801.333          3     428676        637031     541146  535617.667          4     637031        541146     592935  590370.667          5     541146        592935     501485  545188.667          6     592935        501485     606914  567111.333          7     501485        606914     460520      522973          8     606914        460520     392898  486777.333          9     460520        392898     510117  454511.667         10     392898        510117     532889  478634.667         11     510117        532889     492458  511821.333         12     532889        492458     492458    512673.5

These functions are useful for queries that compare each value to the first or last value in the period, such as: "How did each month's sales compare to the first month?"

14.3.3 LAG/LEAD Functions

Although not technically windowing functions, the LAG and LEAD functions are included here because they allow rows to be referenced by their position relative to the current row, much like the PRECEDING and FOLLOWING clauses within windowing functions. LAG and LEAD are useful for comparing one row of a result set with another row of the same result set. For example, the query "Compute the total sales per month for the Mid-Atlantic region, including the percent change from the previous month" requires data from both the current and preceding rows to calculate the answer. This is, in effect, a two-row window, but the offset from the current row can be specified as one or more rows, making LAG and LEAD act like specialized windowing functions where only the outer edges of the window are utilized.

Here is the SQL that uses the LAG function to generate the data needed to answer the question posed in the previous paragraph:

SELECT month,    SUM(tot_sales) monthly_sales,   LAG(SUM(tot_sales), 1) OVER (ORDER BY month) prev_month_sales FROM orders WHERE year = 2001   AND region_id = 6 GROUP BY month ORDER BY month;      MONTH MONTHLY_SALES PREV_MONTH_SALES ---------- ------------- ----------------          1        610697          2        428676           610697          3        637031           428676          4        541146           637031          5        592935           541146          6        501485           592935          7        606914           501485          8        460520           606914          9        392898           460520         10        510117           392898         11        532889           510117         12        492458           532889

As you might expect, the LAG value for month 1 is NULL, since there is no preceding month. This would also be the case for the LEAD value for month 12. If you would like the LAG and LEAD functions to return a non-NULL value for these cases, you can specify a substitute value via the optional third parameter (see next example).

The next query utilizes the output from the previous query to generate the percentage difference from month to month. Note how a third parameter has been specified for the LAG function so that month 1 will use the current month's sales instead of a NULL value for the percentage change:

SELECT months.month month, months.monthly_sales monthly_sales,   ROUND((months.monthly_sales - months.prev_month_sales) /     months.prev_month_sales, 3) * 100 percent_change FROM  (SELECT month,      SUM(tot_sales) monthly_sales,     LAG(SUM(tot_sales), 1, SUM(tot_sales))        OVER (ORDER BY month) prev_month_sales   FROM orders   WHERE year = 2001     AND region_id = 6   GROUP BY month) months ORDER BY month;      MONTH MONTHLY_SALES PERCENT_CHANGE ---------- ------------- --------------          1        610697              0          2        428676          -29.8          3        637031           48.6          4        541146          -15.1          5        592935            9.6          6        501485          -15.4          7        606914             21          8        460520          -24.1          9        392898          -14.7         10        510117           29.8         11        532889            4.5         12        492458           -7.6



Mastering Oracle SQL
Mastering Oracle SQL, 2nd Edition
ISBN: 0596006322
EAN: 2147483647
Year: 2003
Pages: 154

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net