Section 3.2. Cursors, Pipelining, and Nesting

3.2. Cursors, Pipelining, and Nesting

So far, table functions may strike you as a performance booster that might help you out occasionally when you're in a tight spot. The features described in this section, howeverthe use of cursors, pipelining, and nesting with table functionsare powerful enough that they might persuade you to actually tailor your code to make use of them.

Cursors: The omnipresent cursor makes an appearance in table functions as both a parameter datatype and a SQL function to allow SELECT statements to be passed right into a table function to be processed.
Pipelining: This feature allows a table function to send its results back one by one, rather than having to assemble a full result set. The effect is to allow downstream processing to begin much faster. Going back to the cancer research example I mentioned earlier, what if the function had to parse 100 results that each took three seconds? That means that any downstream processing would have to wait five minutes before it could begin. With pipelined functions, the downstream work would begin after only three seconds.
Nesting: Table functions can be nested to perform multiple tasks on data. This is especially powerful for Data Warehouse ETL work.

I think the best way to demonstrate all of these capabilities is with an example from a Data Warehouse ETL process that extracts information about work orders. The specific function I'll show extracts components of the creation, assignment, and close date of the orders. These components are then passed on for further processing in the ETL.

3.2.1. Cursors

We looked carefully at cursors in Chapter 2, and here they are again! How do cursors and table functions interact? Let's start by looking at a non-pipelined version of a table function.

     /* File on web: date_parser.sql */     CREATE OR REPLACE FUNCTION date_parse ( p_curs SYS_REFCURSOR )                       RETURN order_date_t AS       v_order_rec orders%ROWTYPE;       v_ret_val   order_date_t := order_date_t(  );     BEGIN       -- for every order in the cursor...       LOOP         FETCH p_curs INTO v_order_rec;         EXIT WHEN p_curs%NOTFOUND;         -- extend the array by 3 and populate with cmoponents of the         -- orders creation, assignment and close date         v_ret_val.EXTEND(3);         v_ret_val(v_ret_val.LAST - 2) := order_date_o(v_order_rec.order_number,                                                       'O',                                         TO_CHAR(v_order_rec.create_date,'YYYY'),                                         TO_CHAR(v_order_rec.create_date,'Q'),                                         TO_CHAR(v_order_rec.create_date,'MM'));         v_ret_val(v_ret_val.LAST - 1) := order_date_o(v_order_rec.order_number,                                                       'A',                                         TO_CHAR(v_order_rec.assign_date,'YYYY'),                                         TO_CHAR(v_order_rec.assign_date,'Q'),                                         TO_CHAR(v_order_rec.assign_date,'MM'));         v_ret_val(v_ret_val.LAST) := order_date_o(v_order_rec.order_number,                                                   'C',                                         TO_CHAR(v_order_rec.close_date,'YYYY'),                                         TO_CHAR(v_order_rec.close_date,'Q'),                                         TO_CHAR(v_order_rec.close_date,'MM'));       END LOOP;  -- every order in ths cursor       RETURN(v_ret_val);     END;

And here are the results when three orders are queried.

     ORDER_NUMBER D       YEAR    QUARTER      MONTH     ------------ - ---------- ---------- ----------                1 O       2005          3          8                1 A       2005          3          8                1 C       2005          3          8                2 O       2005          4         10                2 A       2005          4         10                2 C       2005          4         10                3 O       2005          4         12                3 A       2005          4         12                3 C       2005          4         12

You'll notice that I said three orders were queried, but I was careful not to say where they were queried from. The answer is that they are queried from the cursor passed to the function as shown here.

     SELECT *       FROM TABLE(date_parse(CURSOR(SELECT *                                      FROM orders)));

The TABLE keyword indicates the call to the date_parse function. The CURSOR keyword indicates that the text that follows (in parentheses) is to be used as the cursor in the table function. It is implicitly opened, and the function fetches records from it. It is implicitly closed when the table function goes out of scope. The ability to pass cursor text to the function is extremely powerful because all it requires is valid SQL. It can be called (and hence reused) in many places. For example, if I want to process only today's orders, I would call it like this:

     SELECT *       FROM TABLE(date_parse(CURSOR(SELECT *                                      FROM orders                                     WHERE create > TRUNC(SYSDATE))));

If I want to further restrict it to orders from a specific region, I would add that restriction to the SELECT statement as follows:

     SELECT *       FROM TABLE(date_parse(CURSOR(SELECT *                                      FROM orders                                     WHERE create > TRUNC(SYSDATE)                                       AND region_id = 33)));

The only restriction on the SELECT statement here is it has to select every column from the ORDERS table because within the table function, the record variable it selects into is declared as such (orders%ROWTYPE). You don't always have to be this restrictive, however, and there are plenty of options for matching up SELECT statements, cursor parameter types, and local variables. I'll discuss those options later in this chapter.

3.2.2. Pipelined Table Functions

A pipelined table function is one that returns a result set as a collection, but it does so iteratively. In other words, Oracle does not wait for the function to run to completion, storing all the rows it computes in the PL/SQL collection before returning it. Instead, as each row is ready to be assigned to the collection, it is "piped out" of the function. Let's see a pipelined table function in action.

     /* File on web: date_parser_pipelined.sql */     CREATE OR REPLACE FUNCTION date_parse ( p_curs SYS_REFCURSOR )                       RETURN order_date_t                       PIPELINED AS       v_order_rec orders%ROWTYPE;     BEGIN       -- for every order in the cursor...       LOOP         FETCH p_curs INTO v_order_rec;         EXIT WHEN p_curs%NOTFOUND;         -- pipe out the components of the orders open data         PIPE ROW(order_date_o(v_order_rec.order_number,                                'O',                                TO_CHAR(v_order_rec.create_date,'YYYY'),                                TO_CHAR(v_order_rec.create_date,'Q'),                                TO_CHAR(v_order_rec.create_date,'MM')));         -- pipe out the components of the orders assign date         PIPE ROW(order_date_o(v_order_rec.order_number,                               'A',                               TO_CHAR(v_order_rec.assign_date,'YYYY'),                               TO_CHAR(v_order_rec.assign_date,'Q'),                               TO_CHAR(v_order_rec.assign_date,'MM')));         -- pipe out the components of the orders close date         PIPE ROW(order_date_o(v_order_rec.order_number,                               'C',                               TO_CHAR(v_order_rec.close_date,'YYYY'),                               TO_CHAR(v_order_rec.close_date,'Q'),                               TO_CHAR(v_order_rec.close_date,'MM')));       END LOOP;  -- every order in the cursor       RETURN;     END;

There are four main syntax changes between the non-pipelined version and the pipelined version:

The PIPELINED keyword is added to the function's header to tell Oracle to pipe results back as requested, rather than assembling a full result set and returning it.
The PIPE ROW command indicates the point at which the function sends back a single result.
The poor lonely RETURN keyword is left with nothing to do but...return. All of the results will already have been piped back via the PIPE ROW command.
The datatype returned (order_date_o) differs from the return datatype declared for the function (order_date_t). Though syntactically different, they must be related, as discussed in the next paragraph.

Oracle won't allow just any old datatype to be piped back from any old table function. The collection specified as the return type of the function must be of the object type. The two datatypes declared for my example were declared like this:

     CREATE OR REPLACE TYPE order_date_o AS OBJECT ( order_number NUMBER,                                                     date_type    VARCHAR2(1),                                                     year         NUMBER,                                                     quarter      NUMBER,                                                     month        NUMBER );     /     CREATE TYPE order_date_t AS TABLE OF order_date_o;     /

To demonstrate how much pipelining helps in this situation, I'll beef up the number of orders to 10,000 and run the following query against both the non-pipelined and the pipelined versions of the function.

     SELECT *       FROM TABLE(date_parse(CURSOR(SELECT *                                      FROM orders)))      WHERE ROWNUM <= 10;

This query is perfect for this situation because it will show exactly how long the function took to send back its 10th result. And the winner is...

In second place with a respectable time of 2.73 seconds is the non-pipelined version.
In first place with an astounding duration of 0.07 seconds is the pipelined version.

The pipelined version runs 2.66 seconds or almost 93% faster. More fulfilling from a DBA point of view is that behind the scenes the database had to spend 93% less time maintaining a read-consistent copy of the ORDERS table to satisfy the query, and there is 93% less chance of requiring excess physical reads because the query ran so long. I could go on and on.

Even more pleasing from a DBA point of view is the reduction in the amount of Oracle session memory required to execute the pipelined version. For the sake of simplicity here, I'll restrict the concept of session memory to the User Global Area (UGA) and the Process Global Area (PGA) . Table 3-1 shows a comparison between the UGA and the PGA required to execute both versions of the function. These stats were gathered after signing onto the database and executing the pipelined and non-pipelined functions once each.

Table 3-1. Pipelined versus non-pipelined Oracle session memory
	Non-pipelined	Pipelined	Difference
UGA maximum	7,105,168	90,284	7,014,884
PGA maximum	12,815,736	242,708	2,573,028

That's a 98% reduction in both session UGA and session PGAthe database has less to keep track of, so it can spend more time being fast.

3.2.3. Nested Table Functions

The nesting of table functions refers to the execution of several table functions in a nested manner: the results of one are sent to the next for further processing, and those results are sent on to the next, and so on. This process is sometimes known as daisy-chaining . Combined with pipelining, the nesting of table functions provides a particularly powerful technique for ETL processing.

I'll demonstrate by coding a function that accepts the results of my date_parse table function via a cursor and performs an operation on them. To avoid the clutter of explaining a complex ETL transformation, we'll stick to a simple one by adding up the order_number, year, quarter, and month values queried. Here's the function.

     /* File on web: next_in_line.sql */     CREATE OR REPLACE FUNCTION next_in_line ( p_curs SYS_REFCURSOR )                       RETURN next_t                       PIPELINED IS       v_ret_val next_t := next_t(  );       -- local variables for cursor results       v_on NUMBER;       v_dt VARCHAR2(1);       v_yr NUMBER;       v_qt NUMBER;       v_mt NUMBER;     BEGIN       -- for all date components from the cursor...       LOOP         FETCH p_curs INTO v_on, v_dt, v_yr, v_qt, v_mt;         EXIT WHEN p_curs%NOTFOUND;         -- pipe out the sum of the components         PIPE ROW(next_o(v_on + v_yr + v_qt + v_mt));       END LOOP;  -- every date component       RETURN;     END;

We've already covered most of the syntax in this example. Only the syntax used to nest the two functions together in a query is new:

     SELECT *       FROM TABLE(next_in_line                  (CURSOR                   (SELECT *                      FROM TABLE(date_parse                                 (CURSOR                                  (SELECT *                                     FROM orders))))));

With my nifty indentation, it even looks nested with the embedded TABLE and CURSOR keywords. Simply put, what this example does is start the date_parse function, which pipes results to the next_in_line function, which pipes them to the world.

Section 3.2. Cursors, Pipelining, and Nesting

3.2. Cursors, Pipelining, and Nesting

3.2.1. Cursors

3.2.2. Pipelined Table Functions

Table 3-1. Pipelined versus non-pipelined Oracle session memory

3.2.3. Nested Table Functions