Section 3.4. Using Table Functions | Oracle PL(s)SQL For DBAs

3.4. Using Table Functions

In this section, I'll show the use of table functions in a real-world example based on a large cable company's attempt to track repeat orders. Simply put, a repeat order occurs when a technician is dispatched to a location more than once within a 30-day period to perform the same type of work or to repair the original work. A repeat visit doesn't always have a negative connotationit could result from an installation occurring within 30 days of a pre-sales call.

Locations are identified by preassigned ID values. These can map to anything from a specific neighborhood cable outlet to a house or a large shopping mall. The type of work is identified using order type IDs. For example, type 1 might be "Cable Installation," and type 2 might be "Cable Upgrade."

The company is divided into several regions, each with its own set of location codes, order types, and repeat-order criteria. These criteria define sets of two order types that must occur at the same location within 30 days of each other to be considered a repeat order. The two order types may or may not be the same; for example, a "Cable Installation" followed by a subsequent "Cable Repair" might be deemed a repeat order in the same way that two "Cable Installations" would.

The criteria are held in this table:

     SQL> DESC repeat_order_criteria      Name                                      Null?    Type      ----------------------------------------- -------- ------      REGION_ID                                          NUMBER      START_DATE                                         DATE      FIRST_TYPE_ID                                      NUMBER      REPEAT_TYPE_ID                                     NUMBER

The table holds each region's repeat-order definitions, including the date they are in effect. Here's an example record.

     SQL> SELECT *       2    FROM repeat_order_criteria;      REGION_ID START_DAT FIRST_TYPE_ID REPEAT_TYPE_ID     ---------- --------- ------------- --------------              1 19-APR-05            44            102

This record states that any order of type 44 followed by an order of type 102 at the same location in region 1 within 30 days constitutes a repeat order.

The ORDERS table contains the fields necessary to determine an order's "repeatedness."

     SQL> DESC orders      Name                                      Null?    Type      ----------------------------------------- -------- ------      ORDER_NUMBER                              NOT NULL NUMBER      ORDER_DATE                                NOT NULL DATE      REGION_ID                                 NOT NULL NUMBER      TYPE_ID                                   NOT NULL NUMBER      LOCATION_ID                               NOT NULL NUMBER

The requirement is made a little more complex by the fact that this company processes tens of thousands of orders every day, and the repeat-order criteria can change at any time, so a new result set has to be available quickly. The large number of records combined with the serialized processing of one record at a time to create the result set makes this application a perfect candidate for table functions.

3.4.1. The Function Header

I'll start things off by explaining the header for the function I'll write.

     1 CREATE FUNCTION repeat_order_finder ( p_curs cursors.repeat_orders_curs )     2                 RETURN repeat_region_location_t     3                 PIPELINED     4                 PARALLEL_ENABLE ( PARTITION p_curs BY RANGE(region_id) )     5                 ORDER p_curs BY (location_id, order_date) IS

As far as function headers go, this one has a lot to say, so let's look at it line by line.

Line 1

States the function name and its parametera strongly typed REF cursor declared in another package like this:

 CREATE OR REPLACE PACKAGE cursors AS   TYPE repeat_orders_rec IS RECORD (order_number NUMBER,                                     order_date   DATE,                                     region_id    NUMBER,                                     type_id      NUMBER,                                     location_id  NUMBER );   TYPE repeat_orders_curs IS REF CURSOR RETURN repeat_orders_rec; END;

When the function is executed, I'll be passing it a SELECT statement getting all orders from the past 30 days.

Line 2

Explains the structure of the rows this function will return. It was created using the following SQL defining an object and a collection:

 CREATE TYPE repeat_region_location_o AS OBJECT ( region_id      NUMBER,                                                  location_id    NUMBER,                                                  first_type_id  NUMBER,                                                  repeat_type_id NUMBER ); / CREATE TYPE repeat_region_location_t AS TABLE OF repeat_region_location_o; /

Line 3

Explains that this function will pipe rows back upstream as soon as they are calculated.

Line 4

Defines the way that records from the passed-in REF cursor will be partitioned across the multiple parallel instances of this function. They are to be partitioned by values in the REGION_ID column. This means that all values for a particular region will be processed by the same function instance. It does not mean there will be one instance per region. Oracle will allocate available PQ servers as it sees fit to run instances of the function. This means that a single instance may process more than one region.

Line 5

States that, within each function instance, the records are to be further ordered by their LOCATION_ID and ORDER_DATE values.

Just working through the header has exposed most of the power behind the table function. This function allows me to throw the power of parallel processing at what would have been serial processing of tens of thousands of records. It even allows me to define what records go to each parallel function instance so I can make some assumptions in my code. Topping that off with not having to wait for all records to be processed before sending results upstream makes me think I've achieved nirvana!

Enough time on the pulpit! Let's get back to the function.

3.4.2. The Basic Loop

The main algorithm of the function is a loop that fetches records from the REF cursor as shown in this pseudo-code.

     BEGIN       -- for every order...       LOOP         FETCH p_curs INTO v_order;         EXIT WHEN p_curs%NOTFOUND;         IF it's a repeat order then           PIPE ROW(  );         END IF;       END LOOP;  -- every order       RETURN;     END;

It's very straightforward. Just fetch records from the passed-in cursor and evaluate them against repeat-order criteria. If a match is found, then pipe a record upstream. Now I'll add in the loading of the regions' repeat-order criteria.

3.4.3. Bulk Fetching the Criteria

Because my table function guarantees that records will be grouped by region, I can safely assume that any time the region ID value of a fetched record changes, I will have moved on to the next region. And the first thing I want to do for each region is load its repeat-order criteria into a PL/SQL associative array. I'll accomplish this using a cursor as shown here.

     CURSOR curs_get_criteria ( cp_region NUMBER ) IS       SELECT *         FROM repeat_order_criteria        WHERE region_id = cp_region;

Then, within the function, I implement a simple "last region id" check to see when the value changes, and, if it does, I bulk load the criteria:

     -- if it's a new region...     IF NVL(v_last_region,0) <> v_order.region_id THEN       -- set the local region ID and bulk load       -- its criteria       v_last_region := v_order.region_id;       OPEN curs_get_criteria(v_order.region_id);       FETCH curs_get_criteria BULK COLLECT INTO v_criteria;       CLOSE curs_get_criteria;     END IF;  -- new region

This exposes another benefit of table functions the ability to implement focused data access within the function itself. This means that the database can focus on getting the orders with one query and the repeat criteria with another.

Getting back to the function, the pseudo-code looks like this now that I've added the query to get the criteria:

     BEGIN       -- for every order...       LOOP         FETCH p_curs INTO v_order;         EXIT WHEN p_curs%NOTFOUND;         IF first record or new region THEN           Load region criteria         END IF;         IF it's a repeat order then           PIPE ROW(  );         END IF;       END LOOP;  -- every order       RETURN;     END;

3.4.4. Identifying Potential Repeats

Now for the slightly difficult partfinding repeat orders. I've decided to do this in two distinct operations. The first operation determines if the order is a potential outage by matching its order type and date with the first order type in a criterion. The second operation decides if the order is a genuine repeat by matching its location and order type with the repeat-order type of a potential outage.

I'll explain further with some example data. Consider the following order criterion record:

     START_DAT FIRST_TYPE_ID REPEAT_TYPE_ID     --------- ------------- --------------     19-APR-05           801             87

This record states that an order on or after April 19, 2005, with an order type of 334 that is followed within 30 days by an order of type 87 at the same location is considered a repeat order.

Now consider the following three orders.

     ORDER_NUMBER ORDER_DAT   TYPE_ID LOCATION_ID     ------------ --------- ---------- ----------             1016 19-APR-05        801        343             1863 20-APR-05         87        343             2228 21-APR-05         87        343

When processed by my function order, 1016 would become a potential repeat order for order type 801 and location 343. Any subsequent order of type 87 within 30 days at location 343 would be considered a genuine repeat order. Thus, orders 1863 and 2228 would be counted as genuine repeats by my function.

To find genuine repeat orders, I need to discover potential repeats first. For the sake of clean code, I'll implement the "potential repeats" logic as a sub-function named load_potential_repeat. First I'll show you the code, and then I'll explain it.

     /*------------------------------------------------------------------*/     PROCEDURE load_potential_repeat ( p_location_id NUMBER,                                       p_type_id     NUMBER,                                       p_date     DATE ) IS     /*------------------------------------------------------------------*/       v_hash NUMBER;     BEGIN       -- for every criteria...       FOR counter IN 1..v_criteria.LAST LOOP         -- if the order type of the order matches that of the criteria         IF v_criteria(counter).first_type_id = p_type_id THEN           -- if date range is valid           IF v_criteria(counter).start_date <= p_date THEN             -- create a hash based on the location and two repeat criteria             v_hash := DBMS_UTILITY.GET_HASH_VALUE(p_location_id || ':' ||                               v_criteria(counter).first_type_id || ':' ||                               v_criteria(counter).repeat_type_id,                               -32767,65533);             -- if the criteria is not already in the potential list then             -- put it there             IF NOT v_potential_repeat.EXISTS(v_hash) THEN               v_potential_repeat(v_hash).location_id := p_location_id;               v_potential_repeat(v_hash).first_type_id :=                                     v_criteria(counter).first_type_id;                   v_potential_repeat(v_hash).repeat_type_id :=                                    v_criteria(counter).repeat_type_id;             END IF;           END IF;  -- date range is valid         END IF;  -- order type matches       END LOOP;  -- every criteria     END load_potential_repeat;

It may look a little daunting, but it's actually a rather simple algorithm. For every criterion for the region, ask:

Does the first order type of the criterion match that of the order being processed? If yes, then carry on.
Is the criterion date less than or equal to the order date? If yes, then carry on.
Manufacture a hash value based on the location of the order, the first order type of the criterion, and the repeat-order type from the criterion.
Has an entry already been created in the associative array at the index point indicated by the manufactured hash? If no, then carry on.
Add the location, first order type, and repeat-order type to the associative array at the index point indicated by the manufactured hash.

For example, if three separate potential repeats were found, the associative array would look something like this:

     INDEX LOCATION_ID FIRST_TYPE_ID REPEAT_TYPE_ID     ----- ----------- ------------- --------------     -3421         874          1876            202       -99        1098             2             18     88862          18           100             88

Thus, any subsequent orders of type 202 and location 874 would be considered repeats, as would any orders of type 18 at location 1098 or type 88 at location 18.

Now, let's move on to find the genuine repeats.

3.4.5. Finding Genuine Repeats

Finding genuine repeats is straightforward. If the order type matches the repeat type of a criteria record, and if a corresponding match is found in the associative array containing potential repeats, then we have found a genuine repeat. I've put this code into a sub-function, as well.

     /*------------------------------------------------------------------*/     FUNCTION order_is_a_repeat ( p_location_id NUMBER,                                  p_type_id     NUMBER,                                  p_date        DATE )              RETURN NUMBER IS     /*------------------------------------------------------------------*/       v_hash NUMBER;     BEGIN       -- for every criteria...       FOR counter IN 1..v_criteria.LAST LOOP         -- if order type matches the repeat order type of a criteria         IF v_criteria(counter).repeat_type_id = p_type_id THEN           -- calculate a hash of the location, first and repeat order types           v_hash := DBMS_UTILITY.GET_HASH_VALUE(p_location_id || ':' ||                             v_criteria(counter).first_type_id || ':' ||                             v_criteria(counter).repeat_type_id,                             -32767,65533);           -- if logged as a potential repeat then its safe to assume           -- we are repeating now           IF v_potential_repeat.EXISTS(v_hash) THEN             RETURN(v_hash);           END IF;         END IF;  -- order type match       END LOOP;  -- every criteria       RETURN(NULL);     END order_is_a_repeat;

The algorithm is as follows. For every criterion for the region:

Does the order type of the order being processed match the repeat-order type of the criterion? If yes, then carry on.
Manufacture a hash value based on the location of the order, the first order type of the criterion, and the repeat-order type of the criterion.
If an entry exists in the associative array of potential repeat orders at the index point denoted by the hash value, then we've found a genuine repeat order.
Return the manufactured hash so the correct row can be found and piped upstream immediately.

3.4.6. The Final Function

Now that all facets of the function are in place, it's time to take one last look at the function's pseudo-code.

     BEGIN       -- for every order...       LOOP         FETCH p_curs INTO v_order;         EXIT WHEN p_curs%NOTFOUND;         IF first revord or new region THEN           Load region criteria         END IF;         IF it's a potential repeat then add to associative array.         IF it's a repeat order then           PIPE ROW(  );         END IF;       END LOOP;  -- every order       RETURN;     END;

The complete function code is available in the repeat_orders.sql file on the book's web site.

3.4.7. Running the Function

The function is executed using a SQL SELECT statement that is passed another SQL SELECT statement. I know that may still take some getting used to, but believe meit's worth checking out. Here's the SQL used to run the function:

     /* File on web: repeat_orders.sql */     SQL> SELECT *       2    FROM TABLE(repeat_order_finder(CURSOR(       3                 SELECT order_number,       4                        order_date,       5                        region_id,       6                        type_id,       7                        location_id       8                   FROM orders       9                  WHERE order_date >= SYSDATE - 30      10               )))      11  /      REGION_ID LOCATION_ID FIRST_TYPE_ID REPEAT_TYPE_ID     ---------- ----------- ------------- --------------              1           1             1              2              2           2             2              3              3           3             3              4              4           4             4              5              4           4             4              5              5           5             5              6              6           6             6              7              7           7             7              8              8           8             8              9              9           9             9             10             10          10            10             11     11 rows selected.

The result set is treated just as if it came from an Oracle table or view. Criteria could be applied to limit the result setfor example, "WHERE first_type_id = 3". Even more exciting is the fact that the results of a somewhat convoluted business process are available as a simple SQL query upon which reports can easily be built. All of the business logic to assemble the result set is done in the database.

Pipelining the function serves records up to be processed right away, saving even more valuable time.

3.4.8. The Totaling Function

Next, I'll build another table function to total the repeat orders by region, and then I'll nest it right into the SQL shown in the previous section. The new function looks like this.

     /* File on web: repeat_orders_summary.sql */     CREATE OR REPLACE FUNCTION summarize_repeat_orders ( p_curs cursors.repeat_summary_curs )                       RETURN repeat_summary_t                       PIPELINED                       PARALLEL_ENABLE ( PARTITION p_curs BY RANGE(region_id) ) AS       v_summary_rec cursors.repeat_summary_rec;       v_last_region NUMBER;       v_count       NUMBER := 0;     BEGIN       -- for every repeat order       LOOP         -- fetch the repeat order         FETCH p_curs INTO v_summary_rec;         EXIT WHEN p_curs%NOTFOUND;         -- if this is the first record then set the local         -- region ID         IF p_curs%ROWCOUNT = 1 THEN           v_last_region := v_summary_rec.region_id;         END IF;         -- if this is a new region then pipe the region count         -- out and reset the local variables         IF v_summary_rec.region_id <> v_last_region THEN           PIPE ROW(repeat_summary_o(v_last_region,v_count));           v_last_region := v_summary_rec.region_id;           v_count := 0;         END IF;         v_count := v_count + 1;       END LOOP;  -- every repeat order       -- don't forget the last record       IF v_count > 0 THEN         PIPE ROW(repeat_summary_o(v_last_region,v_count));       END IF;       RETURN;     END;

The algorithm is a straightforward loop through the fetched repeat orders, summarizing them by region. Whenever the region ID value changes in the repeat order, a result must be piped out.

The summary function is activated using the SQL SELECT statement shown here.

     SQL> SELECT *       2    FROM TABLE(summarize_repeat_orders(CURSOR(       3         SELECT *       4          FROM TABLE(repeat_order_finder(CURSOR(       5               SELECT order_number,       6                      order_date,       7                      region_id,       8                      type_id,       9                      location_id      10                 FROM orders      11                WHERE order_date >= SYSDATE - 30      12                    )))      13              )));      REGION_ID REPEAT_COUNT     ---------- ------------              1            1              2            1              3            1              4            2              5            1              6            1              7            1              8            1              9            1             10            1     10 rows selected.

The method shown here (using multiple table functions ) illustrates the nesting (or daisy-chaining ) of table functions that we introduced earlier. The work is split out among several functions, passing result sets upstream, record by record, until the final result set is assembled. Factor in the parallelization of each function along the way, and it's easy to see the enormous benefits of table functions .