3.4. Using Table FunctionsIn this section, I'll show the use of table functions in a real-world example based on a large cable company's attempt to track repeat orders. Simply put, a repeat order occurs when a technician is dispatched to a location more than once within a 30-day period to perform the same type of work or to repair the original work. A repeat visit doesn't always have a negative connotationit could result from an installation occurring within 30 days of a pre-sales call. Locations are identified by preassigned ID values. These can map to anything from a specific neighborhood cable outlet to a house or a large shopping mall. The type of work is identified using order type IDs. For example, type 1 might be "Cable Installation," and type 2 might be "Cable Upgrade." The company is divided into several regions, each with its own set of location codes, order types, and repeat-order criteria. These criteria define sets of two order types that must occur at the same location within 30 days of each other to be considered a repeat order. The two order types may or may not be the same; for example, a "Cable Installation" followed by a subsequent "Cable Repair" might be deemed a repeat order in the same way that two "Cable Installations" would. The criteria are held in this table: SQL> DESC repeat_order_criteria Name Null? Type ----------------------------------------- -------- ------ REGION_ID NUMBER START_DATE DATE FIRST_TYPE_ID NUMBER REPEAT_TYPE_ID NUMBER The table holds each region's repeat-order definitions, including the date they are in effect. Here's an example record. SQL> SELECT * 2 FROM repeat_order_criteria; REGION_ID START_DAT FIRST_TYPE_ID REPEAT_TYPE_ID ---------- --------- ------------- -------------- 1 19-APR-05 44 102 This record states that any order of type 44 followed by an order of type 102 at the same location in region 1 within 30 days constitutes a repeat order. The ORDERS table contains the fields necessary to determine an order's "repeatedness." SQL> DESC orders Name Null? Type ----------------------------------------- -------- ------ ORDER_NUMBER NOT NULL NUMBER ORDER_DATE NOT NULL DATE REGION_ID NOT NULL NUMBER TYPE_ID NOT NULL NUMBER LOCATION_ID NOT NULL NUMBER The requirement is made a little more complex by the fact that this company processes tens of thousands of orders every day, and the repeat-order criteria can change at any time, so a new result set has to be available quickly. The large number of records combined with the serialized processing of one record at a time to create the result set makes this application a perfect candidate for table functions. 3.4.1. The Function HeaderI'll start things off by explaining the header for the function I'll write. 1 CREATE FUNCTION repeat_order_finder ( p_curs cursors.repeat_orders_curs ) 2 RETURN repeat_region_location_t 3 PIPELINED 4 PARALLEL_ENABLE ( PARTITION p_curs BY RANGE(region_id) ) 5 ORDER p_curs BY (location_id, order_date) IS As far as function headers go, this one has a lot to say, so let's look at it line by line.
Just working through the header has exposed most of the power behind the table function. This function allows me to throw the power of parallel processing at what would have been serial processing of tens of thousands of records. It even allows me to define what records go to each parallel function instance so I can make some assumptions in my code. Topping that off with not having to wait for all records to be processed before sending results upstream makes me think I've achieved nirvana! Enough time on the pulpit! Let's get back to the function. 3.4.2. The Basic LoopThe main algorithm of the function is a loop that fetches records from the REF cursor as shown in this pseudo-code. BEGIN -- for every order... LOOP FETCH p_curs INTO v_order; EXIT WHEN p_curs%NOTFOUND; IF it's a repeat order then PIPE ROW( ); END IF; END LOOP; -- every order RETURN; END; It's very straightforward. Just fetch records from the passed-in cursor and evaluate them against repeat-order criteria. If a match is found, then pipe a record upstream. Now I'll add in the loading of the regions' repeat-order criteria. 3.4.3. Bulk Fetching the CriteriaBecause my table function guarantees that records will be grouped by region, I can safely assume that any time the region ID value of a fetched record changes, I will have moved on to the next region. And the first thing I want to do for each region is load its repeat-order criteria into a PL/SQL associative array. I'll accomplish this using a cursor as shown here. CURSOR curs_get_criteria ( cp_region NUMBER ) IS SELECT * FROM repeat_order_criteria WHERE region_id = cp_region; Then, within the function, I implement a simple "last region id" check to see when the value changes, and, if it does, I bulk load the criteria: -- if it's a new region... IF NVL(v_last_region,0) <> v_order.region_id THEN -- set the local region ID and bulk load -- its criteria v_last_region := v_order.region_id; OPEN curs_get_criteria(v_order.region_id); FETCH curs_get_criteria BULK COLLECT INTO v_criteria; CLOSE curs_get_criteria; END IF; -- new region This exposes another benefit of table functions the ability to implement focused data access within the function itself. This means that the database can focus on getting the orders with one query and the repeat criteria with another. Getting back to the function, the pseudo-code looks like this now that I've added the query to get the criteria: BEGIN -- for every order... LOOP FETCH p_curs INTO v_order; EXIT WHEN p_curs%NOTFOUND; IF first record or new region THEN Load region criteria END IF; IF it's a repeat order then PIPE ROW( ); END IF; END LOOP; -- every order RETURN; END; 3.4.4. Identifying Potential RepeatsNow for the slightly difficult partfinding repeat orders. I've decided to do this in two distinct operations. The first operation determines if the order is a potential outage by matching its order type and date with the first order type in a criterion. The second operation decides if the order is a genuine repeat by matching its location and order type with the repeat-order type of a potential outage. I'll explain further with some example data. Consider the following order criterion record: START_DAT FIRST_TYPE_ID REPEAT_TYPE_ID --------- ------------- -------------- 19-APR-05 801 87 This record states that an order on or after April 19, 2005, with an order type of 334 that is followed within 30 days by an order of type 87 at the same location is considered a repeat order. Now consider the following three orders. ORDER_NUMBER ORDER_DAT TYPE_ID LOCATION_ID ------------ --------- ---------- ---------- 1016 19-APR-05 801 343 1863 20-APR-05 87 343 2228 21-APR-05 87 343 When processed by my function order, 1016 would become a potential repeat order for order type 801 and location 343. Any subsequent order of type 87 within 30 days at location 343 would be considered a genuine repeat order. Thus, orders 1863 and 2228 would be counted as genuine repeats by my function. To find genuine repeat orders, I need to discover potential repeats first. For the sake of clean code, I'll implement the "potential repeats" logic as a sub-function named load_potential_repeat. First I'll show you the code, and then I'll explain it. /*------------------------------------------------------------------*/ PROCEDURE load_potential_repeat ( p_location_id NUMBER, p_type_id NUMBER, p_date DATE ) IS /*------------------------------------------------------------------*/ v_hash NUMBER; BEGIN -- for every criteria... FOR counter IN 1..v_criteria.LAST LOOP -- if the order type of the order matches that of the criteria IF v_criteria(counter).first_type_id = p_type_id THEN -- if date range is valid IF v_criteria(counter).start_date <= p_date THEN -- create a hash based on the location and two repeat criteria v_hash := DBMS_UTILITY.GET_HASH_VALUE(p_location_id || ':' || v_criteria(counter).first_type_id || ':' || v_criteria(counter).repeat_type_id, -32767,65533); -- if the criteria is not already in the potential list then -- put it there IF NOT v_potential_repeat.EXISTS(v_hash) THEN v_potential_repeat(v_hash).location_id := p_location_id; v_potential_repeat(v_hash).first_type_id := v_criteria(counter).first_type_id; v_potential_repeat(v_hash).repeat_type_id := v_criteria(counter).repeat_type_id; END IF; END IF; -- date range is valid END IF; -- order type matches END LOOP; -- every criteria END load_potential_repeat; It may look a little daunting, but it's actually a rather simple algorithm. For every criterion for the region, ask:
For example, if three separate potential repeats were found, the associative array would look something like this: INDEX LOCATION_ID FIRST_TYPE_ID REPEAT_TYPE_ID ----- ----------- ------------- -------------- -3421 874 1876 202 -99 1098 2 18 88862 18 100 88 Thus, any subsequent orders of type 202 and location 874 would be considered repeats, as would any orders of type 18 at location 1098 or type 88 at location 18. Now, let's move on to find the genuine repeats. 3.4.5. Finding Genuine RepeatsFinding genuine repeats is straightforward. If the order type matches the repeat type of a criteria record, and if a corresponding match is found in the associative array containing potential repeats, then we have found a genuine repeat. I've put this code into a sub-function, as well. /*------------------------------------------------------------------*/ FUNCTION order_is_a_repeat ( p_location_id NUMBER, p_type_id NUMBER, p_date DATE ) RETURN NUMBER IS /*------------------------------------------------------------------*/ v_hash NUMBER; BEGIN -- for every criteria... FOR counter IN 1..v_criteria.LAST LOOP -- if order type matches the repeat order type of a criteria IF v_criteria(counter).repeat_type_id = p_type_id THEN -- calculate a hash of the location, first and repeat order types v_hash := DBMS_UTILITY.GET_HASH_VALUE(p_location_id || ':' || v_criteria(counter).first_type_id || ':' || v_criteria(counter).repeat_type_id, -32767,65533); -- if logged as a potential repeat then its safe to assume -- we are repeating now IF v_potential_repeat.EXISTS(v_hash) THEN RETURN(v_hash); END IF; END IF; -- order type match END LOOP; -- every criteria RETURN(NULL); END order_is_a_repeat; The algorithm is as follows. For every criterion for the region:
3.4.6. The Final FunctionNow that all facets of the function are in place, it's time to take one last look at the function's pseudo-code. BEGIN -- for every order... LOOP FETCH p_curs INTO v_order; EXIT WHEN p_curs%NOTFOUND; IF first revord or new region THEN Load region criteria END IF; IF it's a potential repeat then add to associative array. IF it's a repeat order then PIPE ROW( ); END IF; END LOOP; -- every order RETURN; END; The complete function code is available in the repeat_orders.sql file on the book's web site. 3.4.7. Running the FunctionThe function is executed using a SQL SELECT statement that is passed another SQL SELECT statement. I know that may still take some getting used to, but believe meit's worth checking out. Here's the SQL used to run the function: /* File on web: repeat_orders.sql */ SQL> SELECT * 2 FROM TABLE(repeat_order_finder(CURSOR( 3 SELECT order_number, 4 order_date, 5 region_id, 6 type_id, 7 location_id 8 FROM orders 9 WHERE order_date >= SYSDATE - 30 10 ))) 11 / REGION_ID LOCATION_ID FIRST_TYPE_ID REPEAT_TYPE_ID ---------- ----------- ------------- -------------- 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 rows selected. The result set is treated just as if it came from an Oracle table or view. Criteria could be applied to limit the result setfor example, "WHERE first_type_id = 3". Even more exciting is the fact that the results of a somewhat convoluted business process are available as a simple SQL query upon which reports can easily be built. All of the business logic to assemble the result set is done in the database. Pipelining the function serves records up to be processed right away, saving even more valuable time. 3.4.8. The Totaling FunctionNext, I'll build another table function to total the repeat orders by region, and then I'll nest it right into the SQL shown in the previous section. The new function looks like this. /* File on web: repeat_orders_summary.sql */ CREATE OR REPLACE FUNCTION summarize_repeat_orders ( p_curs cursors.repeat_summary_curs ) RETURN repeat_summary_t PIPELINED PARALLEL_ENABLE ( PARTITION p_curs BY RANGE(region_id) ) AS v_summary_rec cursors.repeat_summary_rec; v_last_region NUMBER; v_count NUMBER := 0; BEGIN -- for every repeat order LOOP -- fetch the repeat order FETCH p_curs INTO v_summary_rec; EXIT WHEN p_curs%NOTFOUND; -- if this is the first record then set the local -- region ID IF p_curs%ROWCOUNT = 1 THEN v_last_region := v_summary_rec.region_id; END IF; -- if this is a new region then pipe the region count -- out and reset the local variables IF v_summary_rec.region_id <> v_last_region THEN PIPE ROW(repeat_summary_o(v_last_region,v_count)); v_last_region := v_summary_rec.region_id; v_count := 0; END IF; v_count := v_count + 1; END LOOP; -- every repeat order -- don't forget the last record IF v_count > 0 THEN PIPE ROW(repeat_summary_o(v_last_region,v_count)); END IF; RETURN; END; The algorithm is a straightforward loop through the fetched repeat orders, summarizing them by region. Whenever the region ID value changes in the repeat order, a result must be piped out. The summary function is activated using the SQL SELECT statement shown here. SQL> SELECT * 2 FROM TABLE(summarize_repeat_orders(CURSOR( 3 SELECT * 4 FROM TABLE(repeat_order_finder(CURSOR( 5 SELECT order_number, 6 order_date, 7 region_id, 8 type_id, 9 location_id 10 FROM orders 11 WHERE order_date >= SYSDATE - 30 12 ))) 13 ))); REGION_ID REPEAT_COUNT ---------- ------------ 1 1 2 1 3 1 4 2 5 1 6 1 7 1 8 1 9 1 10 1 10 rows selected. The method shown here (using multiple table functions ) illustrates the nesting (or daisy-chaining ) of table functions that we introduced earlier. The work is split out among several functions, passing result sets upstream, record by record, until the final result set is assembled. Factor in the parallelization of each function along the way, and it's easy to see the enormous benefits of table functions . |