UDF Cookbook

for RuBoard

The section that follows is a cookbook of UDFs I've written to address various business needs. These types of sections are always a lot of fun for me because I get to roll out some of my homegrown code and show the world what I've been up to. One of the design goals of this book is to provide you with code that has intrinsic value apart from the bookcode that you could drop into place on your own systems and put to work. That's what this section is about. It's about demonstrating some of the techniques we've been talking about (and even a few we haven't) by working through some real-world code.

An Improved SOUNDEX() Function

The built-in SOUNDEX() function is certainly a handy tool, but it uses a fairly primitive soundex algorithm, one that could be easily improved. The UDF that follows is fn_soundex(), a custom replacement for the stock SOUNDEX() function. It returns a larger number of unique soundex codes and is generally more functional than SOUNDEX(). Here's the code (Listing 10-11):

Listing 10-11 A UDF that implements a better soundex routine.
 USE master GO EXEC sp_configure 'allow updates',1 GO RECONFIGURE WITH OVERRIDE GO DROP FUNCTION system_function_schema.fn_soundex GO CREATE FUNCTION system_function_schema.fn_soundex(@instring varchar(50)) RETURNS varchar(50) /* Object: fn_soundex Description: Returns the soundex of a string (Russell optimization) Usage: SELECT fn_soundex(@instring=string to translate) Returns: string containing the soundex code Created by: Ken Henderson. Email: khen@khen.com Version: 8.0 Example: SELECT fn_soundex('Rodgers') Created: 1998-05-15. Last changed: 2000-05-20. Notes: Based on the soundex algorithm published by Robert Russell and Margaret O'Dell, 1918, extended to incorporate Russell's optimizations for finer granularity. */ AS BEGIN        DECLARE @workstr varchar(10), @soundex varchar(50)        SET @instring=UPPER(@instring)        -- Put all but the 1st char in a work buffer (we always return 1st char)        SET @soundex=RIGHT(@instring,LEN(@instring)-1)        /*        Translate characters to numbers per the following table:        Char                Number        B,F,P,V                   1        C,G,J,K,Q,S,X,Z           2        D,T                 3        L                   4        M,N                 5        R                   6        A,E,H,I,O,U,W,Y           9        */        SET @workstr='BFPV'        WHILE (@workstr<>'') BEGIN          SET @soundex=REPLACE(@soundex,LEFT(@workstr,1),'1')          SET @workstr=RIGHT(@workstr,LEN(@workstr)-1)        END        SET @workstr='CGJKQSXZ'        WHILE (@workstr<>'') BEGIN          SET @soundex=REPLACE(@soundex,LEFT(@workstr,1),'2')          SET @workstr=RIGHT(@workstr,LEN(@workstr)-1)        END        SET @workstr='DT'        WHILE (@workstr<>'') BEGIN          SET @soundex=REPLACE(@soundex,LEFT(@workstr,1),'3')          SET @workstr=RIGHT(@workstr,LEN(@workstr)-1)        END        SET @soundex=replace(@soundex,'L','4')        SET @workstr='MN'        WHILE (@workstr<>'') BEGIN          SET @soundex=REPLACE(@soundex,LEFT(@workstr,1),'5')          SET @workstr=RIGHT(@workstr,LEN(@workstr)-1)        END        set @soundex=replace(@soundex,'R','6')        SET @workstr='AEHIOUWY'        WHILE (@workstr<>'') BEGIN          SET @soundex=REPLACE(@soundex,LEFT(@workstr,1),'9')          SET @workstr=RIGHT(@workstr,LEN(@workstr)-1)        END        -- Now replace repeating digits (e.g., '11' or '22') with single digits        DECLARE @c int        SET @c=1        WHILE (@c<10) BEGIN          -- Multiply by 11 to produce repeating digits          SET        @soundex=REPLACE(@soundex,CONVERT(char(2),@c*11),CONVERT(char(1),@c))          SET @c=@c+1        END        SET @soundex=REPLACE(@soundex,'00','0') -- Get rid of double zeros        SET @soundex=REPLACE(@soundex,'9','') -- Get rid of 9's        SET @soundex=LEFT(@soundex,3)        WHILE (LEN(@soundex)<3) SET @soundex=@soundex+'0' -- Pad with zero        SET @soundex=LEFT(@instring,1)+@soundex -- Prefix first char and return        RETURN @soundex END GO EXEC sp_configure 'allow updates',0 GO RECONFIGURE WITH OVERRIDE GO SELECT fn_soundex('Rodgers') 

(Results)

 -------------------------------------------------- R326 

This function uses a better algorithm than SOUNDEX() and should perform about as well. Because it's a system function, you can use it any place that you are currently using SOUNDEX().

Although fn_soundex() is an improvement over the built-in SOUNDEX() function, it's still not as functional as it could be. Because it uses numerals in three of the four digits it returns, its total number of possible codes is only 26,000 (26 * 10 3 ). Contrast this with fn_soundex_ex (Listing 10-12), which uses alphabetic letters for each digit, for a total of 456,976 (26 4 ) possible codes. Here's the function:

Listing 10-12 fn_soundex_exa vast improvement over SOUNDEX().
 USE master GO EXEC sp_configure 'allow updates',1 GO RECONFIGURE WITH OVERRIDE GO IF OBJECT_ID('fn_soundex_ex') IS NOT NULL   DROP FUNCTION system_function_schema.fn_soundex_ex GO CREATE FUNCTION system_function_schema.fn_soundex_ex(@instring varchar(50)) RETURNS varchar(50) /* Object: fn_soundex_ex Description: Returns the soundex of a string Usage: fn_soundex_ex(@instring=string to translate) Returns: string containing soundex code Created by: Ken Henderson. Email: khen@khen.com Version: 8.0 Example: SELECT dbo.fn_soundex_ex('Rodgers') Created: 1998-05-15. Last changed: 2000-11-21. Notes: Original source unknown. Translation to Transact-SQL by Ken Henderson. */ AS BEGIN        DECLARE @workstr varchar(10), @soundex varchar(50)        SET @instring=UPPER(@instring)        -- Put all but the 1st char in a work buffer (we always return 1st char)        SET @soundex=RIGHT(@instring,LEN(@instring)-1)        SET @workstr='EIOUY' -- Replace vowels with A        WHILE (@workstr<>'') BEGIN          SET @soundex=REPLACE(@soundex,LEFT(@workstr,1),'A')          SET @workstr=RIGHT(@workstr,LEN(@workstr)-1)        END        /*        Translate word prefixes using this table        From         To        MAC          MCC        KN           NN        K            C        PF           FF        SCH          SSS        PH           FF        */        -- Reaffix first char        SET @soundex=LEFT(@instring,1)+@soundex        IF (LEFT(@soundex,3)='MAC')               SET @soundex='MCC'+RIGHT(@soundex,LEN(@soundex)-3)        IF (LEFT(@soundex,2)='KN')               SET @soundex='NN'+RIGHT(@soundex,LEN(@soundex)-2)        IF (LEFT(@soundex,1)='K')               SET @soundex='C'+RIGHT(@soundex,LEN(@soundex)-1)        IF (LEFT(@soundex,2)='PF')               SET @soundex='FF'+RIGHT(@soundex,LEN(@soundex)-2)        IF (LEFT(@soundex,3)='SCH')               SET @soundex='SSS'+RIGHT(@soundex,LEN(@soundex)-3)        IF (LEFT(@soundex,2)='PH')               SET @soundex='FF'+RIGHT(@soundex,LEN(@soundex)-2)        -- Remove first char        SET @instring=@soundex        SET @soundex=RIGHT(@soundex,LEN(@soundex)-1)        /*        Translate phonetic prefixes (following the 1st char) using this table:        From         To        DG           GG        CAAN         TAAN        D            T        NST          NSS        AV           AF        Q            G        Z            S        M            N        KN           NN        K            C        H            A (unless part of AHA)        AW           A        PH           FF        SCH          SSS        */        SET @soundex=REPLACE(@soundex,'DG','GG')        SET @soundex=REPLACE(@soundex,'CAAN','TAAN')        SET @soundex=REPLACE(@soundex,'D','T')        SET @soundex=REPLACE(@soundex,'NST','NSS')        SET @soundex=REPLACE(@soundex,'AV','AF')        SET @soundex=REPLACE(@soundex,'Q','G')        SET @soundex=REPLACE(@soundex,'Z','S')        SET @soundex=REPLACE(@soundex,'M','N')        SET @soundex=REPLACE(@soundex,'KN','NN')        SET @soundex=REPLACE(@soundex,'K','C')        -- Translate H to A unless it's part of "AHA"        SET @soundex=REPLACE(@soundex,'AHA','~~~')        SET @soundex=REPLACE(@soundex,'H','A')        SET @soundex=REPLACE(@soundex,'~~~','AHA')        SET @soundex=REPLACE(@soundex,'AW','A')        SET @soundex=REPLACE(@soundex,'PH','FF')        SET @soundex=REPLACE(@soundex,'SCH','SSS')        -- Truncate ending A or S        IF (RIGHT(@soundex,1)='A' or RIGHT(@soundex,1)='S')               SET @soundex=LEFT(@soundex,LEN(@soundex)-1)        -- Translate ending "NT" to "TT"        IF (RIGHT(@soundex,2)='NT')               SET @soundex=LEFT(@soundex,LEN(@soundex)-2)+'TT'        -- Remove all As        SET @soundex=REPLACE(@soundex,'A','')        -- Reaffix first char        SET @soundex=LEFT(@instring,1)+@soundex        -- Remove repeating characters        DECLARE @c int        SET @c=65        WHILE (@c<91) BEGIN          WHILE (CHARINDEX(char(@c)+CHAR(@c),@soundex)<>0)            SET @soundex=REPLACE(@soundex,CHAR(@c)+CHAR(@c),CHAR(@c))          SET @c=@c+1        end        SET @soundex=LEFT(@soundex,4)        -- Pad with spaces        IF (LEN(@soundex)<4) SET @soundex=@soundex+SPACE(4-LEN(@soundex))        RETURN(@Soundex) END GO EXEC sp_configure 'allow updates',0 GO RECONFIGURE WITH OVERRIDE GO USE Northwind GO SELECT fn_soundex_ex(LastName) AS ex_Last,        fn_soundex_ex(FirstName) AS ex_First, SOUNDEX(LastName) AS bi_Last, SOUNDEX(FirstName) AS bi_First FROM employees 

(Results)

 ex_Last                ex_First              bi_Last bi_First ---------------------- --------------------- ------- -------- DFL                    NC                    D140    N520 FLR                    ANTR                  F460    A536 LFRL                   JNT                   L164    J530 PC                     MRGR                  P220    M626 BCN                    STFN                  B255    S315 SN                     MCL                   S500    M240 CNG                    RBRT                  K520    R163 CLHN                   LR                    C450    L600 DTSW                   AN                    D326    A500 

As you can see, UDFs can be passed values from columns directly. This is the chief advantage of a UDF over a stored procedure: You can use it in DML statements to refer directly to table data.

Statistical Functions

With the advent of UDF support in SQL Server, Transact-SQL has become much more capable as a statistics computation language. Given its set orientation and direct access to data, you may find that it outperforms traditional statistics computational packages in certain specific circumstances. There is a learning curve associated with most commercial statistics tools, so it makes sense to do operations that Transact-SQL can easily handle in UDFs and stored procedures, and only resort to industrial-strength statistics packages for the really heavy lifting .

Clipping

The process of removing the topmost and bottommost values from a distribution set is called clipping. In statistics computations , we frequently want to toss out the highest and lowest members of a set so that we can focus on more typical values. The function in Listing 10-13, MiddleTemperatures(), shows you how to do this. It starts with a list of temperature samples and removes a user -specified section of the head and tail. It's implemented as an inline function so that you don't have the overhead of storing the slimmed-down version of the distribution anywhere . It simply acts as a parameterized viewyou supply the size of the clipped region, and it takes care of the rest. Here's the code:

Listing 10-13 You can use a UDF to perform statistical clipping.
 USE tempdb GO CREATE TABLE tempdb..TemperatureReadings (MiddayTemp int) INSERT tempdb..TemperatureReadings VALUES (75) INSERT tempdb..TemperatureReadings VALUES (90) INSERT tempdb..TemperatureReadings VALUES (76) INSERT tempdb..TemperatureReadings VALUES (81) INSERT tempdb..TemperatureReadings VALUES (98) INSERT tempdb..TemperatureReadings VALUES (68) GO DROP FUNCTION dbo.MiddleTemperatures GO CREATE FUNCTION dbo.MiddleTemperatures(@ClipSize int = 2) RETURNS TABLE AS    RETURN(SELECT v.MiddayTemp    FROM tempdb..TemperatureReadings v CROSS JOIN tempdb..TemperatureReadings a    GROUP BY v.MiddayTemp    HAVING COUNT(CASE WHEN a.MiddayTemp <=v.MiddayTemp THEN 1 ELSE NULL END)    >@ClipSize    AND COUNT(CASE WHEN a.MiddayTemp >= v.MiddayTemp THEN 1 ELSE NULL END)    >@ClipSize) GO SELECT * FROM dbo.MiddleTemperatures(2) ORDER BY MiddayTemp 

(Results)

 MiddayTemp ----------- 76 81 Warning: Null value is eliminated by an aggregate or other SET operation. 

The @ClipSize parameter specifies the size of the clipped region. It indicates how many rows to remove from the top and bottom of the distribution before returning it.

Histograms

Histogramsa kind of bar chart in which the lengths or widths of its bars represent data valuesare a popular business reporting tool. You see them on everything from your utility bill to the stock prospectus brochures of the largest companies. The best reporting and charting tools have built-in features for producing histograms from relational data. Usually they can group, summarize, and extrapolate histogram data in numerous ways. That said, there are times when you need to compute your own histogramsto produce cross tabs from linear data that could function as the source to a histogram chart or be fed into a more complex routine for further processing. The following code shows you how to create histogram-oriented data using a UDF. It produces a two-dimensional histogram cross tab using the data from the pubs..sales table. You supply a filtering condition, and it takes care of the rest:

 USE pubs GO DROP FUNCTION dbo.SalesHistogram GO CREATE FUNCTION dbo.SalesHistogram(@payterms varchar(12)='%') RETURNS TABLE AS RETURN( SELECT PayTerms=isnull(s.payterms,'NA'), "Less than 10"=COUNT(CASE WHEN s.sales >=0 AND s.sales <10 THEN 1 ELSE NULL END), "10-19"=COUNT(CASE WHEN s.sales >=10 AND s.sales <20 THEN 1 ELSE NULL END), "20-29"=COUNT(CASE WHEN s.sales >=20 AND s.sales <30 THEN 1 ELSE NULL END), "30-39"=COUNT(CASE WHEN s.sales >=30 AND s.sales <40 THEN 1 ELSE NULL END), "40-49"=COUNT(CASE WHEN s.sales >=40 AND s.sales <50 THEN 1 ELSE NULL END), "50 or more"=COUNT(CASE WHEN s.sales >=50 THEN 1 ELSE NULL END) FROM (SELECT t.title_id, s.payterms, sales=ISNULL(SUM(s.qty),0) FROM titles t LEFT OUTER JOIN sales s ON (t.title_id=s.title_id) GROUP BY t.title_id, payterms) s WHERE s.payterms LIKE @payterms GROUP BY s.payterms ) GO SELECT * FROM dbo.SalesHistogram(DEFAULT) 

(Results)

 PayTerms     Less than 10 10-19       20-29       30-39   40-49   50 or more ------------ ------------ ----------- ----------- ------- ------- ----------- Net 30       0            0           5           1       1       1 Net 60       1            4           3           0       0       0 ON invoice   0            2           0           1       0       1 Warning: Null value is eliminated by an aggregate or other SET operation. 

Here we specify the pay terms we're interested in seeing, and SalesHistogram() creates a cross tab representing unit sales organized by ranges. In the previous example, we allowed all pay terms to be displayed because we supplied the DEFAULT parameter. We could just as easily have listed just one set of pay terms data (Listing 10-14):

Listing 10-14 A UDF that implements a basic histogram table.
 SELECT * FROM dbo.SalesHistogram('Net 30') 

(Results)

 PayTerms     Less than 10 10-19       20-29       30-39   40-49   50 or more ------------ ------------ ----------- ----------- ------- ------- ----------- Net 30       0            0           5           1       1       1 Warning: Null value is eliminated by an aggregate or other SET operation. 
Time Series Fluctuation

It's common to need to track the fluctuation of a value across a time series. An obvious example of this is stock price fluctuation. The various exchanges report stock prices on a regular basis throughout the business day, with each stock having a daily opening and closing price. The function presented in Listing 10-15, StockPriceFluctuation(), shows how to report time series fluctuation using stock prices as sample data. It consists of an inline UDF that joins the stock prices table with itself and matches opening and closing dates for (approximate) weekly periods, then reports on the fluctuation of the stock price over each period. The sample data lists the actual weekly closing prices for Microsoft (MSFT) and Oracle (ORCL) over a six-month period of time beginning in early July 2000 and ending in early January 2001. Here's the data and code:

Listing 10-15 A UDF that reports time series fluctuation.
 USE tempdb go CREATE TABLE dbo.stockprices (Symbol varchar(4), TradingDate smalldatetime, ClosingPrice decimal(10,4)) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000706',   82.000) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000710',   78.938) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000717',   72.313) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000724',   69.688) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000731',   69.125) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000807',   72.438) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000814',   71.000) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000821',   70.625) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000828',   70.188) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000905',   69.313) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000911',   64.188) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000918',   63.250) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20000925',   60.313) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001002',   55.563) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001009',   53.750) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001016',   65.188) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001023',   67.688) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001030',   68.250) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001106',   67.375) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001113',   69.063) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001120',   69.938) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001127',   56.625) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001204',   54.438) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001211',   49.188) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001218',   46.438) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20001226',   43.375) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20010102',   49.125) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20010108',   53.500) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000706',   37.938) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000710',   38.063) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000717',   37.719) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000724',   36.188) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000731',   40.781) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000807',   40.563) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000814',   40.656) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000821',   42.313) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000828',   46.313) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000905',   43.281) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000911',   39.156) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000918',   40.367) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20000925',   39.375) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001002',   33.813) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001009',   35.625) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001016',   35.250) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001023',   34.188) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001030',   30.313) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001106',   25.438) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001113',   28.813) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001120',   24.125) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001127',   26.438) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001204',   30.063) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001211',   28.563) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001218',   31.875) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20001226',   29.063) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20010102',   30.125) INSERT dbo.stockprices (Symbol, TradingDate, ClosingPrice) VALUES ('ORCL','20010108',   32.313) DROP FUNCTION dbo.StockPriceFluctuation GO CREATE FUNCTION StockPriceFluctuation(@Symbol varchar(10),        @StartDate smalldatetime='19900101',        @EndDate smalldatetime='20100101') RETURNS TABLE AS RETURN(        SELECT        v.Symbol,        StartDate=CONVERT(char(8),v.TradingDate,112),        EndDate=CONVERT(char(8), a.TradingDate,112),        StartingPrice=v.ClosingPrice,        EndingPrice=a.ClosingPrice,        Change=SUBSTRING('- +',               CAST(SIGN(a.ClosingPrice-v.ClosingPrice)+2 AS               int),1)+CAST(ABS(a.ClosingPrice-v.ClosingPrice) AS varchar)        FROM          (SELECT Symbol, TradingDate, ClosingPrice,               ranking=(SELECT COUNT(DISTINCT TradingDate)                      FROM dbo.stockprices u                      WHERE u.TradingDate <= l.TradingDate)               FROM dbo.stockprices l) v           LEFT OUTER JOIN          (SELECT Symbol, TradingDate, ClosingPrice,               ranking=(SELECT COUNT(DISTINCT TradingDate)                      FROM dbo.stockprices u                      WHERE u.TradingDate <= l.TradingDate)               FROM dbo.stockprices l) a           ON (a.ranking=v.ranking+1)           WHERE v.Symbol = @Symbol AND a.Symbol = @Symbol           AND a.TradingDate IS NOT NULL           AND v.TradingDate BETWEEN @StartDate AND @EndDate           AND a.TradingDate BETWEEN @StartDate AND @EndDate ) GO SELECT * FROM StockPriceFluctuation('ORCL',DEFAULT,DEFAULT) ORDER BY StartDate 

(Results)

 Symbol StartDate EndDate  StartingPrice EndingPrice  Change ------ --------- -------- ------------- ------------ --------------------- ORCL   20000706  20000710 37.9380       38.0630      +0.1250 ORCL   20000710  20000717 38.0630       37.7190      -0.3440 ORCL   20000717  20000724 37.7190       36.1880      -1.5310 ORCL   20000724  20000731 36.1880       40.7810      +4.5930 ORCL   20000731  20000807 40.7810       40.5630      -0.2180 ORCL   20000807  20000814 40.5630       40.6560      +0.0930 ORCL   20000814  20000821 40.6560       42.3130      +1.6570 ORCL   20000821  20000828 42.3130       46.3130      +4.0000 ORCL   20000828  20000905 46.3130       43.2810      -3.0320 ORCL   20000905  20000911 43.2810       39.1560      -4.1250 ORCL   20000911  20000918 39.1560       40.3670      +1.2110 ORCL   20000918  20000925 40.3670       39.3750      -0.9920 ORCL   20000925  20001002 39.3750       33.8130      -5.5620 ORCL   20001002  20001009 33.8130       35.6250      +1.8120 ORCL   20001009  20001016 35.6250       35.2500      -0.3750 ORCL   20001016  20001023 35.2500       34.1880      -1.0620 ORCL   20001023  20001030 34.1880       30.3130      -3.8750 ORCL   20001030  20001106 30.3130       25.4380      -4.8750 ORCL   20001106  20001113 25.4380       28.8130      +3.3750 ORCL   20001113  20001120 28.8130       24.1250      -4.6880 ORCL   20001120  20001127 24.1250       26.4380      +2.3130 ORCL   20001127  20001204 26.4380       30.0630      +3.6250 ORCL   20001204  20001211 30.0630       28.5630      -1.5000 ORCL   20001211  20001218 28.5630       31.8750      +3.3120 ORCL   20001218  20001226 31.8750       29.0630      -2.8120 ORCL   20001226  20010102 29.0630       30.1250      +1.0620 ORCL   20010102  20010108 30.1250       32.3130      +2.1880 

As you can see, this routine plays the role of a parameterized view, just as the SalesHistogram() function did. As with all inline and table-valued functions, you can aggregate the results returned by the function as though they resided in a table:

 SELECT SUM(CAST(Change AS decimal(10,2))) FROM StockPriceFluctuation('ORCL',DEFAULT,DEFAULT) 

(Results)

 ---------------------------------------- -5.63 

So the value of Oracle Corporation's stock declined by approximately $5.63/share over the six-month period between July 2000 and January 2001. Because we also have data for Microsoft, let's see how it did (Listing 10-16):

Listing 10-16 Time series fluctuation for Microsoft stock in 2H 2000.
 SELECT * FROM StockPriceFluctuation('MSFT',DEFAULT,DEFAULT) ORDER BY StartDate 

(Results)

 Symbol StartDate EndDate  StartingPrice EndingPrice  Change ------ --------- -------- ------------- ------------ ---------------------- MSFT   20000706  20000710 82.0000       78.9380      -3.0620 MSFT   20000710  20000717 78.9380       72.3130      -6.6250 MSFT   20000717  20000724 72.3130       69.6880      -2.6250 MSFT   20000724  20000731 69.6880       69.1250      -0.5630 MSFT   20000731  20000807 69.1250       72.4380      +3.3130 MSFT   20000807  20000814 72.4380       71.0000      -1.4380 MSFT   20000814  20000821 71.0000       70.6250      -0.3750 MSFT   20000821  20000828 70.6250       70.1880      -0.4370 MSFT   20000828  20000905 70.1880       69.3130      -0.8750 MSFT   20000905  20000911 69.3130       64.1880      -5.1250 MSFT   20000911  20000918 64.1880       63.2500      -0.9380 MSFT   20000918  20000925 63.2500       60.3130      -2.9370 MSFT   20000925  20001002 60.3130       55.5630      -4.7500 MSFT   20001002  20001009 55.5630       53.7500      -1.8130 MSFT   20001009  20001016 53.7500       65.1880      +11.4380 MSFT   20001016  20001023 65.1880       67.6880      +2.5000 MSFT   20001023  20001030 67.6880       68.2500      +0.5620 MSFT   20001030  20001106 68.2500       67.3750      -0.8750 MSFT   20001106  20001113 67.3750       69.0630      +1.6880 MSFT   20001113  20001120 69.0630       69.9380      +0.8750 MSFT   20001120  20001127 69.9380       56.6250      -13.3130 MSFT   20001127  20001204 56.6250       54.4380      -2.1870 MSFT   20001204  20001211 54.4380       49.1880      -5.2500 MSFT   20001211  20001218 49.1880       46.4380      -2.7500 MSFT   20001218  20001226 46.4380       43.3750      -3.0630 MSFT   20001226  20010102 43.3750       49.1250      +5.7500 MSFT   20010102  20010108 49.1250       53.5000      +4.3750 

And, because we've listed fluctuation from week to week within the series, let's look at the total change over the entire six-month period, as we did with Oracle:

 SELECT SUM(CAST(Change AS decimal(10,2))) FROM StockPriceFluctuation('MSFT',DEFAULT,DEFAULT) 

(Results)

 ---------------------------------------- -28.52 

Looks like Microsoft had a bad six-month run at the end of 2000 as well. Fortunately, both stocks appeared to be climbing at the end of the sampling period.

Trend Analysis

Beyond listing fluctuation from sample point to sample point, we often want to identify trends in data, especially time series data. The function that follows in Listing 10-17, StockPriceTrends(), shows you how to identify trends in series-oriented data.

Technically, a trend is simply a sequential region or subsection of the data that conforms to some predetermined criteria. For example, we may be looking for members of the distribution that have the same absolute value or the same value relative to one another, or that qualify in some other way. Identifying these regions helps us analyze the trends they represent. StockPriceTrends() identifies regions within stock price data where the price increased from week to week. It uses a variation on the earlier StockPrices table that includes an identity column. Here's the code:

Listing 10-17 You can use UDFs to identify complex data trends.
 DROP FUNCTION dbo.StockPriceTrend GO CREATE FUNCTION StockPriceTrend(@Symbol varchar(10),        @StartDate smalldatetime='19900101',        @EndDate smalldatetime='20100101') RETURNS TABLE AS RETURN(        SELECT v.TradingDate, v.ClosingPrice        FROM dbo.StockPrices v JOIN dbo.StockPrices a        ON ((a.ClosingPrice >= v.ClosingPrice) AND (a.SampleId = v.SampleId+1))        OR ((a.ClosingPrice <= v.ClosingPrice) AND (a.SampleId = v.SampleId-1))        WHERE a.Symbol=@Symbol AND v.Symbol=@Symbol        AND v.TradingDate BETWEEN @StartDate AND @EndDate        AND a.TradingDate BETWEEN @StartDate AND @EndDate        GROUP BY v.TradingDate, v.ClosingPrice ) GO SELECT * FROM dbo.StockPriceTrend('MSFT',DEFAULT,DEFAULT) ORDER BY TradingDate 

(Results)

 TradingDate                                            ClosingPrice ------------------------------------------------------ ------------ 2000-07-31 00:00:00                                    69.1250 2000-08-07 00:00:00                                    72.4380 2000-10-09 00:00:00                                    53.7500 2000-10-16 00:00:00                                    65.1880 2000-10-23 00:00:00                                    67.6880 2000-10-30 00:00:00                                    68.2500 2000-11-06 00:00:00                                    67.3750 2000-11-13 00:00:00                                    69.0630 2000-11-20 00:00:00                                    69.9380 2000-12-26 00:00:00                                    43.3750 2001-01-02 00:00:00                                    49.1250 2001-01-08 00:00:00                                    53.5000 

If you look back at the data itself, you'll see that each of the sample points listed here belongs to a sequence of data in which the stock price increased from week to week. You can infer the length of the trend by computing the number of sequential weeks each region spans . For example, the first price increase trend begins on July 31st and runs through August 7th. Then we have a gap of a couple months, and then another increase trend starting on October 10th. This trend runs through November 20th. Encapsulating the code in a UDF makes identifying the trends as easy as querying a table.

Least Squares Linear Regression

When the relationship between two variables is approximately linear, it can be summarized with a straight line. A statistical modeling technique for establishing this relationship is known as least sum of squares linear regression. It's what most people mean when they say they're using least squares, regression, or linear regression to fit a model to their data.

Least squares regression allows us to draw a regression line through the points on a two-dimensional plot of the data points in a series. It allows us to establish the relationship between the x- and y-coordinates on the plot.

The function in Listing 10-18, LSLR_StockPrices() extends our stock price metaphor a bit further and computes the slope intercept and coefficient on the stock data we've been working with. In this example, the week of the year is the x-coordinate and the stock price is the y-coordinate, although you could extend the code to work with any two-dimensional data. I've included the week of the year as a computed column in the table to make the routine easier to follow. Here's the code:

Listing 10-18 A linear squares UDF.
 DROP FUNCTION dbo.LSLR_StockPrices GO CREATE FUNCTION dbo.LSLR_StockPrices() RETURNS @LSLR TABLE (SlopeCoefficient decimal(38,4),                     SlopeIntercept decimal(38,3)) AS BEGIN   DECLARE @MeanX decimal (38,4),          @MeanY decimal (38,4),          @Count decimal (38,4),          @SlopeCoefficient decimal (38,4)   DECLARE @WorkTable   TABLE (x decimal(38,4),          y decimal(38,4),          XDeviation decimal(38,4),          YDeviation decimal(38,4),          CrossProduct decimal(38,4),          XDevSquared decimal(38,4),          YDevSquared decimal(38,4)   )   -- Get the means of x and y, and the total number of values   SELECT  @MeanX=AVG(TradingWeek),           @MeanY=AVG(ClosingPrice),           @Count=COUNT(*)   FROM dbo.StockPrices   -- Store the deviations for each point,   -- the cross product of the deviations,   -- and the squares of the deviations   INSERT @WorkTable   SELECT     TradingWeek,     ClosingPrice,     TradingWeek-@MeanX,     ClosingPrice-@MeanY,     (TradingWeek-@MeanX)*(ClosingPrice-@MeanY),     POWER(TradingWeek-@MeanX, 2),     POWER(ClosingPrice-@MeanY, 2)   FROM dbo.StockPrices   -- Compute the slope coefficient   SELECT @SlopeCoefficient =   ((@Count * SUM(CrossProduct)) - SUM(x) * SUM(y))    /   ((@Count * SUM(XDevSquared)) - POWER(SUM(x), 2))   FROM @WorkTable   -- Insert the slope coefficient and the slope intercept   -- into the table to return   INSERT @LSLR SELECT @SlopeCoefficient,   (@MeanY - (@SlopeCoefficient * @MeanX))     AS SlopeIntercept -- (For clarity)   RETURN END GO SELECT 'Slope-intercept equation is y = '        +CAST(SlopeCoefficient AS varchar(10))+'x + '        +CAST(SlopeIntercept AS varchar(10)) FROM LSLR_StockPrices() 

(Results)

 -------------------------------------------------------- Slope-intercept equation is y = 1.7903x + -2.811 

This presents us with a picture of the relationship between x (the week of the year) and y (the stock price). It's expressed in the form of y = mx + b, the slope-intercept formula. Using it, we can predict future values for y (stock prices) if we have x. Of course, there's no strong correlation between the week of the year and the stock price. Nothing about the passage of time makes stock go up or down in value. For that matter, there's no concrete way to predict stock prices with certainty using any formula, but this function at least gives a semblance of an analysis tool.

Recursion

As with stored procedures and triggers, Transact-SQL UDFs support recursion. A UDF can call itself without having to worry about stack management issues or reentrancy. Listing 10-19 is a simple function that uses recursion to compute a number's factorial:

Listing 10-19 Transact-SQL UDFs support recursion.
 USE tempdb GO IF OBJECT_ID('dbo.Factorial') IS NOT NULL   DROP FUNCTION dbo.Factorial GO CREATE FUNCTION dbo.Factorial(@base_number decimal(38,0)) RETURNS decimal(38,0) AS BEGIN        DECLARE @previous_number decimal(38,0), @factorial decimal(38,0)        IF ((@base_number>26) and (@@MAX_PRECISION<38)) OR (@base_number>32)           RETURN(NULL)        IF (@base_number<0)          RETURN(NULL)        IF (@base_number<2) SET @factorial=1 -- Factorial of 0 or 1=1        ELSE BEGIN          SET @previous_number=@base_number-1  SET @factorial=dbo   .   Factorial(@previous_number)  -- Recursive call          IF (@factorial=-1) RETURN(NULL) -- Got an error, return          SET @factorial=@factorial*@base_number          IF (@@ERROR<>0) RETURN(NULL) -- Got an error, return        END   RETURN(@factorial) END GO SELECT dbo.Factorial(32) AS Factorial Factorial ---------------------------------------- 263130836933693530167218012160000000 

Note the bold line in Listing 10-19. It's the recursive call of the function to itself. Factorial() continues to call itself until the entire computation is complete.

Parameterized UDFs

One of the limitations of UDFs is that you cannot parameterize the objects they work with. That is, if a UDF is coded to work with the StockPrices table, there's no way to have it work with a different tablethere's no way to generalize the function. You can't pass the table name in as a parameter to the UDF. If you have ten StockPrice tables named StockPrice0 through StockPrice9, you'll need ten LSLR_StockPrices() functions to work with them.

Because you can't call stored procedures from UDFs or use INSERT EXEC with table variables, there's no out-of-the-box method for generalizing a UDF so that it can work with any table. Microsoft has made this fairly airtight, but they did leave one backdoorextended stored procedures. You can execute them from UDFs. This is where xp_exec comes in. Xp_exec is an extended procedure that you can call to execute ad hoc Transact-SQL. You can call it from a UDF. It takes three parameters: a string containing the query to run, a Yes/No flag indicating whether to enlist in the caller's transaction, and the name of the database in which to run the query. You can find the complete discussion of xp_exec in Chapter 20. For now, let's look at a technique that addresses the UDF shortcomings I just mentioned through the use of xp_exec (Listing 10-20):

Listing 10-20 A UDF that breaks all the UDF rules.
 USE tempdb GO CREATE TABLE dist (c1 int) GO INSERT dist VALUES (1) INSERT dist VALUES (2) INSERT dist VALUES (2) INSERT dist VALUES (3) INSERT dist VALUES (3) INSERT dist VALUES (4) INSERT dist VALUES (5) INSERT dist VALUES (8) GO DROP FUNCTION MEDIAN GO CREATE FUNCTION dbo.MEDIAN(@Tablename sysname, @Colname sysname) RETURNS @Median Table (Median sql_variant) AS BEGIN   DECLARE @funcsql varchar(8000)   SET @funcsql='  CREATE FUNCTION dbo   .   MedianPrim()   RETURNS @MedianTab Table(Median sql_variant)   AS   BEGIN   INSERT @MedianTab   SELECT Median=AVG(c1) FROM (   SELECT MIN(c1) AS c1 FROM (   SELECT TOP 50 PERCENT '+@Colname+' AS c1 FROM '+@Tablename+'   ORDER BY c1 DESC) t   UNION ALL   SELECT MAX(c1) FROM (   SELECT TOP 50 PERCENT '+@Colname+' AS c1 FROM '+@Tablename+'   ORDER BY c1) t   ) M   RETURN   END   '  EXEC master..xp_exec 'DROP FUNCTION dbo.MEDIANPRIM','N','tempdb'   EXEC master..xp_exec @funcsql,'N','tempdb'   INSERT @median SELECT * FROM MEDIANPRIM()   RETURN END GO SELECT * FROM Median('dist','c1') 

(Results)

 Median ----------- 3 

In this example, we create a table-valued function named Median() that takes two parameters: the name of the table on which to compute the median and the name of the column to use to compute it. Let's examine the Median() function to see how it works.

Note the large @funcsql varchar at the top of the function. What is it? It stores the text of a second UDF, called MedianPrim(), that Median() creates via xp_exec. Median() has to create MedianPrim() for two reasons:

  1. So that it can INSERT SELECT the dist table's median into its return result. UDFs can't INSERT EXEC, but they can INSERT SELECT.

  2. There's no other way for it to dynamically specify the table and column to use for its calculation.

From a performance standpoint, the technique that MedianPrim() uses to compute the median of the table is very efficient. It should execute almost instantly on even very large sets of data.

So, Median() creates MedianPrim() via xp_exec, then INSERT SELECTs its result. This value is then returned as Median()'s function result. We could have coded both functions to return numeric values instead of tables, but using a table allows us to return multiple values should we decide to (e.g., for vectored medians, and so on).

So. There you have it. A means of getting around most UDF restrictions and making a function that can run any Transact-SQL you specify. Just build a function that follows the general pattern laid out in the Median() routine and voila! you have a UDF that can do almost anything.

You could rewrite most of the table-valued functions in this chapter to take tables/columns as parameters. Many of them would benefit immensely from this flexibility. For example, the StockPriceFluctuation() and StockPriceTrend() functions would benefit enormously from being able to refer to other objects.

Before you jump in and start coding every function like Median(), keep these caveats in mind:

  1. As mentioned in Chapter 20, xp_exec connects back to the server using an ODBC connection. It connects using the LocalServer system DSN, so this will need to exist on the SQL Server machine.

  2. Never specify Y for xp_exec's second parameter when calling xp_exec from a UDF. This parameter specifies whether to join the transaction of the caller. You should not try to enlist in the transaction space of a UDFit does not work consistently and Microsoft has recommended against it.

  3. Queries you run with xp_exec must be less than 8000 bytes in size. This isn't a huge limitation, but it's something to be aware of. The reason for this is obvious. I'm using a varchar to define the parameter within the procedure, and varchars are limited to 8000 bytes.

  4. If you want to run your query in a database other than the default specified by the LocalServer DSN or for your login name, you must pass it as xp_exec's third parameter. This parameter is a varchar(128). It doesn't support Unicode types such as nvarchar or sysname.

  5. By its very nature, this technique is not multiuser, reentrant, or thread safe. Obviously, if two users try to create MedianPrim() at the same time with two different tables, you've got a problem. If SQL Server calls multiple instances of Median() in parallel threads, you've also got a problem because these calls may well collide with one another. You can disable parallel query execution for a query via the MAXDOP query hint. And, of course, you can't nest recursive calls to this type of function. If MedianPrim() were to call Median(), it would fail because the server would not allow it to drop itself. There's a workaround for the multiuser issue that I'll show you in a moment, but given that UDFs can't reference temporary objects, there's really no seamless solution. It's best to save functions like this for use only in very unusual circumstances.

So, now that we have a working generalized UDF, let's try it out against a different table:

 SELECT * FROM Median('StockPrices','ClosingPrice') 

(Results)

 Median --------- 67.531500 

Of course, this is the StockPrices table we created in some of the earlier examples. We specify the name of the table and the column on which to compute the median, and the UDF takes care of the rest.

What happens if we want to filter the data in some way? As currently coded, the UDF examines the entirety of the table you pass in. What if you wanted to specify some filter criteria or grouping that should occur before the UDF does its magic? Very simple. Pass a derived table expression into the UDF, like this:

 SELECT * FROM Median('  (SELECT * FROM StockPrices WHERE Symbol=''MSFT'') AS sp',  'ClosingPrice') 

The bold string in the previous query is a derived table expression. Have a look at the @funcsql variable in the Median() function, and you'll understand why this works. Because of how we've laid out the function, either a table name or a complete derived table expression can be dropped in place in the MedianPrim() function. You can use this technique to filter or group the data that the UDF sees however you want.

I mentioned earlier that the parameterized UDF technique is inherently a single-user approach. The main reason for this is that the technique depends on the ability to drop and recreate an object while it executes, and it cannot use a unique name or temporary table for this object because UDFs do not allow it. So, every execution of the Median() function drops and recreates the same object: the MedianPrim() function. This means that multiple simultaneous executions would likely collide. What can you do about this? The obvious thing is not to use it in situations in which multiple users may execute it at the same time. Depending on your situation, this may not be practical, so I'll give you a workaround.

Given that we control the code that's generated to create MedianPrim(), we can owner-qualify the function with the name of the current user, then leave off the owner in the INSERT SELECT back in the Median() function. If we're logged in as, say, JoeUser and we're not a db_owner, MedianPrim() will be created as JoeUser.MedianPrim(), and, as long as no one else logs in with the same account, we won't have to worry about the multiuser collisions I spoke of earlier. Because the owner qualification is omitted in Median(), SQL Server looks first to see whether user JoeUser owns an object named MedianPrim(), and, because he does, that's the function that Median() will end up executing.

Obviously, this workaround requires that your users use different login names and restricts them from being db_owners (a db_owner's database name always reports as "dbo"; hence, we're right back where we started in terms of name collisions). This workaround also requires that you specifically grant rights on both the underlying table that MedianPrim() will be querying and the Median() function. Listing 10-21 provides code that demonstrates this:

Listing 10-21 You can work around the multiuser issue using the USER_NAME() function.
 SET NOCOUNT ON USE tempdb go DROP TABLE dbo.StockPrices GO CREATE TABLE dbo.StockPrices (Symbol varchar(4), TradingDate smalldatetime, ClosingPrice decimal(10,4)) /* Code abridged */ INSERT dbo.StockPrices (Symbol, TradingDate, ClosingPrice) VALUES ('MSFT','20010108',   53.500) GO  GRANT SELECT ON dbo   .   StockPrices TO public  GO DROP FUNCTION MEDIAN GO CREATE FUNCTION dbo.MEDIAN(@Tablename sysname, @Colname sysname) RETURNS @Median Table (Median sql_variant) AS BEGIN   DECLARE @funcsql varchar(8000), @cmd varchar(255)   SET @funcsql=' CREATE FUNCTION '  +USER_NAME()+  '.MedianPrim() RETURNS @MedianTab Table(Median sql_variant) AS BEGIN INSERT @MedianTab SELECT Median=AVG(c1) FROM (        SELECT MIN(c1) AS c1 FROM (        SELECT TOP 50 PERCENT '+@Colname+' AS c1 FROM '+@Tablename+'        ORDER BY c1 DESC) t UNION ALL        SELECT MAX(c1) FROM (        SELECT TOP 50 PERCENT '+@Colname+' AS c1 FROM '+@Tablename+'        ORDER BY c1) t ) M RETURN END '   SET @cmd='DROP FUNCTION '  +USER_NAME()+  '.MEDIANPRIM'   EXEC master..xp_exec @cmd,'N','tempdb'   EXEC master..xp_exec @funcsql,'N','tempdb'  INSERT @median SELECT * FROM MEDIANPRIM()  RETURN END GO  GRANT SELECT ON dbo   .   Median TO public  GO 

This code creates the table and the Median() function and grants SELECT access on both of them to the public group. Note the use of the USER_NAME() function within the Median() function. For non-db_owners, this provides a modicum of isolation between concurrent executions of parameterized functionsa kind of namespace, if you will.

Once this is complete, the stage is set for other users to compute medians using the parameterized UDF. They query just as the dbo or any other user would:

 USE tempdb GO SELECT * FROM Median('dbo.TemperatureData','HiTemp') 
for RuBoard


The Guru[ap]s Guide to SQL Server[tm] Stored Procedures, XML, and HTML
The Guru[ap]s Guide to SQL Server[tm] Stored Procedures, XML, and HTML
ISBN: 201700468
EAN: N/A
Year: 2005
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net