Managing Data Manipulation with User-Defined Functions

Managing Data Manipulation with User -Defined Functions

Managing data manipulation with user-defined functions.

User-defined functions have been on the most- requested -feature list of most SQL Server programmers for a very long time. The system-defined functions, such as USER_NAME() , SERVER_NAME , CONVERT() , and GETDATE() , are used constantly; they provide the functionality that makes T-SQL useful in many respects. By passing back basic values, user-defined functions can be used in many places where stored procedures just wouldn't work.

Types of User-Defined Functions

There are three types of functions. Scalar functions are functions that return a single scalar data value. The word scalar in this context means "single value," and includes variables of the type int , bigint , varchar , and so on. Scalar functions cannot return data of a user-defined data type, a table type, a cursor, or the usual exceptions to most rules: text, ntext, timestamp, and image.

Table-valued functions return variables of type table , which were discussed in Chapter 6. These are essentially variables that are accessed like tables, with rows and columns . There are two types of table-valued functions: inline and multi-statement . An inline table-valued function is one statement long, it has no BEGIN or END , and the table that results is from one SELECT statement. A multi-statement table-valued function has a BEGIN and an END , and returns a table.

Restrictions on User-Defined Functions

User-defined functions have a whole list of things that they cannot do, which can be summed up with one term : side effects . In classic computer science jargon, a side effect is something that a function does that changes something outside the function. In general, functions are allowed to look at anything they want, but they cannot change anything except the variables that are declared inside the function. Any changes made to objects outside the function are called side effects, and are at least bad form, and, in this instance, are just downright not allowed. Functions are not allowed to change data in any databases except for incidental changes to Tempdb. They are not allowed to send email, to change objects, or to return a resultset to the user. They are also not allowed to call any of the following system-defined functions:

`@@CONNECTIONS`	`@@PACK_SENT`	`GETDATE()`
`@@CPU_BUSY`	`@@PACKET_ERRORS`	`GETUTCDATE()`
`@@IDLE`	`@@TIMETICKS`	`NEWID()`
`@@IO_BUSY`	`@@TOTAL_ERRORS`	`RAND()`
`@@MAX_CONNECTIONS`	`@@TOTAL_READ`	`TEXTPTR`
`@@PACK_RECEIVED`	`@@TOTAL_WRITE`

Now look at what user-defined functions actually can do.

Creating User-Defined Functions

You may have guessed that the statement used to create your own user-defined function is CREATE FUNCTION . To create a scalar function, you just do something like this:

 CREATE FUNCTION <function name> (<parameter list>) RETURNS <data type> [WITH <options>] AS BEGIN <function stuff> RETURN <value> END

Looks a lot like the CREATE PROCEDURE syntax, doesn't it? The parameter list is just like the one for CREATE PROCEDURE , except this one doesn't allow OUTPUT parameters and has to have parentheses around it. The RETURNS keyword is new; that's where you put the data type of the data you want to send back. You're allowed to send only one value back. The keyword AS is optional. The WITH <options> part is, well, also optional.

Only two options are allowed, and they're allowed for all three function types: ENCRYPTION and SCHEMABINDING . The ENCRYPTION option encrypts the function's code, so users can't see what's going on inside and the actual inner workings are kept secret. This is similar to how the ENCRYPTION option works for stored procedures.

The SCHEMABINDING option ties the function to database objects and doesn't allow the database objects to be altered or dropped as long as the function exists. This prevents any objects from being dropped that would impair the function of the function, if you will. You can remove the binding by dropping the function or altering the function to remove the SCHEMABINDING option. You can use this option only when all the following apply:

Any views or other user-defined functions used by this function also have SCHEMABINDING turned on.
The objects are all local to the same database and not referenced by a two-part name.
The user creating the function has REFERENCES permission on all the objects used in the function.

Here's a simple, yet useful, function:

 CREATE FUNCTION distance ( @x1 int = 0, @y1 int = 0, @x2 int = 0, @y2 int = 0 ) returns float as begin declare @distance float set @distance = sqrt ( power(@y1 - @y2, 2) + power(@x1 - @x2, 2) ) return @distance end

You may remember this function from your high school algebra class: It finds the distance between two points on a square grid. So, how can you create these user-defined functions with Enterprise Manager? See Step by Step 9.2.

STEP BY STEP

9.2 Creating User-Defined Functions with SQL Server Enterprise Manager

To create a user-defined function using SQL Server Enterprise Manager, follow these steps:

Open SQL Server Enterprise Manager and connect to the server to which you want to add a stored procedure. If you need to, you can register the server now.
View the containers inside the server by clicking the plus sign to the left of the server, as shown in Figure 9.5.

Figure 9.5. These are the objects within the server in SQL Server Enterprise Manager.
Open the Databases container, again by clicking the plus sign to the left of the word Databases. Choose a database from the list and view its containers by clicking the plus sign to the left of the database you want to work with. If you're just playing with the examples in the book, use the Pubs database. Click the User Defined Functions container, as shown in Figure 9.6.

Figure 9.6. These are User-Defined Functions within the Pubs database.
To create a new user-defined function, right-click in the right pane and choose New User-Defined Function. The User-defined Function Properties - New User-Defined Function window appears, as shown in Figure 9.7.

Figure 9.7. This is the window you use to create new user-defined functions.
Change the [OWNER] and [FUNCTION NAME] to the correct owner and name of the function. If you're unsure of the owner name, delete the [OWNER] and the period that follows it. Fill in the PARAMETER LIST , return_type_spec , and FUNCTION BODY . An example is shown in Figure 9.8.

Figure 9.8. This is an example user-defined function typed into the Stored Procedures Properties window.
Click the Check Syntax button to make sure you've typed everything in properly. This should pop up a message box that says Syntax Check Successful! . Clear the box by clicking OK.
Click OK to save your user-defined function.
To edit an existing user-defined function, just double-click it. You'll have the same option to check syntax as before.

Scalar functions, like the following, have to be fully qualified with at least a two-part name when you call them:

 print dbo.distance(1, 1, 2, 2)

If you create that function in the Pubs database, that's how you'd call it. It returns a floating point number, 1.41421. That's a pretty simple call, though. Now look at something a bit more complex:

 use pubs go create table coords (x int, y int) go insert into coords values (1, 1) insert into coords values (1, 2) insert into coords values (1, 3) insert into coords values (4, 1) insert into coords values (4, 2) insert into coords values (4, 3) insert into coords values (4, 4) select * from coords Point1 cross join coords Point2 where pubs.dbo.distance(Point1.x, Point1.y, Point2.x, Point2.y) > 4.0

This is an example of how you use a function to filter data. The SELECT statement returns only rows that are a minimum of 4 units apart, which in this case is the points (1,1) to (4,4) only. You can use a user-defined function anywhere you can use an expression or a system function, such as in a column list or an order by.

For an inline user-defined function, the entire function can consist of only one SELECT statement. The returned recordset is then placed into the variable of type table , which is returned back to the caller. Here's an example of an inline user-defined function:

 CREATE FUNCTION ObjectList (@ObjectType char(1)) RETURNS table AS RETURN (SELCT name FROM sysobjects WHERE type = @ObjectType ) go

Here's an example of how that function would be called:

 SELECT * FROM master.dbo.ObjectList('u')

Why is this better than just using a view? Well, you can't pass parameters to a view, so you're stuck with the WHERE clause specified in the view. An inline function provides a lot more flexibility than a view. Also, if you create function like this on top of an indexed view, and the parameters that you use for the function are part of the index, then you can dramatically improve performance because you've created a parameterized, indexed view.

The third type of function is the multi-line table-valued function. Basically, this just combines the scalar function's capability to do more complex logic and functions, and the capability of an inline function to return a table. The layout of the table being returned is specified in the RETURNS clause, like this:

 CREATE FUNCTION TableDistance (@x1 int,        @y1 int,        @MaxDistance int ) RETURNS @DistanceTable TABLE ( ID int, Dist float ) AS BEGIN INSERT @DistanceTable SELECT ID, sqrt ( power(@y1 - y, 2) + power(@x1 - x, 2) ) FROM location WHERE sqrt ( power(@y1 - y, 2) + power(@x1 - x, 2) ) <= @MaxDistance RETURN END

This particular function scans a table called location and finds all the points within a radius of @MaxDistance of the point (@x1, @y1). Here's how to call it:

 CREATE TABLE location ( ID int IDENTITY(1,1), X int, Y int ) GO INSERT INTO location VALUES (1, 1) INSERT INTO location VALUES (2, 1) INSERT INTO location VALUES (2, 2) INSERT INTO location VALUES (3, 1) INSERT INTO location VALUES (3, 2) INSERT INTO location VALUES (3, 3) SELECT * FROM dbo.TableDistance(4, 3, 1)

If you've been following along so far, you've probably created a whole bunch of functions all over the place. So, how do you get rid of them?

Dropping and Altering User-Defined Functions

User-defined functions can be dropped with the DROP FUNCTION statement. Just do DROP FUNCTION and the name of the function. You probably already figured that out, though.

The ALTER FUNCTION statement affects functions similarly to how the ALTER PROCEDURE statement changes stored procedures. It does not create a new function for you, but it does change the function without changing the permissions on the function.

Determinism and Functions

Functions can be placed into two different categories: deterministic and non-deterministic . A deterministic function is a function that always returns the same output with the same input. For example, the DATALENGTH() function always returns the same value if you pass it the same string; the ABS() function always returns the same value if you pass it the same number; and the distance user-defined function you just saw always returns the same distance value when passed the same set of coordinates, but it is not deterministic because it does not have the SCHEMABINDING option set.

For a user-defined function to be deterministic, it has to pass a four-part test:

The function must be schema bound.
Any function called by the user-defined function must be deterministic.
The function cannot reference database objects outside of its scopeit cannot contain select statements that reference a database.
The function cannot call any extended stored procedures.

There are also many non-deterministic functions. These are functions that don't return the same output for the same input. For example, GETDATE() never returns the same output twice: unless you monkey around with your system clock, it always returns the current date. The NEWID() function is extremely non-deterministic: it always returns a unique value.

Why is this important? First of all, non-deterministic system functions are not allowed in a user-defined function. Remember the list that laid out which functions you can't use in a user-defined function? Well, those are all non-deterministic system functions. Also, if you use an expression to define a computed column, and if the expression relies on a non-deterministic function, you cannot index the computed column. Also, you cannot create a clustered index on a view that relies on a non-deterministic function.

EXAM TIP

Is "Deterministic Function" on the Test? Probably not, as such. But you need to understand what a deterministic function is so that you can understand what functions cannot do and how the SCHEMABINDING works, because those will be on the test.

When to Use Functions or Stored Procedures

When do you use stored procedures and when do you use functions? Here's the general rule: Use a stored procedure whenever you need to return output to an application. Use a function whenever you need to return the output to a query or other T-SQL statement.

If you need to return a rowset that you're going to use in the FROM clause, use a function. If you're going to return a rowset that you want to use in an application, use a stored procedure.