Operations in relational database systems, like SQL Server, are all centered on set-based operations, which means that we can operate on more than one row at a time. All the classic SQL statements, such as SELECT, INSERT, UPDATE, and DELETE, are set-based, as they operate on sets of data. A set of data may be one row, a couple of rows, or even millions of rows. SQL Server is optimized for these kinds of operations. Almost all programming in T-SQL should be done using set-based operations.
However, there are some occasions when we cannot perform actions in a set -based manner. For such occasions, we have cursors. The real problem with cursors lies in understanding when and where to use them. If used improperly (where set-based operations will suffice), they can heavily degrade the system. Cursors perform operations at a record level; they do not benefit from the efficiencies of a database technology and so use more system resources (memory and processor cycles), than necessary.
Prior to SQL Server 2000, cursors were frequently employed, as there were only a few operations that could be done without using cursors; even operations like a moderately complex string manipulation, had to use cursors. However, SQL Server 2000 introduced user defined functions (for details, refer to Chapter 6), which can be used to package operations that cannot be done directly in an SQL statement. An example of this is a complex calculation that requires a lookup into multiple tables, like a sales tax calculation for instance, where you may need to take into consideration the type of item (food, medicine, and so on), the state the purchaser is in, and more. There may be fifty different ways to calculate it in fifty different locales. Previously, we may have encapsulated that logic into a stored procedure, but then have to cursor through the set row by row to calculate the sales tax for each item. With a function, this kind of operation is no longer needed; you simply change the stored procedure to a user defined function, and code it directly into the SQL DML statement.
In this section, we will discuss the basic syntax of cursors, and the best practices for their use. Do not confuse the T-SQL cursors with the cursors that we use to move data between the server and the client. They are closely related, but the issues involved in their use are different. Cursors are more necessary on the client.
There are two different styles for implementing cursors. The first is a special cursor syntax, while the other is a more typical syntax where the cursor is referred to by a variable. In this section, we will briefly look at the different syntaxes and what the settings mean.
The cursor syntax is a kind of heavy overkill in T-SQL coding, and you will find that you do not need many of the settings that are available, since many settings are heavily centered on modifying data in the cursor. This is because we use the same cursors in our client applications and will declare and start these server-side cursors to work through a set of data for client consumption. However, we seldom need to do such things in our stored procedure code, because unlike client-side operations, we know which row we want to modify, and how we want to modify it.
The Query Analyzer tool has several cursor templates that can be used to automate writing the coding of cursors. They can be accessed via the Edit | Insert Template menu, in the Using Cursors folder (assuming you have not made any modifications to your template directories, that is). The templates are useful for getting started, though you will need to modify them for best practices, in most cases.
Let's now look at the basic syntax for cursors:
DECLARE cursor_name CURSOR [LOCAL | GLOBAL] [FORWARD_ONLY | SCROLL] [STATIC | KEYSET | DYNAMIC | FAST_FORWARD] [READ_ONLY | SCROLL_LOCKS | OPTIMISTIC] [TYPE_WARNING] FOR <selectStatement> [FOR UPDATE[OF column_name[ ,...n]]]
The cursor_name value must be a value conforming to the rules for naming identifiers.
[LOCAL | GLOBAL]
From a coding standpoint, this is the most important part of the declaration. If we declare the cursor as LOCAL, it will be scoped to the stored procedure (or batch) that we are working with. Therefore, it will be automatically destroyed when the procedure finishes executing. If it is GLOBAL, it is scoped to the connection, and even after the procedure is finished, the cursor is still active until destroyed. Unless you have some valid reason to use your cursor after the procedure is complete (for example, in your client code or in a different procedure that your code will execute), it is best to declare it as LOCAL. This guarantees that your cursor is only accessible to the authorized code, and will be destroyed when your code finishes executing, even if something occurs to interrupt the code being executed (like an unhandled, unexpected error), and your deallocation code never gets the chance to execute.
If you do not specify either LOCAL or GLOBAL, the type of cursor will be governed by the CURSOR_DEFAULT database setting. It is set using the SET clause of the ALTER DATABASE command:
ALTER DATABASE <databaseName> SET CURSOR_DEFAULT LOCAL --or GLOBAL
The current value of this setting can be checked by using the DATABASEPROPERTYEX() function:
SELECT DATABASEPROPERTYEX(db_name(), 'IsLocalCursorsDefault')
If it returns 1, then LOCAL is the default, otherwise GLOBAL is the default.
[FORWARD_ONLY | SCROLL]
It is usually best to use a FORWARD_ONLY cursor, which is the faster of the two, since we seldom need to SCROLL back and forth in a cursor in a stored procedure. This setting is commonly left blank, since we cannot specify these if we have specified FAST_FORWARD (see below), which is usually the best in stored procedures, and FORWARD_ONLY is the default, unless you specify STATIC, KEYSET, or DYNAMIC type cursor, covered in the next sub section.
[STATIC | KEYSET | DYNAMIC | FAST_FORWARD]
STATIC cursor types fetch all rows resulting from the SELECT statement, and copy them to a temporary table.
KEYSET cursor types fetch all keys from the SELECT statement, and copies them into a temporary table. This obviously requires some unique key to be identified by SQL Server.
DYNAMIC fetches the data as you request it, so unlike the previous types, the actual row that will be fetched is not known until you fetch it. FAST_FORWARD is an optimized static, read-only, forward-only cursor type, which should be used in most cases.
[READ_ONLY | SCROLL_LOCKS | OPTIMISTIC]
READ_ONLY specifies that we do not intend to modify the data in the cursor. Without this setting, we can make changes to the data using the UPDATE…WHERE CURRENT OF CURSOR syntax to make changes to the current row to which the cursor is pointing. This is usually a bad idea, because it makes code quite difficult to deal with, and as such, it is generally not a good practice to use the CURRENT OF CURSOR syntax. It does not mean that we cannot update the data that the cursor was built from, using an UPDATE statement via a key of the data in the cursor.
SCROLL_LOCKS tells SQL Server to hold locks on the rows in the cursor, so that if we fetch a row, we know that we have it to ourselves, and can modify it. OPTIMISTIC uses an optimistic locking mechanism to make sure that the data fetched remains unmodified in the database table. This is done by using a column with a TIMESTAMP data type, and comparing the timestamp of the physical table to the timestamp in the cursor, or with a checksum of the fields in the table and cursor. It is similar to the type of optimistic locking schemes that we will implement in the Concurrency section of this chapter.
[TYPE_WARNING]
No matter what type we specify, depending on the requirements of the settings, SQL Server may have to change the type of cursor that is being implemented. For example, if we choose a KEYSET cursor, and SQL Server cannot determine a key, it will probably downgrade to a STATIC cursor type. By including the KEYSET setting, a warning message will be raised to the client telling them that the type has been changed. It is generally used to tell dynamic clients what is going on, so they can adjust their features accordingly. The client can then interrogate SQL Server to see what kind of cursor is being used. For more information, check the sp_cursor_list, sp_describe_cursor, sp_describe_cursor_columns, and sp_describe_cursor_tables topics in SQL Server Books Online.
In case of the conversion from KEYSET to STATIC, the users will no longer be able to modify the data in the cursor. As always, ensure that your client can respond to a warning message.
FOR <selectStatement>
This can be any SELECT statement, but you cannot include the keywords COMPUTE, COMPUTE BY, FOR BROWSE, and INTO. The SELECT statement used for the cursor will determine whether the query can meet the cursor options chosen, or not. For example, if your SELECT statement has no discernable unique columns, you will not be able to choose a KEYSET cursor type, since no key will be found.
[FOR UPDATE [OF column_name[ ,...n]]]
This option is included for completeness. If you set this, it allows you to update the data in the cursor, and let it modify the data in the table. If you list the column names, then only these columns can be updated, but if you omit all the column names, all the columns will be updateable. This does not make sense for stored procedures, as it makes the code messy. It is a far better practice to update rows of data using a key in the cursor columns. It makes your intentions clearer, and is easier to test. For more details, you can refer to the UPDATE topic in SQL Server Books Online.
There is another form of the declaration that is more natural in stored procedures. It is based on using cursors named as variables:
DECLARE @cursorVariable CURSOR SET @cursorVariable = CURSOR <options> FOR <selectStatement>...
The same options exist for the cursor except for LOCAL and GLOBAL. All variable-based cursors are local cursors, making them better, but we also have a method to use them in subordinate objects. In the original syntax, we mentioned that if you wanted to use the cursor in a subordinate procedure, you had to declare it as GLOBAL.
When dealing with cursors as variables, using cursors in subordinate procedures is more natural. If you need to use a variable cursor in a subordinate procedure, you can pass it as a parameter rather than creating it as GLOBAL (assuming that it has been created) when you call another procedure. It may seem like more work to require declaration of the parameter, but it specifically documents that we intend to use a cursor built elsewhere. There is no performance gain while using variable-based cursors, and so their use is a matter of coding conventions only.
Now that we have created the cursor, let's look at how to reference it from our code.
Once we have declared the cursor, we need to open it. The following statement does this:
OPEN <cursorName>
Once we have opened the cursor, it creates the set and is ready to use. Now we can use the FETCH statement to move back and forth through the set. The syntax for FETCH is:
FETCH [[FIRST | NEXT | PRIOR | LAST | ABSOLUTE { n | @nvar } | RELATIVE { n | @nvar } ] FROM ] {{[GLOBAL] cursor_name } | @cursor_variable_name} [INTO @<variable>, @<variable2>, ... , @<variableN>]
There are two parts to the FETCH statement that we need to discuss – FETCH and INTO. We use FETCH to position the internal pointer on a given row in the set, while INTO is used to extract the values from FETCH into variables for use. Omitting the INTO clause in stored procedures is highly discouraged, since you cannot programmatically determine the data in the current row of the cursor while using T-SQL. Besides, it will return a result set for each FETCH statement, which is seldom desired when building procedures. The common goal for stored procedures is to either do something to a group of rows, or to build a single result set. This functionality should be used in the cursors between client and server, and not in stored procedures.
FETCH <position> INTO <Variables>
This is used to position the cursor pointer on a row in the set. You have the following options for <position>:
FIRST
It gets the first row from the set.
NEXT
It selects the row after the currently selected row.
PRIOR
It selects the row before the currently selected row.
LAST
It gets the last row in the set.
ABSOLUTE <number of row>
It returns the nth row in the set. It can be a variable value.
RELATIVE <number of row>
It returns the nth row from the current row. It can be negative, and may be a variable.
No position information
If the position information is missing, it re-fetches the current row.
FROM {{[ GLOBAL ] <cursor name >} | @<cursor variable name>}
This is where you choose the cursor that you are going to use for the FETCH statement. GLOBAL is used to refer to a cursor that was not declared within the scope of your procedure. Otherwise, an error will be raised by SQL Server, saying that it does not know the cursor you are referring to. When the actual call is made, the cursor you are referring to must have been declared as GLOBAL, and may not be a variable-based cursor. The proper way to handle this is to pas s the cursor as a variable to a stored procedure.
[INTO @<variable>, @<variable2>, ... , @<variableN>]
INTO is used to take the values from the SELECT statement, and place them into the variables for use. For example, if the cursor was declared as:
DECLARE @cursorVar CURSOR SET @cursorVar = CURSOR FAST_FORWARD FOR SELECT orderId, customerId FROM northwind..orders --open the cursor OPEN @cursorVar
Then, you will have to fashion your FETCH with INTO in the following manner:
DECLARE @orderId INT, @customerId CHAR(5) --fetches the current row, which is the first FETCH FROM @cursorVar INTO @orderId, @customerId SELECT @orderId, @customerId
The SELECT statement will return:
10248 VINET
Finally, there are some other syntactical elements we need to look at – statements to check the status of the FETCH statement, and statements to close the cursor.
@@FETCH_STATUS is a global integer variable that holds the status of the last cursor FETCH statement issued against any cursor currently opened by the connection. It can have the following values:
Value | Description |
---|---|
0 | The value that you fetched was there and the FETCH statement was successful. |
−1 | The FETCH statement failed – the most common reason for this is that the row was beyond the result set. |
−2 | The row that you were fetching no longer exists; for example, it may have been deleted by another user. |
Be careful to check the value of @@FETCH_STATUS immediately after fetching from the cursor, as this value is global for all cursors that might exist in your connection so if you call a subordinate procedure that uses a cursor, it can reset the value of @@FETCH_STATUS.
Finally, we need to close our cursor with the CLOSE and DEALLOCATE statements:
CLOSE {{[GLOBAL] cursor_name} | cursor_variable_name} DEALLOCATE {{[GLOBAL] cursor_name} | @cursor_variable_name}
CLOSE is used to close the cursor, although you can reopen it. Deallocating the cursor removes it from memory. This is especially important for global cursors, since the cursor is scoped to the entire connection, and will not be automatically destroyed when the procedure ends.
Now, let's look at the various uses of cursors.
In this section, we will look at some cases that involve the use of cursors:
Formatting a Result Set
This may be needed when we cannot formulate a result set by using the set-based operations in T-SQL, but we can use cursors to step through the set to build another set.
Batch Operations
Frequently, we may need to loop through a set of data and run some operation that needs to do multiple operations per row. Cursors are ideal for this situation.
Let's look at these scenarios in detail.
When formatting a result set, there are two possibilities:
Formatting fields in a result set, or
Building a complex result set that cannot be made in the current version of T-SQL
In this case, we take a query, place the results in a temporary table, and then loop through each row in the temporary table, updating fields in the result set. We might adopt this method to perform some complex calculation that does not fit into standard SQL. For example, let's look at the following cursor code based on the Northwind database. Note that, to keep the example manageable, we have simplified it:
The code will give us all orders, the customer, and the total of each order
First, we create a temporary table for our output:
--first create our temp table CREATE TABLE #holdOutput ( orderId INT PRIMARY KEY NONCLUSTERED, companyName NVARCHAR(40), orderTotal MONEY NULL )
Then, we load the temporary table with all of the orders, with the name of the company that the order was made for:
--then go and fetch rows from database, with --only the columns that do not need formatting INSERT INTO #holdOutput (orderId, companyName) SELECT Orders.OrderId, Customers.CompanyName FROM Orders JOIN Customers ON Orders.CustomerId = Customers.CustomerId
Then we declare the cursor of the FAST_FORWARD type, from SELECT statement of all rows in the temporary table.
--DECLARE our cursor which will contain the keys of #holdOutput table DECLARE @outputCursor CURSOR SET @outputCursor = CURSOR FAST_FORWARD FOR SELECT orderId FROM #holdOutput
Next, we open the cursor and build a variable for holding the key of the cursor.
--open the cursor OPEN @outputCursor --variable to fetch into DECLARE @orderId INT
Now, we fetch the first row (which we are already on) from the @outputCursor, and check the @@FETCH_STATUS.
--fetch the first row from our cursor FETCH FROM @outputCursor INTO @orderId WHILE @@fetch_status = 0 --loop until a fetch is invalid, in this --case past end of table BEGIN
Then, we update our temporary table from the key to our unitPrice from all of the Order Details table values. Note that we use the northwind database by default in all code snippets in this chapter.
--update the total field UPDATE #holdOutput SET orderTotal = (SELECT sum(unitPrice * quantity) FROM [order details] WHERE orderId = @orderId) WHERE orderId = @orderId
Now, we fetch the next row from the cursor:
--get the next row FETCH NEXT FROM @outputCursor INTO @orderId END
Once the @@fetch_status is not equal to 0, we exit the WHILE loop and output all the rows of our temporary table:
--get the output SELECT * FROM #holdOutput
Finally, clean up and exit.
CLOSE @outputCursor DEALLOCATE @outputCursor DROP TABLE #holdOutput
Prior to SQL Server 2000, this was a common operation. In a real life situation, this will represent a complex query, which might include complex calculations that could not be represented in a sub-query. However, in SQL Server 2000, we have user defined functions (UDFs) to cover this situation, which we will see more of in Chapter 6.
In the previous example, we updated a field called orderTotal. Using a UDF, the same example can be written as:
CREATE FUNCTION order$returnTotal ( @orderId INt ) RETURNS money AS BEGIN RETURN (SELECT sum(unitPrice * quantity) FROM [order details] WHERE orderId = @orderId ) END
Then, our query will look like:
SELECT orders.orderId, customers.companyName, dbo.order$returnTotal(orders.orderId) FROM orders JOIN customers ON orders.customerId = customers.customerId
This query, included in the downloadable sample code, returns exactly the same data more efficiently, and with lesser code. Even for this small set, you will notice an appreciable difference in performance between the previous example and this function. Essentially, both perform the same operations, but the latter is optimized, faster, and certainly easier to read and manage. Chapter 6 will cover UDFs in detail.
Generally, you can formulate any result set that you need; however, in some cases it is not possible. For example, let's consider the case where we want to build a random set of data. This kind of operation is useful if you have millions or even billions of rows of data (like logging data from a web site). In our case, we are going to randomly pick employees for some study, such as doing a sample blood test, or filling out a questionnaire.
For our example code, we will use the RAND function from T-SQL to choose approximately half the employees. We will do this by executing the following code snippet:
ROUND(RAND(),0)
RAND returns a floating-point number between 0 and 1. We then round it off to either 0 or 1. I included test code that will sum this 100 times in the sample code, and the totals were consistently, about 50 on average – sometimes high thirties, and sometimes low sixties, but decently random.
Sadly, we cannot do this in a simple SELECT statement, because the RAND function executes only once for the entire statement:
SELECT ROUND(RAND(),0) AS rand, employeeId FROM employees
This will return:
rand employeeId ----- ------- 0.0 3 0.0 4 0.0 8 0.0 1 0.0 2 0.0 6 0.0 7 0.0 5 0.0 9
Note that it might return the same result with all 1s instead of the 0s. Since we get this result, we cannot build a WHERE clause around WHERE 1 = ROUND(RAND(),0). Therefore, we have to build a temporary table to hold the employees IDs, and then execute a RAND function for each row:
CREATE TABLE #holdEmployees ( employeeId INT ) DECLARE @outputCursor cursor SET @outputCursor = CURSOR FAST_FORWARD FOR SELECT employeeId FROM employees --open the cursor OPEN @outputCursor --variable to fetch into DECLARE @employeeId INT --fetch the first row from our cursor FETCH FROM @outputCursor INTO @employeeId WHILE @@fetch_status = 0 --loop until a fetch is invalid, in this --case past end of table BEGIN INSERT INTO #holdEmployees (employeeId) SELECT @employeeId WHERE 1 = ROUND (RAND(),0) FETCH NEXT FROM @outputCursor INTO @employeeId END
Once we have built the set, we run the SELECT query:
SELECT * FROM employees JOIN #holdEmployees ON employees.employeeId = #holdEmployees.employeeId DROP TABLE #holdEmployees DROP TABLE #holdOutput
When you execute this code, you can observe that it gives us a good random sampling in this small set. This kind of logic could be used for several purposes, like selecting a percentage of customers for sending promotion mails, questionnaires, and so on.
Even today, many businesses rely upon batch operation for processing their information. For example, an online retailer might take the order, mark a product as being committed, and verify the payment method while the user is sitting at his web browser waiting for the Thank you for your order message to come up. Later, a process might go through all of the day's orders, post the money to accounts, schedule the order, and perform the other housekeeping tasks.
The other classic example is that of an accounting system. It is also what our example code will implement. Take the case of a donation system. All day, money is received by different processes, through multiple channels (phone, mail, online, and so on). Each donation will have several different outcomes, such as sending a ‘Thank you’ card, a product, and certainly a receipt. It might also add the donor to a list for being contacted later, for follow-ups. We will also want to move the money received to the accounts receivable system, possibly debiting the account from where the money will be used, and crediting the cash accounts.
For our example code, however, we will assume that all of the various activities have been coded into a stored procedure called donation$processNew:
CREATE PROCEDURE donation$processNew ( @donationId INT ) AS BEGIN UPDATE #donation SET processedFlag = 1 WHERE donationId = @donationId END
This simply flips the processedFlag bit on our #donation table. Obviously in reality processing donations will be a far more intensive process, but the overall manner of processing a batch of them could be similar to the following code.
First, we create a temporary table, in place of a permanent donation table.
CREATE TABLE #donation ( donationId INT identity PRIMARY KEY, donorId INT, --fkey to a donor table amount MONEY, DATE SMALLDATETIME, processedFlag BIT DEFAULT 0 )
In the sample code, we will load the table with some simple data. Next, we declare our cursor for looping through our donation records, and open the cursor.
--DECLARE our cursor, which will contain the keys --of the #holdOutput table DECLARE @donorCursor cursor SET @donorCursor = CURSOR FAST_FORWARD FOR SELECT donationId FROM #donation WHERE processedFlag = 0 --open the cursor OPEN @donorCursor
Then we declare our variables, and fetch the first row of the cursor:
--variable to fetch into DECLARE @donationId INT, @returnValue INT, @message VARCHAR(1000) FETCH @donorCursor INTO @donationId
The loop steps through each of the rows in the cursor, and runs the donation$processNew() function. We check the error status and then return value, just in case our subordinate procedure causes any unexpected errors:
WHILE @@fetch_status = 0 BEGIN BEGIN TRANSACTION EXECUTE @returnValue = donation$processNew @donationId IF @@error <> 0 OR @returnValue < 0 BEGIN ROLLBACK TRANSACTION SET @message = 'Donation ' + cast(@donationId as varchar(10)) + ' failed.' RAISERROR 50001 @message END ELSE BEGIN COMMIT TRANSACTION END FETCH NEXT FROM @donorCursor INTO @donationId END
Finally, we clean up:
CLOSE @donorCursor DEALLOCATE @donorCursor DROP TABLE #donation
In some cases, it is actually better to write code that runs outside the SQL Server, and simply calls the stored procedures. The SQL Agent is quite similar to this as it is written external to the T-SQL kernel. This method enables us to take advantage of multi-threading, by using multiple SQL connections. We will discuss the factors enabling us to run more than one SQL process concurrently in the Concurrency section.
The best practice is to avoid cursors, but the truth is best stated as – whenever possible, avoid cursors. The problem with cursors is that they are seldom needed, and typically used unnecessarily. Earlier, in the Formatting Fields in a Result Set section we looked at an example where we used a cursor needlessly, as it was possible to use a user-defined function. However, in the other cases, since there were no methods in T-SQL to perform the task we needed, our use of cursors was justified.
The following are good practices while using cursors, which will help you to write faster code when cursors are needed.
Use read-only, fast forward-only cursors whenever possible
When writing stored procedure code that employs cursors, we rarely need to modify the data in the cursor, nor will we need to move back and forth through the data. In such cases, we can use fast forward-only cursors, as they have optimizations that make them the fastest cursors.
It can be faster to build cursors on temporary tables
It may be better to build a cursor on a set of data in a local temporary table (with a single # in front of the name), especially when we need to hold the cursor for a long time, or even through a transaction. There will be no concurrency issues while locking tables that the other users may need, as the temporary table is scoped (to us).
Close and deallocate cursors when you are finished with them
Be sure to close and deallocate cursors as soon as possible. Cursors use resources, which may be holding other resources. This, in turn, can slow down other processes.
Keep the data set in the cursor as small as possible
When building the data set for the cursor, make the WHERE clause as specific as possible. It can be tempting to include the WHERE clause in the cursor loop, but avoid this, as the smaller the data set, the faster the cursor, as it will use fewer resources on the server to maintain state.
Use local cursors
This is not a performance requirement, but is a good coding practice. Using local cursors (either referenced as a variable, or declared as LOCAL) will help us to prevent issues arising from name clashes, where two procedures have the same cursor names and one calls the other. The preferred method of using cursors declared in one procedure in another one is to pass cursors as parameters.
Cursors are great tools, which give us a lot of control in dealing with data. As we mentioned earlier, it is important to use them only when needed, since they are not the optimal way to deal with data in SQL Server. The best practice is to always comment your cursor code (as well as all your other code) with an explanation of why you are using a cursor, and if it is necessarily needed).