Using Stored Procedures | Designing for Scalability with Microsoft Windows DNA (DV-MPS Designing)

[Previous] [Next]

Using stored procedures is just an extension of what we have talked about so far in this chapter. One difference between a stored procedure and a cached SQL statement is that the stored procedure is created beforehand and that clients call it by name rather than sending the SQL text to the database. Another difference is that a stored procedure is a database object, whereas obviously SQL text sent from the client to the database is not. Because it's a database object, you can use it for security. You—or the database administrator—can give users permission to run a specific stored procedure, even if they're not allowed to do the job the procedure performs. For example, you can allow a user to call a procedure that fetches all the horses trained by a specified trainer, even though the user might not have SELECT privileges for the Horses table. This feature helps you guarantee that the user always accesses the database through well-tested stored procedures and never directly accesses database tables.

In many ways, this practice is similar to creating data access classes in the business tier of a Microsoft Windows DNA application, giving these classes permission to access database objects and never allowing users direct access to tables or any other database objects.

Stored Procedures for SELECT Operations

It's easy to create and call a parameterized stored procedure for a SELECT operation. Say, for example, that you want to create a stored procedure that fetches ID and name values for horses with names that correspond to a certain pattern. The command you use in SQL Server to create such a procedure is, not surprisingly, CREATE PROCEDURE. The following SQL command creates the GetHorseListForNamePattern stored procedure:

 CREATE PROCEDURE GetHorseListForNamePattern @prmNamePattern varchar(20) AS SELECT HorseId, HorseName FROM Horses WHERE HorseName LIKE @prmNamePattern ORDER BY HorseName

The most interesting parts of the preceding code appear in boldface type. Concentrating on that part of the code, you can see that this stored procedure takes one parameter, which is the name pattern you need to use for the query. You can also see that the @prmNamePattern variable is represented in the procedure's SQL text. Notice that in SQL Server the names of all parameters and arguments are prefixed by the @ sign.

How do you call such a procedure from ADO? As a matter of fact, you call such a procedure using ADO almost the same way you would if you were sending the corresponding parameterized SQL statement from the client to the database. The following code is an example. We show the code specific for calling a stored procedure in boldface type.

Public Function GetListForNamePattern _ (ByVal strNamePattern As String) As Recordset Dim cmd As Command, prm As Parameter, rs As Recordset, _ strSQL As String On Error GoTo GetListForNamePatternErr Set cmd = CreateObject("ADODB.Command") cmd.ActiveConnection = strConn  cmd.CommandType = adCmdStoredProc cmd.CommandText = "sp_GetHorseListForNamePattern"  Set prm = CreateObject("ADODB.Parameter") prm.Type = adVarChar prm.Size = 20 prm.Direction = adParamInput prm.Value = strNamePattern cmd.Parameters.Append prm Set rs = CreateObject("ADODB.Recordset") Set rs.Source = cmd rs.LockType = adLockReadOnly rs.CursorLocation = adUseClient rs.CursorType = adOpenStatic rs.Open Set rs.ActiveConnection = Nothing Set GetListForNamePattern = rs Set cmd.ActiveConnection = Nothing Set cmd = Nothing Exit Function GetListForNamePatternErr: Err.Raise Number:=Err.Number, Source:=Err.Source, _ Description:=Err.Description End Function

As you can see, calling a parameterized stored procedure is not much different from sending a parameterized SQL statement through an ADO Command object. So why should you even bother to create, let alone call, such procedures? We'll give you just the following list of reasons:

When you use stored procedures, not allowing any other communication with the database, you give full control of the data resources to the database administrator and owner. Only (we hope) good and well-tested queries will ever be performed, never untested resource-hungry or buggy statements that can eat up all the resources available, leaving nothing to other users, or even give misleading results.

Stored procedures should execute as fast as any SQL statement sent from the client and faster—even much faster—than most. A stored procedure almost guarantees reuse of an existing execution plan, which stands in contrast to cached ad-hoc queries that don't offer such a guarantee.

Client-bound SQL queries are scattered all over the place. Every user, and every user application, has them. If—or rather when—you need to change a statement, you're much better off with stored procedures than with client-bound statements. SQL statements in data access components on the server side offer similar advantages, but you can't put them more in the center of things than when they're in the database as stored procedures.

Stored procedures isolate clients from the structure of the database. For example, if you want to denormalize² the database to optimize certain queries, only the stored procedure must be modified. Clients can continue to call the procedure the same way as before, receiving results the same way as always but faster. This stands in sharp contrast with the changes you must make in the same circumstances with ordinary client-based SQL statements—not to mention the amount of work needed just to find the queries in all those client applications and machines.

In some cases—mainly those involving complex stored procedures—network traffic can be reduced when you use stored procedures. Rather than sending several different SQL statements to the database over the network, you can call a specific stored procedure once and have that procedure call other procedures as required without using the network. This practice is a great bandwidth saver!

Those are some of the reasons we offer, hoping to encourage you to increase your use of stored procedures. Let's see if we can find any reasons not to use them. (Such reasons do exist.)

Stored procedures for updates

Let's take a narrow table—one with only a few columns—such as the RaceTracks table. It's narrow enough, having only four columns. Here's the SQL script required to create that table:

 CREATE TABLE RaceTracks ( TrackId char (3) NOT NULL , TrackName varchar (30) NOT NULL , City varchar (20) NOT NULL , Country char (3) NOT NULL )

A very simple stored procedure designed to update this table would look like the following example:

 CREATE PROCEDURE UpdateRaceTracks @TrackId char(3), @TrackName varchar(30), @City varchar(20), @CountryCode char(3) AS UPDATE RaceTracks SET TrackName = @TrackName, City = @City, Country = @CountryCode WHERE TrackId = @TrackId

There's at least one problem with this procedure: it doesn't have any concurrency control at all. Concurrency control protects users from having their modifications written over by other users. Say, for example, that user A and user B almost simultaneously select all the columns from the RaceTracks table. Because they select their data almost at the same time, they both get the same data back. User A then modifies the city of the track and calls the preceding procedure. Only seconds later, user B calls the same procedure after modifying the track's country. Unfortunately, because B doesn't know that the city value is now different from the value B received, B gives the old city value as the @City parameter, thus writing over and destroying the new city value user A entered.

In some cases, it might be perfectly all right for a modification to destroy other users' modifications, but in most cases it's not. Normally you should use some kind of concurrency control. If you use ADO recordsets when modifying data in SQL Server, concurrency control comes very easy. All you have to do is set the LockType property of the recordset to adLockOptimistic. With this LockType value, ADO makes sure that you don't overwrite other users' modifications when you update the database. As an experiment, let's see what kind of call to SQL Server such an ADO recordset will make when only the TrackName column is modified. The following call to sp_executesql is a typical example:

 sp_executesql N'UPDATE MSPress_ADBRacing..RaceTracks SET TrackName = @P1 WHERE TrackId = @P2 AND TrackName = @P3', N'@P1 varchar(30), @P2 char(3), @P3 varchar(30)', 'Täbyholm Galopp', 'Täb', 'Täby Galopp'

The preceding statement tries to update the name of the racetrack identified by the 'Täb' racetrack ID from 'Täby Galopp'—the old value—to 'Täbyholm Galopp'—the new value. Notice that ADO uses parameters and qualifies table names to make it as easy as possible for SQL Server to reuse any similar execution plan already established, executed, and cached. The performance of this call will be nearly as good as the performance of a stored procedure doing exactly the same work. There will be a very small difference in performance because it's easier for SQL Server to find an existing plan for a stored procedure by name than to search for a cached execution plan that matches a certain query.

Going back to the concurrency control problem, another user might have changed the track name of Täby Galopp in between—after we fetched it but before we tried to update it. If so, SQL Server won't find the row to update. The AND part of the WHERE clause, which specifies the old value, makes sure of that. This query doesn't just ask for a track with TrackId = "Täb"; it asks for a track with TrackId = "Täb" and TrackName = "Täby Galopp". So if another user really modified the name of this track, SQL Server can't find the row we want to update and our update won't be executed. You get this type of concurrency checking for free when you use ADO recordsets to do your updates for you.

Let's consider another example, updating the city as well as the name of the track. If American and Canadian hockey teams can move from one city to another, why shouldn't racetracks be able to do the same? The code ADO would generate for this kind of update is shown below.

 sp_executesql N'UPDATE MSPress_ADBRacing..RaceTracks SET TrackName = @P1, City = @P2, WHERE TrackId = @P3 AND TrackName = @P4 AND City = @P5', N'@P1 varchar(30), @P2 varchar(20), @P3 char(3) @P4 varchar(30), @P5 varchar(20)', 'Täbyholm Galopp', 'Gothenburg', 'Täb', 'Täby Galopp', 'Stockholm'

This statement, automatically generated by ADO for this update, is somewhat more complex than the first one. But just the same, it's a similar statement. It furnishes sp_executesql with as many arguments as it takes to protect users from having their modifications overwritten by other users. What's more, ADO does this without constraining other users more than necessary. If, for a specific track, user A modifies the country and user B modifies both the track name and the city, their modifications don't conflict. And both sets of changes will be made. But if two users modify the same column of the same track, doing it from the same starting point, SQL Server performs only one of the updates; thanks to the augmented WHERE clause, the other one will be denied.

The following table lists all possible column combinations in the racetracks table. We exclude the TrackId column because normally you should avoid modifying key columns.

TrackName TrackName City TrackName City Country TrackName Country City City Country Country

Even for such a small table with only four columns (including the missing TrackId column), you still need seven different UPDATE statements to cover all possible combinations of column updates. This is true at least if you need concurrency control when the table is modified; if you don't, you might get away with only one UPDATE statement. For a table with 10 nonkey columns, you need many different UPDATE statements. Knowing this, it's obvious that you can't have one UPDATE statement for each update option, at least not if you must write one stored procedure for each update option. You need a simpler model. Let us give you two options:

The first choice is to mimic the behavior of the sp_executesql procedure. Using this option, you'd have to make your data access component parse the record to update, finding out which columns to update and which not to update. Going from information thus gained, the component would have to set up the parameters to send to your procedure. The procedure in turn would have to parse the parameters received, setting up an UPDATE statement and a call to the sp_executesql procedure. This isn't an easy feat. Both the component and the stored procedure would be rather complex.

The second option is to have your data access component parse the recordset, as in the first option, but this time in order to generate a SET clause and a WHERE clause; you'd send both these clauses as parameters to the update procedure. The update procedure would combine all the pieces together, including some that are defined within the procedure, to set up and execute the full UPDATE statement. This alternate model is easier to handle than the first one, but we should point out that you have to do a lot of work yourself, work that ADO would have done for you if you had decided not to use stored procedures for updates at all.

We created the following simple stored procedure as an example:

 CREATE PROCEDURE UpdHorse @SetClause varchar(500), @WhereClause varchar(500) AS DECLARE @Stmt varchar(500) SELECT @Stmt = 'UPDATE Horses ' + @SetClause + ' ' + @WhereClause EXECUTE (@Stmt)

The preceding procedure takes a SET clause and a WHERE clause as arguments. Then it builds the complete UPDATE statement, furnishing the UPDATE part itself, and then it goes on to save the completed statement in the @Stmt variable. Finally it uses the EXECUTE keyword to execute the string. Without any problems at all, we used the following EXECUTE statement to call this procedure:

 EXECUTE UpdHorse 'SET HorseName = "Vildkatten"', 'WHERE HorseId = 1610'

It worked just fine, and the name Wildcat was changed to Vildkatten, which by the way is the Swedish word for wildcat.

While this call to the procedure worked, there was a snag. We made the update without controlling concurrency. Our update might conflict with somebody else's; somebody else's update might conflict with ours. So when we changed back to the real name of the horse, we enhanced the statement that calls the procedure, making the WHERE clause check for the old horse name as well as for the ID of the horse. We knew that this time the call wouldn't result in the database being updated because it shouldn't, but we made the call anyway just to prove our point. Here's the enhanced EXECUTE statement:

 EXECUTE UpdHorse 'SET HorseName = "Wildcat"', 'WHERE HorseId = 1610 AND HorseName = "Wildcat"'

Now we've copied ADO's way of updating under concurrency control. We used the wrong horse name in the AND part of the WHERE clause, but that was on purpose. The previous update had named the horse Vildkatten, and we asked for a record of a horse having 1610 as his horse ID and Wildcat as his name. Of course, such a record didn't exist anymore when we issued the statement, so SQL Server didn't find it.

To prove that the record would be updated if we did everything right, we modified the EXECUTE statement again to reflect the present state of things. This time it worked, and we've given back to the horse its correct name. Here's the statement that fixed it:

 EXECUTE UpdHorse 'SET HorseName = "Wildcat"', 'WHERE HorseId = 1610 AND HorseName = "Vildkatten"'

Another way to do it

The method we've used so far to achieve concurrency control is called the optimistic way. It's said to be optimistic because we assume that nobody else will update our data as long as we hold it. Since we're optimistic, we don't bother to lock the record to prevent other users from even seeing it. Implicitly we say something like this: "Now I'm fetching this data from the database. I'm probably not going to change it, so I don't lock the record containing it. If I should happen to change it anyway, probably no other user will come up with the same idea, so there's no real need to lock the record." But even though we're optimistic, we don't just haphazardly change any data without checking for possible concurrency problems. We make sure that at least the fields we're changing still hold the same values they did when we fetched them. This is the reason this strategy is called not only optimistic but optimistic concurrency control. There's optimism involved, but also concurrency control.

You can use another strategy, less optimistic, to make sure when updating that no user unintentionally destroys any updates made by other users. As you would imagine, the other way is called the pessimistic way. Using it, you say something like this: "Now I'm fetching this data from the database. I'm probably going to change it, so I prevent other users from modifying it before me by locking it. When I lock the data I read, I'm guaranteed that nobody else gets access to this data while I have it." It's easy to see why this way of doing things is called the pessimistic way, or pessimistic concurrency control.

It might not be as easy to see how this process of locking most of the records users read reduces concurrency, but it does. Pessimistic locks tend to be long lasting. A pessimistic user gets the lock when retrieving data from the database. Then there's think time and local work time before the update is sent back to the database. During all this time, the affected data is still locked in the database with the express intention of keeping other users away. In an installation with many users and lots of updates, pessimistic concurrency control might become very painful. This is the reason for the popularity of optimistic schemes, one of which we have just described. We recommend that you use pessimistic concurrency control only when you're absolutely sure you need it; in all other situations, use the optimistic way!

We still owe you an explanation of how you can implement pessimistic concurrency control. Using it, you must stay connected to the database because you can't otherwise keep the lock. You must perform the update in the same transaction that acquired the lock. Therefore, you must issue a BEGIN TRANSACTION command before you acquire the lock. Within that transaction, you can acquire your lock on a specific table row by updating one of its values with the value it already has. In other words, you update the table row without changing it. The only effect of this action is to give you an exclusive lock on the row, thus preventing all other users from even seeing the row. This lock remains in effect just as long as your transaction is still alive. As a demonstration, we issued the following command batch of two commands—one to start a transaction, the other to update a row in the Horses table:

 BEGIN TRANSACTION UPDATE Horses SET HorseName = HorseName WHERE HorseId = 365

Notice that we didn't change anything in the table. We just updated the name of the horse identified by the HorseId value of 365 with its own name. What we achieved was an exclusive lock on that table row. Because we started the batch with a BEGIN TRANSACTION command, SQL Server holds that lock for us until we resolve the transaction by issuing a COMMIT or ROLLBACK TRANSACTION command.³

Using SQL Server Enterprise Manager, we opened the Management folder for the server that we used. In that folder, we first opened the Current Activity folder and then the Locks/Object folder. Clicking on the database we were working in, we received the list of locks displayed in Figure 17-1.

click to view at full size.

Figure 17-1. We now hold one exclusive row level lock and two intent exclusive locks in our table.

In the Mode column of the table displayed in Figure 17-1, you can see that we now have two IX locks and one X lock. IX means intent exclusive and X means exclusive. Every lock shown in the figure is given to us. The meaning of each lock shown in the figure is as follows:

The first lock shown is an intent exclusive lock on the entire table. This lock indicates that it's SQL Server's intent to grant us an exclusive lock somewhere lower down in the hierarchy. We'll get that exclusive lock immediately if the area we want to lock is available for locking, but we might also have to wait for the lock until the area becomes available. If, during the time that we hold this intent exclusive lock, another user needs an exclusive lock on this table, SQL Server can see right away that granting such an exclusive lock would conflict with SQL Server's intention to give us an exclusive lock somewhere in the table. If this is the case, SQL Server won't issue the exclusive table lock the other user asked for; that user will be kept waiting until we (and everybody else) release every lock that blocks this second intent. Intent locks help SQL Server make such decisions early, which makes intent locks performance boosters, and that's also the reason they exist.

Postponing talk about the second lock of the figure for a moment, the third lock is another intent lock, this time on a page. The page that has this intent lock is the one containing the row on which we want an exclusive lock, and the intent lock says that it's SQL Server's intention to lock a row on that page.

The second lock is the real lock, our exclusive row lock. The RID lock type tells us that this lock is in fact a row-level lock. It's the one we've been asking for, and now we have it.

SQL Server holds the exclusive row lock just mentioned for the lifetime of the user's transaction. As long as the transaction lives, the lock stays. There's no way another user can modify, or even read,⁴ any data on this row as long as someone owns an exclusive lock on it. Keeping this lock, the owner holding the lock (but nobody else) can fetch the values of the row columns in order to modify them. Just as long as this user stays connected and avoids closing the transaction, the owner has an absolute guarantee that nobody else is allowed to make in-between changes to this record. It doesn't matter whether it takes seconds, minutes, hours, or even days for the user to come back from the races, say. The lock stays, and the data locked conceptually belongs to the owner of the lock.

This, of course, is also the downside of pessimistic concurrency control. Concurrency is reduced, sometimes severely reduced. Normally, if you use pessimistic concurrency control, you should make sure that it's virtually impossible for a user to acquire a lock on some data and then go take his or her coffee break.

Design your stored procedures with care

Stored procedures are great, but they require more work than ordinary updatable ADO recordsets. It has been said that stored procedures double the scalability of SQL Server. This is because stored procedures reuse existing execution plans and therefore also use fewer server resources for a shorter time. It should also be said that it's not all that difficult for a developer to write and access stored procedures in such a way that the advantage of using them is entirely dissipated, perhaps even turning the use of stored procedures into a disadvantage when you look at the system as a whole.

If you decide to use stored procedures, we definitely won't argue with you; just make sure you enforce the concurrency control you need. Normally, the concurrency control you use should be optimistic; only if it's clearly inappropriate to use optimistic concurrency control should you go the pessimistic way.

Whatever you choose, SQL from the client or stored procedures, optimistic or pessimistic or no concurrency control, we wish you the best of luck!