Data Modification Internals | Inside Microsoft SQL Server 7.0 (Mps)

In Chapter 6, we examined how SQL Server stores data. Now we'll look at what SQL Server actually does internally when data is modified. We've already discussed (in Chapters 3 and 6) clustered indexes and how they control space usage within SQL Server. As a rule of thumb, you should always have a clustered index on a table. Only a couple of cases exist in which you would not have one. In Chapter 14, when we look at the query optimizer, we'll examine the benefits and tradeoffs of clustered and nonclustered indexes and establish some guidelines for their use. In this chapter, we'll look only at how SQL Server deals with the existence of indexes when processing data modification statements.

Inserting Rows

When inserting new rows into a table, SQL Server must determine where to put that row. When a table has no clustered index (we'll refer to this kind of table as a heap ), a new row is always inserted wherever room is available in the table. In Chapter 5, we looked at how IAMs and the PFS pages keep track of which extents in a file already belong to a table, and which of the pages in those extents have space available. Even without a clustered index, space management is quite efficient. If no pages with space are available, SQL Server must allocate a whole new extent to the table. Chapter 5 also discussed how the GAMs and SGAMs were used to find extents available to be allocated to an object.

A clustered index directs an insert to a specific page, determined by which value the new row has for the clustered index key columns . This insert occurs when the new row is the direct result of an INSERT statement or when it's the result of an UPDATE statement executed via a delete-followed-by-insert (delete/insert) strategy. New rows are inserted into their clustered position, splicing in a page via a page split if the current page has no room. We saw in Chapters 3 and 6 that if a clustered index isn't declared as unique, and duplicate values are inserted for a key, an automatic uniqueifier is generated for all subsequent rows with the same key. Earlier versions of SQL Server allowed duplicates of a clustered index key value, and if all the rows with the same key values wouldn't fit on the same page, SQL Server used a special kind of page called an overflow page . In SQL Server 7, overflow pages aren't needed because all clustered index keys are forced to be unique.

Because the clustered index dictates a particular ordering for the rows in a table, every new row has a specific location where it belongs. If there's no room for the new row on the page where it belongs, you must allocate a new page and link it into the list of pages. If possible, this new page will be allocated from the same extent as the other pages to which it will be linked. If the extent is full, a new extent (eight pages, or 64 KB) is allocated to the object. As described in Chapter 5, SQL Server uses the GAM pages to find an available extent.

Splitting Pages

After SQL Server finds the new page, the original page must be split; half the rows are left on the original page and half are moved to the new page. In some cases, SQL Server finds that even after the split, there's no room for the new row, which, because of variable-length fields, could potentially be much larger than any of the existing rows on the pages. After the split, one or two rows are promoted to the parent page.

An index tree is always searched from the root down, so during an insert operation, it is split on the way down. This means that while the index is being searched on an insert, the index is being protected in anticipation of possibly being updated. A parent node (not a leaf node) is latched until the child node is known to be available for its own latch. Then the parent latch can be released safely.

Before the latch on a parent node is released, SQL Server determines whether the page will accommodate another two rows and splits it if not. This occurs only if the page is being searched with the objective of adding a row to the index. The goal is to ensure that the parent page always has room for the row or rows that result from a child page splitting. (Occasionally, this results in pages being split that don't need to be ”at least not yet. In the long run, it's a performance optimization.) Three types of splits can occur, depending on the type of page being split: root page of an index, intermediate index page, and data page.

Splitting the Root Page of an Index

If the root page of an index needs to split in order to insert a new index row, two new pages are allocated to the index. All the rows from the root are split between these two new pages, and the new index row is inserted into the appropriate place on one of these pages. The original root page is still the root, but now it has only two rows on it, pointing to each of the newly allocated pages. A root page split creates a new level in the index. Because indexes are usually only a few levels deep, this type of split doesn't occur often.

Splitting the Intermediate Index Page

An intermediate index page split is accomplished simply by locating the midpoint of the index keys on the page, allocating a new page, and then copying the lower half of the old index page into the new page. Again, this doesn't occur often, although it's more common than splitting the root page.

Splitting the Data Page

A data page split is the most interesting and potentially common case, and it's probably the only split that you, as a developer, should be concerned with. Data pages split only under insert activity and only when a clustered index exists on the table. If no clustered index exists, the insert goes on any page where there's room for the new row, according to the PFS pages. Although splits are caused only by insert activity, that activity can be a result of an UPDATE statement, not just an INSERT statement. As you're about to learn in detail, if the row can't be updated in place or at least on the same page, the update is performed as a delete of the original row followed by an insert of the new version of the row. The insertion of the new row can, of course, cause a page split.

Splitting a data page is a complicated operation. Much like an intermediate index page split, it's accomplished by locating the midpoint of the index keys on the data page, allocating a new page, and then copying half of the old page into the new page. It requires that the index manager determine the page on which to locate the new row and then handle large rows that don't fit on either the old page or the new page. When a data page is split, the clustered index key values don't change, so the nonclustered indexes aren't affected.

Let's look at what happens to a page when it splits. The following script creates a table with large rows, so big, in fact, that only five rows will fit on a page. Once the table is created and populated with five rows, we'll find its first page by looking at the sysindexes.first column. As discussed in Chapter 6, we'll convert the address of the first page of the table to a decimal value and then use DBCC PAGE to look at the contents of the page. Because we don't need to see all 8000 bytes of data on the page, we'll look at only the row offset array at the end of the page, and then see what happens to those rows when we insert a sixth row.

 /* First create the table */ use pubs go drop table bigrows go create table bigrows (a int  primary key,  b varchar(1600)) go /* Insert five rows into the table */ insert into bigrows   values (5, replicate('a', 1600))  insert into bigrows   values (10, replicate('b', 1600))  insert into bigrows   values (15, replicate('c', 1600))  insert into bigrows   values (20, replicate('d', 1600))  insert into bigrows   values (25, replicate('e', 1600))  go select first from sysindexes where id = object_id ('bigrows') go /* Convert the value returned to a decimal by byte-swapping     and then doing a hex-to-decimal conversion on the last four    bytes. See Chapter 6 for details. For our example,    we'll assume the converted value is decimal 249. */ dbcc traceon(3604) go dbcc page(pubs, 1, 249, 1, 1) go

Here is the row offset array from the results:

 Row - Offset               4 (0x4) - 6556 (0x199c)    3 (0x3) - 4941 (0x134d)    2 (0x2) - 3326 (0xcfe)     1 (0x1) - 1711 (0x6af)     0 (0x0) - 96 (0x60)

Now insert one more row, and look at the row offset array again.

 insert into bigrows   values (22, replicate('x', 1600))  go dbcc page(pubs, 1, 249,1,1) go

When inspecting the original page after the split, you might find it contains either the first half of the rows from the original page or the second half. SQL Server will normally move the rows so that the new row to be inserted goes on the new page. Since the rows are moving anyway, it makes more sense to adjust their positions to accommodate the new inserted row. In this example, the new row, with a clustered key value of 22, would have been inserted in the second half of the page. So when the page split appears, the first three rows stay on the original page 249. You need to inspect the page header to find the location of the next page, which contains the new row.

The page number is indicated by the field called m_nextPage . This value is expressed as a file number:page number pair, in decimal, so we can easily use it with the DBCC PAGE command. When we ran this query, we got an m_nextPage of 1:179, so we ran the following command:

 dbcc page(pubs, 1, 179, 1, 1)

Here is the row offset array after the insert, for the second page:

 Row - Offset               2 (0x2) - 1711 (0x6af)     1 (0x1) - 3326 (0xcfe)     0 (0x0) - 96 (0x60)

Note a few important points from this output. After the page split, there are three rows on the page: the last two original rows with keys of 20 and 25, and the new row with a key of 22.

If you examine the actual data on the page, you'll notice that the new row with the key of 22 is at slot position 1, even though the row itself is the last one on the page. Slot 1 (with value 22) starts at offset 3326, and slot 2 (with value 25) starts at offset 1711. The clustered key ordering of the rows is indicated by the slot number of the row, not by the physical position on the page. If a clustered index is on the table, the row at slot 1 will always have a key value less than the row at slot 2 and greater than the row at slot 0.

You can also run DBCC PAGE on the first page and see the first three rows in the same position as they were originally.

Although the typical page split isn't too expensive, you'll want to minimize the frequency of page splits in your production system, at least during peak usage times. You can avoid system disruption during busy times by reserving some space on pages using the FILLFACTOR clause when you're creating the clustered index on existing data. You can use this clause to your advantage during your least-busy operational hours by periodically re-creating the clustered index with the desired FILLFACTOR. That way, the extra space is available during your peak usage times, and you'll save the overhead of splitting then. (Using SQL Server Agent, you can easily schedule the rebuilding of indexes to occur at times when activity is lowest .)

NOTE
Using the FILLFACTOR setting is helpful only when you're creating the index on existing data; the FILLFACTOR value isn't maintained when inserting new data. Trying to maintain extra space on the page would actually defeat the purpose because you'd need to split pages anyway to keep that amount of space available.

Deleting Rows

When deleting rows from a table, we need to consider what happens both to the data pages and the index pages. Remember that the data is actually the leaf level of a clustered index, and deleting rows from a table with a clustered index happens the same way as deleting rows from the leaf level of a nonclustered index. Deleting rows from a heap table is managed in a different way, as is deleting from node pages of an index.

Deleting Rows from a Heap

Unlike earlier versions of SQL Server, SQL Server 7 doesn't automatically compress space on a page when a row is deleted. As a performance optimization, the compaction doesn't occur until a page needs additional contiguous space for inserting a new row. We can see this in the following example, which deletes a row from the middle of a page and then inspects that page using DBCC PAGE. Figure 8-2 shows the output from DBCC PAGE.

 use pubs go drop table smallrows go create table smallrows (a int identity,  b char(10)) go insert into smallrows   values ('row 1')  insert into smallrows   values ( 'row 2') insert into smallrows    values ( 'row 3') insert into smallrows    values ( 'row 4') insert into smallrows    values ( 'row 5') go select first from sysindexes where id = object_id ('smallrows ') go dbcc traceon(3604) go dbcc page(pubs, 1, 248,1,1) go

click to view at full size.

Figure 8-2. Page contents before a delete.

Now we'll delete the middle row (WHERE a = 3) and look at the page again. Figure 8-3 on the following page contains the output from the second execution of DBCC PAGE.

 delete from smallrows where a = 3 go dbcc page(pubs, 1, 248,1,1) go

click to view at full size.

Figure 8-3. Page contents after the delete from a heap.

Note that in the heap table in Figure 8-3, the row doesn't show up in the page itself. The row offset array at the bottom of the page shows that row 3 (at slot 2) is now at offset 0 (which means there really is no row using slot 2), and the row using slot 3 is at its same offset as before the delete. The data on the page is not compressed.

Deleting from a B-Tree

In the leaf level of an index, when rows are deleted, they're marked as ghost records. This means that the row stays on the page, but a bit is changed in the row header to indicate that the row is really a ghost. The page header also reflects the number of ghost records on a page. Ghost records are primarily a concurrency optimization for key-range locking. The details of key-range locking, as well as all other locking modes, are discussed in Chapter 13, but a short explanation is in order here, just for completeness.

Ghost records are present only in the index leafs. If ghost records weren't used, the entire range surrounding a deleted key would have to be locked. So suppose you had a unique index on an integer and the index contained the values 1, 30, and 100. If you deleted 30, SQL Server would need to lock (and prevent inserts into) the entire range between 1 and 100. With ghosted records, the 30 is still visible to be used as an endpoint of a key-range lock so that during the delete transaction, SQL Server can allow inserts for any other value, other than 30 itself, to proceed. Even after the delete of 30 is committed, the ghost record will still be available for key-range locks to use, providing greater concurrency.

SQL Server provides a special housekeeping thread that periodically checks B-trees for the existence of ghosted records and asynchronously removes them from the leaf level of the index. (This same thread carries out the automatic shrinking of databases if you have that option set. When this thread is active, the Current Activity window in SQL Server Enterprise Manager, or the sp_who command, will show you an active process with a spid value of 6.)

The following example builds the same table used in the previous DELETE example, but this time the table has a primary key declared, which means a clustered index will be built. The data is the leaf level of the clustered index, so when the row is removed, it will be marked as a ghost. (Figure 8-4 shows the output from DBCC PAGE.)

 use pubs go drop table smallrows go create table smallrows (a int identity primary key,  b char(10)) go insert into smallrows   values ('row 1')  insert into smallrows  values ( 'row 2') insert into smallrows  values ( 'row 3') insert into smallrows  values ( 'row 4') insert into smallrows  values ( 'row 5') go select first from sysindexes where id = object_id ('smallrows ') go delete from smallrows where a = 3 go dbcc page(pubs, 1, 253,1,1) go

Note that in Figure 8-4, because the table has a clustered index, the row still shows up in the page itself. The header information for the row shows that this is really a ghosted record. The row offset array at the bottom of the page shows that the row at slot 2 is still at the same offset and that all rows are in the same location as before the deletion. In addition, the page header gives us a value( m_ghostRecCnt ) for the number of ghosted records in the page. To see the total count of ghost records in a table, you can enable trace flag 2514. If you then execute the DBCC CHECKTABLE command, you'll see the number of ghost records in that table.

 DBCC TRACEON (2514) GO DBCC CHECKTABLE (smallrows)

You'll see results that look like this:

 DBCC results for 'smallrows'. There are 4 rows in 1 page for object 'smallrows'. Ghost Record count = 1 DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Note that the trace flag setting applies only to the current connection. You can force the setting to apply to all connections by enabling the trace flag along with a -1:

 DBCC TRACEON (2514, -1)

Deleting in the Node Levels of an Index

When you delete a row from a table, all nonclustered indexes need to be maintained, because every nonclustered index had a pointer to the row that's now gone. Rows in index node pages aren't ghosted when deleted, but just like heap pages, the space isn't compressed until new index rows need space in that page.

Reclaiming Pages

When the last row is deleted from a data page, the entire page is deallocated. (If the page is the only one remaining in the table, it isn't deallocated. A table always contains at least one page, even if it's empty.) This also results in the deletion of the row in the index page that pointed to the old data page. Index pages are deallocated if an index row is deleted (which again might occur as part of a delete/insert update strategy), leaving only one entry in the index page. That entry is moved to its neighboring page, and then the empty page is deallocated.

The discussion so far has focused on the page manipulation necessary for deleting a single row. If multiple rows are deleted in a single deletion operation, you need to be aware of some other issues. Because the issues of modifying multiple rows in a single query are the same for inserts, updates, and deletes, we'll discuss this issue near the end of the chapter.

click to view at full size.

Figure 8-4. Page contents after the delete from a table with a clustered index.

Updating Rows

SQL Server updates rows in multiple ways, automatically and invisibly choosing the fastest update strategy that can be applied for the specific operation. In determining the strategy, SQL Server evaluates the number of rows affected, how the rows will be accessed (via a scan or an index retrieval, and via which index), and whether changes to the index keys will occur. Updates can happen either in place or as a delete followed by an insert. In addition, updates can be managed by the query processor or by the storage engine. In this section, we'll examine only whether the update happens in place. The decision of whether the update is controlled by the query processor or the storage engine is actually relevant to all data modification operations (not just updates), so we'll look at that in a separate section.

Earlier versions of SQL Server could perform either a direct or a deferred update. Deferred meant that the update occurred in two complete passes . The first pass used the transaction log as a holding area, in which SQL Server recorded all the changes that were going to be made and marked them as NO-OP (because they weren't actually executed at that time). The second pass would then reread the log and apply the changes. Because all changes ”including changes to every nonclustered index ”must be recorded twice, this was log intensive . In addition, in earlier versions of SQL Server, any time a row moved, every nonclustered index had to be updated to hold the new location of the row. Deferred updates were by far the slowest kind of update.

In SQL Server 7, no deferred updates exist. All updates are done in a direct mode, without using the transaction log as an intermediate holding area. In addition, because of the way SQL Server 7 maintains nonclustered indexes, the overhead of moving rows, even if the table has multiple nonclustered indexes, is an inexpensive operation.

Moving Rows

What happens if a table row has to move to a new location? In SQL Server 7, this can happen because a row with variable-length columns is updated to a new, larger size so that it no longer fits on the original page. It can also happen when the clustered index column is changing, because rows are stored in order of the clustering key. For example, if we have a clustered index on lastname , a row with a lastname value of Abbot will be stored near the beginning of the table. If the lastname value is then updated to Zappa , this row will have to move to be near the end of the table.

In Chapter 3, we looked at the structure of indexes and saw that the leaf level of nonclustered indexes contains a row locator for every single row in the table. If the table has a clustered index, that row locator is the clustering key for that row. So if ”and only if ”the clustered index key is being updated, modifications are required in every nonclustered index. Keep this fact in mind when deciding which columns to build your clustered index on. It's a great idea to cluster on a nonvolatile column.

If a row moves because it no longer fits on the original page, it will still have the same row locator (in other words, the clustering key for the row stays the same), and no nonclustered indexes will have to be modified.

In Chapter 3, we also saw that if a table has no clustered index (in other words, it's a heap), then the row locator stored in the nonclustered index is actually the physical location of the row. In SQL Server 7, if a row in a heap moves to a new page, the row will leave a forwarding pointer in the original location. The nonclustered indexes won't need to be changed; they'll still refer to the original location, and from there they'll be directed to the new location.

Let's look at an example. We'll create a table a lot like the one we created for doing inserts, but this table will have a third column of variable length. After we populate the table with five rows, which will fill the page, we'll update one of the rows to make its third column much longer. The row will no longer fit on the original page and will have to move. Remember that we need to select the first column from sysindexes to know the address of the page. The value you actually supply to DBCC PAGE depends on the value that first returns.

 use pubs go drop table bigrows go create table bigrows (a int identity ,  b varchar(1600),  c varchar(1600)) go insert into bigrows   values (replicate('a', 1600), '')  insert into bigrows   values (replicate('b', 1600),'') insert into bigrows   values (replicate('c', 1600),'') insert into bigrows   values (replicate('d', 1600),'') insert into bigrows   values (replicate('e', 1600),'') go select first from sysindexes where id = object_id ('bigrows') go update bigrows  set c = replicate('x', 1600) where a = 3 go dbcc traceon(3604) go dbcc page(pubs, 1, 266,1,1) go

We won't show you the entire output from the DBCC PAGE command, but we'll show you what appears in the slot where the row with a = 3 used to appear:

 Slot = 2                  Offset = 0x1feb            Record Type = FORWARDING_STUB                       Record Attributes =  11ef3feb:  0000f904  00000100        00             .........

The value of 4 in the first byte means this is just a forwarding stub. The 0000f9 in the next 3 bytes is the page number to which the row has been moved. Because this is a hexadecimal value, we need to convert it to 249 decimal. The next group of four bytes tells us the page is at slot 0, file 1. If you then use DBCC PAGE to look at that page 249, you can see what the forwarded record looks like.

Managing Forward Pointers

Forward pointers allow us to modify data in a heap without worrying about having to make drastic changes to the nonclustered indexes. If a row that has been forwarded must move again, the original forwarding pointer is updated to point to the new location. We'll never end up with a forwarding pointer pointing to another forwarding pointer. In addition, if the forwarded row shrinks enough to fit in its original place, the forward pointer is eliminated, and the record moves back to its place.

In a future version of SQL Server, some mechanism might be invented for performing a physical reorganization of the data in a heap, which will get rid of forward pointers. Currently, when a forward pointer is created, it stays there forever ”with only two exceptions. The first exception is the case just discussed, when a row shrinks enough to return to its original location. The second exception is when the entire database shrinks. The RIDs (used as the row locators) are actually reassigned when a file is shrunk. The shrink process never generates forwarded rows. For those pages removed because of the shrinking process, any forwarded rows or stubs they contain will be effectively "unforwarded."

Updating in Place

Unlike earlier versions of SQL Server, updating a row right in place is the rule rather than the exception in SQL Server 7. This means that the row stays in exactly the same location on the same page, and only the bytes affected are changed. In addition, in most cases of updating in place, the log will contain a single record for each such updated row. The exceptions occur when the table has an update trigger on it or when the table is marked for replication. In these cases, the update still happens in place, but the log contains a delete record followed by an insert record.

In cases where a row can't be updated in place, the cost of a not-in-place update is minimal because of the way the nonclustered indexes are stored and because of the use of forwarding pointers. Updates will happen in place if a heap is being updated or if a table with a clustered index is updated without changing any of the clustering keys.

To see the total count of forwarded records in a table, enable trace flag 2509 and then execute the DBCC CHECKTABLE command:

 DBCC TRACEON (2509) GO DBCC CHECKTABLE (bigrows)

You'll see results like this:

 DBCC results for 'bigrows'. There are 5 rows in 2 pages for object 'bigrows'. Forwarded Record count = 1 DBCC execution completed. If DBCC printed error messages, contact  your system administrator.

Updating Not in Place

If your update can't happen in place because you're updating clustering keys, it will occur as a delete followed by an insert. In some cases, you'll get a hybrid update: some of the rows are updated in place and some aren't. If you're updating index keys, SQL Server will build a list of all the rows that need to change as both a delete and an insert operation.

This list is stored in memory, if it's small enough, and is written to tempdb if necessary.This list is then sorted by key value and operator (delete or insert). If the index whose keys are changing isn't unique, the delete and insert steps are then applied to the table. If the index is unique, an additional step is carried out to collapse delete and insert operations on the same key into a single update operation. Let's look at a simple example. The code at the top of the following page builds a table with a unique clustered index on column X and then updates that column in both rows.

 use pubs GO DROP TABLE T1 GO CREATE TABLE T1(X int primary key, Y int) INSERT T1 VALUES(1, 10) INSERT T1 VALUES(2, 20) GO UPDATE T1 SET X = 3 - X GO

Table 8-3 shows the operations that SQL Server generates internally for the above update statement. This list of operations is called the input stream and consists of both the old and new values of every column and an identifier for each row that is to be changed.

Table 8-3. A SQL Server input stream for an update operation.

Operation	Original X	Original Y	New X	New Y	Row ID
update	1	10	2	10	ROW1
update	2	20	1	20	ROW2

However, in this case the updates must be split into delete and insert operations. If they aren't, the update to the first row, changing X from 1 to 2, will fail with a duplicate-key violation. So a converted input stream is generated and looks like Table 8-4.

Table 8-4. Converted input stream for an update operation.

Operation	Original X	Original Y	Row ID
delete	1	10	ROW1
insert	2	10	<null>
delete	2	20	ROW2
insert	1	20	<null>

Note that for insert operations, the RID isn't relevant in the input stream of operations to be performed. Before a row is inserted, it has no RID. This input stream is then sorted by index key and operation, as in Table 8-5.

Table 8-5. Input stream sorted by index key and operation.

Operation	X	Y	Row ID
delete	1	10	ROW1
insert	1	20	<null>
delete	2	20	ROW2
insert	2	10	<null>

Finally, if the same key value has both a delete and an insert, the two rows in the input stream are collapsed into an update operation. So the final input stream looks like Table 8-6.

Table 8-6. The final output stream for an update operation.

Operation	Original X	Original Y	New X	New Y	Row ID
update	1	10	1	20	ROW1
update	2	20	2	10	ROW2

Note that, although the original query was an update to column X , after the split, sort , and collapse of the input stream the final set of actions looks like we're updating column Y ! This method of actually carrying out the update ensures that no intermediate violations of the index's unique key occur.

Part of the graphical query plan for this update (Figure 8-5) shows you the split, sort, and collapse phases. We'll be looking at query plans and the mechanisms for examining query plans in Chapter 14.

Updates to any nonclustered index keys also affect the leaf level of the index by splitting the operation into deletes and inserts. Think of the leaf level of a nonclustered index as a mini_clustered index, so any modification of the key could potentially affect the sequence of values in the leaf level. Just like for the data in a clustered index, the actual type of update is determined by whether the index is unique or not. If the nonclustered index is nonunique , the update is split into delete and insert operators. If the nonclustered index is unique, the split is followed by a sort and an attempt to collapse any deletes and inserts on the same key back into an update operation.

click to view at full size.

Figure 8-5. The update query plan displayed graphically.

Table-Level vs. Index-Level Data Modification

We've just considered the placement and index manipulation necessary for modifying either a single row or only a few rows with no more than a single index. If you are modifying multiple rows in a single insertion operation (insert, update, or delete) or by using BCP or the BULK INSERT command, and the table has multiple indexes, you must be aware of some other issues. In SQL Server 7, two different strategies exist to maintain all the indexes that belong to a table. The query optimizer chooses between them based on its estimate of the anticipated execution costs for each strategy.

The two strategies are called table-level modification and index-level modification. Table level is sometimes called row at a time, and index level is sometimes called index at a time . In a table-level modification, all indexes are maintained for each row as that row is modified. If the update stream isn't sorted in any way, SQL Server will have to do a lot of random index accesses, one access per index per update row. If the update stream is sorted, it can't be sorted in more than one order; therefore, there might be nonrandom index accesses for at most, one index.

In index-level modifications, SQL Server gathers all the rows to be modified and sorts them for each index. In other words, there are as many sort operations as there are indexes. Then, for each index, the updates are merged into the index, and each index page will never be accessed more than once, even if multiple updates pertain to a single index leaf page.

Clearly, if the update is small, say less than a handful of rows, and the table and its indexes are sizable , then the query optimizer usually considers table level the best choice. Most OLTP operations will use table-level modifications. On the other hand, if the update is relatively large, table-level modifications require a lot of random I/O operations and might even read and write each leaf page in each index multiple times. In that case, index-level modification performs much better. The amount of logging required is the same for both strategies.

Let's look at a specific example with some actual numbers . Suppose you use BULK INSERT to increase the size of a table by 1 percent (which could correspond to one week in a two-year sales history table). The table is stored in a heap (no clustered index), and the rows are simply appended at the end because there is not much available space on other pages in the table. There's nothing to sort on for insertion into the heap, and the rows (and the required new heap pages and extents) are simply appended at the end of the heap. Assume also that there are two nonclustered indexes on columns a and b . The insert stream is sorted on column a and merged into the index on a .

If an index entry is 20 bytes long, an 8 KB page filled at 70 percent (which is the natural, self-stabilizing value in a B-tree after lots of random insertions and deletions) would contain 8 KB * 70% / 20 bytes = 280 entries. Eight pages (one extent) would then contain 2240 entries, presuming leaf pages are nicely laid out in extents. A 1 percent table growth implies, on average, 2.8 new entries per page, or 22.4 new entries per extent.

Table-level (row-at-a-time) insertion would touch each page, on average, 2.8 times. Unless the buffer pool is large, row-at-a-time insertion reads, updates, and writes each page 2.8 times. That is 5.6 I/O operations per page, or 44.8 I/O operations per extent. Index-at-a-time reads, updates, and writes each page (and extent, again presuming a nice contiguous layout) exactly once. In the best case, about 45 page I/O operations have been replaced by 2 extent I/O operations, for each extent. Of course, the cost of doing the insert this way also includes the cost of sorting the inputs for each index. But the much-reduced cost of the index-level strategy can easily dwarf the cost of sorting.

NOTE
Earlier versions of SQL Server always recommended that you drop all your indexes if you were going to import a lot of rows into a table using a bulkcopy interface, because no index-level strategy existed for maintaining all the indexes.

You can determine whether your updates were done at the table level or the index level by inspecting the SHOWPLAN output. (Query plans and SHOWPLAN will be discussed in more detail in Chapter 14.) If SQL Server performs the update at the index level, you'll see a plan produced that contains an update operator for each of the affected indexes. If SQL Server performs the update at the table level, you'll see only a single update operator in the plan.

Logging

Standard INSERT, UPDATE, and DELETE statements are always logged to ensure atomicity, and you can't disable logging of these operations. The modification must be known to be safely on disk in the transaction log ( write-ahead logging) before the commit of the statement or transaction can be acknowledged to the calling application. Page allocations and deallocations, including those done by TRUNCATE TABLE, are also logged. You can specify a few unlogged operations by enabling the select into/bulkcopy option. These unlogged operations are available for faster bulk loading of data.

Locking

Any data modification must always be protected with some form of exclusive lock. For the most part, SQL Server makes all the locking decisions internally; a user or programmer doesn't need to request a particular kind of lock. Chapter 13 explains the different types of locks and their compatibility. Chapter 14 discusses techniques for overriding SQL Server's default locking behavior. However, because locking is closely tied to data modification, you should always be aware of the following:

Every type of data modification performed in SQL Server requires some form of exclusive lock. For most data modification operations, SQL Server considers row locking as the default, but if many locks are required, this could escalate to page locks or even a full table lock.
Update locks can be used to signal the intention to do an update and are important to avoiding deadlock conditions. But ultimately, the update operation requires that an exclusive lock be performed. The update lock serializes access to ensure that an exclusive lock will be able to be acquired , but the update lock isn't sufficient by itself.
Exclusive locks must always be held until the end of a transaction in case the transaction needs to be undone (unlike shared locks, which can be released as soon as the scan moves off the page, assuming READ COMMITTED isolation is in effect).
If a full table scan must be employed to find qualifying rows for an update or a delete, SQL Server has to inspect every row to determine the row to modify. Other processes that need to find individual rows will be blocked even if they are ultimately going to modify different rows. Without inspecting the row, SQL Server has no way of knowing whether the row qualifies for the modification or not. If you're modifying only a subset of rows in the table, as determined by a WHERE clause, make sure that you have indexes available to allow SQL Server to access the needed rows directly and not have to scan every row in the table. (We'll talk about how to choose the best indexes for your tables in Chapter 14.) Doing full table scans to modify rows isn't going to result in a high-concurrency environment.