Working with Text and Image Data | Inside Microsoft SQL Server 2000

SQL Server provides large object (LOB) support via the ntext, text, and image datatypes. I talked about their storage structure in Chapter 6. If you work with these datatypes, you might want to use the additional statements provided by SQL Server along with the standard SELECT, INSERT, UPDATE, and DELETE statements. Because a single LOB column can be as large as 2 GB, you frequently need to work with LOB data in chunks, and these additional statements (which we'll see in a moment) can help. I could have discussed this topic earlier, when I discussed Transact-SQL programming in Chapter 10. However, because you need some knowledge of isolation levels, transactions, and consistency issues to understand this topic, I've waited until after I covered most of those issues. In addition, one of the examples I'll show you involves cursors, so I really couldn't show that one before now.

For simplicity's sake, most of the discussion will deal with the text datatype. But everything here is also relevant to the image and ntext datatypes. These three datatypes are essentially the same internally. Recall that ntext, text, and image datatypes are special because they are normally not stored on the same data page as the rest of the row. Instead, the data row holds only a 16-byte pointer to an ntext, text, or image structure. Remember that you can choose to store LOB data in the row itself if the amount of data is small enough and the table option text in row has been enabled.

Although the space required by text data is managed much more efficiently in SQL Server 2000 than in earlier versions, functional drawbacks still exist. Although you can indeed use standard INSERT, UPDATE, DELETE, and SELECT statements with a text column, some significant restrictions apply. In a WHERE clause, you can search on the text column only with the LIKE operator or with the functions SUBSTRING and PATINDEX. Variables of type text cannot be declared or manipulated. You can declare a parameter in a stored procedure to be of type text (or ntext or image), but you can't do much besides pass a value to the procedure initially. For example, you cannot subsequently assign different values to the parameter or return it as an OUTPUT parameter. Because of these limitations, you'll want to use these special datatypes only when another datatype isn't a reasonable option. If a varchar(8000) column can work for you, (or nvarchar(4000) for Unicode data) you can use it and avoid text or ntext altogether. But if you absolutely need a memo field, for example, and 8000 characters are not enough, you'll need to use text or denormalize the data and use multiple varchar columns.

If a text column makes the most sense for you despite its drawbacks, you need to understand how to work effectively with text. When you can, it is easiest to work with text datatypes using standard SELECT, INSERT, UPDATE, and DELETE statements. But if your text data gets large, you'll run into issues (such as how big a string your application can pass) that might make it necessary for you to deal with chunks of data at a time instead of the entire column.

The special statements for working with text data are WRITETEXT, READTEXT, and UPDATETEXT. Both READTEXT and UPDATETEXT let you work with chunks of a text column at a time. The WRITETEXT statement does not let you deal with chunks but rather with the entire column only.

You'll need to consider the overhead of transaction logging done with these operations. Unlike the normal INSERT, UPDATE, and DELETE operations that always cause the entire row to be logged, the amount of logging done by the WRITETEXT and UPDATETEXT operations depends on the recovery model you're using. These two operations are considered BULK operations, as I discussed in Chapter 5, which means that if you're in FULL or SIMPLE recovery mode, all the LOB data will be written to the log. If you're in BULK LOGGED mode, the LOB data isn't written directly to the log; instead, all pages modified by these BULK operations will be backed up when the transaction log is backed up. Also, if you use SQL Server replication to replicate text or image columns, any UPDATETEXT or WRITETEXT operations must be logged because the replication process looks for changes based on the transaction log.

The WRITETEXT, READTEXT, and UPDATETEXT statements all work with a text pointer. A text pointer is a unique varbinary(16) value for each text, ntext, or image column of each row. The following discussion of the three operations will talk about text data for simplicity, but in most cases it will apply to all three of the LOB datatypes.

WRITETEXT

WRITETEXT completely overwrites an existing text column. You provide the column name (qualified by the table name), the text pointer for the specific column of a specific row, and the actual data to be written. It might seem like a catch-22 when using WRITETEXT initially, because you need to pass it a text pointer—but if the column is initially NULL, there is no text pointer. So how do you get one? You SELECT it using the textptr function. But if the text column has not been initialized, the textptr function returns NULL. To initialize a text pointer for a column of a row with text or image data, you can perform some variation of the following:

Explicitly insert a non-null value in the text column when you use an INSERT statement. Recognize that WRITETEXT will completely overwrite the column anyway, so the value can always be something such as A or a space.
Define a default on the column with a non-null value such as A. Then when you do the insert, you can specify DEFAULT or omit the column, which will result in the default value being inserted and the text pointer being initialized.
Update the row after inserting it, explicitly setting it to anything, even NULL.

You then select the text pointer into a variable declared as varbinary(16) and pass that to WRITETEXT. You can't use SELECT statements or expressions in the WRITETEXT statement. This means that the statement is limited to being executed one row at a time, although it can be done from within a cursor. The SELECT statement that gets the text pointer should be known to return only one row, preferably by using an exact match on the primary key value in the WHERE clause, because that will ensure that at most one row can meet the criteria. You can, of course, use @@ROWCOUNT to check this if you're not absolutely sure that the SELECT statement can return only one row. Before using the WRITETEXT statement, you should also ensure that you have a valid text pointer. If you find a row with the criteria you specified and the text pointer for that row was initialized, it will be valid. You can check it as a separate statement using the TEXTVALID function. Or you can check that you do not have a NULL value in your variable that was assigned the text pointer, as you'll see in the following example. Make sure that you don't have an old text pointer value from a previous use, which would make the IS NOT NULL check be TRUE. In this example, I do one variable assignment and the variable starts out NULL, so we can be sure that a non-null value means we have selected a valid text pointer:

 -- WRITETEXT with an unprotected text pointer DECLARE @mytextptr varbinary(16) SELECT @mytextptr=TEXTPTR(pr_info)      FROM pub_info WHERE pub_id='9999'  IF @mytextptr IS NOT NULL     WRITETEXT pub_info.pr_info @mytextptr  'Hello Again'

In this example, the text pointer is not protected from changes made by others. Therefore, it is possible that the text pointer will no longer be valid by the time the WRITETEXT operation is performed. Suppose that you get a text pointer for the row with pub_id='9999'. But before you use it with WRITETEXT, another user deletes and reinserts the row for publisher 9999. In that case, the text pointer you're holding will no longer be valid. In the example above, the window for this occurrence is small, since we do the WRITETEXT immediately after getting the text pointer. But there is still a window of vulnerability. In your application, the window might be wider. If the text pointer is not valid when you do the WRITETEXT operation, you will get an error message like this:

 Msg 7123, Level 16, State 1 Invalid text pointer value 000000000253f380.

You can easily see this for yourself if you add a delay (for example, WAITFOR DELAY '00:00:15') after getting the text pointer and then delete the row from another connection. You'll get error 7123 when the WRITETEXT operation executes. If you think the chances of getting this error are slim, you can choose to simply deal with the error if it occurs.

You should instead consider using transaction protection to ensure that the text pointer will not change from the time you read it until you use it, and to serialize access for updates so that you do not encounter frequent deadlocks. Many applications use TEXTVALID to check right before operating—that's the right idea, but it's hardly foolproof. There's still a window between the TEXTVALID operation and the use of the text pointer during which the text pointer can be invalidated. The only way to close the window is to make both operations part of an atomic operation. This means using a transaction and having SQL Server protect the transaction with a lock. (For more about locking, see Chapter 14.)

By default, SQL Server operates with Read Committed isolation and releases a share (READ) lock after the page has been read. So simply putting the pointer in a transaction with the Read Committed isolation level, which is SQL Server's default, is not enough. You need to ensure that the lock is held until the text pointer is used. You could change the isolation level to Repeatable Read, which is not a bad solution, but this changes the isolation behavior for all operations on that connection, so it might have a more widespread effect than you intend. (Although you could, of course, then change it right back.) Also, this approach doesn't guarantee that you will subsequently be able to get the exclusive lock required to do the WRITETEXT operation; it ensures only that when you get to the WRITETEXT operation, the text pointer will still be valid. You won't be sure that you're not in the lock queue behind another connection waiting to update the same row and column. In that case, your transaction and the competing one would both hold a share lock on the same row and would both need to acquire an exclusive lock. Since both transactions are holding a share lock, neither can get the exclusive lock and a deadlock results in one of the connections having its transaction automatically aborted. (In Chapter 14, you'll see that this is an example of a conversion deadlock.) If multiple processes are intending to modify the text and all transactions first request an update lock in a transaction when selecting the text pointer, conversion deadlocks will be avoided because only one process will get the update lock and the others will queue for it. But users' transactions that need only to read the row will not be affected because an update lock and a share lock are compatible.

Using the update lock on the text pointer is good for serializing access to the actual text data, even though the lock on the text data is distinct from the lock on the row with the text pointer. In this case, you essentially use the update lock on a text pointer as you'd use an intent lock for the text data. This is conceptually similar to the intent locks that SQL Server uses on a table when a page-locking or row-locking operation for that table will take place. That operation recognizes that rows, pages, and tables have an implicit hierarchy. You can think of text pointers and text data as having a similar hierarchy and use the update lock on the text pointer to protect access to the associated text data.

Here is the improved version that protects the text pointer from getting invalidated and also reserves the transaction's spot in the queue so that it will get the exclusive lock that's necessary to change the column. This approach will avoid conversion deadlocks on the text pages:

 -- WRITETEXT with a properly protected text pointer BEGIN TRAN DECLARE @mytextptr varbinary(16) SELECT @mytextptr=TEXTPTR(pr_info)      FROM pub_info (UPDLOCK) WHERE pub_id='9999'  IF @mytextptr IS NOT NULL     WRITETEXT pub_info.pr_info @mytextptr 'Hello Again' COMMIT TRAN

READTEXT

READTEXT is used in a similar way to WRITETEXT, except that READTEXT allows you to specify a starting position and the number of bytes to read. This is its basic syntax:

  READTEXT [[database.]owner.]table_name.column_name     text_ptr offset size [HOLDLOCK]

Unlike with WRITETEXT, with READTEXT you do not need to work with the entire contents of the data. You can specify the starting position (offset) and the number of bytes to read (size). READTEXT is often used with the PATINDEX function to find the offset (in number of characters) at which some string or pattern exists, and it's also used with datalength to determine the total size of the text column in bytes. If you're working with ntext data, you'll have to divide DATALENGTH by 2 to get the number of characters. But you cannot directly use these functions as the offset parameter. Instead, you must execute them beforehand and keep their values in a local variable, which you then pass. As I mentioned in the discussion of WRITETEXT, you'll want to protect your text pointer from becoming invalidated. In the next example, you'll read text without updating it. So you can use the HOLDLOCK lock hint on the SELECT statement for the text pointer (or set the isolation level to Repeatable Read).

Sometimes people think that transactions are used only for data modifications, but notice that in this case you use a transaction to ensure read repeatability (of the text pointer) even though you're not updating anything. You can optionally add HOLDLOCK to the READTEXT statement to ensure that the text doesn't change until the transaction has completed. But in the example below, we'll read the entire contents with just one read and we will not be rereading the contents, so there's no point in using HOLDLOCK. This example finds the pattern Washington in the pr_info column for pub_id 0877 and returns the contents of that column from that point on:

 -- READTEXT with a protected text pointer BEGIN TRAN DECLARE @mytextptr varbinary(16), @sizeneeded int, @pat_offset int SELECT @mytextptr=TEXTPTR(pr_info),      @pat_offset=PATINDEX('%Washington%',pr_info) - 1,     @sizeneeded=DATALENGTH(pr_info) -          (PATINDEX('%Washington%',pr_info) - 1)     FROM pub_info (HOLDLOCK) WHERE pub_id='0877'  IF @mytextptr IS NOT NULL AND @pat_offset >= 0 AND     @sizeneeded IS NOT NULL     READTEXT  pub_info.pr_info @mytextptr @pat_offset @sizeneeded COMMIT TRAN

NOTE
If you run the preceding query using SQL Query Analyzer, you might not see the entire text result. SQL Query Analyzer has a default limit of 256 characters that can be returned for a text column. You can use the Tools/Options command and go to the Results tab to change this limit.

The offset returned by PATINDEX and the offset used by READTEXT are unfortunately not consistent. READTEXT treats the first character as offset 0. (This makes sense because you can think of an offset as how many characters you have to move to get to the desired position—to get to the first character, you don't need to move at all.) But PATINDEX returns the value in terms of position, not really as an offset, and so the first character for it would be 1. You could actually call this a minor bug. But people have adapted to it, so changing it would cause more problems than it would solve at this point. You should assume that this acts as intended and adjust for it. You need to fix this discrepancy by taking the result of PATINDEX and subtracting 1 from it. PATINDEX returns 0 if the pattern is not found. Since we subtract 1 from the value returned by PATINDEX, if the pattern was not found, the variable @pat_offset would be -1. We simply check that @pat_offset is not negative.

You also need to specify how many bytes you want to read. If you want to read from that point to the end of the column, for example, you can take the total length as returned from datalength and subtract the starting position, as shown in the preceding example. You cannot simply specify a buffer that you know is large enough to read the rest of the column into; that will result in error 7124:

 The offset and length specified in the READTEXT command is greater  than the actual data length of %d.

If you could always perform a single READTEXT to handle your data, you'd probably not use it; instead, you could use SELECT. You need to use READTEXT when a text column is too long to reasonably bring back with just one statement. For example, the text size of the pr_info field for publisher 1622 is 18,518 bytes. You can't select this value in an application such as SQL Query Analyzer because it's longer than the maximum expected row length for a result set, so it would be truncated. But you can set up a simple loop to show the text in pieces. To understand this process, you need to be aware of the system function @@TEXTSIZE, which returns the maximum amount of text or image data that you can retrieve in a single statement. Of course, you need to make sure that the buffer in your application will also be large enough to accept the text. You can read chunks smaller than the @@TEXTSIZE limit, but not larger.

NOTE
The value for @@TEXTSIZE is unrelated to the setting in SQL Query Analyzer for the maximum number of characters discussed in the previous note. @@TEXTSIZE is a server option, while the character limit controls the client tool's display.

You can change the value of @@TEXTSIZE for your connection by using SET TEXTSIZE n. The default value for @@TEXTSIZE is 64 KB. You should read chunks whose size is based on the amount of space available in your application buffer and based on the network packet size so that the text will fit in one packet, with an allowance for some additional space for metadata. For example, with the default network packet size of 4096 bytes (4 KB), a good read size would be about 4000 bytes, assuming that your application could deal with that size. You should also be sure that @@TEXTSIZE is at least equal to your read size. You can either check to determine its size or explicitly set it, as we do in the example below. Also notice the handy use of the CASE statement for a variable assignment to initialize the @readsize variable to the smaller of the total length of the column and the value of @@TEXTSIZE. In this example, I make the read size only 100 characters so that it displays easily in SQL Query Analyzer or in a similar query window. But this is too small for most applications and is used here for illustration only.

 -- READTEXT in a loop to read chunks of text. -- Instead of using HOLDLOCK, use SET TRANSACTION ISOLATION LEVEL  -- REPEATABLE READ (equivalent). Then set it back when done but  -- be sure to do so in a separate batch. SET TRANSACTION ISOLATION LEVEL REPEATABLE READ SET TEXTSIZE 100    —- Just for illustration; too small for                      —- real world. 4000 would be a better value. BEGIN TRAN DECLARE @mytextptr varbinary(16), @totalsize int,     @lastread int, @readsize int SELECT     @mytextptr=TEXTPTR(pr_info), @totalsize=DATALENGTH(pr_info),     @lastread=0,     -- Set the readsize to the smaller of the @@TEXTSIZE setting      -- and the total length of the column     @readsize=CASE WHEN (@@TEXTSIZE < DATALENGTH(pr_info)) THEN          @@TEXTSIZE ELSE DATALENGTH(pr_info) END      FROM pub_info WHERE pub_id='1622'  IF @mytextptr IS NOT NULL AND @readsize > 0     WHILE (@lastread < @totalsize)     BEGIN         READTEXT pub_info.pr_info @mytextptr @lastread @readsize         IF (@@error <> 0)             BREAK    -- Break out of loop if an error on read         -- Change offset to last char read         SELECT @lastread=@lastread + @readsize                 -- If read size would go beyond end, adjust read size         IF ((@readsize + @lastread) > @totalsize)              SELECT @readsize=@totalsize - @lastread      END      COMMIT TRAN GO —- Set it back, but in a separate batch SET TRANSACTION ISOLATION LEVEL READ COMMITTED

Notice that in this example I need to ensure not only that the text pointer is still valid when we get to READTEXT but also that the column is not changed between iterations of READTEXT. If another connection simply updated the text in place, the text pointer would still be valid, although the read would be messed up since the contents and length were changed. I could use HOLDLOCK both on the READTEXT statement as well as for protecting the text pointer. But for illustration, I instead changed the isolation level to REPEATABLE READ.

NOTE
If the pub_info table in your pubs database doesn't have any text data in the pr_info column, you can load it using a supplied script in the install directory of your SQL Server installation folder. The script called pubtext.bat requires two parameters: your sa password and the name of the SQL Server. There is also a script to load in image data for the logo column of pub_info, called pubimage.bat. Both of these scripts use the undocumented command line utility called textcopy. (Although textcopy will allow image data to be copied into a SQL Server image column, there are no supplied facilities for displaying image data graphically. You'll need to use a client tool that provides this capability in order to "see" the graphics.)

Copying Text to Sequenced varchar Columns

As you've seen, the text datatype is sometimes awkward to work with. Many functions don't operate on text, stored procedures are limited in what they can do with text, and some tools don't deal with it well. If the amount of text isn't too big—and you get to be the judge of what is too big for you—you can covert a text column to a set of varchar columns. In SQL Server 2000, varchar columns can be up to 8000 bytes. Although the code that follows can produce columns of that size, that might still be too big to work with effectively. All of SQL Server's string functions will work on these columns, but if you want to print out each column, you might want to find something smaller that fits your needs better. The code below produces columns of length 200. You can take this code and substitute varchar(8000) or any other length up to 8000 for varchar(200).

Like the previous example, the following procedure operates on the pub_info table from the pubs database. For a given pub_id, the procedure takes the text column of the row and does successive READTEXT operations in chunks of 200 until the entire text column has been read. After creating the procedure, we use a cursor to iterate for each row of pub_info and copy the text column into multiple rows of a temporary table. When we're done for a given pub_id, we take the temporary table's contents and add it to the permanent table. We truncate the temporary table and move on to the next pub_id value.

 -- copy_text_to_varchar.sql -- Proc get_text does READTEXT in a loop to read chunks of text -- no larger than 200, or the column size or @@TEXTSIZE, and -- produces as many rows as necessary to store the text column -- as a series of sequenced varchar(200) rows CREATE PROC get_text @pub_id char(4) AS DECLARE @mytextptr varbinary(16), @totalsize int, @lastread int,     @readsize int -- Use a TRAN and HOLDLOCK to ensure that textptr and text are  -- constant during the iterative reads BEGIN TRAN SELECT @mytextptr=TEXTPTR(pr_info), @totalsize=DATALENGTH(pr_info),     @lastread=0,     -- Set the readsize to the smaller of the @@TEXTSIZE settings,     -- 200, and the total length of the column     @readsize=CASE WHEN (200 < DATALENGTH(pr_info)) THEN 200      ELSE DATALENGTH(pr_info) END     FROM pub_info (HOLDLOCK) WHERE pub_id=@pub_id  -- If debugging, uncomment this to check values  -- SELECT @mytextptr, @totalsize, @lastread, @readsize -- Do READTEXT in a loop to get next 200 characters until done IF @mytextptr IS NOT NULL AND @readsize > 0     WHILE (@lastread < @totalsize)     BEGIN         -- If readsize would go beyond end, adjust readsize         IF ((@readsize + @lastread) > @totalsize)              SELECT @readsize = @totalsize - @lastread                   -- If debugging, uncomment this to check values          -- SELECT 'valid ptr?'=textvalid('pub_info.pr_info',         --    @mytextptr), 'totalsize'=@totalsize,          --    'lastread'=@lastread, 'readsize'=@readsize         READTEXT pub_info.pr_info @mytextptr @lastread @readsize         IF (@@error <> 0)             BREAK    -- Break out of loop if an error on read         -- Change offset to last char read         SELECT @lastread=@lastread + @readsize               END       COMMIT TRAN GO IF EXISTS (SELECT * FROM tempdb.dbo.sysobjects     WHERE name='##mytmptext' AND type='U')     DROP TABLE ##mytmptext GO -- Intermediate temp table that READTEXT will use. -- This table is truncated after each pub_id value, so the Identity -- property sequences the rows for each publisher separately. CREATE TABLE ##mytmptext (     seq_no        int    IDENTITY,     text_chunk    text ) GO IF EXISTS (SELECT * FROM sysobjects     WHERE name='newprinfo' AND type='U')     DROP TABLE newprinfo GO -- This is the new table that pub_info is copied to. -- It keeps chunks of text and is sequenced for each pub_id. CREATE TABLE newprinfo (     pub_id        char(4)    NOT NULL,     seq_no        int        NOT NULL,     text_chunk    varchar(200),     CONSTRAINT PK PRIMARY KEY (pub_id, seq_no) ) GO -- Having created the procedure get_text, iterate for each pub_id -- value, temporarily sequencing them in the temp table. Once done -- for a given pub_id, copy them to the new permanent table. Then  -- truncate the temp table for use with reseeded Identity for next  -- pub_id row. DECLARE @pub_id char(4) DECLARE iterate_prinfo CURSOR FOR     SELECT pub_id FROM pub_info ORDER BY pub_id OPEN iterate_prinfo      FETCH NEXT FROM iterate_prinfo INTO @pub_id         BEGIN          TRUNCATE TABLE ##mytmptext         INSERT ##mytmptext             EXEC get_text @pub_id         INSERT newprinfo (pub_id, seq_no, text_chunk)             SELECT @pub_id, SEQ_NO,             CONVERT(varchar(200), TEXT_CHUNK)              FROM ##mytmptext ORDER BY SEQ_NO         FETCH NEXT FROM iterate_prinfo INTO @pub_id         END     CLOSE iterate_prinfo DEALLOCATE iterate_prinfo GO -- Simply verify contents of the new table SELECT * FROM newprinfo ORDER BY pub_id,seq_no GO

A subset of the output looks like the following:

 pub_id    seq_no    text_chunk ------    ------    ----------------------------------------------- 0736      1         This is sample text data for New Moon Books,                     publisher 0736 in the pubs database. New Moon                     Books is located in Boston, Massachusetts.                     This is sample text data for New Moon Books,                      publisher 0736 in 0736      2         the pubs database. New Moon Books is located in                     Boston, Massachusetts.                     This is sample text data for New Moon Books,                     publisher 0736 in the pubs database. New Moon                      Books is located in Boston, Massach 0736      3         usetts.                     This is sample text data for New Moon Books,                      publisher 0736 in the pubs database. New Moon                      Books is located in Boston, Massachusetts.                      This is sample text data for New Moon Books,                     publish

UPDATETEXT

UPDATETEXT is a big improvement for text processing. If you had only WRITETEXT, you'd need to completely rewrite the entire column to make even a minor change. UPDATETEXT lets you work with text in pieces to insert, overwrite, or append data. Or you can copy data from another text column and append it or overwrite the column with it. Because of its additional capability and flexibility, the syntax for UPDATETEXT is a bit more complex:

 UPDATETEXT table_name.dest_column_name dest_text_ptr     offset delete_length [inserted_data |     table_name.src_column_name src_text_ptr ]

The destination column name and text pointer parameters point to the column that you will be updating; these parameters are required. As you would with WRITETEXT, you should use the UPDLOCK hint to protect the text pointer from becoming invalid and to serialize access to the text pages to prevent a conversion deadlock. The source column name parameters are used only when you copy data from another text column. Otherwise, you directly include in that spot the data you'll be adding or you omit the parameter if you're deleting data. The offset is the position at which you start your data modification. It should be NULL if you're appending to the current contents and 0 if you're starting from the beginning of the column. The delete_length parameter tells you how many characters (if any) to delete starting from the offset parameter. Use NULL for this parameter if you want to delete all contents from the offset up to the end of the column, and use 0 if you want to delete no bytes. As with READTEXT, the first character of the column is considered to have a 0 offset.

UPDATETEXT can do everything WRITETEXT can do, and much more. So you might choose to use only READTEXT and UPDATETEXT and forget about WRITETEXT. (WRITETEXT existed in versions of SQL Server before UPDATETEXT appeared; it is maintained for backward compatibility, but there isn't much need for it now.)

Following are some examples that illustrate the use of UPDATETEXT better than further explanation.

EXAMPLE 1

Use UPDATETEXT to completely replace the contents of a column:

 -- Use UPDATETEXT to completely overwrite a text column. -- Alternative to WRITETEXT. DECLARE @mytextptr varbinary(16) BEGIN TRAN SELECT @mytextptr=TEXTPTR(pr_info) FROM pub_info (UPDLOCK) WHERE     pub_id='9999'  IF @mytextptr IS NOT NULL     UPDATETEXT pub_info.pr_info @mytextptr 0 NULL          'New text for 9999'  COMMIT TRAN

EXAMPLE 2

Use UPDATETEXT to delete characters off the end; first notice that publisher 0877, Binnet & Hardley, has the following contents in the text column pr_info:

 This is sample text data for Binnet & Hardley, publisher 0877 in  the pubs database. Binnet & Hardley is located in Washington,  D.C. This is sample text data for Binnet & Hardley, publisher 0877 in  the pubs database. Binnet & Hardley is located in Washington,  D.C.  This is sample text data for Binnet & Hardley, publisher 0877 in  the pubs database. Binnet & Hardley is located in Washington,  D.C. This is sample text data for Binnet & Hardley, publisher 0877 in  the pubs database. Binnet & Hardley is located in Washington,  D.C. This is sample text data for Binnet & Hardley, publisher 0877 in  the pubs database. Binnet & Hardley is located in Washington,  D.C.

Because the text is repeated several times, we'll delete all characters that follow the first occurrence of D.C.:

 DECLARE @mytextptr varbinary(16), @pat_offset int BEGIN TRAN SELECT @mytextptr=TEXTPTR(pr_info),      @pat_offset=PATINDEX('%D.C.%', pr_info)-1+4      -- For offset, subtract 1 for offset adjust but add 4 for     -- length of "D.C."     FROM pub_info (UPDLOCK) WHERE pub_id='0877'  IF @mytextptr IS NOT NULL AND @pat_offset >= 4     UPDATETEXT pub_info.pr_info @mytextptr @pat_offset NULL  COMMIT TRAN

The column now has these contents (only):

 This is sample text data for Binnet & Hardley, publisher 0877 in the pubs database. Binnet & Hardley is located in Washington,  D.C.

EXAMPLE 3

With the small amount of text here, it wouldn't be bad to simply rewrite the column with new text. But if this were a large text column (you could literally store the contents of War and Peace in a single text column), it would be extremely inefficient to rewrite the entire column just to make a minor change. In this example, we'll add the text Kimberly Tripp is president of the company. to the current contents. We'll use UPDATETEXT to append text to the column:

 DECLARE @mytextptr varbinary(16) BEGIN TRAN SELECT @mytextptr=TEXTPTR(pr_info) FROM pub_info (UPDLOCK)     WHERE pub_id='0877'  IF @mytextptr IS NOT NULL     UPDATETEXT pub_info.pr_info @mytextptr NULL NULL          'Kimberly Tripp is president of the company.'  COMMIT TRAN

And the result:

 This is sample text data for Binnet & Hardley, publisher 0877 in the pubs database. Binnet & Hardley is located in Washington,  D.C.Kimberly Tripp is president of the company.

That worked exactly as specified, but you might wish we had skipped a line and then included a tab before adding the new sentence. We can easily add both a vertical and a horizontal tab, as you can see in the next example.

EXAMPLE 4

Use UPDATETEXT to insert some characters:

 DECLARE @mytextptr varbinary(16), @pat_offset int,     @mystring char(2) BEGIN TRAN SELECT  @mystring=char(13) + CHAR(9),   -- Vertical tab is code point 13.                                  -- Tab is 9. @pat_offset=PATINDEX('%Kim%', pr_info)-1, @mytextptr=TEXTPTR(pr_info) FROM pub_info (UPDLOCK)     WHERE pub_id='0877'  IF @mytextptr IS NOT NULL AND @pat_offset >= 0     UPDATETEXT pub_info.pr_info @mytextptr @pat_offset 0          @mystring COMMIT TRAN

And the result:

 This is sample text data for Binnet & Hardley, publisher 0877 in  the pubs database. Binnet & Hardley is located in Washington, D.C.     Kimberly Tripp is president of the company.

EXAMPLE 5

Oops! We just learned that the president has gotten married and changed her last name from Tripp to Tripp-Simonnet. We need to fix that. Use UPDATETEXT for search and replace:

 -- UPDATETEXT for Search and Replace DECLARE @mytextptr varbinary(16), @pat_offset int,     @oldstring varchar(255), @newstring varchar(255),     @sizeold int BEGIN TRAN SELECT @oldstring='Tripp', @newstring='Tripp-Simonnet' SELECT @sizeold=DATALENGTH(@oldstring), @pat_offset=PATINDEX('%' + @oldstring + '%', pr_info)-1, @mytextptr=TEXTPTR(pr_info)  FROM pub_info (UPDLOCK) WHERE pub_id='0877'  IF @mytextptr IS NOT NULL AND @pat_offset >= 0     UPDATETEXT pub_info.pr_info @mytextptr @pat_offset @sizeold          @newstring COMMIT TRAN

And the result:

 This is sample text data for Binnet & Hardley, publisher 0877 in the pubs database. Binnet & Hardley is located in Washington, D.C.     Kimberly Tripp-Simonnet is president of the company.

We used variables above and figured lengths and offsets using SQL Server's built-in functions. By doing this, we ensured that the procedure is pretty generic and that it can deal with changing the string to another string that is either longer or shorter than the original.

EXAMPLE 6

Suppose we want to append the contents of the text for publisher Scootney (pub_id 9952) to the text for Binnet (pub_id 0877). If we didn't have this option in UPDATETEXT, it would be necessary to bring all that text back to the client application, append it, and then send it back to the server. Over a slow network such as the Internet, this would not be practical if the text columns were large. But with UPDATETEXT, the whole operation is done on the server. In this example, notice that we protect the text pointer for the target with UPDLOCK, since we'll be updating that row. We can use HOLDLOCK for the source row since we're reading it only and we want to ensure that it hasn't changed.

We'll use UPDATETEXT to copy and append one text column to another:

 -- UPDATETEXT to copy and append another text column DECLARE @target_textptr varbinary(16),     @source_textptr varbinary(16) BEGIN TRAN SELECT @target_textptr=TEXTPTR(pr_info) FROM pub_info (UPDLOCK) WHERE pub_id='0877'  SELECT @source_textptr=TEXTPTR(pr_info) FROM pub_info (HOLDLOCK) WHERE pub_id='9952'  IF @target_textptr IS NOT NULL AND @source_textptr IS NOT NULL     UPDATETEXT pub_info.pr_info @target_textptr NULL NULL          pub_info.pr_info @source_textptr COMMIT TRAN

Text-in-Row Data

As discussed in Chapter 6, you can enable a table to store LOB data in the row itself. Most of the examples I've shown, as well as the details regarding the READTEXT, WRITETEXT, and UPDATEXT, apply equally to LOB data stored in a row or out of a row. However, the previous discussion of locking the text pointer applies only to LOB data that is stored outside of the data row. For tables that have text in row enabled, the locking issues with the text pointer are slightly different. When a text pointer is acquired for text data in a table with text in row enabled, the data row is locked with a share lock if the isolation level of the transaction is higher than READ UNCOMMITTED, and the database is not in read-only or single-user mode. The lock ensures that nobody else can modify or delete the row while you have a text pointer on a text value from that row. This lock is released when the text pointer becomes invalid. Since at the end of the transaction all text pointers are invalid, the locks are all released. You'll see a few more details about what kind of operations can invalidate a text pointer for in-row LOB data later in this section.

To illustrate the automatic locking of the row with in-row text data, let's see some basic examples. The share locks on rows with in-row text are held until the end of a transaction, and this means that we cannot get a text pointer for in-row text unless we are in a user defined transaction. This example shows an attempt to acquire a text pointer without an explicit transaction:

 CREATE TABLE t1 (c1 int, c2 text) EXEC sp_tableoption 't1', 'text in row', 'ON' INSERT INTO t1 SELECT 1, 'hello there' GO DECLARE @ptrval varbinary(16) SELECT @ptrval = TEXTPTR(c2) FROM t1 WHERE c1 = 1 READTEXT t1.c2 @ptrval 0 1 RESULTS: Server: Msg 7101, Level 16, State 1, Line 3 You cannot use a text pointer for a table with option 'text in row'  set to ON. Server: Msg 7133, Level 16, State 1, Line 4 NULL textptr (text, ntext, or image pointer) passed to  READTEXT function.

However, attempting to do the same text pointer assignment and READTEXT inside a user-defined transaction succeeds:

 BEGIN TRAN DECLARE @ptrval varbinary(16) SELECT @ptrval = TEXTPTR(c2) FROM t1 WHERE c1 = 1 READTEXT t1.c2 @ptrval 0 1 COMMIT TRAN

To see that the shared locks are taken and held automatically, we can execute the same code shown earlier for the text data in the pub_info table, which was not stored in the row itself. If you actually ran that earlier example, the row with pub_id 9999 is now gone, so I'll change the code to look for a row with pub_id 1756. I also must first enable the text in row option for the pub_info table and then use an explicit transaction.

 -- WRITETEXT with text-in-row data EXEC sp_tableoption pub_info, 'text in row', 'ON' GO BEGIN TRAN DECLARE @mytextptr varbinary(16) SELECT @mytextptr=TEXTPTR(pr_info)      FROM pub_info WHERE pub_id='1756'  WAITFOR DELAY '00:00:20' -- During the delay, you can attempt to delete -- this row in another Query Analyzer connection: --        DELETE FROM pub_info WHERE pub_id='1756'  IF @mytextptr IS NOT NULL     WRITETEXT pub_info.pr_info @mytextptr 'Hello Again' COMMIT TRAN

If the isolation level of the transaction is READ UNCOMMITTED, any text pointer obtained from a table with text in row enabled is considered read-only and cannot be used to update LOB values. They can be used only to read the data. The example below creates a table with in-row text data and then sets the transaction isolation level to READ UNCOMMITTED. You can see that the READTEXT succeeds but the WRITETEXT does not.

 CREATE TABLE t1 (c1 int, c2 text) EXEC sp_tableoption 't1', 'text in row', 'ON' INSERT INTO t1 SELECT 1, 'hello there' GO SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED GO BEGIN TRAN DECLARE @ptr varbinary(16) SELECT @ptr=textptr(c2) FROM t1 WHERE c1=1 READTEXT t1.c2 @ptr 0 5 COMMIT TRAN GO BEGIN TRAN DECLARE @ptr varbinary(16) SELECT @ptr=textptr(c2) FROM t1 WHERE c1=1 WRITETEXT t1.c2 @ptr 'xx' COMMIT TRAN GO

This is the error message returned by the WRITETEXT operation:

 Server: Msg 7106, Level 16, State 1, Line 5 You cannot update a blob with a read-only text pointer The statement has been terminated.

Note that the message uses the term BLOB, which should refer just to image fields; in my discussion, I've been using the term LOB, which refers to text and ntext as well as image.

Invalidating Text Pointers As you saw earlier, if a table was not enabled for in-row text storage, no row lock is held for the text pointer. The only way you saw to lock the data row was to raise the isolation level to REPEATABLE READ. The drawback of using the isolation level to prolong the lock is that there is no way to choose to release the lock earlier. When working with LOB data in a table that has text in row enabled, you do have control over when the lock is released because it will be released whenever the text pointer becomes invalidated. I mentioned above that the end of a transaction will invalidate a text pointer, but that is not the only way.

Text pointers for in-row text data are also invalidated when certain DDL operations are performed on the table. Here are the DDL statements that invalidate your text pointers:

CREATE CLUSTERED INDEX
DROP CLUSTERED INDEX
ALTER TABLE
DROP TABLE
TRUNCATE TABLE
sp_tableoption with the text in row option
sp_indexoption

Finally, you can invalidate a text pointer for in-row text data using a special stored procedure, sp_invalidate_textptr, which takes an argument of a text pointer.

You might need to force invalidation of text pointers because SQL Server sets an upper limit on the number of pointers for in-row text data. A limit is needed because SQL Server maintains internal memory for each valid text pointer, and we want to limit the amount of memory that can be used in this way. You can have up to 1024 valid text pointers per transaction per database. If your transaction spans two databases, you can have 1024 text pointers in each database. Within a transaction, obtaining a text pointer twice on the same LOB data gives you the same text pointer structure and will not count against your limit of 1024. Note that there is no limit on the number of pointers you can have for LOB data that is not stored in the row.