Internal Storage | Inside Microsoft SQL Server 2000

This section covers system catalogs and the internal data storage of tables. Although you can use SQL Server effectively without understanding the internals, understanding the details of how SQL Server stores data will help you develop efficient applications. (If you don't need this in-depth information, you can skip this discussion and proceed to the section on constraints.)

When you create a table, one or more rows are inserted into a number of system catalogs to manage that table. At a minimum, rows are added to the sysobjects, sysindexes, and syscolumns system catalogs (tables). When you define the new table with one or more foreign key constraints, rows are added to the sysreferences system table.

For every table created, a single row that contains—among other things—the name, object ID, and owner of the new table is added to the sysobjects table. The syscolumns table gains one row for each column in the new table, and each row will contain information such as the column name, datatype, and length. Each column receives a column ID, which directly corresponds to the order in which you specified the columns when you created the table. That is, the first column listed in the CREATE TABLE statement will have a column ID of 1, the second column will have a column ID of 2, and so on. Figure 6-3 shows the rows added to the sysobjects, and syscolumns system tables when you create a table. (Not all columns are shown for each table.)

click to view at full size.

Figure 6-3. Catalog information stored after a table is created.

Notice in the syscolumns output in the figure that the xoffset column contains negative numbers in some rows. Any column that contains variable-length data will have a negative xoffset value in syscolumns. Successive variable-length columns in the CREATE TABLE statement are given decreasing xoffset values (-1, -2, -3, and so on). You can see in Figure 6-3 that the employee last name (emp_lname) is the first variable-length column in the table.

The sysindexes table will have at least one row that will eventually contain pointers to the storage space that the new table uses and information about the table's size. However, when the table is first created, no space is allocated to it, so the columns in sysindexes that indicate page addresses and storage space reserved will all be 0. Note that this is a change from all previous versions of SQL Server, in which at least two pages were allocated to the table as soon as it was created—one page for the data, one root page for each index, one Index Allocation Map (IAM) page for each index, and one IAM page for the table itself.

In SQL Server 2000, no space is allocated until the first row is inserted. If the table is defined without any primary key or unique constraints, sysindexes will have only one row, with an indid (index ID) value of 0, which means that the table is a heap. If a primary key or unique constraint is defined on the table, specifying that it is to be supported by a clustered index, the indid value for the table's row in sysindexes will be 1. Any additional constraints supported by nonclustered indexes will require that additional rows be added to sysindexes. Each nonclustered index will have a row with an indid value between 2 and 250. In addition, if there is text or image data in the row, there will be a row in sysindexes for the text or image data. The name for that row will be the name of the table prefixed by the character t, and the indid will always be 255. Indid values of 251 through 254 are reserved for future needs.

Data Pages

Data pages are the structures that contain all data that is not text or image. As with all other types of pages in SQL Server, data pages have a fixed size of 8 KB, or 8192 bytes. They consist of three major components: the page header, data rows, and the row offset array, as shown in Figure 6-4.

Figure 6-4. The structure of a data page.

Page Header

As you can see in Figure 6-4, the page header occupies the first 96 bytes of each data page (leaving 8096 bytes for data and row offsets). Table 6-5 shows the information contained in the page header.

Data Rows

Following the page header is the area in which the table's actual data rows are stored. The maximum size of a single data row is 8060 bytes. A data row can't span multiple pages (except for text or image columns, which can be stored in their own separate pages). The number of rows stored on a given page will vary depending on the structure of the table and on the data being stored. A table that has all fixed-length columns will always store the same number of rows per page; variable-length rows will store as many rows as will fit based on the actual length of the data entered. Keeping row length shorter allows more rows to fit on a page, thus reducing I/O and improving the cache-hit ratio.

Table 6-5. Information contained in the page header.

Field	What It Contains
pageID	File number and page number of this page in the database
nextPage	File number and page number of the next page if this page is in a page chain
prevPage	File number and page number of the previous page if this page is in a page chain
objID	ID of the object to which this page belongs
lsn	Log sequence number (LSN) value used for changes and updates to this page
slotCnt	Total number of slots (rows) used on this page
level	Level of this page in an index (always 0 for leaf pages)
indexId	Index ID of this page (always 0 for data pages)
freeData	Byte offset of the first free space on this page
pminlen	Number of bytes in fixed-length portion of rows
freeCnt	Number of free bytes on page
reservedCnt	Number of bytes reserved by all transactions
xactreserved	Number of bytes reserved by the most recently started transaction
tornBits	1 bit per sector for detecting torn page writes (discussed in Chapter 5)
flagBits	2-byte bitmap that contains additional information about the page

Row Offset Array

The row offset array is a block of 2-byte entries, each of which indicates the offset on the page on which the corresponding data row begins. Every row has a 2-byte entry in this array. Although these bytes aren't stored in the row with the data, they do affect the number of rows that will fit on a page.

The row offset array indicates the logical order of rows on a page. For example, if a table has a clustered index, SQL Server will store the rows in the order of the clustered index key. This doesn't mean that the rows will be physically stored on the page in the order of the clustered index key. Rather, slot 0 in the offset array will refer to the first row in the order, slot 1 will refer to the second row, and so forth. As we'll see shortly when we examine an actual page, the offset of these rows can be anywhere on the page.

There's no internal global row number for every row in a table. SQL Server uses the combination of file number, page number, and slot number on the page to uniquely identify each row in a table.

Examining Data Pages

You can view the contents of a data page by using the DBCC PAGE statement, which allows you to view the page header, data rows, and row offset table for any given page in a database. Only a system administrator can use DBCC PAGE. But because you typically won't need to view the content of a data page, you won't find much about DBCC PAGE in the SQL Server documentation. Nevertheless, in case you want to use it, here's the syntax:

 DBCC PAGE ({dbid | dbname}, filenum, pagenum[, printopt])

The DBCC PAGE command includes the parameters shown in Table 6-6. Figure 6-5 shows sample output from DBCC PAGE. Note that DBCC TRACEON(3604) instructs SQL Server to return the results to the client instead of to the error log, as is the default for many of the DBCC commands that deal with internals issues.

Table 6-6. Parameters of the DBCC PAGE command.

Parameter	Description
dbid	ID of the database containing the page
dbname	Name of the database containing the page
filenum	File number containing the page
pagenum	Page number within the file
printopt	Optional print option; takes one of these values: 0: Default; prints the buffer header and page header 1: Prints the buffer header, page header, each row separately, and the row offset table 2: Prints the buffer and page headers, the page as a whole, and the offset table 3: Prints the buffer header, page header, each row separately, and the row offset table; each row is followed by each of its column values listed separately

click to view at full size.

Figure 6-5. Sample output from DBCC PAGE.

As you can see, the output from DBCC PAGE is divided into four main sections: BUFFER, PAGE HEADER, DATA, and OFFSET TABLE (really the Offset Array). The BUFFER section shows information about the buffer for the given page. (A buffer in this context is an in-memory structure that manages a page.)

The PAGE HEADER section in Figure 6-5 displays the data for all the header fields on the page. (Table 6-5 shows the meaning of most of these fields.) The DATA section contains information for each row. For each row, DBCC PAGE indicates the slot position of the row and the offset of the row on the page. The page data is then divided into three parts. The left column indicates the byte position within the row where the displayed data occurs. The next four columns contain the actual data stored on the page, displayed in hexadecimal. The right column contains a character representation of the data. Only character data is readable in this column, although some of the other data might be displayed.

The OFFSET TABLE section shows the contents of the row offset array at the end of the page. In the figure, you can see that this page contains 23 rows, with the first row (indicated by slot 0) beginning at offset 1585 (0x631). The first row physically stored on the page is actually row 6, with an offset in the row offset array of 96. DBCC PAGE displays the rows in slot number order, even though, as you can see by the offset of each of the slots, that isn't the order in which the rows physically exist on the page.

The Structure of Data Rows

A table's data rows have the general structure shown in Figure 6-6. The data for all fixed-length columns is stored first, followed by the data for all variable-length columns. Table 6-7 shows the information stored in each row.

Status Bits A contains a bitmap indicating properties of the row. The bits have the following meaning:

Bit 0 Versioning information; in SQL Server 2000, it's always 0.
Bits 1 through 3 Taken as a 3-bit value, 0 indicates a primary record, 1 indicates a forwarded record, 2 indicates a forwarded stub, 3 indicates an index record, 4 indicates a blob fragment, 5 indicates a ghost index record, and 6 indicates a ghost data record. (I'll discuss forwarding and ghost records in Chapter 9.)
Bit 4 Indicates that a NULL bitmap exists; in SQL Server 2000, a NULL bitmap is always present, even if no NULLs are allowed in any column.
Bit 5 Indicates that variable-length columns exist in the row.
Bits 6 and 7 Not used in SQL Server 2000.

click to view at full size.

Figure 6-6. The structure of data rows.

Table 6-7. Information stored in a table's data rows.

Information	Mnemonic	Size
Status Bits A	TagA	1 byte
Status Bits B (not used in SQL Server 2000)	TagB	1 byte
Fixed-length size	Fsize	2 bytes
Fixed-length data	Fdata	Fsize -4
Number of columns	Ncol	2 bytes
NULL bitmap (1 bit for each column in table; a 1 indicates that the corresponding column is NULL)	Nullbits	Ceiling (Ncol / 8)
Number of variable-length columns	VarCount	2 bytes
Variable column offset array	VarOffset	2 * VarCount
Variable-length data	VarData	VarOff[VarCount] - (Fsize + 4 + Ceiling (Ncol / 8) + 2 * VarCount)

Within each block of fixed-length or variable-length data, the data is stored in the column order in which the table was created. For example, suppose a table is created with the following statement:

 CREATE TABLE Test1 ( Col1 int NOT NULL, Col2 char(25) NOT NULL, Col3 varchar(60) NULL, Col4 money NOT NULL, Col5 varchar(20) NOT NULL )

The fixed-length data portion of this row would contain the data for Col1, followed by the data for Col2, followed by the data for Col4. The variable-length data portion would contain the data for Col3, followed by the data for Col5. For rows that contain only fixed-length data, the following is true:

The first hexadecimal digit of the first byte of the data row will be 1, indicating that no variable-length columns exist. (The first hexadecimal digit is comprised of bits 4 through 7; bits 6 and 7 are always 0, and if no variable-length columns exist, bit 5 is also 0. Bit 4 is always 1, so the value of the four bits is displayed as 1.)
The data row ends after the NULL bitmap, which follows the fixed-length data. (That is, the shaded portion shown in Figure 6-6 won't exist in rows with only fixed-length data.)
The total length of every data row will be the same.

Column Offset Arrays

A data row that has all fixed-length columns has no variable column count or column offset array. A data row that has variable-length columns has a column offset array in the data row with a 2-byte entry for each variable-length column, indicating the position within the row where each column ends. (The terms offset and position aren't exactly interchangeable. Offset is 0-based, and position is 1-based. A byte at an offset of 7 is in the eighth byte position in the row.)

Storage of Fixed-Length and Variable-Length Rows

The two examples that follow illustrate how fixed-length and variable-length data rows are stored. First, the simpler case of an all fixed-length row:

 CREATE TABLE Fixed ( Col1 char(5) NOT NULL, Col2 int NOT NULL, Col3 char(3) NULL, Col4 char(6) NOT NULL, Col5 float NOT NULL )

When this table is created, the following row (or one very much like it) is inserted into the sysindexes system table:

 id name indid first minlen ----------- ----- ------ -------------- ------ 1797581442 Fixed 0 0xC70000000100 30

And these rows are inserted into the syscolumns system table:

 name colid xtype length xoffset ---- ------ ----- ------ ------- Col1 1 175 5 4 Col2 2 56 4 9 Col3 3 175 3 13 Col4 4 175 6 16 Col5 5 62 8 22

For tables containing only fixed-length columns, the minlen value in sysindexes will be equal to the sum of the column lengths (from syscolumns.length), plus 4 bytes. It won't include the 2 bytes for the number of columns or the bytes for the NULL bitmap.

To look at a specific data row in this table, you first insert a new row:

 INSERT Fixed VALUES ('ABCDE', 123, NULL, 'CCCC', 4567.8)

Figure 6-7 shows this row's actual contents on the data page. To run the DBCC PAGE command, I had to take the value of first from the syindexes output above (0xC70000000100) and convert it to a file and page address. In hexadecimal notation, each set of two hexadecimal digits represents a byte. I first had to swap the bytes to get 00 01 00 00 00 C7. The first two groups represent the 2-byte file number, and the last four groups represent the page number. So the file is 0x0001, which is 1, and the page number is 0x000000C7, which is 199 in decimal.

click to view at full size.

Figure 6-7. A data row containing all fixed-length columns.

NOTE
The sysindexes table contains three columns that represent page numbers within a database: first, firstIAM, and root. Each is stored in a byte-swapped format. To convert to a decimal file number and page number, you must first swap the bytes and then convert the values from hexadecimal to decimal. You could use the Windows calculator to do the conversion, but a script has been provided on the companion CD to create a stored procedure called ConvertPageNums, which converts all these columns in the sysindexes table for you.

WARNING
SQL Server does not guarantee that the sysindexes.first column will also indicate the first page of a table. I've found that first is reliable until you begin to perform deletes and updates on the data in the table.

Reading the output takes a bit of practice. DBCC PAGE displays the data rows in groups of 4 bytes at a time. Within each group of four, the bytes are listed in reverse order. So the first group of four bytes is byte 3, byte 2, byte 1, and byte 0. The shaded area in Figure 6-7 has been expanded to show the bytes in the actual byte-number sequence.

The first byte is Status Bits A, and its value (0x10) indicates that only bit 3 is on, so the row has no variable-length columns. The second byte in the row remains unused. The third and fourth bytes (1e00) indicate the length of the fixed-length fields, which is also the column offset in which the Ncol value can be found. (The byte-swapped value is 0x001e, which translates to 30.) You can identify the data in the row for each column simply by using the offset value in the syscolumns table: the data for column Col1 begins at offset 4, the data for column Col2 begins at offset 9, and so on. As an int, the data in Col2 (7b000000) must be byte-swapped to give us the value 0x0000007b, which is equivalent to 123 in decimal.

Note that the 3 bytes of data for Col3 are all zeros, representing an actual NULL in the column. Because the row has no variable-length columns, the row ends 3 bytes after the data for column Col5. The 2 bytes starting right after the fixed-length data at offset 30 (0500, which is byte-swapped to yield 0x0005) indicate that five columns are in the row. The last byte is the NULL bitmap. The value of 4 means that only the third bit is on, because in our row, the third column was indeed a NULL.

WARNING
Fixed-length columns always use the full number of bytes defined for the table, even if the column holds a NULL value. If you're just upgrading from version 6.5 of SQL Server, this might take you by surprise. I know of at least one large database at a manufacturing company that expanded to three or four times its size when it was upgraded from version 6.5. This is because they had fields in many tables that were defined as char(255) and allowed NULLs. Some tables had 10 or more such columns, and the vast majority of the data was actually NULL. In version 6.5, the only space that these NULL columns needed was the 2 bytes in the variable column offset array; the actual NULL data took no space at all. In SQL Server 2000, every one of these columns needed a full 255 bytes of storage and the database ballooned in size!

Here's the somewhat more complex case of a table with variable-length data. Each row has three varchar columns:

 CREATE TABLE Variable ( Col1 char(3) NOT NULL, Col2 varchar(250) NOT NULL, Col3 varchar(5) NULL, Col4 varchar(20) NOT NULL, Col5 smallint NULL )

When you create this table, the following row is inserted into the sysindexes system table:

 id name indid first minlen ----------- -------- ------ -------------- ------ 1333579789 Variable 0 0xC90000000100 9

And these rows are inserted into the syscolumns system table:

 name colid xtype length xoffset ---- ------ ----- ------ ------- Col1 1 175 3 4 Col2 2 167 250 -1 Col3 3 167 5 -2 Col4 4 167 20 -3 Col5 5 52 2 7

Now you insert a row into the table:

 INSERT Variable VALUES ('AAA', REPLICATE('X', 250), NULL, 'ABC', 123)

The REPLICATE function is used here to simplify populating a column; this function builds a string of 250 Xs to be inserted into Col2.

As shown in Figure 6-8, the data for the fixed-length columns is located using the offset value in syscolumns. In this case, Col1 begins at offset 4, and Col5 begins at offset 7.

To find the variable-length columns, you first locate the column offset array in the row. Right after the 2-byte field indicating the total number of columns (0x0500) and the NULL bitmap with the value 0x04, a 2-byte field exists with the value 0x0300 (or 3, decimal) indicating that three variable-length fields exist. Next comes the column offset array. Three 2-byte values indicate the ending position of each of the three variable-length columns: 0x0e01 is byte-swapped to 0x010e, so the first variable byte column ends at position 270. The next 2-byte offset is also 0x0e01, so that column has no length and has nothing stored in the variable data area. Unlike with fixed-length fields, if a variable-length field has a NULL value, it takes no room in the data row. SQL Server distinguishes between a varchar containing NULL and an empty string by determining whether the bit for the field is 0 or 1 in the NULL bitmap. The third 2-byte offset is 0x1101, which, when byte-swapped, gives us 0x0111. This means that the row ends at position 273 (and is a total of 273 bytes in length).

The total storage space needed for a row depends on a number of factors. Variable-length fields add additional overhead to a row, and their actual size is probably unpredictable. Even for fixed-length fields, the number of bytes of overhead can change depending on the number of columns in the table. In the earlier example illustrated in Figure 6-1, I mentioned that 10 bytes of overhead will exist if a row contains all fixed-length columns. For that row, 10 is the correct number. The size of the NULL bitmap needs to be long enough to store a bit for every column in the row. In the Figure 6-1 example, the table has 11 columns, so the NULL bitmap needs to be 2 bytes. In the examples illustrated by Figures 6-7 and 6-8, the table has only 5 columns, so the NULL bitmaps need only a single byte. Don't forget that the total row overhead also needs to include the 2 bytes for each row in the offset table at the bottom of the page.

click to view at full size.

Figure 6-8. A data row with variable-length columns

Page Linkage

SQL Server 2000 doesn't connect the individual data pages of a table in a doubly linked list unless the table has a clustered index. Pages at each level of an index are linked together, and since the data is considered the leaf level of a clustered index, SQL Server does maintain the linkage. However, for a heap, there is no such linked list connecting the pages to each other. The only way that SQL Server determines which pages belong to a table is by inspecting the IAMs for the table.

If the table has a clustered index, you can use the M_nextPage and M_prevPage values in the page header information to determine the ordering of pages in the list. Alternatively, you can use the DBCC EXTENTINFO command to get a list of all the extents that belong to an object. This example uses the Orders table in the Northwind database:

 DBCC EXTENTINFO ('Northwind', 'Orders', 1)

The last argument indicates only extents for index 1, which is the clustered index (and includes the data). Here is the output:

 file_id page_id pg_alloc ext_size obj_id ----------- ----------- ----------- ----------- ----------- 1 203 1 1 21575115 1 205 1 1 21575115 1 230 1 1 21575115 1 231 1 1 21575115 1 232 1 1 21575115 1 233 1 1 21575115 1 234 1 1 21575115 1 235 1 1 21575115 1 240 8 8 21575115 1 264 5 8 21575115 index_id pfs_bytes -------- -------- 1 0x60000000… 1 0x60000000… 1 0x60000000… 1 0x60000000… 1 0x60000000… 1 0x60000000… 1 0x60000000… 1 0x60000000… 1 0x40404040… 1 0x40404040…

Notice that the first eight rows indicate an extent size (ext_size) of 1. As discussed in Chapter 5, the first eight pages of a table are allocated from mixed extents. Only after the table has reached eight pages does SQL Server allocate uniform extents of eight pages each. The last two rows in the table show this situation, and the page number (page_id) column gives the page number of the first page of the extent. Note that the last extent (starting on page 264) has used only five of its pages so far.

Text and Image Data

As mentioned earlier, if a table contains LOB data (text, ntext, or image data), the actual data might not be stored on the data pages with the rest of the data for a row. In versions prior to SQL Server 2000, SQL Server always stored a 16-byte pointer in the data row that indicated where the actual data could be found. In SQL Server 2000, the default behavior is identical to that in SQL Server 7, but you have the option of changing the storage mechanism by setting a table option to allow LOB data to be stored in the data row itself. So, first I'll tell you about the default storage of LOB data.

LOB Data Stored Outside the Table

As stated earlier, by default no LOB data is stored in the data row. Instead, the data row contains only a 16-byte pointer to a page where the data can be found. These pages are 8 KB in size, like any other page in SQL Server, and individual text, ntext, and image pages aren't limited to holding data for only one occurrence of a text, ntext, or image column. A text, ntext, or image page can hold data from multiple columns and from multiple rows; the page can even have a mix of text, ntext, and image data. However, one text or image page can hold only text or image data from a single table.

The collection of 8-KB pages that make up a LOB field aren't necessarily located next to each other. The pages are logically organized in a B-tree structure so that operations starting in the middle of the string are very efficient. SQL Server 2000 can quickly navigate the tree; earlier versions of SQL Server had to scan through the page chain. The structure of the B-tree differs slightly depending on whether the amount of data is less than or more than 32 KB. (See Figure 6-9 for the general structure.) I'll discuss B-trees in more detail when I talk about index internals in Chapter 8.

click to view at full size.

Figure 6-9. A text column pointing to a B-tree that contains the blocks of data.

If the amount of data is less than 32 KB, the text pointer in the data row points to an 84-byte text root structure. This forms the root node of the B-tree structure. The root node points to the blocks of text, ntext, or image data. While the data for LOB columns is arranged logically in a B-tree, physically both the root node and the individual blocks of data are spread throughout LOB pages for the table. They're placed wherever space is available. The size of each block of data is determined by the size written by an application. Small blocks of data are combined to fill a page. If the amount of data is less than 64 bytes, it's all stored in the root structure.

If the amount of data for one occurrence of a LOB column exceeds 32 KB, SQL Server starts building intermediate nodes between the data blocks and the root node. The root structure and the data blocks are interleaved throughout the text and image pages in the same manner as described earlier. The intermediate nodes, however, are stored in pages that aren't shared between occurrences of text or image columns. Each page storing intermediate nodes contains only intermediate nodes for one text or image column in one data row.

LOB Data Stored in the Data Row

If you store all your LOB data outside of your data pages, every time you access that data SQL Server will need to perform additional page reads. If you need the LOB data frequently, and if some of your data is small enough to fit within a SQL Server data page, you might notice a performance improvement by allowing some of the LOB data to be stored in the data row. You can enable a table option called text in row for a particular table by either setting the option 'ON' or by specifying a maximum number of bytes to be stored in the data row. The following command enables up to 500 bytes of LOB data to be stored on the data pages of the Employees table in the Northwind database:

 sp_tableoption Employees, 'text in row', 500

Instead of a number, you can specify the option 'ON' (including the quotes), which sets the maximum size to 256 bytes. Note that in the employees table, the only LOB column is notes, which is defined as ntext. This means that 500 bytes allows only 250 characters. Once you enable the text in row option, you never get just the 16-byte pointer for the LOB data in the row, as is the case when the option is not on. If the data in the LOB field is more than the specified maximum, the row will hold the root structure containing pointers to the separate chunks of LOB data. The minimum size of a root structure is 24 bytes, and the possible range of values that text in row can be set to is 24 to 7000 bytes.

To disable the text in row option, you can set the value to either 'OFF' or 0. To determine whether a table has the text in row property enabled, you can use the OBJECTPROPERTY function, as shown here:

 SELECT OBJECTPROPERTY (object_id('Employees'), 'TableTextInRowLimit')

This function returns the maximum number of bytes allowed for storing LOBs in a data row. If a 0 is returned, the text in row option is disabled.

Let's create a table very similar to the one we created to look at row structures, but we'll change the varchar(250) column to the text datatype. We'll use almost the same insert statement to insert one row into the table.

 CREATE TABLE HasText ( Col1 char(3) NOT NULL, Col2 text NOT NULL, Col3 varchar(5) NULL, Col4 varchar(20) NOT NULL, Col5 smallint NULL ) INSERT HasText VALUES ('AAA', REPLICATE('X', 250), NULL, 'ABC', 123)

Now let's find the necessary information for this table from the sysindexes table and note the results:

 SELECT indid, first, name FROM sysindexes WHERE id = object_id('HasText') Results: indid first name ------ -------------- -------------- 0 0xBF0000000100 HasText 0xBD0000000100 tHasText

Even though there are no indexes on this table, there are two rows in sysindexes. Sysindexes keeps track of all objects that take up space, and since a LOB field takes space separate from the table, it has its own row in sysindexes. The index ID for LOB data is always 255, and the "name" of the structure is the name of the table prefixed by the letter t. The first page of the table is page 0xBF, which is 191 decimal, and the LOB data is at 0xBD, which is 189 decimal. Figure 6-10 shows the output from running DBCC PAGE. The row structure is very similar to the row structure in Figure 6-x, except for the text field itself. Bytes 20 to 35 are the 16-byte text pointer, and you can see the value bd (or 0xBD) at offset 31. This is the pointer to the page where the text data starts, and as we saw from the row in sysindexes, that is indeed the location of the LOB data for this table.

click to view at full size.

Figure 6-10. A row containing a text pointer.

Now let's enable text data in the row, for up to 500 bytes:

 EXEC sp_tableoption HasText, 'text in row', 500

Enabling this option does not force the text data to be moved into the row. We have to update the text value to actually force the data movement:

 UPDATE HasText SET col2 = REPLICATE('Z', 250)

Figure 6-11 shows the output from running DBCC PAGE after the text has been moved into the row. Note that the row is practically identical to the row containing varchar data, as shown in Figure 6-8.

click to view at full size.

Figure 6-11. A row containing text data.

Finally, let's change the maximum size of in-row text data to only 50 bytes. This will force the text data to be moved off the page, and the root of the text structure will be stored in the row. As long as the text in row option is not OFF (or 0), SQL Server will never store the simple 16-byte LOB pointer in the row. It will store either the LOB data itself, if it fits, or the root structure for the LOB data B-tree.

 EXEC sp_tableoption HasText, 'text in row', 50

Figure 6-12 shows the output from running DBCC PAGE after the text pointer root has been stored on the page. A root structure is at least 24 bytes long (which is why 24 is the minimum size for setting the text-in-row limit.) Again, you can see the page number 0xBD at offset 40, indicating the location of the actual text data. Other information in the root includes:

Bytes 0 through 1: The type of column; 1 indicates a LOB root
Byte 2: Level in the B-tree
Byte 3: Unused
Bytes 4 through 7: A value used by optimistic concurrency control for cursors that increases every time a LOB is updated
Bytes 8 through 11: A random value used by DBCC CHECKTABLE that remains unchanged during the lifetime of each LOB
Bytes 12 through 23 and each succeeding group of 12 bytes in the column: Links to LOB data on a separate page

click to view at full size.

Figure 6-12. A row containing a text pointer root.

As indicated above, when you first enable text in row, no data movement occurs until text data is actually updated. The same is true if the limit is increased—that is, even if the new limit is large enough to accommodate LOB data that had been stored outside the row, the LOB data will not be moved onto the row automatically. You must update the actual LOB data first.

However, if the text in row option is turned OFF or the limit is reduced, all LOB data must adjust to comply with the new specifications. If the option is set to OFF, all LOB data is moved off the rows and replaced by a 16-byte text pointer. If the option limit is decreased, all LOB data larger than the new limit is moved off the rows and replaced by a LOB root pointer. You should be aware that SQL Server will log all movement of LOB data, which means that reducing the limit of or turning OFF the text in row option can be a very time-consuming operation for a large table.

Another point to keep in mind is that just because the amount of LOB data is less than the limit, that does not always mean the data will be stored in the row. You're still limited to a maximum row size of 8060 bytes for a single row on a data page. In addition, if a variable-length column needs to grow, it might push LOB data off of the page so as not to exceed the 8060-byte limit. Growth of variable-length columns always has priority over storing LOB data in the row because the non-LOB data must be stored on the data page. If no variable-length char fields need to grow during an update operation, SQL Server will check for growth of in-row LOB data, in column offset order. If one LOB needs to grow, others might be pushed off the row.

sql_variant Datatype

The new sql_variant data type provides support for columns that contain any or all of SQL Server's base datatypes except LOBs, rowversion (timestamp), and the types that can't be defined for a column in a table, namely cursor and table. For instance, a column can contain a smallint value in some rows, a float value in others, and a char value in the remainder.

This feature was designed to support what appears to be semistructured data in products sitting above SQL Server. This semistructured data exists in conceptual tables that have a fixed number of columns of known datatypes and one or more optional columns whose type might not be known in advance. An example is e-mail messages in Microsoft Outlook and Microsoft Exchange. With the sql_variant type, you can pivot a conceptual table into a real, more compact table with sets of property-value pairs. Here is a graphical example. The conceptual table shown in Table 6-8 has three rows of data. The fixed columns are the ones that exist in every row. Each row can also have values for one or more of the three different properties, which have different datatypes.

Table 6-8. Conceptual table with an arbitrary number of columns and datatypes.

	fixed-columns	property-1	property-2	property-3
row-1	XXXXXX	value-11		value-13
row-2	YYYYYY	value-22
row-3	ZZZZZZ	value-31	value-32

This can be pivoted into Table 6-9, where the fixed columns are repeated for each different property that appears with those columns. The column called value can be represented by sql_variant data and be of a different datatype for each different property.

Table 6-9. Semi-structured data stored using the SQL_VARIANT datatype.

fixed-columns	property	value
XXXXXX	property-1	value-11
XXXXXX	property-3	value-13
YYYYYY	property-2	value-22
ZZZZZZ	property-1	value-31
ZZZZZZ	property-2	value-32

Internally, columns of type sql_variant are always considered variable-length. Their storage structure depends on the type of data, but the first byte of every sql_variant field always indicates the actual datatype being used in that row.

I'll create a simple table with a sql_variant column and insert a few rows into it so we can observe the structure of the sql_variant storage.

 USE pubs GO CREATE TABLE var (a int, b sql_variant) GO INSERT INTO var VALUES (1, 3) INSERT INTO var VALUES (2, 3000000000) INSERT INTO var VALUES (3, 'abc') INSERT INTO var VALUES (4, current_timestamp)

SQL Server decides what datatype to use in each row based on the data supplied. For example, the 3 in the first INSERT is assumed to be an integer. In the second INSERT, the 3000000000 is larger than the biggest possible integer, so SQL Server assumes a decimal with a precision of 10 and a scale of 0. (It could have used a bigint, but that would need more storage space.) We can now find the first page of the table and use DBCC PAGE to see its contents:

 SELECT first FROM sysindexes WHERE name = 'var' -- (I got a value of 0x8E0000000100 for first, so that can be -- converted to a value of 142 decimal.) GO DBCC TRACEON (3604) GO DBCC PAGE (5, 1, 142, 1)

Figure 6-13 shows the contents of the four rows. I won't go into the details of every single byte because most are the same as what we've already examined.

click to view at full size.

Figure 6-13. Rows containing sql_variant data.

The difference between the three rows starts at bytes 13-14, which indicate the position where the first variable-length column ends. Since there is only one variable-length column, this is also the length of the row. The sql_variant data begins at byte 15. Byte 15 is the code for the datatype. You can find the codes in the xtype column of the systypes system table. I've reproduced the relevant part of that table here:

 xtype name ----- ---------------- 34 image 35 text 36 uniqueidentifier 48 tinyint 52 smallint 56 int 58 smalldatetime 59 real 60 money 61 datetime 62 float 98 sql_variant 99 ntext 104 bit 106 decimal 108 numeric 122 smallmoney 127 bigint 165 varbinary 167 varchar 167 tid 167 id 173 binary 175 char 175 empid 189 timestamp 231 sysname 231 nvarchar 239 nchar

In our table, we have the datatypes 38 hex (which is 56 decimal, which is int), 6C hex (which is 108 decimal, which is numeric), A7 hex (which is 167 decimal, which is varchar) and 3D hex (which is 61 decimal, which is datetime). Following the byte for datatype is a byte representing the version of the sql_variant format, and that is always 1 in SQL Server 2000. Following the version, there can be one of the following four sets of bytes:

For numeric and decimal: 1 byte for the precision and 1 byte for the scale
For strings: 2 bytes for the maximum length and 4 bytes for the collation ID
For binary and varbinary: 2 bytes for the maximum length
For all other types: no extra bytes

These bytes are then followed by the actual data in the sql_variant column.

There are many other issues related to working with sql_variant data and comparing data of different types. I'll cover these when we look at data retrieval and manipulation in Chapter 7.