Index Space Requirements | Inside Microsoft SQL Server 2005: The Storage Engine (Solid Quality Learning)

I've shown you the structure and number of bytes in individual index rows, but you really need to be able to translate that into overall index size. In general, the size of an index is based on the size of the index keys, which determines how many index rows can fit on an index page and the number of rows in the table.

B-Tree Size

When we talk about index size, we usually mean the size of the index tree. The clustered index does include the data, but because you still have the data even if you drop the clustered index, we're usually just interested in how much additional space the nonleaf levels require. A clustered index's node levels typically take up very little space. You have one index row for each page of table data, so the number of index pages at the level above the leaf (level 1) is the bytes available per page divided by the index key row size, and this quotient divided into the number of data pages. You can do a similar computation to determine the pages at level 2. Once you reach a level with only one page, you can stop your computations because that level is the root.

For example, consider a table of 10,000 pages and 500,000 rows with a clustered key of a 5-byte fixed-length character. As you saw in Figure 7-5, the index key row size is 12 bytes, so we can fit 674 rows (8,096 bytes available on a page/12 bytes per row) on an index page. Because we need index rows to point to all 10,000 pages, we'll need 15 index pages (10,000/674) at level 1. Now, all these index pages need pointers at level 2, and because one page can easily store the 15 index rows needed to point to these 15 pages at level 1, level 2 is the root. If our 10,000-page table also has a 5-byte fixed-length character nonclustered index, the leaf-level rows (level 0) will be 11 bytes long because they will each contain a clustered index key (5 bytes), a nonclustered index key (5 bytes), and 1 byte for the status bits. The leaf level will contain every single nonclustered key value along with the corresponding clustered key values. An index page can hold 736 index rows at this leaf level. We need 500,000 index rows, one for each row of data, so we need 680 leaf pages. If this nonclustered index is unique, the index rows at the higher levels will be 12 bytes each, so 674 index rows will fit per page, and we'll need two pages at level 1, with level 2 as the one root page.

So how big are these indexes compared to the table? For the clustered index, we had 16 index pages for a 10,000-page table, which is less than 1 percent of the size of the table. I frequently use 1 percent as a ballpark estimate for the space requirement of a clustered index, even though you can see that in this case it's an overly large estimate. On the other hand, our nonclustered index needed 683 pages for the 10,000-page table, which is about 6 percent additional space. For nonclustered indexes, it is much harder to give a good ballpark figure. My example used a very short key. Nonclustered index keys are frequently built on larger keys, or are even composite, so it's not unusual to have key sizes of over 100 bytes. In that case, we'd need a lot more leaf-level pages, and the total nonclustered index size could be 30 or 40 percent of the size of the table. I've even seen nonclustered indexes that are as big as or bigger than the table itself. Once you have two or three nonclustered indexes, you need to double the space to support these indexes. Remember that SQL Server allows you to have up to 249 nonclustered indexes! Disk space is cheap, but is it that cheap? You still need to plan your indexes carefully. Inside Microsoft SQL Server 2005: T-SQL Querying gives you an introduction to how to choose the most appropriate indexes for your tables based on the queries that you will be executing, and Inside Microsoft SQL Server 2005: Tuning and Optimization will provide even more information.

Actual vs. Estimated Size

The actual space used for tables or indexes, as opposed to the ballpark estimates I just described, can be seen in the sys.allocation_units catalog view. As mentioned earlier, in Chapter 6 I showed you a query I referred to as the "allocation query," which joins sys.partitions with sys.allocation_units to give you the number of pages and the number of rows for each type of page in each partition of each table and index. Another way to get this information, in a slightly different format, is to use the dynamic management view called sys.dm_db_partition_stats. This view returns all the information for each partition in a single row, which includes the following result columns:

partition_id
object_id
index_id
partition_number
in_row_data_page_count
in_row_used_page_count
in_row_reserved_page_count
lob_used_page_count
lob_reserved_page_count
row_overflow_used_page_count
row_overflow_reserved_page_count
used_page_count
reserved_page_count
row_count
Based on all the discussion of structures in Chapter 6 and in this chapter, most of the columns should be self-explanatory. Note that there is a separate value for used pages and reserved pages for each type of page and then a column for total used space and one for total reserved space. (These columns are just the sum of the used and reserved space for the three types of storage.) Remember that after the first eight pages are allocated to a table or index, SQL Server grants all future allocations in units of eight pages, called extents. The total of all the pages in all the allocated extents for a particular type of page is the reserved page count. The total of all the pages that actually contain data is the in_row_used_page_count. It might be that only a few of the eight pages in some of the extents actually have data in them and are counted as used, so the reserved value is frequently higher than the used value.

I'll create a table called hugerows_with_text that contains regular in-row data, row-overflow data, and LOB data.

USE AdventureWork; CREATE TABLE dbo.hugerows_with_text    (a varchar(3000),    b varchar(8000),    c varchar(8000),    d text); INSERT INTO dbo.hugerows_with_text      SELECT REPLICATE('a', 3000), REPLICATE('b', 8000),          REPLICATE('c', 8000),  REPLICATE('d', 8000); We can compare the results of my 'allocation query' with the results of selecting from sys. db_dm_partition_stats: -- Here is my 'allocation query' SELECT object_name(object_id) AS name,     partition_id, partition_number AS pnum, rows,     allocation_unit_id AS au_id, type_desc as page_type_desc,     total_pages AS pages FROM sys.partitions p JOIN sys.allocation_units a      ON p.partition_id = a.container_id WHERE object_id=object_id('dbo.hugerows_with_text '); RESULTS: name               partition_id      pnum rows au_id             page_type_desc     pages ------------------ ----------------- ---- ---- ----------------- ----------------- ----- hugerows_with_text 72057594042449920 1    1    72057594048413696 IN_ROW_DATA       2 hugerows_with_text 72057594042449920 1    1    72057594048479232 ROW_OVERFLOW_DATA 3 hugerows_with_text 72057594042449920 1    1    72057594048544768 LOB_DATA          3 SELECT * FROM sys.dm_db_partition_stats WHERE object_id=object_id('dbo.hugerows_with_text '); RESULTS: (wrapped) partition_id         object_id   index_id    partition_number in_row_data_page_count -------------------- ----------- ----------- ---------------- ---------------------- 72057594042449920    923866358   0           1                1 in_row_used_page_count in_row_reserved_page_count lob_used_page_count  lob_reserved_page_count ---------------------- -------------------------- -------------------- ----------------------- 2                      2                          3                    3 row_overflow_used_page_count row_overflow_reserved_page_count ---------------------------- -------------------------------- 3                            3 used_page_count reserved_page_count row_count --------------- ------------------- --------- 8               8                   1

The first query will return three rows, one for each type of page. The second query will return only one row. Note that for this first row, the used and reserved page count values are the same for all three types of pages. Also note that for the in-row data pages, three values are returned: a page count, a used page count, and a reserved page count. The simple page count is the number of pages at the leaf level of an index, or if the structure is a heap, it's the total pages. The used page count includes all the other pages in the index, including the node pages and IAM pages. If you run the preceding INSERT statement several more times (I suggest at least eight more times), you start to get some bigger values returned from sys.db_dm_partition_stats, and you see that the values for used and reserved will no longer be the same.

Although sys.db_dm_partition_stats doesn't return the index name, it does return the object ID and index ID. For my hugerow_with_text table, that is irrelevant because the table is a heap. However, if you want to look at the space allocated to the Sales.SalesOrderDetail table, you might want to see the index name. Unfortunately, although there is a supplied system function that provides the name of an object given the object ID, there is no function to return an index name. So I have written one. If you create the index_name function in your AdventureWorks database, you can use it when selecting from sys.db_dm_partition_stats for a table in that database.

USE AdventureWorks; CREATE FUNCTION dbo.index_name (@object_id int, @index_id tinyint) RETURNS sysname AS BEGIN   DECLARE @index_name sysname   SELECT @index_name = name FROM sys.indexes      WHERE object_id = @object_id and index_id = @index_id   RETURN(@index_name) END;

Now I will use this function to return just a few selected columns from sys.db_dm_partition_stats.

SELECT convert(char(25),dbo.index_name(object_id, index_id)) AS index_name,      used_page_count, reserved_page_count, row_count FROM sys.dm_db_partition_stats WHERE object_id=object_id('Sales.SalesOrderDetail'); RESULTS: index_name                     used_page_count      reserved_page_count  row_count ------------------------------ -------------------- -------------------- --------- PK_SalesOrderDetail_SalesOrder 1245                 1278                 121317 AK_SalesOrderDetail_rowguid    417                  454                  121317 IX_SalesOrderDetail_ProductID  237                  253                  121317

SQL Server also supplies a stored procedure called sp_spaceused, which, when passed a table name as a parameter, reports the total amount of space used by the table. It reports total reserved space, total used data space, and total index space for the table and all its relational indexes, but it does not give an index by index breakdown. However, sp_spaceused does include the space used by XML indexes and full-text indexes that are dependent on the specified table, so the totals it gives might not agree with what you can compute by totaling all the values returned by sys.db_dm_partition_stats. In earlier versions of SQL Server, the values representing total space used are not always updated every time a table or index is modified. In particular, immediately after you create a table or after certain bulk operations, sp_spaceused does not accurately reflect the total space reserved or used. You can force these earlier versions of SQL Server to update the internal metadata by passing a parameter called @updateusage to sp_spaceused, with the value true, or by running the command DBCC UPDATEUSAGE. SQL Server 2005 Books Online reports that this command is no longer needed because the space usage information is always maintained accurately. The only time you might want to use one of those options is after upgrading a SQL Server 2000 or SQL Server 7.0 database to SQL Server 2005.

It's great to have a way to see how much space is being used by a particular table or index, but what if we're getting ready to build a database and we want to know how much space our data and indexes will need? For planning purposes, it would be nice to know this information ahead of time.

If you've already created your tables and indexes, even if they're empty, SQL Server can get column and data type information from the sys.columns catalog view, and it should be able to see any indexes you've defined in the sys.indexes view. You shouldn't need to type all this information into a spreadsheet. The companion content includes a set of stored procedures you can use to calculate estimated space requirements when the table and indexes have already been built. The main procedure is called sp_EstTableSize, and it requires only two parameters: the table name and the anticipated number of rows. The procedure calculates the storage requirements for the table and all indexes by extracting information from the sys.indexes, sys.columns, and sys.types views. The result is only an estimate when you have variable-length fields. The procedure has no way of knowing whether a variable-length field will be completely filled, half filled, or mostly empty in every row, so it assumes that variable-length columns will be filled to the maximum size. If you know that this won't be the case with your data, you can create a second table that more closely matches the expected data. For example, if you have a varchar(1000) column that will only rarely use the full 1000 bytes, and 99 percent of your rows will use only about 100 bytes, you can create a second table with a varchar(100) column and run sp_EstTableSize on that table.