Index Space Requirements

I've shown you the structure and number of bytes in individual index rows, but you really need to be able to translate that into overall index size. In general, the size of an index is based on the size of the index keys, which determines how many index rows can fit on an index page and the number of rows in the table.

B-Tree Size

When we talk about index size, we usually mean the size of the index tree. The clustered index does include the data, but since you still have the data even if you drop the clustered index, we're usually just interested in how much additional space the nonleaf levels require. A clustered index's node levels typically take up very little space. You have one index row for each page of table data, so the number of index pages at the level above the leaf (level 1) is the bytes available per page divided by the index key row size, and this quotient divided into the number of data pages. You can do a similar computation to determine the pages at level 2. Once you reach a level with only one page, you can stop your computations because that level is the root.

For example, consider a table of 10,000 pages and 500,000 rows with a clustered key of a 5-byte fixed-length character. As you saw in Figure 8-5, the index key row size is 12 bytes, so we can fit 674 (8096 bytes available on a page / 12 bytes per row) rows on an index page. Since we need index rows to point to all 10,000 pages, we'll need 15 index pages (10000 / 674) at level 1. Now, all these index pages need pointers at level 2, and since one page can contain 15 index rows to point to these 15 pages, level 2 is the root. If our 10,000-page table also has a 5-byte fixed-length character nonclustered index, the leaf-level rows (level 0) will be 11 bytes long as they will each contain a clustered index key (5 bytes), a nonclustered index key (5 bytes) and 1 byte for the status bits. The leaf level will contain every single nonclustered key value along with the corresponding clustered key values. An index page can hold 736 index rows at this leaf level. We need 500,000 index rows, one for each row of data, so we need 680 leaf pages. If this nonclustered index is unique, the index rows at the higher levels will be 12 bytes each, so 674 index rows will fit per page, and we'll need two pages at level 1, with level 2 as the one root page.

So how big are these indexes compared to the table? For the clustered index, we had 16 index pages for a 10,000-page table, which is less than 1 percent of the size of the table. I frequently use 1 percent as a ballpark estimate for the space requirement of a clustered index, even though you can see that in this case it's an overly large estimate. On the other hand, our nonclustered index needed 683 pages for the 10,000-page table, which is about 6 percent additional space. For nonclustered indexes, it is much harder to give a good ballpark figure. My example used a very short key. Nonclustered index keys are frequently much larger, or even composite, so it's not unusual to have key sizes of over 100n bytes. In that case, we'd need a lot more leaf level pages, and the total nonclustered index size could be 30 or 40 percent of the size of the table. I've even seen nonclustered indexes that are as big or bigger than the table itself. Once you have two or three nonclustered indexes, you need to double the space to support these indexes. Remember that SQL Server allows you to have up to 249 nonclustered indexes! Disk space is cheap, but is it that cheap? You still need to plan your indexes carefully.

Actual vs. Estimated Size

The actual space used for tables or indexes, as opposed to the ballpark estimates I just described, is stored in the sysindexes table. A column called rowcnt contains the number of data rows in the table. A column called dpages contains, for a clustered index or a heap, the number of data pages used; for all other indexes, dpages contains the number of index pages used. For LOB data (text, ntext, or image), with an indid value of 255, dpages is 0. Sysindexes also has a column called reserved, which, for a clustered index or a heap contains the number of data pages reserved; for a nonclustered index, it contains the number of pages reserved for the index itself; and for LOB data, reserved is a count of the pages allocated. I'll tell you more about what "reserved" means later in this section.

Finally, there is a column in sysindexes called used. For clustered indexes or heaps, it contains the total number of pages used for all index and table data. For LOB data, used is the number of LOB pages. For nonclustered indexes, used is the count of index pages. Because no specific value is just the number of nonleaf, or nondata, pages in a clustered index, you must do additional computations to determine the total number of index pages for a table, not including the data.

The stored procedure sp_spaceused examines these values and reports the total size used by a table. Keep in mind that these values in sysindexes are not updated every time a table or index is modified. In particular, immediately after you create a table and after some bulk operations, sp_spaceused might not accurately reflect the total space because the values in sysindexes are not accurate.

You can force SQL Server to update the sysindexes data in two ways. The simplest way to force the space used values to be updated is to ask for this when you execute the sp_spaceused procedure. The procedure takes two optional parameters: the first is called @objname and is a table name in the current database, and the second is called @updateusage, which can have a value of TRUE or FALSE. If you specify TRUE, the data in sysindexes will be updated. The default is FALSE.

Here's an example of forcing SQL Server to update the space used data for a table called charge:

 exec sp_spaceused charge, true 

Here's the result:

 name rows reserved data index_size unused -------- --------- ------------ ----------- ------------ --------- charge 100000 10328 KB 5384 KB 4696 KB 248 KB 

Remember that after the first 8 pages are allocated to a table, SQL Server grants all future allocations in units of 8 pages, called extents. The sum of all the allocations is the value of reserved. It might be that only a few of the 8 pages in some of the extents actually have data in them and are counted as used, so that the reserved value is frequently higher than the used value. The data value is computed by looking at the data column in sysindexes for the clustered index or heap and adding any LOB pages. The index_size value is computed by taking the used value from sysindexes for the clustered index or heap, which includes all index pages for all indexes. Because this value includes the number of data pages, which are at the leaf level of the clustered index, we must subtract the number of data pages from used pages to get the total number of index pages. The procedure sp_spaceused does all of its space reporting in KB, which we can divide by 8 to get the number of pages. The unused value is then the leftover amount after we subtract the data value and the index_size value from the reserved value.

You can also use the DBCC UPDATEUSAGE command to update the space information for every table and index in sysindexes. Or, by supplying additional arguments, you can have the values updated for only a single table, view, or index. In addition, DBCC UPDATEUSAGE gives you a report of the changes it made. You might want to limit the command to only one table because the command works by actually counting all the rows and all the pages in each table it's interested in. For a huge database, this can take a lot of time. In fact, sp_spaceused with the @updateusage argument set to TRUE actually calls DBCC UPDATEUSAGE and suppresses the report of changes made. The following command updates the space used information for every table in the current database:

 DBCC UPDATEUSAGE (0) 

Here's some of the output I received when I ran this in the Northwind database:

 DBCC UPDATEUSAGE: sysindexes row updated for table 'syscolumns' (index ID 2): USED pages: Changed from (2) to (5) pages. RSVD pages: Changed from (2) to (5) pages. DBCC UPDATEUSAGE: sysindexes row updated for table 'Orders' (index ID 2): USED pages: Changed from (2) to (5) pages. RSVD pages: Changed from (2) to (5) pages. DBCC UPDATEUSAGE: sysindexes row updated for table 'Orders' (index ID 3): USED pages: Changed from (2) to (5) pages. RSVD pages: Changed from (2) to (5) pages.  DBCC UPDATEUSAGE: sysindexes row updated for table 'Order Details' (index ID 3): USED pages: Changed from (2) to (6) pages. RSVD pages: Changed from (2) to (6) pages. 

Note that you can use this command to update both the used and reserved values in sysindexes.

It's great to have a way to see how much space is being used by a particular table or index, but what if we're getting ready to build a database and we want to know how much space our data and indexes will need? For planning purposes, it would be nice to know this information ahead of time.

If you've already created your tables and indexes, even if they're empty, SQL Server can get column and datatype information from the syscolumns system table, and it should be able to see any indexes you've defined in the sysindexes table. There shouldn't be a need for you to type all this information into a spreadsheet. The CD also includes a set of stored procedures you can use to calculate estimated space requirements when the table and indexes have already been built. The main procedure is called sp_EstTableSize, and it requires only two parameters: the table name and the anticipated number of rows. The procedure calculates the storage requirements for the table and all indexes by extracting information from the sysindexes, syscolumns, and systypes tables. The result is only an estimate when you have variable-length fields. The procedure has no way of knowing whether a variable-length field will be completely filled, half full, or mostly empty in every row, so it assumes that variable-length columns will be filled to the maximum size. If you know that this won't be the case with your data, you can create a second table that more closely matches the expected data. For example, if you have a varchar(1000) field that you must set to the maximum because a few rare entries might need it but 99 percent of your rows will use only about 100 bytes, you can create a second table with a varchar(100) field and run sp_EstTableSize on that table.

NOTE


The procedures for calculating estimated storage space were developed primarily by SQL Server MVP BP Margolin and are included on the CD with his permission.



Inside Microsoft SQL Server 2000
Inside Microsoft SQL Server 2000
ISBN: 0735609985
EAN: 2147483647
Year: 2005
Pages: 179
Authors: Kalen Delaney

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net