The Procedure Cache | Inside Microsoft SQL Server 2000

I've mentioned SQL Server's procedure cache several times. It's important to understand that the procedure cache in SQL Server 2000 is not actually a separate area of memory. Releases prior to SQL Server 7 had two effective configuration values to control the size of the procedure cache: one specifying a fixed size for SQL Server's total usable memory and one specifying a percentage of that memory (after fixed needs were satisfied) to be used exclusively for storing query plans. Also, in previous versions query plans for ad hoc SQL statements were never stored in the cache, only the plans for stored procedures. In SQL Server 2000, the total size of memory is by default dynamic and the space used for query plans is also very fluid.

One of the first things SQL Server determines when processing a query is whether the query is both ad hoc and cheap to compile; if so, SQL Server won't bother caching it at all. (In the next section, we'll look at what "ad hoc" actually means and compare it to the alternative types of queries: autoparameterized, prepared, and stored procedures.) It's not a good idea to flood the cache with things that probably won't be reused very often. It is cheaper to simply recompile if necessary. If the query is not ad hoc or is not cheap to compile, SQL Server will get some memory from the buffer cache. There is only one source of memory for 99 percent of the server's needs, and that is the buffer cache itself. In a few cases, SQL Server will go out and allocate large chunks of memory directly from the operating system, but those cases are extremely rare. For everything else, the management is centralized.

Plans are saved in cache along with a cost factor that reflects the cost of actually creating the plan by compiling the query. If the plan is ad hoc, SQL Server will set its cost to 0. The value 0 means that the plan is immediately available to be kicked out of the procedure cache. You're taking a chance that the plan might be reused, but the probability is low. If there's memory pressure on the system, plans for ad hoc statements should be the first to go. If the plan is not ad hoc, SQL Server sets the cost to the actual cost of compiling the query. These costs are basically in units of disk I/O. When a data page is read off the disk, it has a cost of 1 I/O. When this plan was built, information was read off the disk, including statistics and the text of the query itself. SQL did additional processing and that processing work is normalized to the cost of an I/O. Once the cost is computed, the plan is put into the cache. Figure 15-9 shows the flow of costing a plan and placing it in cache.

If another query comes along that can reuse that plan, SQL Server again determines what type of plan it is. If it's an ad hoc plan, SQL Server increments the cost by 1. So if ad hoc plans are really being reused, they'll stay in the cache for a little while longer as their cost factor increases. If the plan is reused frequently, the cost will keep getting bumped up by 1 until it reaches its actual creation cost—and that's as high as it will ever be set. But if it's reused a lot—if the same user or other users keep resubmitting the exact same SQL text—the plan will remain in cache.

click to view at full size.

Figure 15-9. Costing a query plan and placing it in cache.

If the query is not ad hoc, which means that it's a stored procedure, a prepared query, or an autoparameterized query, the cost is set back up to the creation cost every time the plan is reused. As long as the plan is reused, it will stay in cache. Even if it's not used for a while, depending on how expensive it was to create initially, it can remain in cache for a fair amount of time. Figure 15-10 shows the flow of retrieving a plan from cache, and adjusting the cost.

click to view at full size.

Figure 15-10. Retrieving a plan from cache.

The lazywriter is the mechanism that actually manages the cost of plans and is responsible for removing plans from cache when necessary. The lazywriter is actually part of the storage engine, but since it's so important to the query processing mechanism that it needs to be mentioned here. The lazywriter uses the same mechanism for managing memory used by query plans that it does for managing data and index pages, because plans are stored in the normal buffer cache. The lazywriter looks through all the buffer headers in the system. If there's very little memory pressure on the system, it looks very slowly; if memory pressure grows and the hit rates on the cached objects start going up, the lazywriter starts running more frequently. As the lazywriter runs, it looks at each buffer header and then at the current cost for the page that the buffer is holding. If the cost is 0, which means it hasn't been touched since the lazywriter's last pass through the cache, the lazywriter will release that page to get some free memory in the system to use for page I/O or for other plans. In addition, if the buffer contains a procedure plan, the lazywriter will call the SQL manager, which does some cleanup. Finally, the buffer is put on the free list for reuse.

If the cost associated with a buffer is greater than 0, the lazywriter decrements the cost and continues inspecting other buffers. The cost then actually indicates for how many cycles of the lazywriter something will stay in the cache without reuse before it's removed. This algorithm doesn't differentiate between query plans in cache and data or index pages in cache, with the exception of the step that calls the SQL manager if the object is a stored procedure. The lazywriter doesn't really know anything about an object being a query plan, and the algorithm balances the use of a cache for disk I/O versus the use of cache for a procedure plan. Figure 15-11 shows the flow of the lazywriter's processing of the cache.

click to view at full size.

Figure 15-11. The flow of the lazywriter's processing of plans in cache.

You'll find that if you have something that is expensive to compile, even if it is not touched for a long time, it will stay in cache because its initial cost is so high. Anything that is reused frequently will sit in cache for a long time because its cost is also reset whenever it's used, and the lazywriter will never see it go to 0.