The Mechanics of Query Processing | Beginning Transact-SQL With SQL Server 2000 and 2005

To drive a car, it's not essential to understand how the engine works. However, if you want to be able to drive a car well (and perhaps maintain and tune it for optimal performance), it's helpful to have a fundamental understanding of the engine mechanics and to know what's going on inside. Likewise, it's possible to use SQL Server without fully understanding its mechanics, but if you want to create queries that work efficiently, it will help to understand what goes on within the relational database engine and the query processor.

When a SQL statement is presented to the database engine, it begins to analyze the request and break it down into steps. Based on characteristics of the data stored in tables, decisions are made resulting in the selection of appropriate operations. Many factors are considered including the table structures, existence of indexes, and the relative uniqueness of relevant data values.

It would be inefficient for the query-processing engine to analyze all of the data prior to each query, so SQL Server gathers statistical information it uses to make these decisions. In essence, SQL Server learns from previous query executions and adapts as the data changes (see Figure 2-15). In theory, queries will continue to be optimized and updated as time goes on.

image from book
Figure 2-15:

Complex queries are broken down into individual steps — smaller queries — that process granular operations. This list of steps and operations is known as an execution plan. The query's syntax may actually be rewritten by the query optimizer into a standard form of SQL. SQL Server doesn't actually execute SQL — that's just how we talk to it. Before SQL Server can send instructions to the computer's processor, these commands must be compiled into low-level computer instructions, or object code. The optimized, compiled query is placed into an in-memory cache. Depending on how the query is created (for example, it may be saved as a view or stored procedure), the execution plan and cache are saved with that object in the database, called procedure cache. Even ad-hoc queries may benefit from this process. The cached compiled query and execution plan is held into memory as buffer cache and reused until the user and client application's connection is closed. This way, if the same query is executed multiple times, it should run faster and more efficiently after the first time. In SQL Server 2005, the same mechanism is used to manage both buffer cache and procedure cache. Here's a closer look at this process, also illustrated in Figure 2-16:

image from book
Figure 2-16:

First the query text is flat-lined and translated into a standardized form of SQL.
Objects and then permissions are resolved, replacing object names with data-specific numeric identifiers and security context. These identifiers streamline conversations between the relation and storage engine.
The query is semantically translated from SQL to Tabular Data Stream (TDS), the native language of the SQL Server net libraries. In this translation, operations are simplified and optimized. More than 300 possible semantic operations exist.
Compiled version of the plan and call are placed into the buffer.
The relational engine spawns threads for calling logical and physical I/O and operational execution. Database object locks are placed and managed by the transactional engine.