SQLOS


Put simply, SQLOS is a user-mode application that is actually a miniature operating system. SQLOS is designed to abstract the complexities of the underlying hardware while at the same time providing a rich programming environment that enables the developers of SQL Server to take full advantage of hardware features and functionality. SQLOS provides operating system-like services such as a non-preemptive scheduling, memory management, deadlock detection, and exception handling.

SQLOS Architecture

SQLOS is made up of three major components: nodes, schedulers, and tasks. Nodes represent the resource that is being addressed, such as a CPU or memory resources. Schedulers represent the work item queue for that resource, such as a thread. Tasks represent the actual work item for that resource, such as performing an operation or reading a memory register. There is a strict hierarchical relationship among these, as shown in Figure B-2.

image from book
Figure B-2: The hierarchical relationship between SQLOS nodes, schedulers, and tasks

It is important to understand how the underlying architecture of SQLOS is represented on various types of hardware. In SMP systems, for example, all CPU nodes share only one memory node. In NUMA systems, multiple memory nodes each sponsor multiple CPU nodes. (For example, a 16-core system most likely contains four memory nodes, each "sponsoring" four CPU nodes.) When SQL Server first starts, it examines the underlying hardware configuration and dynamically constructs the SQLOS structure. This information is reported in the SQL Server Log, which can be viewed from the Management node in SQL Server Management Studio as follows:

 SQL Server is starting at normal priority base (=7). This is an informational message only. No user action is required. Detected 4 CPUs. This is an informational message; no user action is required. Using locked pages for buffer pool. Using dynamic lock allocation.  Initial allocation of 2500 Lock blocks and 5000 Lock Owner blocks per node.  This is an informational message only.  No user action is required. Multinode configuration: node 0: CPU mask: 0x000000000000000c Active CPU mask: 0x000000000000000c. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required. Multinode configuration: node 1: CPU mask: 0x0000000000000003 Active CPU mask: 0x0000000000000003. This message provides a description of the NUMA configuration for this computer. This is an informational message only. No user action is required. 

This log demonstrates the default SQLOS configuration for a four-processor machine with 8 GB of RAM, showing two memory nodes, each hosting two CPU nodes. What this means, from a practical standpoint, is that even though 8 GB of RAM is present in the system, 4 GB of RAM is associated with one set of processors, and 4 GB of RAM is associated with another set.

In SQL Server 2005, all user access to data occurs through memory buffers known as the buffer cache. This is especially important to note on systems that support NUMA, because the total memory in the system is equally divided among NUMA memory nodes, and the majority of that memory is allocated to buffer cache. Due to the fact that Windows does not have an Application Programming Interface (API) that enables SQLOS to directly allocate physical memory to NUMA nodes, SQLOS must use a dynamic approach to allocating memory. This means that the behavior of SQL Server 2005 will sometimes differ on systems that have just started versus systems that have been running for a while. From a practical standpoint, this means that developers must be aware of how memory is divided among NUMA nodes and how to properly partition application access to the system so that any one node is not overly taxed by their application.

Due to the nature of the SQLOS hierarchy, the memory node is probably the most important node for developers to be aware of, as it is the node that controls the system resources. The following query, which uses the sys.dm_os_memory_clerks Dynamic Management View (DMV), shows how many memory nodes SQLOS has configured as well as the amount of memory assigned and allocated to each.

 SELECT    memory_node_id AS [Memory Node],    SUM(single_pages_kb) AS [Single Pages KB],    SUM(multi_pages_kb) AS [Multi Pages KB],    SUM(virtual_Memory_reserved_kb) AS [Reserved Virtual Memory KB],    SUM(virtual_memory_committed_kb) AS [Virtual Memory Committed KB],    SUM(awe_allocated_kb) AS [AWE Memory Allocated KB] FROM sys.dm_os_memory_clerks GROUP BY memory_node_id; 

Executing this SQL script will result in a result set that details the amount and type of memory that is allocated to each memory node. See Figure B-3 for the result of this query on a four-processor machine with 8 GB of total RAM.

image from book
Figure B-3: Displaying the SQLOS memory node configuration and allocations

If the query returns a single row, it indicates that your system is not configured to use NUMA. In some cases, hardware vendors have implemented NUMA architectures that are not detectable by SQLOS. This can lead to performance problems very quickly. To alleviate this problem, SQL Server 2005 has the capability to manually configure NUMA-like support. This is known as "soft-NUMA."

Configuring SQL Server Soft-NUMA Support

The primary instance in which you should consider utilizing soft-NUMA is when you suspect that there is an I/O-related performance problem when running applications but you also believe that your I/O subsystem is properly configured. This is a common problem among high-volume systems that are configured to utilize Storage Area Networks (SANs). Each NUMA node has its own I/O thread that is responsible for reading and writing data on the disk and another thread, known as a lazy writer, that is responsible for managing the state of the SQL Server buffer cache. For systems in which NUMA is not automatically detected by SQLOS, configuring soft-NUMA can significantly affect I/O performance. Another reason to consider soft-NUMA is for multiple-instance support. Each instance of SQL Server on a single machine shares NUMA nodes by default, so when dealing with multiple instances and more than 8 GB on a machine, using soft-NUMA can lead to performance gains.

Configuring soft-NUMA support in SQL Server 2005 is a two-step process. The first step is setting the processor affinity mask. A processor affinity mask is a structure that is used to bind a specific processor to a specific memory node. The second step is adding a registry value to map NUMA nodes onto specific processors.

Configuring soft-NUMA support also requires an understanding of binary and hexadecimal math. For example, if you have a system with eight processors and two instances of SQL Server 2005 running, and you want to assign four processors to each instance, you would set the processor affinity of the first instance to 15 decimal (00001111 binary), and the processor affinity of the second instance to 240 decimal (11110000 binary). Processor affinity on the first instance can be set with the sp_configure stored procedure as follows:

 EXEC sp_configure 'show advanced options',1 RECONFIGURE GO EXEC sp_configure 'affinity mask',15; Processor affinity on the second instance can be set as follows: EXEC sp_configure 'show advanced options',1 RECONFIGURE GO EXEC sp_configure 'affinity mask',240; 

Important 

Configuring NUMA support on more than 32 processors

The above sp_configure options are specific to systems with 32 or fewer processors. If you are working with a system that has more than 32 processors and you want to configure the affinity mask, you must use both the affinity mask option shown above and the affinity64 mask. For more information, see the SQL Server 2005 Books Online article "affinity64 mask Option" at http://msdn2.microsoft.com/en-us/library/ms188291.aspx.

In addition to using hexadecimal math, you can also use the Microsoft SQL Server 2005 Management Studio to configure processor affinity, as shown in Figure B-4.

image from book
Figure B-4: Configuring the processor affinity using the Microsoft SQL Server 2005 Management Studio Server Properties dialog box

Once the processor affinity is set, soft-NUMA can be configured as desired using a registry setting. In HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\90\Node-Configuration\Nodex, where the first node is Node0, the second node is Node1, and so on, a single DWORD named CPUMask has a value set to the hexadecimal representation of the binary affinity mask. For example, if you want to assign two NUMA memory nodes, each with two processors, to the first instance from the above scenario, you would create registry entries for Node0 and Node1 and set the CPUMask values to 0x03 hexadecimal (00000011 binary) and 0x0c hexadecimal (00001100 binary), respectively. To assign a single NUMA memory node (with all four processors) to the second instance, create a registry entry for Node2 and set the CPUMask value to 0xF0 (11110000 binary). Soft-NUMA support can also be used in systems that support hardware NUMA to control the processor-to-memory affinity.

Important 

Configure processor affinity

SQL Server 2005 provides the ability to dynamically configure processor affinity, but soft-NUMA support requires a restart of SQL Server to take effect.

SQLOS provides a granular level of control over NUMA configurations, giving experienced database developers good control over how their applications perform on high-end hardware. Unfortunately, processor affinity and soft-NUMA are not the only places where SQLOS needs to step in on high-end systems. In the SQLOS hierarchy, schedulers play a very important role in deciding what tasks are executed and when.

SQL Server Task Scheduling

When an application connects to SQL Server, it is assigned a server process identifier (SPID). All information that must be maintained for the life of the connection is managed in internal data structures associated with the SPID. As an instance of SQL Server receives batches from clients, it breaks the batch into one or more tasks, and then associates each task with an available worker thread from a pool of worker threads. A worker thread is bound to the task for the lifetime of the task. A worker thread runs the request on the associated SQL scheduler. If there are no free worker threads and the configuration value "max worker threads" has not been reached, the instance of SQL Server allocates a new worker thread for the new batch. If there are no free threads or fibers available and the max worker threads value has already been reached, the instance of SQL Server blocks the new task until a worker thread is freed.

After a worker is associated with a task, it remains associated with the task until the task is completed; for example, until the last of the result sets generated by the batch has been returned to the client. At that time, the worker thread is freed and can be paired with the tasks associated with the next batch.

The database engine must actively perform work for a connection only from the time a batch is received until the results have been returned to the client. During this period, there might be times when the batch does not require active processing. For example, there might be times during which the database engine must wait for a read operation to retrieve the data required for the current query, or must wait for another batch to release a lock. The task-to-worker association is maintained even when the task is blocked on some resource.

Whenever the database engine starts to process a task associated with a batch, it schedules the worker thread associated with the task to perform the work. After the worker thread has completed the work for the task, an instance of SQL Server dispatches the worker thread to the next task that is ready to work. A SPID remains constant for a connection for the life of the connection. Long-running connections might have their individual batch tasks executed by many different worker threads. For example, the tasks from the first batch might be executed by worker1, but the tasks from the second batch could be executed by worker2. Some statements can be processed in parallel. In this case, a batch might have multiple tasks that are executed by multiple worker threads at the same time.

SQLOS Schedulers

Due to the nature of SQLOS, only one task can execute on a given CPU at any given time, so it is the job of the scheduler to determine which task can run. Most tasks that SQL Server must execute begin with a user request. (There are several system tasks that run.) User requests generally begin with a client system making a request of the server. The problem with this approach is that all user requests are generally funneled into a single TCP/IP connection (the IP Sockets protocol uses port 1433 by default) which, by default, assigns requests to each NUMA node scheduler in a "round-robin" fashion. The theory behind this approach is that the workload will be evenly spread out among all schedulers. However, for systems that support multiple applications, database developers might want tighter control over which NUMA nodes their applications use.

Important 

TCP/IP port affinity

When considering whether to configure TCP/IP port affinity, database developers must have a thorough understanding of how their application consumes system resources. Normally, the default configuration is more than sufficient for most applications.

Configuring TCP/IP Port Affinity

One of the new features in SQL Server 2005 is the ability to create TCP/IP endpoints. Usually, these endpoints are used to host native Web services; however, they can also be used to configure additional Tabular Data Stream (TDS) endpoints for use with "normal" client communication. For very high-activity systems that support multiple applications, it might be desirable to create a single TCP/IP connection to the server for each NUMA memory node, and bind applications to each connection in a logical fashion, such as grouping applications that have similar data-access patterns. Creating a TCP/IP connection for SQL Server involves the following steps:

  1. Create an endpoint for a specific TCP/IP port.

  2. Grant permissions to the new endpoint.

  3. Configure SQL Server to listen on the new port.

To create the endpoint, execute the CREATE ENDPOINT command as follows. (In this example, the endpoint is named "CustomConnection" and is configured to listen to TCP port 1500 on any IP address in the system.)

 CREATE ENDPOINT [CustomConnection] STATE = STARTED AS TCP    (LISTENER_PORT = 1500, LISTENER_IP =ALL) FOR TSQL(); 

Important 

Re-enable access to the default port

Whenever a new endpoint is created for T-SQL, SQL Server assumes that the default endpoint (TCP port 1433) is no longer being used. If you still want to be able to utilize the default port, you must re-enable access by using the command GRANT CONNECT ON ENDPOINT::[TSQL Default TCP] to [public] to grant access to the public role.

Once the endpoint is created, proper permissions must be assigned to the users who will connect by using the GRANT command as follows:

 GRANT CONNECT ON ENDPOINT::[CustomConnection] to [corp\SQLSupport] ; 

The final step in configuring additional TCP/IP ports is to add the newly configured endpoint to the listener by performing the following steps:

  1. Start SQL Server Configuration Manager.

  2. Expand the SQL Server 2005 Network Configuration node.

  3. Expand the Protocols node.

  4. In the right pane, select the TCP/IP protocol, right-click, and select Properties.

  5. On the IP Addresses tab, under IPAll, enter the new port number, as shown in Figure B-5.

image from book
Figure B-5: Configuring SQL Server 2005 to listen on an additional TCP port

Once the new port (or ports) are created, they can be bound to specific NUMA memory nodes by appending the node mask to the port number in the TCP/IP configuration. For example, on a system with three NUMA nodes and two TCP/IP ports, where it is desired to bind the first two nodes to the default port and the third node to the port just created, the default port would utilize a node mask of 0x3 (00000011), and the second port would utilize a node mask of 0x4 (00000100), as shown in Figure B-6.

image from book
Figure B-6: Configuring SQL Server 2005 to utilize IP Port affinity

Getting each SQLOS node and scheduler to the optimal configuration is a challenge for even the most experienced database developer and is very different for each system, depending on usage characteristics. Fortunately, SQL Server 2005 and SQLOS provide a rich management structure that enables developers and administrators to understand exactly how the system is behaving at any given point in time.

Using SQLOS to Determine Application Performance Characteristics

One challenge that database developers face when developing applications is understanding how the application performs while in production. To accurately understand the performance characteristics of an application, the developer needs to know the specific cause of any wait states that occur during the execution of the application. At the most basic level, waits can be broken down into two categories: signal waits or resource waits. In SQL Server, signal waits occur when the scheduler is waiting on a processor resource to schedule the task, and resource waits occur when the task has been assigned to a processor resource but are waiting for disk or memory resources. When examining overall waits for an application, if there is a high ratio of signal to resource waits (that is, if there are many more signal waits than resource waits), it might signify an inefficient use of processor resources. If there is a high ratio of resource to signal waits, that can indicate an inefficient use of disk resources. To begin the process of evaluating application performance, developers need to understand which of these wait types is most prevalent. Fortunately, SQLOS provides a window into the system that is granular enough that developers are able to use it to answer the question.

Important 

SQLOS vs. SQL Server 2000 UMS

With SQL Server 7 and SQL Server 2000 and the UMS, developers could never tell whether application performance problems were a result of signal waits or resource waits. Many times a developer would use a tool such as Windows System Monitor, notice that the processor was very busy, and assume that she needed to upgrade her machine to a faster processor. In reality, because of how UMS accomplishes thread manipulation, the problem was really caused by a slow disk subsystem, and upgrading to a faster processor simply exacerbated the problem. With SQLOS, this problem no longer exists, but developers must know how to utilize the new functionality in SQLOS to understand signal waits versus resource waits.

The following query utilizes the sys.dm_os_wait_states DMV to query SQLOS and report on resource versus signal waits:

 SELECT signal_wait_time_ms=sum(signal_wait_time_ms)    ,'%signal waits' = cast(100.0 * sum(signal_wait_time_ms) / sum (wait_time_ms) AS NUMERIC (20,2))    ,resource_wait_time_ms=sum(wait_time_ms - signal_wait_time_ms)    ,'%resource waits'= cast(100.0 * sum(wait_time_ms - signal_wait_time_ms) / sum (wait_time_ms) AS NUMERIC(20,2)) FROM sys.dm_os_wait_stats; 

Executing this query on a SQL Server 2005 machine will return a result set that provides details on the percentage of overall waits and whether they are signal or resource waits, as shown in Figure B-7. Generally, systems that exhibit a low signal-to-resource wait exhibit the best overall performance.

image from book
Figure B-7: Using SQL Server 2005 Dynamic Management Views to query SQLOS wait types

When performing "normal" performance-monitoring tasks (for a discussion of normal performance monitoring, see Chapter 8, "Improving Database Application Performance," Lesson 2, "Resolving Performance Problems"), database developers can isolate a specific SPID that is causing performance issues, such as excessive blocking or excessive memory consumption. Using SQLOS, developers can dig deeper into the process to determine the exact statement that is executing, as well as the resource utilization by the process. The following query utilizes the sys.dm_exec_requests DMV to query all non-system processes and report on specific resource utilization (to isolate a specific process, replace the ">50" in the WHERE clause with the specific process ID you wish to view):

 SELECT r.session_id          ,status          ,substring(qt.text,r.statement_start_offset/2,                (CASE WHEN r.statement_end_offset = -1                then len(convert(nvarchar(max), qt.text)) * 2                ELSE r.statement_end_offset END - r.statement_start_offset)/2)          AS query_text           ,qt.dbid           ,qt.objectid           ,r.cpu_time           ,r.total_elapsed_time           ,r.reads           ,r.writes           ,r.logical_reads           ,r.scheduler_id FROM sys.dm_exec_requests r cross apply sys.dm_exec_sql_text(sql_handle) AS qt WHERE r.session_id > 50 ORDER BY r.scheduler_id, r.status, r.session_id; 

Executing this query on SQL Server 2005 will provide detail on all system activity and resource consumption, as shown in Figure B-8.

image from book
Figure B-8: Using SQLOS to display all currently executing statements and their resource utilization

Using the above query, database developers can isolate specific statements that are causing performance issues, as well as gain insight into how the various statements are distributed among SQLOS memory nodes and schedulers.

Once a database developer learns about how SQL Server is processing his queries and utilizing resources, he might want to consider creating a system configuration strategy that can be used to deploy his application. Part of the system configuration strategy is to understand how evenly the load on the system is distributed. SQLOS can report on the distribution state of the current system workload and what type of load factor each scheduler is experiencing. This is done via the sys.dm_os_schedulers DMV. The following code can be used to query SQLOS to determine the distribution of current tasks and their associated workload:

 SELECT   scheduler_id,   cpu_id,   current_tasks_count,   runnable_tasks_count,   pending_disk_io_count,   current_workers_count,   active_workers_count,   work_queue_count,   load_factor FROM sys.dm_os_schedulers WHERE scheduler_id < 255; 

The output of this query is shown in Figure B-9.

image from book
Figure B-9: Querying SQLOS to determine the current system workload distribution

One very interesting column in the result set is "load factor." Load factor indicates the perceived load on a scheduler. This value is used to determine whether a new task should be put on this scheduler or another scheduler. This value is useful for debugging purposes when it appears that schedulers are not evenly loaded. In SQL Server 2000, a task is routed to a particular scheduler. However, in SQL Server 2005, the routing decision is made based on the load on the scheduler. SQL Server 2005 also uses a load factor of nodes and schedulers to help determine the best location to acquire resources. When a task is enqueued, the load factor is increased. When a task is completed, the load factor is decreased. Using the load factors helps SQLOS to evenly distribute system workload.

On multiprocessor systems, performance problems can arise due to the way the SQL Optimizer chooses to parallelize a particular query plan. When a query is first examined by the optimizer, a decision is made regarding whether the query should be run serially or there are instead portions of the query that can run in parallel. The number of query portions that the optimizer chooses to run in parallel is known as the degree of parallelism. SQL Server 2005 automatically detects the best degree of parallelism for each query. It does this based on the following criteria, which are evaluated in the following order:

  1. Whether SQL Server is running on a computer that has more than one processor.

  2. Whether sufficient threads are available. Each query or index operation requires a certain number of threads to execute. Executing a parallel plan requires more threads than a serial plan, and the number of required threads increases with the degree of parallelism.

  3. The type of query or index operation executed. Index operations that create or rebuild an index, or drop a clustered index, and queries that use processor cycles heavily are the best candidates for a parallel plan. For example, joins of large tables, large aggregations, and sorting large result sets are good candidates. For simple queries, which are frequently found in transaction-processing applications, the additional coordination required to execute a query in parallel outweighs the potential performance boost. To distinguish between queries that benefit from parallelism and those that do not benefit, SQLOS compares the estimated cost of executing the query or index operation with the cost threshold for parallelism value.

  4. Whether there are a sufficient number of rows to process. If the query optimizer determines that the number of rows is too low, it does not introduce parallel operations.

  5. Whether current distribution statistics are available. In earlier versions of SQL Server, the database engine abandoned parallel plans if the available statistics prevented the database engine from providing the highest degree of parallelism. In SQL Server 2005, if the highest degree of parallelism is not possible, lower degrees are considered before the parallel plan is abandoned. For example, when you create a clustered index on a view, distribution statistics cannot be evaluated, because the clustered index does not yet exist. In this case, the database engine cannot provide the highest degree of parallelism for the index operation. However, some operators, such as sorting and scanning, can still benefit from parallel execution.

Important 

SQL Server 2005 Enterprise Edition supports parallel index operations

Only SQL Server 2005 Enterprise Edition supports parallel index operations.

At execution time, SQLOS determines whether the current system workload and configuration information previously described enable parallel execution. If parallel execution is warranted, the database engine determines the optimal number of threads and spreads the execution of the parallel plan across those threads. When a query or index operation starts executing on multiple threads for parallel execution, the same number of threads is used until the operation is completed. The database engine re-examines the optimal number of thread decisions every time an execution plan is retrieved from the procedure cache. For example, one execution of a query can result in the use of a serial plan, a later execution of the same query can result in a parallel plan using three threads, and a third execution can result in a parallel plan using four threads.

The query below, which uses the sys.dm_ DMV, provides details for the tasks that are assigned to specific schedulers and can be used to determine the current state of queries that have been parallelized. (When a query is parallelized, there will be multiple entries in the result set with the same session_id and request_id values.)

 SELECT    request_id,    session_id,    task_address,    task_state,    context_switches_count,    pending_io_count,    pending_io_byte_count,    pending_io_byte_average,    scheduler_id,    exec_context_id,    worker_address,    host_address FROM sys.dm_os_tasks    WHERE session_id > 50 ORDER BY session_id, request_id; 

The output of this query is shown in Figure B-10.

image from book
Figure B-10: Querying SQLOS to determine the degree of parallelism of currently executing queries

In some cases, queries that have been parallelized can perform more poorly than the same query running in a serial fashion. The easiest way to determine whether parallelism is causing system performance issues is to follow the normal performance tuning procedures described in Chapter 8, and look for CXPacket waits. If a specific SPID is experiencing CXPacket waits, run the query above and note the statement that is executing on that SPID. If it is determined that a specific statement or batch is causing a problem with CXPacket waits, use the MAXDOP 1 query option for that batch and execute it again to see if it performs better.

As discussed earlier in this appendix, all user access to data in SQL Server occurs through the buffer cache. When a query is executed, the query engine first checks to see whether the data needed is already present in the buffer cache. If it isn't present, a request is made for the storage engine to read the data into buffer cache. Because the buffer cache is a finite resource, SQLOS must manage the buffer cache very carefully. Developers can assist SQLOS by controlling how their applications access data (for example, group access to similar objects as much as possible), which can help minimize data movement within the buffer cache. SQLOS provides a mechanism in the form of the sys.dm_os_buffer_descriptors DMV that enables developers to peer into buffer cache and determine which objects currently have pages in buffer cache. The following query will provide a list of all objects and the number of pages from that object currently in buffer cache:

 SELECT count(*)AS cached_pages_count     ,name ,index_id FROM sys.dm_os_buffer_descriptors AS bd     INNER JOIN     (         SELECT object_name(object_id) AS name             ,index_id ,allocation_unit_id         FROM sys.allocation_units AS au             INNER JOIN sys.partitions AS p                 ON au.container_id = p.hobt_id                     AND (au.type = 1 OR au.type = 3)         UNION ALL         SELECT object_name(object_id) AS name             ,index_id, allocation_unit_id         FROM sys.allocation_units AS au             INNER JOIN sys.partitions AS p                 ON au.container_id = p.hobt_id                     AND au.type = 2     ) AS obj         ON bd.allocation_unit_id = obj.allocation_unit_id WHERE database_id = db_id() GROUP BY name, index_id ORDER BY cached_pages_count DESC; 

The output of this query is shown in Figure B-11.

image from book
Figure B-11: Determining the objects that have data currently stored in buffer cache

Having a solid understanding of how database objects are used relative to buffer cache can have a major impact on how developers build applications to interact with data. This is especially important on systems that have limited memory.




MCITP Self-Paced Training Kit Exam 70-442  .Designing and Optimizing Data Access by Using Microsoft SQL Server 2005
MCITP Self-Paced Training Kit Exam 70-442 .Designing and Optimizing Data Access by Using Microsoft SQL Server 2005
ISBN: 073562383X
EAN: N/A
Year: 2007
Pages: 162

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net