The SQL Server Engine | Inside Microsoft SQL Server 2000

The SQL Server engine supports a variety of demanding applications, such as online transaction processing (OLTP) and decision-support applications. At the core of its decision-support capabilities is Transact-SQL, Microsoft's version of Structured Query Language (SQL). Beneath this query language are the components that support transaction processing and recoverability.

Transact-SQL

SQL is a well-known, widely used data access language. Every mainstream database management system (DBMS) product implements SQL in some way. Transact-SQL (often referred to as T-SQL) is a powerful and unique superset of the SQL standard.

The SQL SELECT statement provides tremendous power and flexibility for retrieving information. Using SELECT, data from multiple tables can be easily chosen and combined and the results returned in tabular format.

Take a look at the following two tables from the pubs sample database. (This database, which I use in many examples in this book, is installed when SQL Server is installed. For brevity, I'll sometimes use an abbreviated amount of the data, as in this example.)

 publishers Table pub_id       pub_name                city          state 0736         New Moon Books          Boston        MA 0877         Binnet & Hardley        Washington    DC 1389         Algodata Infosystems    Berkeley      CA titles Table title_id    title                                    pub_id BU1032      The Busy Executive's Database Guide      1389 BU1111      Cooking with Computers: Surreptitious    1389             Balance Sheets BU2075      You Can Combat Computer Stress!          0736 BU7832      Straight Talk About Computers            1389 MC2222      Silicon Valley Gastronomic Treats        0877 MC3021      The Gourmet Microwave                    0877 MC3026      The Psychology of Computer Cooking       0877

The following simple SELECT statement logically joins the titles and publishers tables to retrieve the book titles with their corresponding publishers:

 SELECT title, pub_name, city, state  FROM titles JOIN publishers    ON titles.pub_id = publishers.pub_id

Here's the result:

 title                             pub_name               city       state -----                             --------               ----       ----- The Busy Executive's Database     Algodata Infosystems   Berkeley      CA Guide Cooking with Computers:           Algodata Infosystems   Berkeley      CA Surreptitious Balance Sheets You Can Combat Computer Stress!   New Moon Books         Boston        MA Straight Talk About Computers     Algodata Infosystems   Berkeley      CA Silicon Valley Gastronomic        Binnet & Hardley       Washington    DC Treats The Gourmet Microwave             Binnet & Hardley       Washington    DC The Psychology of Computer        Binnet & Hardley       Washington    DC Cooking

The preceding query, a simple SQL statement, shows that standard SQL provides a powerful way to query and manipulate data. (In Chapters 7 and 10, I'll discuss SQL queries in much greater depth.)

The Transact-SQL language is compliant with the American National Standards Institute (ANSI) SQL-92 standard at the entry level. It offers considerably more power because of its unique extensions to the standard.

Transact-SQL extensions

Transact-SQL provides a number of capabilities that extend beyond typical implementations of SQL. These capabilities allow you to easily and efficiently write queries that are difficult to write in standard SQL. You can, for example, embed additional SELECT statements in the SELECT list, and you can drill into a result set by further selecting data directly from a SELECT statement, a feature known as a derived table. Transact-SQL provides many system functions for dealing with strings (for finding substrings and so on), for converting datatypes, and for manipulating and formatting date information.

Transact-SQL also provides mathematical operations such as returning the square root of a number. In addition, you can use special operators such as CUBE and ROLLUP to efficiently perform multidimensional analysis at the database server, where the analysis can be optimized as part of the execution plan of a query. You can use the CASE expression to easily make complex conditional substitutions in the SELECT statement. Multidimensional (sometimes referred to as OLAP, or online analytical processing) operators, such as CUBE, and conditional expressions, such as CASE, are especially useful in implementing data warehousing solutions with SQL Server.

The query optimizer

In Transact-SQL, a cost-based query optimizer determines the likely best way to access data. This automatic optimization allows you to concentrate on defining your query criteria rather than defining how the query should be executed. For example, this nonprocedural approach eliminates the need for you to know which indexes exist and which, if any, should be used. Would it be more efficient to incur additional I/Os to read index pages in addition to data pages, or would it be better just to scan the data and then sort it? The optimizer automatically, invisibly, and efficiently resolves these types of important questions for you.

The query optimizer maintains statistics about the volume and dispersion of data, which it then uses to determine the execution plan that will most likely work best for the operation requested. Because a cost-based optimizer is by definition probability-based, an application might want to override the query optimizer in some special cases. In your application, you can specify optimizer hints that direct the choice of execution plan. In addition, you can use one of SQL Server's SHOWPLAN options to explain the chosen execution plan and provide insight into why it was chosen—this is information that allows you to fine-tune the application and database design.

The programmable server

Transact-SQL provides programming constructs—such as variables, conditional operations (IF-THEN-ELSE), and looping—that can dramatically simplify application development by allowing you to use a simple SQL script rather than a third-generation programming language (3GL). These branching and looping constructs can dramatically improve performance in a client/server environment by eliminating the need for network conversations. Minimizing network latency is an important aspect of maximizing client/server application performance. For example, instead of having the server return a value that the client needs to evaluate in order to issue a subsequent request, you can build conditional logic directly into the SQL batch file so that the routine is evaluated and executed entirely at the server.

You can use Transact-SQL to write complex batches of SQL statements. (A batch of SQL statements in a complex application can potentially be hundreds, or even thousands, of lines long.) An important capability of SQL Server 2000 is the T-SQL Debugger, which allows you to use SQL Query Analyzer to fully debug Transact-SQL routines, including stepping through the statements, setting breakpoints, and setting watchpoints on Transact-SQL variables.

Stored procedures and functions

Simply put, stored procedures and functions are collections of SQL statements stored in a SQL Server database. You can write complex queries and transactions as stored procedures and then invoke them directly from the front-end application. Whenever an ad hoc SQL command is sent to a database server for processing, the server must parse the command, check its syntax for sense, determine whether the requester has the permissions necessary to execute the command, and formulate an optimal execution plan to process the request. Stored procedures and functions execute faster than batches of dynamic SQL statements, sometimes dramatically faster, because they eliminate the need to reparse and reoptimize the requests each time they're executed. SQL Server supports stored procedures that let you store groups of compiled SQL statements on the server for later recall, to limit the overhead when the procedures are subsequently executed.

Stored procedures and functions differ from ad hoc SQL statements and from batches of SQL statements in that they're checked for syntax and compiled only the first time they're executed. SQL Server stores this compiled version in its memory cache and then uses the cached, compiled version to process subsequent calls, which results in faster execution times. Stored procedures and functions can also accept parameters, so they can be used by multiple applications using different input data.

Stored procedures and functions differ from each other in how they're used in your code. Stored procedures perform actions and usually contain all the error checking associated with that action. Many developers prefer to have each procedure perform a single, well-defined action. For example, a stored procedure might be written to insert a new order into the orders table, and the data must be checked for validity before the actual insert occurs. A function, on the other hand, returns a value, and that value can be used anywhere that any SQL statement can use a single value. For example, you can use a function in the list of items to return in the SELECT clause, you can compare it to other values in a WHERE clause, you can use it to generate a value for insertion in the VALUES clause of an INSERT statement, or you can use it to supply a parameter to a stored procedure or to another function. In addition, functions in SQL Server 2000 can return a table-valued result that you can use in a FROM clause. I'll discuss stored procedures and functions in detail in Chapter 11.

You can think of stored procedures as basic building blocks in database application development. Even if stored procedures provided no performance advantage (which, of course, they do), you'd still have a compelling reason to use them: they provide an important layer of insulation from changes in business practices. Suppose, for example, that a retailer maintains a mailing list for catalog distribution, which is used by multiple applications. After the applications are deployed, a change in criteria and logic (that is, the business rules) occurs, which affects which customers should automatically receive new catalogs. If the business rules were programmed directly into the company's applications, every application would need to be modified—probably an expensive and time-consuming operation. Furthermore, if multiple developers worked on the applications, the rules might not have been programmed with exactly the same semantics by every programmer.

A stored procedure, on the other hand, can be modified once, in seconds, at the server. An application need not be changed or even recompiled. The next time the application executes the stored procedure, the new rules are in place automatically. In addition to providing a performance advantage, stored procedures can provide an important security function. By granting users access to a stored procedure but not to the underlying tables, you can allow them to access or manipulate data only in the way prescribed by the stored procedure.

Extended stored procedures

Extended stored procedures, which are unique to SQL Server, allow you to extend the programming capabilities provided by Transact-SQL and to access resources outside of SQL Server. Messaging integration, security integration, the ability to write Hypertext Markup Language (HTML) files (files formatted for use on the Internet), and much of the power of SQL Enterprise Manager (discussed later in this chapter) are all implemented using extended stored procedures. You can create extended stored procedures as external dynamic link libraries (DLLs). For example, you can write a DLL to establish a modem connection, dial the ACME Credit Service, and return a status value indicating credit approval or rejection. Also, you can build extended stored procedures using the Microsoft Open Data Services (ODS) API, which lets your stored procedures return self-describing result sets to the calling client applications, just as a normal stored procedure would.

Extended stored procedures allow even Microsoft to extend SQL Server. Good engineering practices dictate that if code doesn't benefit from being shared, it should be segregated and isolated. With this principle in mind, Microsoft added integration with messaging via the Messaging API (MAPI) as a set of extended stored procedures (xp_sendmail, xp_readmail, and so on) instead of directly modifying the SQL Server engine. Extended stored procedures allow you to add powerful features without risk of disrupting the core server engine. You can thus add more features quickly, with less risk of destabilizing the server. And because an extended procedure's DLL is loaded only if the procedure is executed, the memory footprint of SQL Server doesn't grow for services that aren't being used.