Overview of the Core Architectures | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

Experienced SQL Server users (with at least versions 7.0 and 2000 under their belts) can breeze through this overview because I have written it for newcomers with two objectives in mind. First, it should be regarded as an orientation to get you to a level that will give you a better grasp of the fuller discussions of the architecture coming later. Second, during this overview you will be looking at the central design themes introduced earlier in more depth, and this overview will help you appreciate the new power you now have. I also point out several new features you can zero in on if you intend to breeze through this section.

You will also notice that I discuss several key architectures of SQL Server, but this is not the whole story SQL Server is composed of number of collateral technologies and architectures, many of them lengthy and complex subjects themselves, that you will learn about in later chapters. And these include the replication architecture, the I/O architecture, the query processing architecture, language support, and much more.

The Database Architecture

SQL Server’s database architecture is the subject of the next chapter, so I will save the interesting (and complex) stuff until then. However, to get a good footing in SQL Server, you should understand the basic database structure as early as possible, and this is why I go direct to the database architecture in Chapter 2.

SQL Server is not a difficult product to comprehend, even though (and I agree) it is often considered the most complex server product in Microsoft’s server offering. I have managed all of the Microsoft server products in busy production environments and data centers, and not one other server product presents the challenge and technological “high” that SQL Server does.

I consider this to be the case in large part due to the three levels at which an IT professional interacts with and services the product: the DBA level, the SQL Server developer level, and the client applications developer level (touched upon in Part III). This is the only server product in the Microsoft server lineup in which the system administrator of a high-end, mission-critical system requires a lot more on the job than to be certified in bells and whistles. He or she needs to have a solid grounding in relational database theory and more than a smattering of programming skills. Quite frankly, if you want to be the best DBA in the company, you need to know about the server platform, such as Windows Server 2003; its hardware needs, such as storage, memory, and processing requirements; networking (for replication, clustered servers, and so on); security; disaster recovery; administration; SQL and Transact-SQL; relational theory; object-oriented design (the basics)…and you should be proficient in at least one mainstream .NET computer language, such as C#.

The SQL Server 2005 relational database engine comprises a number of highly scalable and well-tested components for storing, manipulating, analyzing, and accessing data. With the number of systems in use in the world running into the multimillions and the number of certified DBAs well into the six figures, SQL Server 2005 is most certainly proven technology.

However, it is also a brilliantly elegant product. It is by no means as convoluted as it appears, which is not the case for many of its competitors. While you can deploy SQL Server in the most demanding and complex situations, it also has the lowest barrier to entry of all the database products on the market. This is due in part to its architecture, and in large part to the huge installed base, which is supported by many books such as this one, dozens of devotee Web sites, and user groups in just about every major city in the world.

The SQL Server 2005 database engine stores your data in tables. Each of the tables holds data that represents information of interest and value to your organization. This data consists of a collection of classes of information (called domains) that, when understood as a whole, represent a unique and complete record of data in the database. For example, a row holding a social security number is not itself a complete record, according to your model, without somehow being linked to the name of a person. That person’s name can be in the same row of data, in a value representing the domain last names, or in another table in the same database. The two rows taken together represent the record of your data, even though its values are spread among different tables.

Collections of tables-and there can be thousands of them-are stored in a container object known as a SQL Server database. Each database also stores objects that pertain to each database and its tables, including stored procedures, triggers, and so on.

A collection of databases can be managed as a unit by a single instance of SQL Server. System-wide management features include functions such as backup/restore, logins and security, replication, and data transformation.

The DBMS or Administration Architecture

SQL Server ships with a rich and powerful set of administrative tools. These are the tools that DBAs use to manage databases, user access, backups and restores, and other administrative functions. Microsoft has published the full APIs to these tools, enabling independent software developers to provide turnkey solutions that use SQL Server as their storage database. A computer telephony application is an example of a turnkey application that would typically include many automated administrative features, so that actual hands-on administrative involvement on the part of end-user database owners would be minimized or obviated. The chapters in Part II cover all the DBA tools and then some.

The Application Development Architecture

If you are a database applications developer (the client developer or SQL Server developer), you have a wide choice of APIs and technologies to use in programming SQL Server. If you are building applications that depend on SQL Server databases for their data stores, or that need to obtain the functionality of the SQL Server database engine and its other components, you can use a number of interfaces and technologies.

If you are accustomed to developing to the Open Database Connectivity (ODBC) API, or have legacy code to maintain, you can code directly to it to work with SQL Server 2005. However, you can also get to ODBC functionality via the object-oriented (OO) APIs, such as ADO.NET and OLE DB. These APIs are discussed in Chapters 10 and 11.

All Microsoft’s modern OO database APIs enable you to encapsulate or encase Transact-SQL (T-SQL) statements within your code. Also, T-SQL can be used to access the ODBC extensions, even from within ADO.NET methods, and from standard ODBC applications. You can also continue to program SQL Server from the legacy object model APIs that encapsulate ODBC, such as Remote Data Objects (RDO) and Data Access Objects (DAO). However, it would make sense, for many reasons, to use the more modern ADO.NET libraries for data access.

SQL Server application development is simplest and fastest using the ADO.NET components, which can be used in any application development environment that supports .NET technology. Naturally, the easiest environment and language a novice can start with is Visual Studio 2005 and Visual C# 2005 and Visual Basic 2005.

OLE DB is more complex and typically used in old fashioned C++ applications. While it is extremely powerful, it makes little sense to extend or support direct OLE DB functionality in new development against SQL Server 2005.

To talk to SQL Server-that is, to ask it to save data, change it, return it, process it-you need to know SQL (or more accurately, Transact-SQL; but more about this in a moment). SQL is the international standards-supported computer language of the relational database. It is represented by the American National Standards Institute (ANSI) as well as the International Organization for Standardization (ISO). Its official name is the International Standard Data Base Language SQL, 1992. It is often known as SQL-92 and commonly referred to as “Sequel.”

All relational database products support a flavor of SQL that is infused with support for the individual product and its extensions. Microsoft calls its version of SQL for SQL Server “Transact-SQL,” T-SQL for short. SQL, the base language, is not a perfect language by any means; it has its share of inconsistencies and mistakes, and each product deals with them in certain ways that reflect the needs of the users. For example, SQL supports the concept of a persistent storage module (SQL/PSM), which allows stored procedures and similar functions to be grouped together where they can share and reuse programmatic data.

Note

I will use the more convenient T-SQL acronym most of the time.

With SQL Server 2000, the members of the Transact-SQL development team felt there was so little demand for adding PSM to SQL Server that they felt their time would be better spent in other badly needed areas (user-defined functions is an example, see Chapter 10, Programmability). However, with SQL Server 2005, while Transact-SQL is still not completely compliant with SQL/PSM, it goes beyond the SQL/PSM standard in its new functionality, such as the system-defined aggregate, and a variety of new data manipulation and mathematical functions.

These constructs can be referenced in stored procedures to take advantage of the proximity of the code to data. The objective: cutting down on the number of network round trips from client to server.

However, while T-SQL is extremely powerful, even more flexibility and omni-accessibility to code and data can be obtained using any one of the .NET languages that SQL Server 2005 supports. The ability to use a language like C# to write procedural code against SQL Server is perhaps the most revolutionary enhancement of the product, if not the most compelling reason to adopt it.

Yes, in SQL Server 2005, you can write procedural code in Transact-SQL, C#, J#, Visual Basic (managed) and Visual C++ (managed). Managed COBOL is also available. Everything you write in Transact-SQL, stored procedures, user-defined functions (UDFs), and triggers, can be written in one of the aforementioned languages.

Later, in the Part III chapters, we will discuss when you would choose T-SQL over C#, and the converse. Consider, however, that a UDF that executes a complex mathematical operation would be better written in C# because T-SQL is interpreted, whereas C# is compiled. In addition, T-SQL is not integrated with the .NET Framework as C# is. So it’s impossible for T-SQL triggers or stored procedures to access external resources such as files, event logs, the registry, Active Directory, Web services, and the plethora of .NET assemblies such as the complex math libraries or data manipulation classes those found in the .NET Framework’s regular expression library

If what we just discussed does not blow your mind, consider that SQL Server procedural code can also be written as extended stored procedures (XPs), using the Open Data Services API in SQL Server 2005. These special system stored procedures can call COM objects that are loaded into the SQL Server process. XPs are not as safe as .NET or T-SQL stored procedures, because the code is not subject to any SQL Server control. You can do a lot of damage with a badly written XP, in the areas of memory management and thread management. The power is there, it’s how you use it that matters. More about the “Xtreme” stuff in Chapter 10.

T-SQL is compliant with the SQL-2003 standard. It thus fully complies with SQL-92 and SQL-99’s Entry Level features and also supports a number of features from the Intermediate and Full Levels. T-SQL is further discussed in Part III, which deals exclusively with programming. The latest version of SQL is the SQL 2003 (also known as the SQL-200n) specification.

In 1999 and 2000 Microsoft promised a number of new features for SQL Server 2000 which had to be held over for SQL Server 2005. These included enhancement of the TOP clause, the PIVOT clause, and support for the common table expression (CTE). These enhancements, and many more, are now available in SQL Server 2005 and are discussed in the chapters of Part III, Programming SQL Server 2005.

The Client/Server Architecture

As explained earlier, SQL Server is a client/server database system. At home in both two-tier and multitier environments, it is capable of servicing very small databases in limited storage areas or very large databases, also known as VLDBs, in huge storage silos. A huge number of users can be connected to SQL Server at the same time, all sending queries and instructions to the server, performing inserts, updates, deletes, queries, and just about anything you can imagine doing with a database. But you can also install and operate multiple instances of SQL Server on the same computer, or your users can be querying a database under the control of instance x on one server while inserting data into a database under the control of instance y on another server. But before we get carried away, let’s also look at more modest implementations.

SQL Server, as a client/server system, can be easily implemented as a two-tiered system on a single machine. Both the SQL Server engine and the user interface are installed on the same computer, much as you would install and use Access or FoxPro applications. Frontend or user-interface database applications communicate with SQL Server via the Windows Interprocess Communications (IPC) facilities and the Shared Memory protocol (explained in Chapter 2 and Chapter 4) and not over a network.

This capability makes SQL Server 2005 ideal for single-user needs, but it is also ideal for thin-client/server environments in which all users operate in terminal sessions on the same server-class machine hosting SQL Server. When the product is installed on Windows Server 2003 in Application Server mode, your users would have access to applications that send requests to the database server engine via IPC facilities. This is a much more elegant, safer, and cheaper solution than giving every user a copy of Access (which must be licensed), or a custom application, and having every user tax the system with application-side processing overhead.

But what if you need to implement a high-end or demanding application on the local machine? A good example that does not involve human users at all, but that is one I am intimately familiar with, is computer telephony.

Computer telephony systems are not small or trivial applications. A computer telephony application often requires access to a locally installed database, mainly because network latency, even measured in milliseconds, is not an option when you have hundreds of callers on telephone lines, all waiting for service. Using file/server system architecture for a high-end application can lead to disaster.

A computer telephony system has several telephony cards in it that can answer calls on a number of channels (in some systems the number of calls all appearing at the same time can be as high as several hundred per server). A file/server database engine like Jet or the BDE (child of the old Paradox engine) cannot handle such a dramatic burst of read or write requests during peak hours, often sustained for long periods. Each request for data or write to the database is essentially a user connection to the database, the user being a process thread spun out of the computer telephony application to the database server. Some of the requests for data services can be as simple as checking an extension number, or the number of rings to wait before forwarding the call, whereas others can be very complex, for instance, recording a voice message, or accepting an incoming stream of fax data.

Internet applications are not unlike computer telephony applications from a load perspective. You might have one stock exchange server chugging along servicing a modest number of buys and sells over the Web. Then suddenly an issue goes “gold” and ten servers are not enough.

This is where SQL Server’s capabilities as a key role player in a multitier or n-tier client/ server system come to the foreground. In high-traffic Internet environments, you have situations in which zero data processing can be performed at the client, which does nothing but present data in a browser. In such situations numerous “client” processes might come into play to connect to more than one instance of SQL Server for a variety of reasons.

Transactions can be handed to legacy services like the COM+ Component Services (formerly known as Microsoft Transaction Server, or MTS) and Microsoft Message Queuing Services (MSMQ) or the new SQL Server 2005 Service Broker technology (see Chapter 17) for a variety of reasons. Such technology is key, for example, in banking applications over the Internet. Take funds transfer: First you need assurance from the server that money debited from account A gets credited to account B, and that the execution and result (success/failure) of the transaction is properly communicated to the user even over an unreliable connection like the Internet.

SQL Server’s client/server architecture does not support traditional mainframe-type load balancing for extremely high volume. Instead the work is shared between servers that hold data partitions. The servers cooperate to share the load as a server “federation,” which is a SQL Server term that means, essentially, the same thing as a server farm that allows a client to connect to any of a number of servers for the same service. (You will learn about database federations again in Chapter 13, but they and the other new features aimed at availability deserve some introduction here.)

High Availability

SQL Server 2005 benefits from the host of new scalability and reliability features of the truly mission-critical facilities of the Windows Server 2003 platform, and the client/server architecture.

A new feature is database mirroring, in which transaction logs from a source server are replicated to a single destination server. If the source system dies, your applications can immediately reconnect to the database on the destination server. The recovery instance can detect failure of the primary system and within seconds accepts connections from clients. You don’t need any special database mirroring technology or hardware to make this work, and you need no special shared storage technology or fancy hardware. I will cover database mirroring in more detail in Chapter 8.

SQL Server can scale up and scale out. On the scale-up side, one Enterprise Edition server can address up to 64GB of RAM and can take advantage of the 64-way SMP support built into its Windows Server 2003 host architecture. And for advanced fail-over in the scale-up model, SQL Server 2005 includes enhanced fail-over clustering, among eight nodes on the Windows Server 2003 platform.

Note

SQL Server clustering support has been extended to the collateral services as well, such as Analysis Services, Notification Services, and Replication.

SQL Server 2005 also does software scale-out. It operates on a high-availability model known as shared nothing federations. A federation of servers share load by distributing hits across all available SQL servers that are members of the federation. Instead of sharing physical resources, the servers share nothing. This is also known as the shared-nothing approach to clustering. Instead of servers sharing physical resources, identical, distributed, and updatable copies of the databases reside on each server, in an architecture made possible by a technique known as distributed partitioned views. All SQL Server servers are active in a federation.

The 64-bit versions of SQL Server 2005 have been optimized for the Intel Itanium processor. It fully exploits advanced memory addressing capabilities such as buffer pools, caches, and sort heaps. These new features reduce the need to perform multiple I/O operations to bring data in and out of memory from disk.

The Windows Server 2003 x64 platform provides the high performance architecture for both 32-bit and 64-bit applications. The x64 architecture comprises 64-bit extensions to the industry-standard x86 instruction set, which means your 32-bit applications can run natively on x64 processors. However, 64-bit applications are executed in 64-bit mode. Native 64-bit applications allow greater access to memory, thus speeding up numeric calculations per clock cycle by an order of magnitude. So you have a system that can handle your legacy 32-bit applications while at the same time paving the way to native 64-bit computing.

The Database Engine

The Microsoft SQL Server 2005 database engine, controlled by the SQL Server service sqlservr.exe, manages the databases and all related files and mechanisms that its instance owns on the host server. It is also responsible for processing all the T-SQL statements sent to it by client applications and processes. It also controls the execution of SQL Server components (the objects) installed in the database, such as stored procedures, triggers, and the integrity mechanisms we will discuss in Chapter 12 .

The SQL Server Database Engine is the DBMS. It is the only engine on the server that creates and manages the databases, juggles concurrent connections, enforces security, processes queries, builds and applies indexes, and so on. The database engine, which we will refer to as the DBMS from here on, is the largest and most important SQL Server entity. It is discussed in depth in the next chapter.