Managing SQL Server Connections | Hitchhikers Guide to Visual Studio and SQL Server: Best Practice Architectures and Examples, 7th Edition (Microsoft Windows Server System Series)

To access the data or manage the objects in SQL Server, your application or the tools you use have to open a link to the networked or local SQL Server instance's serviceassuming you've already installed the server. This link is called a "connection," and it's typically implemented behind the scenes by exposing specific TCP and UDP ports (that you can specify by number, if necessary).

One of the most troublesome aspects of working with SQL Server can be establishing and managing one or more connectionsthat's why I devote all of Chapter 9 to connecting and connection issues. Why? Well, SQL Server protects itself and your data by preventing unauthorized users from connecting, so it's designed to guard against casual access (so to speak). These security constraints on Windows, the network, and SQL Server (in several places) prevent you and everyone else from gaining unfettered access to the server and the data unless you have proper authenticated credentials.

Until SQL Server 2005, you couldn't just use a program to point to a SQL Server .MDF file and open it. You still can't open a .MDF file and make much sense out of itonly SQL Server can open and manage its binary data files. However, all it takes is physical access to the .MDF to compromise its datasimply transport it to another system containing a friendly SQL Server instanceand....

Before I go on, let's make a few terms clear. When an application (any code anywhere) opens a connection to SQL Server (or any DBMS data source), it's considered to be a "client." This application can be embedded in a Windows Forms application, in a middle-tier XML Web Service, in an ASP application, or on a hand-held PC connected to a milking machine. "Client/server" applications are considered to be those programs (like Windows Forms applications) that open connections and provide interactive user interfaces, and are not browser-based. That is, the application runs on an independent computer separate from the system that hosts the DBMS engine (like SQL Server). A "server" is the system that hosts SQL Server or other shared DBMS. Yes, a server can share files and it's then dubbed a "file server." SQL Server gets its name from the fact that it provides Structured Query Language (SQL) "services" to the network.

Understanding the Data Access Interface Role

In Chapter 1, I showed you how Microsoft has exposed a number of data access interfaces. These include DB-Library, ODBC, DAO, RDO, OLE DB, ADO, and now ADO.NET. This, of course assumes that Microsoft has not created another TLA since this book hit the streets. For good or bad, each of these interfaces requires a separate (and distinct) "stack" of DLLs to implement their functionality. This has meant developers must install the correct stack for the application they're deployingone that matches the interfaces they used when the application was developed and tested. As I have said, when forcing the application to "upgrade" an existing stack on the target system, developers saw an alarming number of incidents where existing code misbehaved (to be kind). Most notorious of these is the MDAC stack, which is used to support OLE DB, ODBC, DAO, and ADO (classic). Over the years, its constituent parts have changed, but it has (at one time or another) contained the JET engine, OLE DB drivers for a variety of backends, ODBC bridge providers, and several other sundry DLLs.

Understanding the Protocols

For the most part, you needn't be concerned about which protocol is being used to transport TDS packets to SQL Server. If you do care, ADO.NET and the data access interfaces support four different protocols. Each has its "best use" scenario and limitations:

TCP/IP: The "Transmission Control Protocol/Internet Protocol" is the standard (open source) protocol used to move data over the Internet. It addresses source and destination locations using a TCP/IP address and can be routed anywhereexcept parts of rural Leavenworth County, Kansas. This protocol is the most flexible but not the most efficient.
Named Pipes: This protocol is used to move data over local area networks in a Windows domain. It requires that the host server (hosting SQL Server) be capable of creating a named pipe interfaceonly NT-class operating systems are so capable. This means Windows 9x or ME cannot expose SQL Server instances using named pipesyou'll have to use NT, XP, or Windows 2003 Server (or better), or TCP/IP protocol. Peter tells me that Named Pipes is evil, as it's not really a routable protocolwho am I to disagree?
Shared Memory: This protocol is used to link SQL Server within client memory on the same system. It bypasses the "network" layers so it's the fastest due to the lower overhead. By default, this is the only protocol SQL Server 2005 instances support. All other protocols have to be specifically enabled.
VIA: "Virtual Interface Adaptor" protocol. Used on more recent NT (and later) OS builds, this protocol is designed to work with specific VIA LAN hardware^[9].
^[9] VI Protocol: See www.windowsitpro.com/Article/ArticleID/16097/16097.html

As I discuss coding the ADO.NET connection string in Chapter 9, I'll show how to choose the protocol that makes most sense.

Opening a Connection

To open a connection to SQL Server, ADO.NET is used to capture a number of parameters and pass these on to the .NET provider. These include the TCP/IP location or DNS name of the SQL Server, the instance being addressed, appropriate security credentials, and a number of optional configuration settings. These parameters are placed in the ADO.NET SqlConnection object property called the ConnectionString and referenced when you execute the ADO.NET Connection.Open method. That's it. When creating a connection from one of the tools such as SQL Server Management Studio or Express Manager, you provide the same parameters via an interactive dialog (as shown in Figure 2.22). The mechanism behind the scenes is the same.

Figure 2.22. Opening a connection to a SQL Server instance using SQL Server Management Studio.

There are a dozen or more other optional parameters (expressed as key ; value pairs) that help your application point to a specific user database on the server; name your application; specify the I/O port, the protocol, and the data cache size; or change how the data access layers manage the connection and the connection pool for you. I'll discuss how to manage these options in ADO.NET in Chapter 9.

When SQL Server is asked to open a connection, the client layers are forced to load (or locate) specific DLLs, network libraries, and other support structures; locate the server (machine) using the Active Directory and the DNS; and address a specific SQL Server instance on the network based on the Windows domain name, (or an IP address) and TCP/UDP ports, and the proddings of a good acupuncturist. Once the server and instance are located (of course, each server machine can support multiple SQL Server instances), it's queried to see if it can support another connection (based on license and other restrictions). If a connection is available, it's assigned to the data access interface requesting the connection and an agent (worker thread) is started on the host SQL Server instance to manage any queries requested by the application at the other end of the connection. This can take quite some timeespecially the first time, as DLLs and other overhead tasks must be initialized on first use. After that, subsequent connections can be established more quickly, but it's still a relatively expensive process.

When trying to open a connection for the first time, if you find it takes too long to open the server using a DNS (machine) name, you might be having problems with a slow (or non-existent) DNS server. Try putting the machine name along with its hard-coded TCP/IP address in the hosts file.

The next time your application has to open a connection to the same server, only some of these steps need to be repeated. However, each time you open a connection, the supplied credentials are revalidatedeven if they have been validated a thousand times in the last five minutes. This takes timeeven if you're fetching a connection from the connection pool. However, many of the other steps have already been done, so connecting for the second time can be far fasterassuming the pool hasn't already been exhausted. See Chapter 9 for more details on how the connection pool worksor is supposed to work.

Unfortunately, SQL Server is not particularly informative when it complains that it can't connect. Part of the problem is the TDS packets that the device drivers and providers return, and part of the problem is how the connection process is handled behind the scenes. For whatever reason, when you don't get connected, you're likely to get a less-than-useful message that says that something went wrong. Fortunately, the .NET Framework and SQL Server 2005 have improved the quality and usefulness of these messages. While they might be hard to decipher, they usually contain several clues as to why you didn't get connected. For example, when trying to create a new SQL Server registration in SQL Server Management Studio, the dialog shown in Figure 2.23 was exposed. This more-or-less generic message says that the problem could be the server name, login credentials, or database chosen. To get more information, I clicked on "Show advanced information".

Figure 2.23. Initial exception message in SQL Server Management Studio register server dialog.

This exposes a dialog that provides a lot more (albeit complex) information about the exception. Here, you can see which interface (TCP/IP, named pipes, shared memory or via) was being used (named pipes, in this case) to open the connection and (roughly) what when wrongas shown in Figure 2.24. I can see that the network path was not found. This means the network could not resolve the name of the server.

Figure 2.24. The advanced information dialog provides details about the connection exception.

Since I tried to connect using a server name, I have to figure out why the named server could not be found. The first thing I usually do is "ping" the server by name using the Ping utility from a command prompt, as shown in Figure 2.25. In this case, I can see that the server name resolved to an IP address. This address was found in either the hosts file or the directory name service (DNS). However, ping could not get a response from the server. This means the server itself or the network link to the server is down.

Figure 2.25. Using the ping command-line utility to test for the presence of a server on the network.

At this point, you need to determine:

Is the server name correctly spelled? Server names are not case-sensitive, but you have to spell the name correctly.
If an IP address is used to address the server, does the address include the correct port number using the syntax xxx.xxx.xxx.xxx.:pppp, where pppp is the port number?
If you're using a server name with no instance name, is SQL Server installed on the server as the default instance?
If you're using either "(local)" or "." to reference the local server, is SQL Server installed as the default instance on the local system? This syntax won't work with named instances unless you include the instance name using the ".\instancename" syntax.
If the SQL Server is installed as a named instance, is the name included along with the server name using the server\instance syntax?
Are you connecting to a local instance or a network instance?
Is the network link to the server up? If you're using a VPN connection to expose the remote server, is it enabled?
Does the ISA server on the server-end of the VPN permit dialup access to the server?
Is the server configured to expose its I/O ports (usually 1433, but not always)?
Is the server running the SQL Browser service so that it can be seen on the LAN?
Assuming the server is found on the network with ping, does the server recognize the credentials provided?
Is there a login account that matches the credentials? This might be a role that contains Windows domain groups or individual SQL Server logins.
If the credentials are recognized, do they have rights to the default database as assigned by the system administrator when the Login account was set up?

In Chapter 9, I'll add to this list as I discuss connecting to SQL Server from ADO.NET.

Debugging a Connection

I've provided a flowchart (shown in Figure 2.26) to help you walk through the process of opening a connection. As you can see, there are a quite a few issues to keep in mind. Notice how SQL Server uses (or demands use of) specific data access transport providers such as TCP/IP or named pipes. I also show what to do at each step to help build the ConnectionString or to diagnose what went wrong. While this chart won't help find all of the possible issues you'll encounter over your lifetime as a developer, I expect it to help get connected in the majority of the cases you'll have to deal with.

Figure 2.26. Flowchart showing how to debug a connection.

Using Connection Pooling

In an attempt to make the process of opening connections less expensive (after the first connection), many years ago, Microsoft implemented a technology called "Connection Pooling". It's still alive and kicking in the 2.0 version of the .NET Framework. ADO.NET 2.0 permits even tighter control over this automated connection management interface. Basically, the connection pool is a software mechanism that runs on the client (it's not managed by SQL Server) that's implemented by the data access provider. That means the ODBC, OLE DB, and .NET drivers and providers each implement connection pooling on their owneach has its own properties and setup options. When I get to the ADO.NET SqlConnection object in Chapter 9, I'll show that the SqlClient .NET data provider has a wealth of properties, counters, and methods to monitor, control, and even disable the connection pool.

Each process gets its own connection pool. In the case of Windows applications, that means each application gets its own pooleven if 10 instances of the same application are running on the same system. In ASP/ASP.NET applications, each application domain gets its own pool. This means multiple instances of ASP pages share a common poolas long as the ConnectionString remains the same^[10]. When an application requests a new connection (using the Open method), the following operations take place:

^[10] There are a number of other factors that determine when additional pools are created; these are discussed in Chapter 9.

The connection pooling mechanism searches the pool for a matching connection string (that contains the credentials and connection settings).
If a match is not found, a new connection is established. This process involves authenticating the credentials, locating the server, passing the credentials to the server for revalidation, locating the default database, and authenticating access by the given login.
The server creates an "agent" to handle the connection.
The connection pooling mechanism passes back a pointer to the new connection to data provider.

When another connection is needed, the process is repeated. If the existing connection is still in use, the process is repeated and another connection is added to the pool. When your code closes a connection, the connection pooling mechanism marks the pooled connection as "dormant" so it's available when your application attempts to open the connection again. After four to eight minutes^[11], the connection is automatically dropped from the pool and the "real" connection to SQL Server is closed. When this happens, the server releases all of the resources owned by the connection. Note that the connection pooling mechanism does not close the connection to the server when ADO.NET executes the Close methodit's simply returned to a pool.

^[11] No, you can't programmatically alter this dormant to closed timeout period.

When your application attempts to establish an identical connection using the ADO.NET Open method (using an identical connection string), it's simply passed to the requesting application, thus saving the cost of establishing a new connection. If your connection string changes (as when you change the user credentials, default database, or any other setting), you'll get a new connection pool to manage other connections using the same connection string.

Before the application is passed the "used" connection, the connection pooling mechanism revalidates the credentials. In addition, when a reopen is requested, the connection pooling mechanism resets the connection (during the first operation on the connection), so any resources held by the previous owner are released, so your application can inherit a clean connection each timethis also takes time. It turns out that since SQL Server 2000, the reset request is tied to first request to use the connection and does not take a separate round-trip. The problem with this approach is that the resources can be held on the server for several minutes (48), which is a very long time. Basically, this means reconnecting is not freein some (many) cases, it will be faster to leave the connection open for the lifetime of the application. That's what I often recommend for Windows Forms/Smart Client applications.

Managing Multiple Active Resultsets (MARS)

SQL Server can support hundreds to thousands of connections. In earlier versions, each connection could perform only one operation at a time. That is, you could execute SQL that returned a set of rows, but you could not use the connection to do something else until all the rows had been fetched and the connection was freed for reuse. It's like using an old-fashioned two-way radio. You could talk or listen, but not both at the same time. This is called a half-duplex connection. SQL Server 2005 supports new technology called Multiple Active Result Sets (MARS). This permits you to use the same connection to perform one, two, or more operations at the same timea full-duplex connection. Whether or not this makes sense (since individual connections have other benefits) is TBD. Microsoft added this functionality to help developers create applications that can more easily work against either SQL Server or Oracle (which has supported multiple operations per connection for some time). As you'll discover after having read Chapter 9, I'm no fan of MARSor the third moon of Jupiter.

Choosing a Connection Strategy

When working with Windows forms applications, you might consider opening a connection and leaving it open for the lifetime of the application. This is especially useful, as when using this approach, you'll be able to manage server state, including building and reusing temporary resultsets or global variables as well as server-side cursors (that you'll have to create manually). The downside to this approach (in some folks' mind) is that each connection consumes resources (RAM and CPU time) on the server. More connections mean less RAM available for the cache that helps SQL Server performanceeach connection costs about 40K. However, this problem can be addressed by adding more RAM to the server. This approach does have its limitsI expect an average server would be hard pressed to support more than several hundred users with an average application. Since the vast majority of SQL Server systems support hundreds instead of thousands of users, this should not be a problem. As I discussed a little earlier, it's not free to close and reopen connections, as the connection pooling mechanism still has to revalidate and reset the connection on each open.

When working with other architectures such as IIS/ASP applications, you can't get away with leaving the connection openyour ASP page does not live that long and applications that hold open connections are impractical (but not impossible) to write and hard to scale. Almost universally, these applications launch a selected executable page, open a connection, execute a query, close the connection, and wait to be reactivated to post the changes to the server. In this case, connections are opened and closed like the turnstiles on a subway platformoften thousands of times a minute. Ordinarily, this would slow the server down quite a bit, as the process of opening a connection is fairly complex and expensive, but the connection pooling mechanism reduces the overhead of this connection thrashing to a minimum. Because the connection is not left open, ASP applications can't depend on SQL Server server-side statethey can't expect to build and reuse temporary tables or cursors as they can with Windows Forms applications. However, this type of application scales up far higher than a typical Windows Forms application. They also expose the database to anyone on the planet, so they need especially tight security.

SQL Server speaks one and only one language, "Tabular Data Stream" (TDS). This is a proprietary protocol that encodes the conversations between all clients and the server. Each version of SQL Server uses a subtly different version of TDS, but the server is capable of recognizing older versions and executing their requests. At one time, Sybase's version of SQL Server used TDS, but it was abandoned many years ago, so you won't be able to use TDS to communicate with current versions of Sybase SQL Server. TDS dictates several aspects of the data that's sent and received from SQL Server. For example, when the data access layer (like ADO) needs to retrieve the results of a query, it has to wait until all rowsets (as generated from SELECT statements) are returned before the RETURN value or OUTPUT parameters are returned. TDS also determines how exceptions are handled, how MARS works (or doesn't), and many other factors that are really outside of the control of ADO.NET (or any other data access interface to SQL Server).

Managing Server State

As your applications get more sophisticated, you'll find that you might be able to help performance or tune the way your queries are executed by managing server state yourself. Server state is simply the settings and objects owned by the connection on the server. For example, one of the easiest changes you can make is to set or change the initial catalog/default database. When working with ADO.NET (or any data access interface), you can set this value when you connect by adding "Database=<your default database name>" to the ConnectionString property. Yes, there are other settings you can make using the ConnectionString. You can also create other objects on the server that are persisted as long as your connection remains open. These include temporary tables, server-side cursors, pessimistic locks, and various settings used to control the behavior of the T-SQL compiler and SQL Server itself. These settings and temporary objects can play a major role in how SQL Server handles dates, locking, query execution, transactions, arithmetic issues, and much more.

Some of the settings you can change have an impact on all connectionsnot just yours. This means you should not make changes to these settings until you fully understand their impact. Generally, when a stored procedure or trigger makes a change to a setting (by executing the SET statement), the setting is rolled back after the procedure is done. This is also the case when the query is executed with sp_executesql (as ADO.NET often does). There are a couple of exceptions: SET ANSI_NULLS and SET QUOTED_IDENTIFIERthese settings are persisted and cannot be changed in a stored procedure. In some cases, these settings impact not just your database, but all databases on the server. See "SET (described)" in BOL^[12] for more information.

^[12] BOL: SQL Server documentation (Books Online).

The connection pooling mechanism rolls back any connection-owned settings and objects (like temporary tables and cursors) when the connection is reset on reuse. Actually, cursors are closed and their resources are released when the client executes the Close method. Why is this important? Well, if you want to create your own temporary (#Temp) tables or server-side cursors (ADO.NET won't do it for you), you'll need to remember that these objects are owned by the connection and live only as long as the connection. In addition, these objects consume resources on the serverand that limits scalability. If you need to support fewer users than a public web site selling cheap drugs, SQL Server can easily handle a couple hundred server-side cursors and #Temp tablesassuming you don't overload them with too many rows.