Database Issues | Designing Enterprise Applications with Microsoft Visual Basic .NET (Pro-Developer)

I l @ ve RuBoard

Databases are critical to most applications these days. Problems with database access occur most frequently in a small number of areas. These problems are not always easy to track down, but they can have catastrophic effects on the performance of any application. For the most part, I'll focus on using the SqlClient namespace data classes, for two reasons. First, they provide performance counters that enable you to track the state of the data layer. Second, I'm talking about high-performance applications that are likely to be using an industrial-level database system such as Microsoft SQL Server anyway.

Database Connection Leaks and Connection Pooling

Connection pooling is a critical part of the data access infrastructure of modern applications. The general idea behind pooling is that establishing connections with a database is an expensive process, especially when that database will most likely exist on another system. Consider a Web application that deals with a high volume of incoming requests . It will likely talk to the one database or set of databases. Creating a new database connection for each incoming request is not especially efficient. A connection pool allows the application to reuse existing database connections that aren't in use but are still open. The application can therefore open a minimal number of database connections to satisfy a much larger set of incoming requests.

So how does this work? The connection pool mechanism comes into play when you call the Open method on a connection object. The Open logic first checks to see whether a connection pool exists for the specified connection string. If a connection pool doesn't already exist, a new connection will be created, opened, and inserted into a new connection pool. If a pool does exist, Open will look for an available open connection. If it finds one, Open will succeed. If no open connections are available but space is available in the pool, a new connection will be opened and added to the pool. On the other hand, if there are no available open connections and no empty slots in the pool, the Open method will wait up to 30 seconds for a chance at a connection in the pool. If one is not made available in that time, an InvalidOperationException will be generated. Figure 11-1 illustrates this process using the SqlConnection pooling system.

Figure 11-1. SqlConnection pooling.

graphics/f11pn01.jpg

That covers the process of opening connections, but you need to know three additional facts about SqlConnection pools:

Pools are limited to a finite number of open connections. You can specify a custom pool size in your connection string (this applies to the SqlConnection but not the OleDbConnection ), but it will always be finite and I don't recommend changing the default size unless absolutely necessary.
Pools are based on connection strings. Different connection strings will generate separate connection pools, each with its own connection limit.
Calling Close or Dispose on a SqlConnection object will return the open connection to the pool. If you don't call Close or Dispose on the connection, it will eventually be destroyed by the garbage collector and will not be returned to the connection pool. This will also cause what I call a "connection leak." (I'll get into that a little bit later on.)

Making the Most of Connection Pools

Three main rules will allow you to make the most of connection pooling in your applications:

Always use the same connection string to talk to the same database. This will ensure that all of those connections end up in the same pool.
Always call Close or Dispose on your SqlConnection object. This will ensure that the connection ends up back in the connection pool.
Open your connections as late as possible and close them as soon as possible. This will increase connection churn and should help avoid timeouts with other threads that are waiting for available connections.

This is all well and good, but if you have a large application that's performing poorly, how can you tell whether the problem is related to pooling or to connection leaks? Read on.

Tracking Down Connection Leaks

Frankly, tracking down connection leaks can be a real pain. I spent days trying to find the cause of performance and scalability problems in a Web application that were actually caused by several connection leaks. The problem was that the application ran fine for a single user , but as additional users added stress, the application suddenly hit a wall. The worst part was that the application didn't really look like it was working all that hard, nor did the database. (In fact, processor use was next to nothing on both boxes.)

Symptoms of a potential connection leak

You should look out for four major symptoms when you test your application under various loads. If any of the following scenarios is true, you might have one or more connection leaks:

Database connections continually increase under light application stress until the connection pool size limit is reached.
Under a constant but heavier load, your application starts creating more and more connections, eventually hitting the pool size limit.
Your application is throwing timeout exceptions from the connection's Open method.
Your application runs fine for a single user but quickly croaks as more users are added.
Your application is running slowly under stress and is exhibiting low processor usage.

Avoiding connection leaks

The simplest way to avoid connection leaks is to always close your connections. Easy, right? OK. Let's think through a scenario. You've gone through your application and have closed all of the connections explicitly. Great. But you notice that, under stress, your application still exhibits some connection leak behavior (albeit at a much slower rate). What have you done wrong? It turns out that closing a connection isn't always so simple. Look at the following example:

 DimconnAsNewSqlConnection(connectionString) conn.Open() DoStuff(conn) conn.Close()

Is this a form of potential connection leak? You bet! The problem is in the DoStuff section. Let's say an exception is thrown ”which is certainly not out of the question. Will Close be called? No. The reference to the connection object will be lost, and it will be up to the garbage collector to clean up, when it gets around to it. So this code can be deceiving. It seems to be doing all of the right things, but it doesn't handle itself well in an error situation. Some might say that you should just fix DoStuff , but that misses the point. Bad things always happen in applications. As developers, we all need to plan for success as well as failure. In this case, the exception might be thrown by invalid input (from a hacker, maybe). If that input is repeated, it might cause a denial of service attack because your application will eventually run out of available connections and stall.

The best way out of this situation is to use a Try...Finally block to ensure that your connections will always be closed, regardless of what happens. In this way, your code will behave as expected in the case of either success or failure. The following example demonstrates this:

 DimconnAsNewSqlConnection(connectionString) Try conn.Open() DoStuff(conn) Finally conn.Close() EndTry

Note that I don't open the connection in the Try block ”because I don't care about the Open method failing. If it fails, no connection will be opened, so I don't need to close it. This doesn't mean that you can't nest Try blocks to deal with an exception. In that case, you might try something like the following:

 DimconnAsNewSqlConnection(connectionString) Try conn.Open() Try DoStuff(conn) Finally conn.Close() EndTry CatchexAsException Debug.WriteLine(ex.ToString()) EndTry

Using a DataReader with a Stored Proc Whenever Possible

Microsoft ADO.NET provides many ways of getting data from a database to your application ”too many ways. For the best performance, you should use only database stored procedures on the server and use a DataReader only in your application. The DataReader is the most efficient way to get data because it reads the incoming information "right off the wire." It is a forward-only data model but has minimal overhead. Essentially, the DataReader processes only the current record from the database connection. Each call to Read causes the DataReader to process only the next immediate row of data and throw away the previous row. You can't get much more efficient than that.

 DimconnAsNewSqlConnection(connString) DimcmdAsNewSqlCommand("MyStoredProc",conn) DimrdrAsSqlDataReader conn.Open() Try rdr=cmd.ExecuteReader() Whilerdr.Read() Console.WriteLine(rdr(0)) EndWhile rdr.Close() Finally conn.Close() EndTry

The other data models in ADO.NET all require a lot more processing overhead. Of course, they're also more flexible. But for sheer performance, you can't beat a DataReader .

Using Ordinals Instead of Column Names

One of the most common performance mistakes made in database applications is using column names to access row column data. We've all created programs that have done this, at one time or another. (This is mostly a classic ADO bad practice, but ADO.NET makes it just as easy to do.) The following example should feel fairly familiar to most readers:

 Whiledr.Read() sb.AppendFormat("CustomerID='{0}' ",dr("CustomerID")) sb.AppendFormat("CompanyName='{0}' ",dr("CompanyName")) sb.AppendFormat("ContactName='{0}' ",dr("ContactName")) sb.AppendFormat("ContactTitle='{0}'",dr("ContactTitle")) sb.AppendFormat("Address='{0}' ",dr("Address")) sb.AppendFormat("City='{0}' ",dr("City")) sb.AppendFormat("Region='{0}' ",dr("Region")) sb.AppendFormat("PostalCode='{0}' ",dr("PostalCode")) sb.AppendFormat("Country='{0}' ",dr("Country")) sb.AppendFormat("Phone='{0}' ",dr("Phone")) sb.AppendFormat("Fax='{0}'{1}",dr("Fax"),vbCrLf) EndWhile

Using strings to access column data is a horribly bad idea. The most efficient and appropriate method is to use the actual column numbers (ordinals) to access the data. This is extremely efficient and is a huge time-saver. It frees ADO.NET from having to look up the column name every time. The following example demonstrates this:

 Whiledr.Read() sb.AppendFormat("CustomerID='{0}' ",dr(0)) sb.AppendFormat("CompanyName='{0}'",dr(1)) sb.AppendFormat("ContactName='{0}' ",dr(2)) sb.AppendFormat("ContactTitle='{0}' ",dr(3)) sb.AppendFormat("Address='{0}' ",dr(4)) sb.AppendFormat("City='{0}' ",dr(5)) sb.AppendFormat("Region='{0}' ",dr(6)) sb.AppendFormat("PostalCode='{0}' ",dr(7)) sb.AppendFormat("Country='{0}' ",dr(8)) sb.AppendFormat("Phone='{0}' ",dr(9)) sb.AppendFormat("Fax='{0}'{1}",dr(10),vbCrLf) EndWhile

The one major problem with the example is that you run into problems if someone changes the column ordering of the database query. If a column is added in the middle of the column collection or two columns are switched, bad things will happen. So intuitively this feels more fragile, albeit faster, than using the column names. Using column names at least affords you the flexibility of not knowing the exact column positions in advance. Thankfully, a happy medium is available: IDataRecord.GetOrdinal , which is implemented by both SqlDataReader and OleDbDataReader .

Essentially, the IDataRecord.GetOrdinal method allows you to perform an ordinal lookup for a column name. You can store the value of an ordinal in a variable once and then reuse it in a loop. This is advantageous because it allows for column position flexibility, while at the same time minimizing the use of column name lookups. So this really is the best of both worlds . The following example demonstrates this:

 DimCustomerIDAsInteger=dr.GetOrdinal("CustomerID") DimCompanyNameAsInteger=dr.GetOrdinal("CompanyName") DimContactNameAsInteger=dr.GetOrdinal("ContactName") DimContactTitleAsInteger=dr.GetOrdinal("ContactTitle") DimAddressAsInteger=dr.GetOrdinal("Address") DimCityAsInteger=dr.GetOrdinal("City") DimRegionAsInteger=dr.GetOrdinal("Region") DimPostalCodeAsInteger=dr.GetOrdinal("PostalCode") DimCountryAsInteger=dr.GetOrdinal("Country") DimPhoneAsInteger=dr.GetOrdinal("Phone") DimFaxAsInteger=dr.GetOrdinal("Fax") Whiledr.Read() sb.AppendFormat("CustomerID='{0}' ",dr(CustomerID)) sb.AppendFormat("CompanyName='{0}'",dr(CompanyName)) sb.AppendFormat("ContactName='{0}'",dr(ContactName)) sb.AppendFormat("ContactTitle='{0}'",dr(ContactTitle)) sb.AppendFormat("Address='{0}' ",dr(Address)) sb.AppendFormat("City='{0}' ",dr(City)) sb.AppendFormat("Region='{0}' ",dr(Region)) sb.AppendFormat("PostalCode='{0}' ",dr(PostalCode)) sb.AppendFormat("Country='{0}' ",dr(Country)) sb.AppendFormat("Phone='{0}' ",dr(Phone)) sb.AppendFormat("Fax='{0}'{1}",dr(Fax),vbCrLf) EndWhile

Yet again, we see a situation in which more code can actually be faster. The first example had less code but required more work at run time, whereas the last example minimized the actual amount of work done by paying a one-time, up-front penalty (looking up the column names) instead of paying for it each time through the loop.

I l @ ve RuBoard