Performance and Scalability Optimizations

for RuBoard

If your organization is like most others, the ADO.NET applications you'll develop will get their data from a relational database, such as SQL Server or Oracle. As a result, the performance and scalability of your applications depends not only on the efficiency of code you write, but also (and more importantly) on the techniques you employ to actually retrieve and maintain the data. This section details those techniques enumerated in several categories, starting with those you can apply closest to the data and moving outward to the managed code.

Query Techniques

This set of techniques can be applied to the actual statements executed against the data store and can be used to increase the raw performance of an application.

Ask for the Appropriate Rows

Probably the most immediate way you can increase the performance of your applications is to ask for only the data that's absolutely required. Many applications spend a lot of time querying for data that the user will never see, but that "needs" to be available in case the user wants to scroll to it. Constantly retrieving hundreds or thousands of rows not only puts undue strain on the database server that needs to fulfill the request, but also consumes unnecessary network bandwidth and resources on the middle- tier component or Web server that must, for example, populate the DataSet and store it in memory. Generally speaking, most applications never need to query more than 50 rows at a time.

Note

The techniques we discussed on Day 3, "Working with DataSets," for filtering DataSet objects using a DataView and the Select and Find methods should then be applied to these smaller and more appropriate result sets.

As a result, you should employ WHERE clauses in almost all your SELECT statements and return only enough rows to satisfy the immediate need of the user. You can then requery the database if and only if the user requires more data. As we discussed on Day 16, "ADO.NET in the Presentation Services Tier," in ASP.NET applications, you can take advantage of the custom paging behavior of the DataGrid control to execute statements that return only the relevant data for the particular page.

A second point that should be stressed is that you should remember that applications typically perform better when they ask for summarized data directly from the database server rather than retrieving all the detail rows and then summarizing them on the middle-tier server or client machine. Not only is the database server optimized for just these kinds of operations, but retrieving summarized data also decreases both the network bandwidth required and the resource requirements on the middle tier or client.

Ask for the Appropriate Columns

As a corollary to the previous technique, you should also ask for only the columns that the application requires. In other words, use specific column lists in your SELECT statements rather than simply using SELECT * . This not only makes your code easier to read because it's more explicit, but also ensures that the database server needn't go to the extra work of retrieving columns that will never be displayed or manipulated.

This technique is particularly important when your database server supports BLOB or long text columns that might be stored in separate data structures from the rest of the row. Asking for those columns when they're not needed simply wastes CPU cycles on the database server as the data is found and retrieved. Further, some database servers such as SQL Server support covered queries that can greatly increase performance. In a covered query, the database server needn't access the actual rows in the table on the disk if the query asks for only data found in the index the server is using to satisfy the query. And so, coupled with the appropriate use of indexes, asking for only those columns that are required can shorten the amount of time the database needs to fulfill the query.

Perform Joins on the Server

One of the things that relational database engines are particularly efficient at is joining tables together based on foreign key relationships. This is the case because in a normalized database, joins will often be required to display a complete logical record that is stored in multiple tables.

However, as you learned on Day 4, "DataSet Internals," in ADO.NET, you also have the ability to retrieve multiple sets of data and place each of them in a DataTable within a DataSet . Then you can use DataRelation objects to relate the tables in order to traverse the relationships programmatically, or show the relationship graphically in controls such as the Windows Forms DataGrid .

Although it's possible to use DataSet objects in this way, you should do so only when the data needs to be displayed in a master/detail relationship or is retrieved from multiple data sources and combined in the same DataSet . For most applications, it will be more efficient and simpler to join the related tables on the server using a JOIN or WHERE clause, and then read the joined result set directly into a DataSet or through a data reader. This is the case because the database server can use indexes on the server to join the related rows quickly, multiple result sets needn't be returned to the client, and only one result set need be loaded or traversed.

Note

Of course, if you're going to update the data through a DataSet , the insert, update, and delete commands for the data adapter, along with their SourceColumn properties, need to be aware of which columns will be used to update which tables.

Database Techniques

This second set of techniques applies to higher-level design issues you can apply to the database itself. These are in addition to the database design issues we discussed on Day 13, "Working with SQL Server."

Make Sure That Indexes Are Employed

Many times during the development and testing phases of a project, developers and testers will use small data sets for their application. Although this is convenient and often necessary due to hardware limitations, it can mask performance problems that appear only when the application is put into production. Chief among these problems is the time it takes the database server to retrieve or update rows in larger tables.

For a database server to efficiently access data in large tables, it relies on separate data structures (indexes) that are easily navigated and that typically point to the actual data in the rows. Without an appropriate index for a particular query, the server resorts to examining each row in the table. This is referred to as a table scan. Obviously, in tables with hundreds of thousands or millions of rows, table scans will be extremely slow, even on the best hardware available.

Note

As we discussed on Day 13, SQL Server allows you to create both clustered and nonclustered indexes. The latter is a separate data structure that points to the row data, whereas the former actually organizes the entire table on the disk into one big index. Clustered indexes are appropriate when you typically access a particular table in ranges (such as when the BETWEEN clause is used) or when the results are typically ordered using an ORDER BY clause.

As a result, even if everything else in your application is as efficient as possible, not using indexes will ruin its performance because the time your application spends waiting for data far exceeds any other single task.

To make sure that your queries and statements use indexes, follow the suggestions we discussed on Day 13. These included tips on determining which columns to index, along with the tools provided by the database server, such as the Index Tuning Wizard and the SET SHOWPLAN ON statement in SQL Server.

Use Stored Procedures

As outlined on Day 13, employing stored procedures or other constructs that encapsulate SQL statements usually increases performance. This is because the procedures are precompiled on the server, result in less network traffic, and make it easier to create the appropriate indexes for your applications. Even if you don't use stored procedures, using standardized SQL statements and sp_executesql stored procedure in SQL Server also increases performance by allowing SQL Server to reuse statements saved in its cache.

Favor Output Parameters over `DataSet` Objects

On Day 10, "Using Commands," you learned how to catch both return values and output parameters from stored procedures. Because output parameters can typically use any data type supported by the database server, and also because you can have multiple output parameters for a single stored procedure, they can be used to return a single value or a related set of values to a client.

Although the code required to use output parameters is slightly more complex on both the client and the server, using them instead of returning a single row result set in a DataSet results in better performance. This is the case because the server needn't create a result set and the overhead of populating a DataSet object is eliminated. However, in SQL Server, using output parameters performs almost identically to using the ExecuteScalar method of the command object to retrieve a single value or to retrieve a single row with a data reader. However, this might not be the case with all database servers, so performing a few tests is warranted.

Tip

Remember that you can pass the SingleRow value of the CommandBehavior enumeration to the ExecuteReader method of the command object. .NET Data Providers can then use this information to optimize retrieval of the data. The OleDb provider does this by using the IRow rather than the IRowset interface to bind the data.

Managed Code Techniques

The final set of techniques applies to the managed code you write and can be used to optimize performance on the middle-tier server.

Use Data Readers

As documented in the article "Performance Comparison: Data Access Techniques" published on the MSDN Web site, data readers typically offer the highest throughput (measured in requests per second) and lowest response time. In fact, the more rows that are returned, the greater the advantage (almost doubling the throughput) the data reader has. Data readers also lessen the resource load on the middle-tier server because the values aren't stored in memory as with a DataSet . As a result, in applications in which the data access code lives in the same application domain as the client code that uses it, you should consider returning data readers through the data services tier to allow clients to stream through data quickly.

Tip

Keep in mind that if you return a data reader, you should pass the CloseConnection value of the CommandBehavior enumeration to the ExecuteReader method to allow the client to close the connection automatically when the data reader is closed.

However, using data readers efficiently implies that you close them as quickly as possible in order to release connections back to the pool. This is because they tie up database connections while they're being processed .

Use Narrow Providers Where Possible

Because narrow providers like SqlClient can use native protocols to communicate with their data stores, they'll likely provide the fastest access to the data store. This can increase both performance (because individual clients will experience better response times) and scalability (applications will scale because resources such as connections are released more quickly). In fact, much of the impetus for creating the ProviderFactory and DataFactory classes on Days 17, "ADO.NET in the Data Services Tier," and 18, "Building a Data Factory," resulted from the desire to use a single code base and employ narrow providers rather than resort to using a broad provider such as OleDb.

Focusing only on the provider when writing managed code stands in sharp contrast to the past, when the only way to achieve better performance was to use a lower-level language and interfaces like C++ with the OLE DB COM interfaces instead of the combination of VB 6.0 and ADO 2.x. Because all managed code, regardless of language, ultimately compiles to intermediate language and is JIT compiled, there are no performance differences between the languages themselves . The interchangeable use of VB and C# in this book attests to that fact.

Minimize the Number of Round Trips

One of the problems developers unknowingly and often ran into when using ADO 2.x was that it was easy to incur extra round trips to the database server by inadvertently opening server-side cursors . In fact, in many cases, the client code would execute a stored procedure against the database server for each and every row returned to the client. Obviously, this situation hurts both scalability and performance because the time required to retrieve even small result sets is multiplied by the latency of the network and connections are being used longer than is necessary.

Because ADO.NET doesn't expose a server-side data access model, this particular problem is avoided. However, the same rule applies to the explicit code you write as well. In other words, you should always strive to minimize the number of times you request data from the data store because the time it takes to make a round trip to the data store is typically among the highest-cost operations your applications will perform. In addition, it decreases scalability by using more resources on the database server. One way to avoid incurring extra round trips is to take advantage of various caching techniques on the middle-tier server or client machine. At the same time, this must be balanced against consuming an undue amount of resources on the middle-tier by, for example, creating large DataSet objects and storing them in session state. Usually, only careful testing will reveal the proper balance for a particular application.

Note

The need to minimize round trips is one of the reasons (along with the fact that command builders don't use stored procedures) I recommend you don't use command builders such as the OleDbCommandBuilder . When the Update method of the data adapter is called, command builders must make a round trip to the data store to discover the schema information of the base table so that it can create the insert, update, and delete commands.

Always Use Connection Pooling

One of the best ways to increase scalability (although not performance) of an application is to make sure to take advantage of connection pooling. As we discussed on Day 10, both the SqlClient and OleDb providers allow connections to be stored and retrieved from a pool. This increases scalability because the expensive operation of constantly creating new connection objects is avoided. Although dependent on the design of the application, you would typically expect an application that used connection pooling to consume only one database connection for every five or more concurrent users.

Tip

To monitor the number of connections both in and out of the pools for SqlClient, you can use the Performance Monitor utility found in the Administrative Tools group and select the counters under the .NET CLR Data performance object.

Keep in mind that for connection pooling to work, the connection string you use for your application must be identical and that all connections must be made with the same security credentials. Connection pooling is explicitly turned on for both SqlClient and OleDb, although it can be disabled by setting the Pooling attribute to false in the SqlClient connection string and the OLE_DB_SERVICES to “2 in the OleDb connection string.

Take Advantage of ASP.NET Caching

For ASP.NET applications, one of the most effective ways to increase the responsiveness and scalability of an application is to take advantage of its caching engine. In an ASP.NET application, you can take advantage of the cache using page and fragment caching as well as manipulating the cache directly.

Both page and fragment caching can be implemented using the OutputCache page directive in a Web form or user control in order to cache the output for the entire page or merely the portion rendered by the user control. This is the technique used in Listing 19.5, and, as you might imagine, is most useful when the page or control is built using an expensive database query.

Note

Although not used in Listing 19.5, the VaryByParam attribute of the directive is particularly useful for page and user control content generated from database queries because queries typically accept parameters. In addition, the directive supports VaryByHeader and VaryByCustom attributes to support caching for particular HTTP headers, browser types, and even custom information you specify.

If you need to have more granular control of the elements in the cache, you can manipulate the cache programmatically using the System.Web.Caching.Cache object exposed through the Cache property of the HttpContext object. Using this property, you can add objects ”such as DataGrid controls and DataSet objects ”to the cache. You even have the capability through the Add method or overloads on the Insert method to age-out its items, invoke a callback method when an item is removed, and create dependencies between items.

For example, to add a DataGrid control called dgTitles to the cache and then pull it from the cache if it is already populated , you could use the following snippet of C# code in the Load event of the page:

 if (this.Context.Cache["dgTitles"] == null) {    // populate the grid    this.Context.Cache.Insert("dgTitles",dgTitles,null); } else {    dgTitles = this.Context.Cache["dgTitles"]; }

Note

Keep in mind that the ASP.NET cache is not tied to a particular user as is the session object. As a result, you would need to append user-specific information to the key name if you want to store user-specific data.

Use Multithreading for Long Queries

Although ADO.NET doesn't support multithreaded operations natively, it's relatively simple to add multithreading to an application that uses ADO.NET. This is particularly effective for Windows Forms “based applications when you know that you'll have long-running queries that can be completed while the user performs some other useful work.

In the .NET Framework, there are two primary means of doing work on multiple threads: using the Thread class and using asynchronous delegates.

Using the `Thread` Class

The Thread class can be found in the System.Threading namespace and enables you to spawn and control threads explicitly. For example, if you want to execute a method called GetTitles on a background thread, you could use the following code snippet:

 Thread tTitles = new Thread(new ThreadStart(this.GetTitles)); tTitles.Name = "TitlesQuery"; tTitles.Priority = ThreadPriority.BelowNormal; tTitles.Start(); //Foreground thread is free

In this case, an instance of the ThreadStart delegate that points to the GetTitles method is passed to the constructor of the Thread object tTitles . The object can then have its properties, such as Name and Priority , set before invoking the Start method.

The Name property allows the thread's name to show up in the Debug Location toolbar in the VS .NET IDE when in debug mode, whereas the Priority property requests that the operating system run the thread at one of five priorities set using the ThreadPriority enumeration.

Note

In VB, you would use the AddressOf keyword to point to the GetTitles method in the constructor of Thread so that you wouldn't have to create the ThreadStart delegate explicitly.

After the Start method is invoked, the foreground thread is free to continue other work. Because the thread can still be referenced with the tTitles variable, however, it can be manipulated using its Suspend , Resume , Abort , Interrupt , and Join methods. The Join method is particularly interesting because it blocks the foreground thread until the other thread completes or a specified amount of time has elapsed. In this way, you can synchronize the activities of the foreground and background threads.

Using Delegates

The second technique for doing work on multiple threads is built into the .NET Framework and uses delegates to handle thread management and underlying infrastructure. This asynchronous delegate pattern is found throughout the .NET Framework.

The interesting aspect of this model is that it's client driven. This means that clients determine whether to call a method synchronously or asynchronously, either using methods provided by the server class (the class doing the work) or through an asynchronous delegate. For example, classes in the .NET Framework ”such as FileStream and the proxy class generated by VS .NET for calling a Web service ”provide Begin and End methods in addition to the standard methods that provide the functionality. These methods can be used to start and finish an asynchronous call. In the case of FileStream , this means that it supports both BeginRead and BeginWrite methods that can be used asynchronously, in addition to Read and Write methods that are used synchronously. Typical examples of where Begin and End methods already exist in the framework include network IO, remoting, and messaging.

However, if the class you want to call doesn't support Begin and End methods, you can still call one of its methods asynchronously by simply creating and invoking a delegate. This would be the case when you simply want to call a method like GetTitles of a data access class asynchronously. To do so, you can use the BeginInvoke and EndInvoke methods exposed by the delegate to call the method asynchronously.

Whether you're using server classes that expose Begin and End methods or calling a method asynchronously through a delegate, the pattern is basically the same.

First, the caller creates a delegate of type AsyncCallback that will point to the method that will be called when the asynchronous operations finish. This provides notification that the thread has completed its work. For example, if you want to call the GetTitles method asynchronously, you could be notified using a method called GetTitlesCallback like so:

 AsyncCallback cb = new AsyncCallback(this.GetTitlesDone);

Next, the client code invokes the asynchronous operation and passes it the delegate either through the Begin method exposed by the server class or the BeginInvoke method if you're working with a class that doesn't expose them. If the data access class that contains the GetTitles method doesn't support the Begin and End methods, you would use a delegate as shown in the following code snippet:

 //declared at the class level public delegate void GetTitlesAsync(string publisher); //in a method GetTitlesAsync dgetTitles = new GetTitlesAsync(this.GetTitles); dgetTitles.BeginInvoke("Sams",cb,null);

Note that the delegate is declared at the class level of the client code and can specify arguments passed to the method.

Note

Passing an argument to the delegate points out a key difference between using delegates and using threads explicitly. It's more difficult to pass arguments to an explicit thread because the ThreadStart delegate can only point to methods that don't take arguments. As a result, you'd have to resort to using thread local storage (TLS) to pass the thread values ”a topic beyond the scope of this book. For more information, see Chapter 12 of my book Building Distributed Applications with Visual Basic .NET , published by Sams.

The delegate is then instantiated in the calling code and executed using the BeginInvoke method of the delegate. Note that the arguments to the GetTitles method, in this case "Sams," are passed first. They are followed by the AsyncCallback object and a state object that can be populated with any other state information you want to pass to the method. At this point, the thread that called BeginInvoke will be free to perform other operations as the execution of the GetTitles method continues on a separate thread managed by the common language runtime.

Finally, when the operation completes, the GetTitlesDone method will be called through the delegate, as shown in Listing 20.1.

Listing 20.1 Using a callback. This method is called by the asynchronous delegate when the operation completes.

 public void GetTitlesDone(IAsyncResult ar) {     // Extract the delegate from the AsyncResult.     AsyncResult result = (AsyncResult)ar;     GetTitlesAsync gt = (GetTitlesAsync)result.AsyncDelegate;     // End the operation     gt.EndInvoke(ar);     //Switch back to main thread before updating UI     MethodInvoker mi = new MethodInvoker(Form1.UpdateUI); }

In Listing 20.1, you'll notice that the GetTitlesDone method must accept one parameter of type IAsyncResult . The IAsyncResult interface exposes several methods that can be called to determine whether the operation completed asynchronously, to retrieve the state passed to the Begin method, and even to poll or wait for the completion of the operation. In this case, the ar object is cast to an AsyncResult object from the System.Runtime.Remoting.Messaging namespace. The AsyncResult object then exposes the delegate that originally started the operation in its AsyncDelegate property. By casting to the GetTitlesAsync delegate, it can be retrieved and its EndInvoke method called to signal the completion of the operation. If the GetTitles method returns a value or used arguments passed by reference, they could be retrieved through the EndInvoke method as well.

After the operation is complete, control can be returned to the foreground thread. In a Windows Forms application, you might first want to update the user interface by calling a custom method such as UpdateUI on the form. To do so, you can use the MethodInvoker delegate of the Windows.Forms namespace as shown in Listing 20.1.

Although the pattern shown here is the one you'll most often use with asynchronous operations, the Begin methods can be called without the AsyncCallback delegate by passing in a null value ( Nothing in VB .NET). In this case, you'll need to poll for the completion of the operation using the IsCompleted method of the IAsyncResult interface returned from the BeginInvoke method, simply call the EndInvoke method (which will block the current thread until the operation completes), or use the WaitHandle returned from the AsyncWaitHandle property of the IAsyncResult interface to wait for completion. In the first and third cases, you'll then need to explicitly call the End method.

Caution

No matter which technique you use, you should be careful not to introduce locking contention into your application. In other words, ideally , each thread should access only objects and data private to it. If that can't be avoided, you can use the Monitor class in the System.Threading namespace to provide synchronized access to objects.

for RuBoard

Query Techniques

Ask for the Appropriate Rows

Ask for the Appropriate Columns

Perform Joins on the Server

Database Techniques

Make Sure That Indexes Are Employed

Use Stored Procedures

Favor Output Parameters over DataSet Objects

Managed Code Techniques

Use Data Readers

Use Narrow Providers Where Possible

Minimize the Number of Round Trips

Always Use Connection Pooling

Take Advantage of ASP.NET Caching

Use Multithreading for Long Queries

Using the Thread Class

Using Delegates

Listing 20.1 Using a callback. This method is called by the asynchronous delegate when the operation completes.

Favor Output Parameters over `DataSet` Objects

Using the `Thread` Class