Understanding Basic Data Access Architectures | Hitchhikers Guide to Visual Studio and SQL Server: Best Practice Architectures and Examples, 7th Edition (Microsoft Windows Server System Series)

This section discusses several different application architectures, based on how they access data. No, not all applications need a DBMS engine (as I discussed earlier), but almost every application you design or build accesses data. This could be as simple as saving configuration settings or user preferences to a local file or as complex as extracting data from a web-based XML Web Service or a complex relational database. Each of the architectures discussed here has advantages and disadvantages. Each requires a different skill level, resource demands, and interdependencies. It's these interdependencies that can make applications especially complex. When your application starts depending on other platforms, people, and shared resources, it needs to be programmed to deal with a new set of realities. In our experience, applications most often fail because they don't adapt when their environment changeswhen business situations changes, when companies merge, when systems go down, when business gets bad, when business gets good. In other words, when stuff happens. For example, if your application depends on a specific database that becomes unavailable or simply changes in some way, your application had better be prepared to deal with this reality. No, it's unrealistic to expect your application to automatically morph itself to the new data structure, but it does make sense to use a development scheme that can more easily adapt to changing schemasespecially during the development process. Yes, object-oriented programming and strong type checking are a good start, but the question remains: What does your application do when it finds that the data structure it expects has changed, is missing, or is temporarily unavailable? What should your application do when the Net connection disappears? Perhaps, your code expects an integer but is sent a float or a long. While many development shops can't afford to write applications that deal with every possible contingency, these caveats should keep you aware of the issues and encourage you to take all the precautions you can to help recover from the application crashes that inevitably occur.

At this point, you should understand many of the issues that all database software architects facewhether they're designing an application for a team of developers or simply building a tool for themselves. The following section briefly describes each of these architectures and their principle attributes and limitations.

IMHO

The simplest application to build and maintain has the fewest number of interdependencies.

Multi-Tiered Designs

No book on application architectures would be worth much if it did not at least mention "tiered" application architecture design principles. This is not a new conceptit's been around for many decades. However, over the years, how architects view tiered architectures has evolved. We build tiered applications for many reasons. There are "good" reasons and "bad" reasons. A common "bad" reason is to complicate your application by breaking it into tiers because someone said it was the best thing to do. There are several good reasons, including separation of work, separation of logic, and database/data source independence. When you break up any application, you must create "interfaces" between the layers. In the COM days, one would create an object and expose properties and methods on that object that other tiers could consume. In the Service Oriented Architecture (SOA) world, one could create a stand-alone "service" (like an XML Web Service) that's accessed via the Simple Object Access Protocol (SOAP) protocol, which Visual Studio can help generate. Another approach is to simply move business logic into SQL Server implemented as stored procedures, rules, and triggersI've used that approach for decades. Regardless of how you expose the objects, properties (attributes), or methods of your tiers, publishing and documenting these interfaces is of critical importance. If these interfaces change (as they often do during development), the "object" consumers must be informed of this change so they can adapt, recode, recompile, and redeploy their applications.

Basically, the idea behind 2-tier, 3-tier, or "n-tiered" applications is to:

Separate the logical parts of an application to permit different groups to work on the application. For example business logic, user interface, and data access logic can be separated into separate tiers. This way, the "artsy" folks can work independently on the user interface and the "data" folks can work on the data tier logic.
Permit one or more tiers to be genericized, to permit it to be replaced at a later time. For example, one might create a UI to interface to an English-speaking user and another for a Spanish-speaking user. Another example would be to create a data access module that would permit switching between SQL Server and Oracle backend database engines.
Separate the parts of an application so they can be executed on separate processors or systems or remotely. Typically, I use this approach to improve performance and scalability by having each system execute just those tasks it is tuned to execute. For example, moving business rules to SQL Server stored procedures is a way to move logic into the "data" tier.

You can break these tiers down any way you want to, but generally they are broken down into the "presentation" (or UI) layer, the "business intelligence" (or BI) layer, and the "data" layer, as shown in Figure 1.1.

Figure 1.1. Three-tiered architecture.

Each of the architectures I discuss in this chapter can be constructed using a multi-tiered approach. Sure, some architectures and applications and development teams lend themselves better to segmented designs than others. Do multi-tiered designs make your applications execute faster? Usually notthis approach usually adds more overhead as the logic flow moves from tier to tier over the interface "bridges" between tiers. Does this approach make development easier? Well, yes and no. If you choose this approach, it means the interfaces defined between the layers must be defined early in the development process and rigidly adhered toespecially if different teams (possibly different companies) are expected to develop and implement the separate tiers independently. In other words, the teams have to be disciplined to work together. In complex projects, this approach can make the overall project easier (sometimes possible) to codeit might be the only way to address the problems.

Creating Database-Agnostic Applications

Oftimes, I hear from folks who have tried to create a multi-tiered application designed to use either SQL Server, JET/Access, Oracle, or 3 x 5 cards as the data source. They hope to be able to "simply" replace the data tier and leave the BI and UI layers intact. Not many of these succeed. It turns out that there is more to a layer than just the interfaces. While it's possible to implement each of these layers with carefully constructed classes with "standard" interfaces (properties, methods, events, and arguments), what goes on behind the scenes is also very important. Sure, it's possible to create a generic OSFA data interface that simply constructs "appropriate" SQL to return a rowset or update a data table. However, most DBMS systems do many things differentlyit's their competitive edge. If you discount the 3 x 5 card system (which has a distinct advantage in situations where the power fails), JET, SQL Server, and Oracle all implement (so-called ANSI) SQL differently. Okay, so the data tier can accommodate those differences as well as the differences in the .NET data provider, the connection string, security, and the data types each supports (or doesn't). But consider that JET, SQL Server, and Oracle all implement stored procedures very differently. Their administrative functions are also radically different, so backup, restore, compression, repair, transaction log management, index maintenance, and a dozen more issues make each backend very different. Sure, creating database-agnostic applications can be done. It's just going to be more expensive than writing dedicated applications that take advantage of the special features and sidestep the shortcomings of the backend serverespecially if you consider that very few of the desirable backends are static: Most are evolving. With the exception of JET (I hope), all of these platforms are adding new features, new paradigms, and new ways to manage data more efficiently. A database-agnostic application will be hard-pressed to incorporate these new features as they are added.

Building Stand-Alone Applications

The simplest application to build and maintain has the fewest number of interdependencies. A healthy portion of the computer applications in use today are designed to be used by only one person at a timewhat I categorize as a stand-alone application. These applications usually save data to a local file and have very few dependencies to confuse the logic. I expect there are a thousand stand-alone applications that save their data locally for every application that requires access to a DBMS of some kind. Consider that Excel is the most popular "database" in use todayeven exceeding Access or other "real" DBMS applications. Excel applications depend on one or more spreadsheet "workbook" files to maintain the data state. However, there are even more applications that simply save the application data state or user preferences to a local file or the Registry. While stand-alone applications might collect information from the user and perhaps save and retrieve data, they don't always need a sophisticated DBMS to manage their data. Stand-alone "disconnected" applications include calculators, word processors, spreadsheets, accounting applications, games, thermostats, clock radios, and many others. Stand-alone applications also include custom applications written for and by your company to deal with special dedicated, often complex operations.

How many of you have looked over the counter at the screen used by hotel clerks, car dealers, travel agents, and other salespeople? I have. Clearly, one of the most widely used paradigms in use world-wide is not a Windows forms or ASP applicationit's CICS. CICS (Customer Information Control System), is the transaction serverbased system that (predominantly) runs on IBM mainframes and uses a 3270-style vector-graphic (line-drawing) or character-based screen. I got my start in data processing working with these systems. Sure, some of these applications are hosted on Windows using 3270 emulators, but Visual Studio is nowhere to be found. I expect that many of these applications were first written in the 1970s and 1980s. Why are they still in use today? Because they work. They're reliable, fast, and well understood. Why am I mentioning this approach here? Because when it comes time to choose an application architecture, you should consider the experience these systems can bring to bear. You also are likely to have to compete with them or interface to the data they collect and query.

While you can consider that process control applications (designed to run elevators, robots, complex machines, security systems, and the six-disc CD changer in your car) are typically "single-user" stand-alone applications, I'm going to skip over this type of application for now. Perhaps I can write about process control application development in one of my EBooksit's a fascinating subject.

Another platform for stand-alone applications is personal digital assistants (PDAs). Again, this is a rapidly expanding marketwhat I call a "target-rich" platform where there are lots of potential customers and more to come over the years. I bought over $300 worth of software for my PDA (and Smart Phones) in the last couple of yearsjust don't tell my wife. No, they weren't all games. The problem with this space is lack of support by the manufacturers. I've had to buy a new cell phone each time a new version of Windows came out for the platformMotorola did not want to (or couldn't) upgrade. The techniques, limitations, and tools used to develop these applications is also fodder for another EBookand beyond the scope of this book.

Sure, a stand-alone application usually needs to retrieve, manipulate, and save data. The question is, how is this done? Can these applications justify use of even a simple DBMS system like JET or SQL Server Everywhere^[13], or can they get away with building a custom data management system on their own? Well, the answer is yesthey can do either. For decades, I've designed and written a variety of applications that don't require anything but a simple file I/O data retrieval system. Nowadays, you can save data to a file using your own custom data structures in a .NET stream, so you don't need a DBMS to persist (save) and retrieve your data. To make this even more powerful, ADO.NET (as I'll show you later in Chapter 8) permits you to build stand-alone DataTable objects and save them to a stream in an XML format. This makes "rehydration" (the process of rebuilding the data structures from a file or other external source) very simple.

^[13] Be sure to check out my EBook, Hitchhiker's Guide to SQL Server Everywhere. See the support site www.hitchhikerguides.net for more information.

IMHO

Keep in mind that XML file storage is not particularly efficientsure, it's flexible and feature-rich, but not efficient.

Keep in mind that XML file storage is not particularly efficient. Even a small block of data can expand to several times its binary size when translated and encoded to Unicode (16 bits/character) and formatted as an XML data structure. However, with systems as fast as they are today and disk space relatively cheap, this should not present a big problem for a reasonably small number of data records. I say "records" because I'm not describing a relational database, but a flat-file data storeit's not even an index sequential access method (ISAM) database like JET, as the data structure does not include any indexes to speed data retrieval. Sure, if you use the XML Document Object Model and its associated methods, you can search XML data structures and retrieve selected data on demand.

As the number of data rows you plan to store in one or more local files increases, consider that ADO.NET supports quite a few methods and properties on its data structures (the DataSet, DataTable, and DataView) that make it easy to find, sort, sift, and filter the data once it's in memory. This means you might not need to graduate to a DBMS for some time, as you can build, save, and reload a complex DataSet from a number of separate files. However, this approach means that you won't be able to do cross-table "JOIN" queries on DataSet objects as you could with a relational DBMS engine, but you will be able to define your own inter-table relationships. Once the DataSet is created and populated, you can establish ADO.NET-managed parent-child relationships on its DataTables and use all of the relational features afforded to this in-memory DataSet "database," including binding and parent-child navigation and consistency validation.

IMHO

Another advantage to the stand-alone data file approach is the lack of a DBMS engine.

Another advantage to the stand-alone data file approach is the lack of a DBMS engine. Because this type of application does not require installation of a separate DBMS engine (and its inherent compatibility, DLL Hell, and resource issues), your application should be easier to deploy and more stable once installedno matter how many times MDAC is updated. If you look closely, you'll also discover that Visual Studio supports the ability to create DataSets (even strongly typed DataSets) based on "objects" that you define as classes. In other words, you'll be able to use many of the drag-and-drop code generators even though you aren't connected to a DBMS. I'll talk about this again in Chapter 6.

Deciding How Much Data to Store

So what's a "reasonable" number of records to persist in a flat-file data store? Well, when the user starts to complain about the time it takes to get the application going (as you rehydrate your DataSet-based "database"), you had better start thinking about a different, more efficient strategy. Unfortunately, once you've build a flat-file application and the entire support infrastructure around this approach, it's harder to "upgrade" to a DBMS than if you had started with a data engine in the first place. One performance alternative might be to switch to a binary file format instead of an XML structuretrading disk I/O volume for additional saving and rehydration code. However, this approach requires quite a bit more knowledge of binary code structures and more sophisticated persistence techniques.

If you think your application might need a DBMS engine at some time in the near term, it's probably best to start out using one instead of trying to redesign six months after you deploy the new application. Because the basic architecture of a DBMS front-end application is different than an application where your own code carries much of the data access burden, you'll often find its far more expensive to switch than start out using a DBMS like MSDE or SQL Server Express. In the same way, if you think your single-user application might be required to access shared data, it's essential that you start from an architectural base that best supports this data access requirement.

Dealing with Hybrid Architectures

Sure, a number of applications are not "pure"they often have to support stand-alone modes and keep working while the DBMS is unavailable, busted, or simply offline. For example, for insurance salesmen (the ones who are not geckos) or agents who work out of the office, it's not uncommon to use a local laptop-based data store that's synchronized with the central DBMS once a day or so. In this case, it's best to build your application as a shared-data application from the beginning and include routines that fall back on a local DBMS engine like SQL Express or simply log changes to be posted later to an XML flat file as I have been discussing. When it comes time to synchronize the data, you can let SQL Server do it for you using replication, or write custom routines that rehydrate a DataSet and post the changes to the DBMS using the Update method.

Figure 1.2. Client/server architecture.

Managing Client/Server Connections

The connection is a direct channel to SQL Server from the client. It carries the Tabular Data Stream (TDS) packets to and from the SQL Server query engine. These packets are used to transport TSQL strings, which can contain a SQL SELECT, other T-SQL commands, or utility invocations (such as database maintenance [DBCC] or bulk copy [BCP] commands). Since SQL Server supports multiple T-SQL statements in a single string, you can group several T-SQL commands together (with or without intervening semicolons). Each returns its own resultset, as I discuss in Chapter 2.

In a client/server application the "client" application is generally expected to open a network "connection" to the central server and leave it open for the life of the application. A client/server application (actually, any application) can open as many connections as needed. Yes, to better leverage the number of available connections and economize on server-side resources, applications tend to close unneeded connections during idle times and reconnect as neededas when it comes time to update the data. This means the application usually maintains state locally so that it can resume from where it left off without a lot of extra work and round-trips back to the server. The server connection also persists information about the "state" of the client and manages a number of scoped resources that can be seen or manipulated only by that client connection (and no other). These resources include temporary tables (#Temp tables) and intermediate sort results in TempDb, server-side cursors, and local connection-scoped variables such as @@IDENTITY.

In the 2.0 Framework, ADO.NET exposes a new feature called Multiple Active Result Sets (MARS) that permits (for some reason) multiple use of a connection by more than one part of the application. For example, you might need to populate a local rowset with the results of a query while at the same time executing an UPDATE on another set of rows. With MARS enabled (it's off by default), you can use the same connection for both operations. Frankly, I'm of mixed mind about this "feature." First, MARS adds overhead to connection and TDS management behind the scenes. It also breaks down once you attempt to open more than 10 connections. Next, it's not that expensive to open one or more additional connectionsat least, not to SQL Server. While some other platforms make you pay dearly for more connections (like Oracle and JET), SQL Server connections are relatively cheap to open.

When a connection is "closed," all server-side resources associated with the connection are releasedat least, they are if the connection is actually closed. Because of the way that ADO.NET and the .NET data provider pool connections behind the scenes, the connection-pooling mechanism does not actually close the connection immediately. This means resources remain allocated until the connection is actually closed, times out (after 48 minutes), or is reused. However, connection pooling is not an important factor in client/server applications, so you might want to disable this functionality.

The client application is responsible for connection management. As long as the connection is open, the server periodically polls the client to ensure that the client is still breathing. This means if your client is very busy playing a game or goes to sleep (goes into suspend mode), the connection might be dropped at the server end. Any number of things can break a connection, so your client application should implement fairly robust exception handlers to deal with these contingencies. If the connection is lost, you'll need to be prepared to reconnect and recover server state, as the server will drop all server-side state that belongs to your connection.

Each connection is expected to provide some flavor of authentication credentials that permit the server to grant access to the DBMS resources your application needs (and no more). These can be a login ID and password or a request to let the Windows domain help manage client validation. Based on these credentials, SQL Server grants, prohibits, or simply hides access to specific DBMS objectsincluding the server and database itself. I discuss connection management in more depth in Chapter 9.

Client/Server Data Entry Paradigms

Generally, a client/server application exposes forms to the user used to capture parameters to fetch relevant data and display the results of the queries. The application is also designed to capture user-generated data. If the user is expected to enter data from the keyboard, the client/server application has a much heavier responsibilityto pre-validate the data. This is an essential step for several reasons. Data validation includes ensuring that the data entered meets datatype constraints (integer range checks, float/integer, numeric tests, string-length checks, and precision and scale checks) and business rule checks (allowable range, minimum, maximum, and inter-field relationships, to name a few). An extremely important reason to check user input has to do with SQL injection attacks, a technique used by hackers to gain access to or damage your DBMS and the data it protects. Later in Chapter 10, I discuss the details of this security problem, but suffice it to say here, it's very important that you not permit users to provide strings that are blindly inserted or concatenated to SQL statements to be executed on the server. Be sure to learn how to use an ADO.NET Command object before executing any queries, as this can usually defeat this (very common) type of hacker attack.

IMHO

Be sure to learn how to use an ADO.NET Command object before executing any queries that include user input, as this can usually defeat SQL injection attacks.

Sometimes (often), business rule validations can be done only by accessing lookup tables. In some cases, you might be able to download validation values from lookup tables and force the user to select from one of these pre-validated values. This is a unique advantage of the client/server architecture, as your application can easily afford to download this data. The brief time it takes to populate client-side validation tables is miniscule in comparison to the time saved by preventing server-side validation failures. Another approach to this task might be to save lookup table data in a local data store like SQLEv.

And here's the first of many catch-22 situations. How can client/server applications validate user input without hard-coding business rules? There are several suggested solutions to this quandarywhich of these are viable is another question:

Use a mechanism to fetch the business rules from the server when the application is first started. Some time back, I wrote an article^[14] describing how to do this using SQL Server extended properties. Basically, it works like this: Each object in a SQL Server database can have one or more extended properties assigned. These properties can be given any name or value and can be assigned to the database itself, tables, columns, views, or any other object. For example, you could assign "minimum", "maximum", "default", "mask", or any other property value to an individual column. With a bit of logic, you could build a state machine to validate user input. When it comes time to change these values, it's a simple matter of changing the values on the serverthe next time the application starts, it can fetch the new values. Of course, you can devise a way to signal to the applications that new parameters are available so that this client application update process can be done automatically. One technique introduced in the 2.0 Framework is client notifications, where the client application can be notified by event when a table changes. However, this approach would be problematic if events were lostas they appear to be when the engine is under heavy load. If you limit the notifications to cache sync you should have more luck.
^[14] http://www.sqlmag.com/Articles/Index.cfm?ArticleID=20886
Store the business rules in a configuration file that's managed and deployed separately from the base application. This might be a suitable way to use XML to persist and rehydrate your client-side business rules. A better solution would be to store the business rules in a SQLEv database, as illustrated in my EBook.
Since SQL Server now permits you to store XML in the database as a native column type, you might consider creating a business rules table to help manage client-side rules and other business logic. Again, you could use client notification to signal a change in these values, so the clients could fetch them as soon as something changesas long as you don't over-stress the engine.

Managing Client/Server Data Validation

It's important (and easier) to pre-validate data as it's entered instead of trying to deal with one or more server-side update failures. In many cases, the user is working with an input document or is talking to the "real" source of the dataanother person. If the user has to wait until the entire form is complete before validation errors are detected, the whole data-entry process can be slowed down and made far more difficult. If you submit an entire form at once to validate, the exceptions that are returned are often tough to trap and manage. You might also find that these exceptions do not include all of the errors, as the server-side update logic might stop at the first exception and not discover remaining exceptions until you try the update process again. This might mean you'll have to attempt the update operation two, three, or more times before it actually succeeds or the user gives up in frustration, throws the monitor through the nearest window, and crams the keyboard down the nearest toilet^[15]. One approach to deal with this problem might be to attempt "pseudo updates." That is, instead of asking the server to actually perform a complex update, you can submit a query that simply tests the validity of the data just entered. That way, when you actually execute the complete form update, you can be relatively certain that the update will succeed. These pseudo updates can be (should be) done behind the scenes without the data entry clerk even knowing that they are taking placeassuming you can master asynchronous queries. This feature is reenabled in the 2.0 Framework, as I'll discuss in Chapter 11, "Executing SqlCommand Objects."

^[15] I think it's interesting that people take their frustrations out on the monitor and keyboard, which are rarely as guilty as the computer itself.

Limiting User Choices

I'm of the opinion that most errors are caused by humanseither those programming the computers or the users entering data. No, not all problems are caused by politicians or "evil doers," but many are caused by users given access to a keyboard. As I see it, there is (approximately) a 1 in 103 chance that a user will hit the correct key when asked to type in a response. I still hear of people wanting to know the location of the "any" key ("press any key to continue"). To address this issue, I encourage (and use) "pick lists" of valid options when building data access forms. This way, users must choose from a list of valid thingslike states, sizes, options, and part numbers. Sure, when building client/server applications, I have the option of preloading or reloading these pick lists based on what the user is doing now or did yesterday, or what I expect him or her to do. For example, if the user chooses Bellevue, Washington, as the city and state, I already know the postal codes for that city. Frankly, it would be easier for us if they pick the zip/postal code first so I would immediately know that they should be entering the matching cityI actually do that for them. All the user has to do is verify that I chose the right city and state. The United States Postal Service (no, that's not an obscure Windows service) makes machine-readable postal codes available for this type of validation query^[16].

^[16] See www.usps.com/ncsc/formoreinformation.

Pick lists can also be populated based on (ever-changing) business rules using the techniques I just discussed. This way, when dealing with a special customer (one that deserves a special discount or one to which you would rather not extend any additional credit), your pick lists can adapt automatically. This means you don't have to train your clerks (as much) to deal with these special cases.

IMHO

The real test for a data-entry form is the ability to pass the "cat" test. That is, if a cat walks (or takes a nap) on your keyboard and your application doesn't fail, you've done a good job. Note that there is very little that can protect your system from other cat ejecta and bodily emissions.

Managing Client/Server Data Retrieval

Unlike some other architectures, the fact that the server is just a brief query away makes it far easier to populate browse lists and perform more complex searches. A browse list is a (relatively) long list of data rows requested by the user. A client/server application can easily cache these lists indefinitely and manage their content locallyfetching and refreshing the list as needed. For example, if a user wants to see local used car lots, they might initiate a query that returns quite a few rows. Let's assume the user does not live in Copperas Cove, Texas (where there are only 14), but in a city with several hundred dealers anxious to take your money. I think it makes sense to return no more than a few hundred rows from a query before you ask the user to narrow the search. This is not really a problem with used-car dealers, as there aren't that manyeven in bigger cities. But if the user wants to locate all of the nationwide retail outlets that sell Barbie dolls, they would have to deal with over 10,000 sources (not counting the black market outlets and guys hanging out in alleys near elementary schools selling cheap knock-offs to unsuspecting kids). In this case, the volume of data is far too much for a user to "browse"even if you used very tiny fonts. To address this issue, I offer the following words of advice:

Fetch just the rows the user needs now and no more. If the query is too broad, suggest (or anticipate) ways to narrow the search. You especially need to avoid sending rows the user does not need or throws away. When I get to the ADO sections of the book, you'll find out that it's not all that simple to discard rows once you ask for them in a query.
Limit returned rowsets to no more than a few hundred rows. If you design around smaller query result rowsets, your application will scale better, perform better, and yield happier customers.
If the user wants to visually scan a lot of rows, fetch the first couple of screens first (20100 rows) and display them immediately. In the background, fetch the next couple of screens so when the user clicks "down" to see the next set of rows, you're ready. The code behind this technique involves use of TOP expressions and a WHERE clause that helps segment the fetched rowsets into smaller groups.

Tip

Remember that SQL Server is more obedient than the androids in I Robotit will impale itself on a query that sucks every ounce of resources out of the system to return the data requested.

Managing Stored Procedures

I talked about stored procedures earlier, but I want to make sure that you understand how they can benefit client/server architecture. Remember, the idea behind client/server is to build an intelligent server and keep the client simple. Stored procedures are designed to help implement this strategy. Basically, a stored procedure is simply a program that runs on the server once invoked by the client (or another stored procedure). These can be written in T-SQL or in SQL Server 2005 in one of the CLR languages (as long as it's Visual Basic .NET or C#). Sure, these procedures can be very complexas complex as you likebut remember that this code must be executed by each client that needs the functionality, so it should be coded efficiently. SQL Server is very clever about loading and executing stored procedures, as I discuss in great depth in Chapter 5, so if several clients need the same stored procedure, the server can use a cached version already in memory.

By keeping the logic in the server, a client/server application can leverage its server-side code and not need to be redeployed when the stored procedure changesat least, if its signature does not change. The stored procedure signature is the set of input and output parameters (basically, the method parameters) required to execute the procedure. One could stretch this definition to mean the number and placement of rowsets and intermediate resultsets returned by the stored procedure. If these change, you're likely going to have to change the application that calls them as well.

I discuss invoking stored procedures in Chapter 11, but there are alluring ways to shortcut this process. A word of warning: Don't use these CommandBuilder-based approaches. They are expensive to execute at runtime and don't eliminate the need to redesign your code to deal with changes in the stored procedure logic.

Managing Client/Server Data Binding

When ADO.NET executes a query, it returns a rowset in the form of a DataReader. In a Windows Forms application, you can load a DataTable with a single line of code (as discussed in Chapter 12). If you "bind" the rowset to controls on your form, behind the scenes, ADO.NET moves that data from the rowset columns to the control's .TEXT properties. This means inbound data appears automatically in the TextBox or other "simple" bound controls (that show one row at a time) or "complex" bound controls, like the DataGridView control (that can show the entire rowset at once). In a Windows forms application, data binding is "full-duplex." That is, when changes are made to bound controls, the changes are automatically posted back to the ADO.NET DataTable (if you play your cards right). It turns out that in the 2.0 Framework, ASP.NET applications can also implement full-duplex data binding. The advantage of Windows forms data binding will be very clear once you start working with the Visual Studio 2005 IDE, which can automatically tie ADO.NET data structures (like DataSets, DataTables, and DataViews) to your form controls.

Over the last decade, Microsoft has spent quite a bit of time and talent to re-re-reinvent the VB3 Data control. This control ties a specific connection and rowset to one or more bound controls. In the pre ADO.NET days, you could position a pointer in the rowset (a Recordset in ADO classic), which reset the bound controls, posting changes in the controls to the rowset and moving data from the new "current" row to the bound controls. ADO.NET's data binding was a "start-from-scratch" effort that threw out this approach in favor of one that actually works. It seems that ADO.NET 2.0 has added yet another set of controls to the litany of ways to manage a bound rowset. As I discuss in detail in Chapter 12, developers can now choose to add a BindingNavigator to their forms to provide the scrolling features once provided by the Data control. Note that there is also a new ToolStrip control to capture the input parameters to the command being executed and provide a button control to execute the query to populate the DataSet. In this case, as shown in Figure 1.3, I'm executing a stored procedure with two input parameters.

Figure 1.3. Data binding in a Windows form application using the BindingNavigator.

These new controls can make it easier to build Windows^[17] applications that need to browse through rowsets row by row or move to the first or last row of the rowset. The BindingNavigator also permits an easy way to activate code to add or delete rowsthe IDE will even write this code for you if the SELECT statement is simple enough.

^[17] Nope, these controls aren't available for web development.

Okay, but do these drag-and-drop controls really help when you have complex data structures returned from stored procedures? Nope, not really. These controls are fine for handling the rowsets returned from simple queries. When you start returning multiple resultsets or the product of a JOIN as the rowset, many of features exposed by these controls are left behind. Do you need to expose these controls to navigate through a rowset? No, they are pretty flexible, as they permit you to disable the parts you don't think are necessary, as well as supporting a number of visual layouts that I'll examine in Chapter 7, "The Visual Studio 2005 Data Tools."

Managing Server-Side Cursors

In earlier versions of ADOc (and RDO and DAO), developers could choose from one of several different types of server-side cursors maintained on a connection-by-connection basis. The most efficient of these cursors (the keyset cursor) permits the server to build a structure on the server that contains a pointer to each of the selected live rows (instead of returning physical rows). This means when the data row changed on the server, the client application can easily (and quickly) fetch the current row contents without having to run the entire query again (which could be prohibitively expensive). Server-side cursors only really make sense for client/server applications, as the cursor is "owned" by the connectionother architectures can't depend on a persistent connection. Yes, when the connection is closed, the cursor is dropped automatically. Unfortunately, early versions of ADO.NET never implemented server-side cursors and failed to do so again in ADO 2.0. The good news is that you can implement these yourself using ANSI SQL commands. The bad news is that if you aren't careful implementing server-side cursors, you can severely limit scalability and cripple system performance. An in-depth discussion on how to build ANSI server-side cursors is provided in Appendix IV, "Creating and Managing Server-Side Cursors."

Managing Temporary Tables

Because the server can save a temporary table on a persistent connection-by-connection basis, I've often created "helper" #temp tables (in TempDB) to shortcut JOINs and permit faster lookup table performance. Suppose a user is assigned to manage a particular part of the databasethe Washington and Oregon accounts. When this user logs on, I know his or her duties include only these two states, so I can execute a complex JOIN to fetch index rows that apply to this database subset because I know that subsequent operations need this list of applicable rows. When it comes time to actually fetch rows for this user, I use the #temp table I've created by incorporating it into a TSQL IN clause. Using this approach, I've been able to reduce query JOIN time dramatically. Remember that #temp tables live as long as the connection remains openwhen the connection is closed, the #temp table is dropped. That means this strategy works only if your application is designed around a persistent connectionnot one that's opened and closed frequently.

#Temp tables created within stored procedures are visible only within the stored procedure.

The Advantages and Challenges of Client/Server Architectures

As I've seen, there are several advantages to the client/server approach:

Carefully (correctly) designed and implemented client/server applications can be lighter "thin" clients that depend on intelligent server-side data validation instead of complex client-side logic that needs periodic maintenance and redeployment. The client-side logic can also be automatically morphed to deal with changing business constraints and rules.
The application can safely and efficiently manage and depend on server state. This includes the ability to create server-side cursors, as well as temporary tables that can be used to improve performance.
Client/server applications can afford to be more interactive and more functionally aware. That is, they can be more easily adapted to the Windows Forms application metaphor, which helps maintain application statethat can also mirror server state without extra coding.
Pseudo updates are far simpler and more practical, as the server is always within easy reach for testing data validationthere's no need to open a connection to perform background operations.
Applications can be written to support one or very few users and fairly easily adapted to multi-user situations.
The load on the network is extremely low. This makes client/server architectures ideal for low-speed connection conditions and permits a higher degree of scalability without over-taxing the network.

The challenges and disadvantages of client/server architecture include the following points:

Client/server architecture is best when you connect the clients to a server on a local-area network (LAN). Due to the connectivity strategies, this approach should not be implemented on networks subject to interruption in service. This means that it's not really suitable for use on the WWW. Sure, I've implemented client/server applications on ordinary modems, but I also included several layers of exception management that helped insulate the user from the vagarities of the connection.
Client/server applications are best when you're supporting a few to a few hundred users. As you add servers, you can scale to thousands of userson a LAN. If you expect to support the population of a small country, you had best choose another approach, like ASP.
A client/server application with a UI assumes you're going to build a Windows application. The problem here is that there are a dozen different versions of Windows in general use today and more on the way. Each version and each service-pack upgrade changes the environment in which you must run. When building a .NET application, you're going to have to ensure (or install) the correct version of the framework (at least) along with your application. Unfortunately, the framework does not run on all versions of Windows, so you have to ensure that your choice of architecture matches the target systemsbefore you commit to it.
Client/server architecture is not politically popular because of its ageit's been "done." While it can out-perform ASP applications in speed, features, and flexibility, it does not appeal to those that choose to connect over the web.
Traditionally, client/server applications have been far more expensive to deploy than newer ASP approaches. However, this problem has been addressed (at least, to some extent) with the new .NET "one-touch" deployment paradigm. Deployment is always a concern when installing anything on a user's computereven with low-impact deployment schemes. This makes client/server architectures less desirable for web-based systemsbut is not as big an issue for LAN corporate network deployment, where the company can more easily control what gets installed on the user's system.

ASP Browser-Based Architectures

No, I can't say that I'm an ASP expert. While I've learned a great deal about ASP architecture over the years, I don't pretend to know how to design (or tell you how to design) ASP or ASP.NET applications. Frankly, I depend on a number of technical resources for guidance in this genrelike Peter Blackburn. Remember, my focus is and has been on data access strategies, and ASP.NET applications also consume and update data. The low-level mechanics of how this is done are different than the strategies used in Windows Forms applications, but not that different. Consider that ADO.NET was designed (primarily) to support disconnected data structuresASP.NET applications have no need for pessimistic locking, server-side cursors or other techniques that require the server to maintain per-user/connection state. The connection-pooling mechanism was implemented to help ASP.NET applications open connections more quicklya Windows Forms application opens so few connections that connection pooling does not really help (if at all). ASP.NET applications also must handle security differently. Since many of these applications are designed to expose information and data interaction to the general WWW public, it makes no sense to try to manage these "users" with domain accounts, as you can (and often should) with Windows Forms applications. Sure, there are situations where you pass the user credentials through to IIS (which hosts ASP.NET), but very few applications take that route.

As far as handling data queries and updates in an ASP.NET application, there are a number of differences. Consider that in a scalable ASP.NET application, no (or very little) state is maintained on the server or in the structures passed to and from the browser (the View State). This means if your application executes a query, the rowset must be either persisted locally as a DataTable, DataSet, or TableAdapter object (which can be expensive), or requeried each time the rows are needed (which can be moreor lessexpensive, depending on the complexity of the query and number of rows returned). In previous versions, once the data is changed in the browser and the new data values from the controls are returned to your ASP page, you had to move the data control by control to the "bound" data table columns. In ASP.NET 2.0, you can set up full-duplex data binding so your code is simplerbut still not as simple as Windows Forms applications, which maintain a strongly bound link (in memory) between the UI controls and the underlying DataTable objects.

While there are differences, the design principles you use to build successful, high-performance, scalable Windows Forms applications are basically the same. Code your queries to return just the rows and columns you need in the immediate future. While scalable (and functional) ASP applications insist on an open-query-close connection strategy, this approach can also be used in Windows Forms applicationsbut it does not have to be. As you read through the rest of the book, you'll find additional hints and strategies that can help make your ASP application support more users with less hardware.

XML Web Service Architectures

One of the newest and most widely touted architectures released with the .NET Framework is XML Web Services. Basically, a Web Service is a block of code hosted on IIS (or an HTML Endpoint) that is exposed as a class. It uses XML structures to render its object names, methods, properties, and events. It exposes these structures via SOAP. Think of Web Services as a replacement for distributed COM (DCOM).

As far as data access strategies, this approach is similar in many respects to ASP.NET. Again, it's tough to maintain data state on a block of code that's designed to serve 100 or 100,000 users an hour. It's well beyond the scope of this book to do more than point them out as we tour other parts of the Visual Studio and SQL Server landscape.