Chapter 6: Data Access in .NET | Introducing Microsoft .NET (Pro-Developer)

An’ home again, the Rio run: it’s no child’s play to go
Steamin’ to bell for fourteen days o’ snow an’ floe an’ blow—
The bergs like kelpies overside that girn an’ turn an’ shift
Whaur, grindin’ like the Mills o’ God, goes by the big South drift.
(Hail, snow an’ ice that praise the lord: I’ve met them at their work,
An’ wished we had anither route or they anither kirk.)

—Rudyard Kipling, writing on the perils of
data access, “McAndrew’s Hymn,” 1894.

Problem Background

Unlike single desktop programs, which usually deal with documents on a user’s local hard disk, essentially all distributed programs access remote data stores in some way. Remote data access is the main engine driving the phenomenal growth of the Internet—the incredible potential of easy access to data from anyone who wants to make it available. Sometimes the owner of that data charges money for the data itself. Pornographers were the first who made this business model really sing, as few users would fork over the bucks for any other type of content. Other businesses, such as the Wall Street Journal (porn for a different audience, some say) and the Oxford English Dictionary, are enjoying limited success with this model today, and mainstream music companies may eventually figure it out if they ever get their heads screwed on right. More often today, the owner of the data makes money by using the Internet’s easy access to that data to lower the friction of existing business processes, such as removing human employees from airline reservation systems or overnight package delivery tracking. Accessing remote data over the Internet is primarily why you have a PC today.

All Internet applications access remote data stores.

Once you realize that the goals of most Internet applications differ radically from those of desktop programs, you won’t be surprised to learn that we encounter different design problems when we write Internet apps. (Are you starting to see a pattern in this book?) First, the data that we want to see and perhaps change resides in many different locations and many different types of containers. I purposely selected the term data stores in this chapter’s opening sentence instead of the more narrow databases. Certainly an enormous amount of data resides in large relational database programs such as Microsoft SQL Server or Oracle9i, but the data that an Internet app uses can and often does reside in many other locations. Some of these sources will be familiar to you, and only the notion of easy remote access will be new. For example, the financial data for my current house remodeling project lives in Microsoft Excel spreadsheets and Microsoft Money files on my hard disk. I’d like my architect and contractor to be able to read these files and update them with their latest cost overruns, and I’d like my banker to be able to read them and recoil in shock before handing over the money to cover the costs. Other data sources are new, and you might not have thought of them as data sources just a year ago. For example, the April 30, 2001, Wall Street Journal carried a story about a software product that reports the status of all the remote substations of an electric power utility company using the Web as a transport and display mechanism (oooh, baby, that feels SO good).

Data stores live in many different programs in many different locations.

Naturally, the greater the number of different data sources, the more difficult becomes the task of writing client applications that access these different sources. We can’t take the time to learn different programming models for every conceivable data store: one for SQL Server, a different one for Oracle, yet another for Excel—and heaven knows what programmatic interface those electric power guys are exposing to clients. This problem is especially bad for small-scale data providers because they don’t have the clout to make clients learn their proprietary language, as some would argue that Microsoft and Oracle do. We need to have one basic programming model for accessing all types of data no matter where the data lives, otherwise we’ll spend all our development budget dealing with different data access schemes and not have any resources left for writing any code that does useful work with the data once we’ve fetched it.

We want our many sources of data to look the same to a client program.

Internet data access programming is also difficult because of the heterogeneous and nondeterministic nature of the Internet environment. When a desktop PC accesses a database file on its own hard disk—say, in an application for a small dry-cleaning business looking for a missing garment—the developer can depend on that access being fast because it uses the PC’s internal bus. On the Internet, a similar request might have to travel over congested transmission lines and wait for the attention of overloaded servers. The request is slower and the speed varies from one access of data to another. A developer needs to write code to account for these various conditions. In addition, a data source and its client are coupled much more loosely over the Internet than they would be if they resided on the same PC. For example, it’s relatively easy to write code that opens a database connection and keeps it open for the duration of the work session of the human user. While this might be reasonable on a single PC, it doesn’t work well over the Internet because the server probably has (desperately hopes it has) many concurrent users and the server will buckle under the load of keeping open many simultaneous connections, even if most of them aren’t doing anything. We want to be able to access data in a way that can deal with slow and varying response times and doesn’t tie up server resources for long periods.

Our data access strategy needs to work well in the loosely coupled world of the Internet.

XML (eXtensible Markup Language) is quickly emerging as the lingua franca of the Internet. That Latin term is about 200 years old, and it literally means “French language,” but figuratively it means “the language everyone speaks.” Today, we’d probably call XML the English of the Internet. I like to call XML the tofu of the Internet because it doesn’t have any flavor of its own—it takes on the flavor of whatever you cook it with—or occasionally the WD-40 of the Internet, because it drastically lowers the friction of crossing boundaries. XML makes an excellent wire format for transporting data from one computer system to another because it’s widely supported and free of implementation dependencies. Our data access strategy needs to go into and out of XML easily.

We need our data access strategy to work well with XML.

Finally, we need to maintain backward compatibility with existing code and data. The installed base of data access code is enormous, written and tested at great expense, and we can’t afford to jettison it. Any new architecture that doesn’t provide a bridge from the current state of affairs, whatever it is, doesn’t have much chance in the market, no matter how cool it is on its own.

We need our new data access strategy to keep working with what’s been working.