Datasets, Adapters, and Sources | VSTO Data Programming

Now that we have seen a couple no-coding-required examples, let's take a peek under the hood and see how data binding actually works. Several players make data binding work, many of which can be seen on the component tray or design surface:

A back-end data source, such as an Access database, a remote SQL Server database, a Web service, or some other data storage and retrieval system, is where the data ultimately lives.
A dataset is a disconnected local cache of some portion of the back-end data source.
An adapter connects the dataset to the back-end data source, to both fill the dataset from the back-end source, and to update the back end with any changes. There is usually one adapter per table, which is why we saw two adapters in the preceding example.
A binding source acts as the intermediary between the user interface control and the dataset. Although it is possible to bind a control directly to a dataset, as discussed later, it is usually more convenient to go through a dedicated binding source object.
A data-bindable control provides a user interface element that enables the user to read or write the data.

The back-end data source is represented in a VSTO project by the connection string passed to the adapter; everything else is represented by a member of the customized host item (the worksheet or document) class.

Let's take a look at these different components in more detail.

Data Sources and Security Best Practices

As you probably noticed in the Connection Wizard, all the information required to connect to the back-end data source is stored in a "connection string" generated by the wizard. It typically looks something like this:

Server=MyDataServer; Database=Customers; Integrated Security=true;

That is, it says where the database is located, what it is called, and how the user should be authenticated. All this is potentially sensitive information! Use caution when embedding connection strings into your programs; remember, even without the source code, it is very easy to figure out which strings are embedded in a managed application. This particularly applies to connection strings where, instead of using Windows NT integrated security, you simply embed UserID=eric;Password=BigSecret123 directly.

Furthermore, hard-coded embedded strings in your source code make it hard for developers, testers, end users, and database administrators to update your application should the database connection information change over time. As discussed previously, Visual Studio gives you the option of embedding the connection string in the configuration file. The automatically generated configuration file in our example above looks something like Listing 17-1.

Listing 17-1. A Typical Database Connection String in a Configuration File

It is also a good idea to use the "principle of least privilege." This is one of the fundamental principles of secure design: Grant exactly as much privilege as you need to get the job done, no more, no less. For example, if your user needs to be able to read from the database but not write to it, do not specify a connection string that gives the user administrator rights to the database. Instead, choose a connection string that specifies a username and password with read-only access. That way, if the username and password are ever compromised, at least the attacker does not get administrator access out of it.

Better still, do not use stored user IDs and passwords at all; some databases use integrated Windows authentication, so the logged-on user can seamlessly use his already authenticated credentials. Or, if your database system requires a username and password, make the user type them in rather than storing them. As you'll see later when we discuss adapters, you can manually change the connection string used by the adapter before it fills the dataset. That way you could ask the user to type in his user ID and password and then generate a new connection string from that information.

Datasets

The cornerstone of the VSTO 2005 data model, and of ADO.NET in general, is the dataset. We should motivate the existence of datasets by describing the old way of doing data access. Back in the twentieth century, you typically communicated with a database via "ADO Classic" something like this:

Create and open a connection to a database.
Create and execute a database command (such as SELECT partnumber FROM invoices WHERE price > 100).
Enumerate the resulting record set.
Close the connection.

This approach worked fairly well, but it has several drawbacks. The principle drawbacks are consequences of the fact that this model requires a live connection to a database. If there are going to be many live connections, the server needs to be scalable and robust, which can be expensive. Therefore, to minimize load upon the server, we want connections to be short-lived. But because the connection is open while the user is enumerating the record set, the connection is typically open for quite some timeas long as the user is working with the data.

Furthermore, even if the server-side expense of keeping connections open is unimportant, this model does not work well in a world where you want to be able to work with your data even if you temporarily lack network connectivity.

A Disconnected Strategy

Database connections are both expensive and necessary and therefore must be managed carefully. In a typical ADO application, much developer effort is expended writing code to ensure that the connection is open for as little time as possible while still meeting the needs of the application's users. ADO.NET addresses the problems of ADO by going straight to the root; if we cannot make connections inexpensive, we can at least make them less necessary. ADO.NET is therefore fundamentally a disconnected strategy. A typical ADO.NET scenario goes something like this:

Create a DataAdapter to manage the connection to a specific database or other data source.
Set properties on the adapter that tell it what query to execute against the database.
Create a dataset to be filled.
Invoke a method on the adapter to take care of the details of opening a connection, executing the query, saving the results in the dataset, and closing the connection as soon as possible.
Work with the data in the now-disconnected dataset.
When you have finished working with the data, invoke a method on the adapter to re-open the connection to the database and update it with any changes.

And indeed, as you will see later when we discuss adapters, VSTO does exactly this on your behalf.

Because the dataset acts much like the original database, the connection need be open only as long as it takes to fill the dataset. After the data has been copied to the dataset, you can query and manipulate the dataset for as long as you want without worrying that you are consuming a valuable database connection.

Furthermore, there is no reason why the data used to fill the dataset have to come from a connected database; you could fill the dataset from an XML file, or write a program to add tables and rows to build one "from scratch." Datasets have no knowledge of where the data they contain came from; if you need it, all that knowledge is encapsulated in the adapter.

The foregoing is not to say that old-fashioned connected data access is impossible in ADO.NET, or even discouraged; the DataReader class allows for traditional always-connected access to a database. However, neither Windows Forms controls nor VSTO 2005 host items/host controls can use DataReaders for data binding, so we speak of them no more in this book.

Typed and Untyped Datasets

In the Solution Explorer of the Word or Excel projects we created earlier, you will find a NorthwindDataSet.xsd file containing the database schema. This is an XML document that describes the tables, columns, and relationships that make up the dataset. One of the child nodes in the Solution Explorer tree view is NorthwindDataSet.Designer.cs. This file contains the automatically generated code for the dataset and table adapters.

The first line of the declaration is interesting:

public partial class NorthwindDataSet : System.Data.DataSet {

The generated class is partial so that if you need to add your own extensions to it, you can do so in a separate file; it is a bad idea to edit automatically generated files. More importantly, this dataset extends the System.Data.DataSet class. A System.Data.DataSet consists of a collection of data tables. As you would expect, data tables consist of a collection of data columns and data rows. Each class exposes various collections as properties that allow you to navigate through the dataset.

System.Data.DataSet is not an abstract class; you can create instances and fill them from any back-end data source. But that would be an untyped dataset; the NorthwindDataSet is a typed dataset. Untyped datasets give you great flexibility but are so general that they are somewhat harder to use.

For example, if you were to fill an untyped dataset with data from the Northwind database file, you could access a particular datum with an expression such as this:

name = myDataSet.Tables["Products"].Rows[1]["ProductName"];

But that flexibility comes at a cost: You can accidentally pass in a bad table name, a bad column name, or make a bad assumption about the type of the data stored in a column. Because none of the structure of the tables or types of the columns is known at compile time, the compiler is unable to verify that the code will run without throwing exceptions. Also, the IntelliSense engine is unable to provide any hints about the dataset's structure while you are developing the code.

Typed datasets mitigate these problems. A typed dataset is a class that extends the dataset base class; it has all the flexible, untyped features of a regular untyped dataset, but also has compile-time strongly typed properties that expose the tables by name. A typed dataset also defines typed data table and data row subclasses, too.

As you can see from the NorthwindDataSet.Designer.cs file, the typed dataset has public properties that enable you to write much more straightforward code, such as this:

name = myDataSet.Products[1].ProductName;

Typed datasets extend untyped datasets in many ways; some of the most important are as follows:

Tables are exposed as read-only properties typed as instances of typed data tables.
Tables have read-only properties for each column.
Tables have an indexer that returns a typed data row.
Event delegates for row change events pass typed change event arguments. Each row type has a row-changing, row-changed, row-deleting, and row-deleted event. (You might be wondering where the row-adding and row-added events are. The changing/changed events pass a DataRowAction enumerated type to indicate whether the row in question was newly created.)
Tables provide methods for adding and removing typed data rows.
Rows provide getters, setters, and nullity testers for each column.

In short, it is almost always a good idea to use a typed dataset. Weakly typed code is harder to read, harder to reason about, and harder to maintain.

Adapters

Take a look at the Startup event handler in either the Word or Excel examples above. Visual Studio has automatically generated the code in Listing 17-2 on your behalf.

Listing 17-2. Auto-Generated Table-Filling Code

public partial class ThisDocument
{
 private void ThisDocument_Startup(object sender, System.EventArgs e)
 {
 // TODO: Delete this line of code to remove the default
 // AutoFill for 'northwindDataSet.Products'.
 if (this.NeedsFill("northwindDataSet"))
 {
 this.productsTableAdapter.Fill(this.northwindDataSet.Products);
 }
 // TODO: Delete this line of code to remove the default
 // AutoFill for 'northwindDataSet.Suppliers'.
 if (this.NeedsFill("northwindDataSet"))
 {
 this.suppliersTableAdapter.Fill(this.northwindDataSet.Suppliers);
 }
}

We discuss what exactly NeedsFill is for in more detail when we discuss data caching later in this chapter and in Chapter 18, "Server Data Scenarios." But for now, this should look fairly straightforward: If the two tables need to be filled from the back-end data source, the adapters fill the appropriate tables.

There are a number of reasons why you might want to not automatically fill the data tables in the Startup event, which is why the comment points out that you can remove the auto-generated code. For example, as mentioned earlier, you might want to require that the user enter a database password before attempting to fill the dataset. You can generate a new connection string and then set the adapter's Connection.ConnectionString property.

Or perhaps you want to give the user the option of whether to connect to the back end. If the user is on an expensive or slow connection, the user might want to skip downloading a large chunk of data. For any number of reasons, you might not want to connect right away or use the default connection string, so Visual Studio allows you to modify this startup code.

Visual Studio generates strongly typed custom adapters at the same time as it generates the typed dataset. If you read through the generated adapter code in NorthwindDataSet.Designers.cs, you will see that the generated adapter has been hard-coded to connect to the database specified by the connection string in the configuration file. The bulk of the generated adapter code consists of the query code to handle reading from the back-end data store into the typed dataset, and then taking any changes in the dataset and updating or deleting the appropriate rows in the store.

The adapter takes care of all the details of opening the connection, executing the query, copying the data into the dataset, and closing the connection. At this point, we have a local copy of the data, which we can use to our heart's content without worrying about taxing the server further.

When you are done editing the local copy of the data in the dataset, you can use the adapter to update the database with the changes by calling the Update method of the adapter. The adapter will then take care of making the additions, changes, and deletions to the back-end database.

By default, the adapter assumes that you want "optimistic concurrency." That is, other users will be able to update the database unless you are currently in the process of updating the database. Other concurrency models are possible but beyond the scope of this text. If you want either "pessimistic concurrency" (that is, the database remains locked during the whole time that you have the offline dataset) or "destructive concurrency" (that is, the database is never locked even when multiple people are writing at once), consult a reference on ADO.NET to see how to configure your adapter appropriately.

Using Binding Sources as Proxies

Why does Visual Studio bind the controls to a BindingSource "proxy" object, rather than binding controls directly to the data table?

The reason is because the control can bind to the proxy even if the data in the table is not currently available. For instance, perhaps the data table is going to be derived from a call to a Web service, which will not happen until long after the initialization is complete, or until the user types in his password or presses a button to start the database connection.

The proxy object is created when the customization starts up and controls can be bound to it even if there is no "real" data available. When the real data is available, the binding source updates the controls. It is essentially just a thin "shim" that makes it easier to set up bindings before the data is all available.

As you saw in the examples, multiple controls can share the same binding source and therefore have the same "currency." That is, when one control is updated, every other control linked to the same binding source is also updated automatically. Controls on different worksheets or even on the actions pane can share binding sources and thereby share currency. You will learn about currency management in more detail later in this chapter.

Data-Bindable Controls

The last piece of the data binding puzzle is the host control or Windows Forms control on the spreadsheet or document that actually displays the bound data. There are two flavors of data-bindable controls: "simple" and "complex." Controls that can bind a single datum to a particular property are "simple-data-bindable." Controls that can bind multiple rows and/or columns are "complex-data-bindable."

In the preceding examples, the list object in Excel and the combo box and data grid Windows Forms controls are complex-data-bindable; the list object and data grid display multiple rows and columns from a table, and the combo box displays multiple rows from a single column. The bookmark and named range controls by contrast are simple-data-bindable; only a single datum is bound to the Value property of the named range.

All the Windows Forms controls are simple-data-bindable, as are almost all of the Word and Excel host items and host controls. (There is one exception: Word XMLNodes host control is neither simple- nor complex-data-bindable.) Of the host items and host controls, only Excel's list object is complex-data-bindable.

The behind-the-scenes mechanisms by which controls implement data binding and manage currency are fairly complex; we cover them in more detail toward the end of this chapter. But first, now that we have gotten a little context as to what all these parts are and how they relate, let's take a look at a somewhat more labor-intensive way to do data binding in Excel. This time we are going to actually write a few lines of code.

Part One. An Introduction to VSTO

Part One. An Introduction to VSTO

An Introduction to Office Programming

Introduction to Office Solutions

Part Two. Office Programming in .NET

Part Two. Office Programming in .NET

Programming Excel

Working with Excel Events

Working with Excel Objects

Programming Word

Working with Word Events

Working with Word Objects

Programming Outlook

Working with Outlook Events

Working with Outlook Objects

Introduction to InfoPath

Part Three. Office Programming in VSTO

Part Three. Office Programming in VSTO

The VSTO Programming Model

Using Windows Forms in VSTO

Working with Actions Pane

Working with Smart Tags in VSTO

VSTO Data Programming

Server Data Scenarios

.NET Code Security

Deployment

Part Four. Advanced Office Programming

Part Four. Advanced Office Programming

Working with XML in Excel

Working with XML in Word

Developing COM Add-Ins for Word and Excel

Creating Outlook Add-Ins with VSTO