Choosing a Data Storage Methodology | Professional ASP.NET MVC 1.0 (Wrox Programmer to Programmer)

Having seen both relational and XML data access in action within the .NET Framework (albeit in a fairly basic way), how do you decide on a data storage methodology? The simple answer is that, with the advent of .NET, you really don't need to worry about this anymore.

Years ago, one of the main directions in data storage and access was the construction of huge data depositaries or data warehouses where all the data your organization required was stored in a massive central database. While this might still suit some situations (such as a government tax office) it has become clear that it is not a generally practical approach in today's distributed and disconnected computing world.

In fact, there has been even less centralization of data over recent years, and the drive now is far more towards the provision of access through common methods to all kinds of remote and non-centralized data. As an example, the Internet contains vast quantities of data in myriad different formats, but we increasingly need to be able to get at this data in a structured and standard way.

Likewise, in an office environment, the promised takeover of thin client computing has not really taken place yet. People like to store information locally as they work, and use it when disconnected from the corporate network. In some cases, such as the traveling salesperson with a laptop computer, this is the prime requirement when working with corporate data.

Access and Manipulation Is the Key

In fact, it's obvious that where and (to some extent) how we store data is not important. The crux of the matter is how we can access and manipulate that data in whatever format it's stored and wherever it resides. As you saw at the start of this chapter, this is what has been the driving force behind the adoption of XML, and the design of the .NET data access libraries.

So, what issues should one consider when implementing a data store, and which data access technique is most appropriate for that data? The answer lies more in the nature of the data, and the way we need to use it. For example, highly structured data, such as stock lists or customer details, is well suited to storage in a relational database such as SQL Server or Oracle, or MS Access on the desktop. However, unstructured data, such as reports , data sheets, email messages, family trees, and other common everyday scenarios, is more suited to storage using the tree-like metaphor of XML.

Likewise, if we regularly need to access parts of the data in specific ways, or all the data on a very regular basis, the relational database is probably the most efficient. It is optimized to provide indexing and other features to give the best performance. But if we usually access the entire data entity in one go, or access it only rarely, an XML-based approach is probably the best choice. And, being basically just text files, XML documents are easy to archive and retrieve.

Of course, in some cases, you don't actually get to choose the data storage format. For example, your email server and your fax server probably have dedicated storage mechanisms that can't be changed. In such cases, you have to make do with what's there, or change to another product.

A New Approach to Querying

Another point to be aware of is that you should not base your data format decision on current querying technologies. One of the major issues at the moment is that each data storage format has its own specific techniques for querying and extracting data; for example, SQL for relational data and XSLT for XML data. If you want to perform a query across different types of data, you generally have to convert the all to the same type first.

However, this is set to change with the growing realization that a new querying technology, called XML Query Language or XQuery, will be able to integrate different types of data under a universal query mechanism. XQuery has been called SQL querying for XML data because it uses a syntax that is similar to the widely accepted SQL standards, and yet can be applied to XML documents.

And as relational data stores such as SQL Server become increasingly XML-capable, and the tools to access and manage XML data inside a relational database improve, XQuery can also be used with suitable relational databases. In future releases of .NET, this scenario will become a core part of the way you query data in mixed environments.

There is a preview of the way Microsoft are approaching XQuery, at least as far as working with XML is concerned , on the special Web site they have set up at http://xqueryservices.com. You can experiment with XQuery online, or download the Microsoft XQuery demo to run on your server.

Transport Protocols Are the Future

Once you've decided on the storage mechanism for your data, the next important decision comes when you consider how you will transport this data from one place to another. Here, there is probably only one good solution that matches the requirements of the future. There's no doubt that we'll face increasing needs to interface with other systems and other organizations as time goes by, and for this, a standard data interchange format will be an absolute necessity.

The only obvious choice today is XML (and the associated standards such as SOAP and other industry- specific implementations of XML). XML is independent of the platform, application, and operating system, and so it provides the best chance for interoperability.

In fact, Microsoft BizTalk Server and similar systems can handle the transmission and guaranteed delivery of data in XML format over almost any kind of network, as well as the conversion to and from other formats. Using the tools available today and in the near future, you can transform an XML document into almost any other document type on demand and often transform any non-XML document or data into XML as well.

And .NET Is a Great Solution

So, if the transport protocol and transmission format for data are going to be XML-based, and the data storage and manipulation could be through any existing or new technology, what you really need is a solid, reliable, and wide- ranging technique to connect to any kind of data store, and work with any kind of data.

This is where the combination of the relational and XML data access techniques provided by the .NET Framework comes in. As you've seen (and will see), you can use the .NET data access classes to connect to almost any kind of data store be it a mail server, a relational database, an office application document, an XML document, or whatever. Then, once you have extracted data, you can convert it between XML and traditional relational rowsets at will and update the data store or save it to disk in almost any format you need.