In this section, we'll address the challenge of accessing a multitude of data sources in a little more detail. Every data store provides a native access method. Each database vendor provides a vendor-specific API to ease database access. Non-DBMS data can be accessed via data-specific APIs, such as the Microsoft Windows NT Directory Service API or the Messaging API (MAPI) for accessing mail data, or via file system APIs. By using the native access method for each data store, a developer can use the full power of each store. However, this technique requires the developer to learn how to use each access method—that is, the developer must understand the API functions as well as how to use them efficiently and how to use diagnostic and configuration tools associated with the data store. The cost of training developers on all the data access methods used within a company can be quite high.
Instead of using native data access methods, developers can choose to use a generic, vendor-neutral API such as the Microsoft ODBC interface. ODBC is a C programming language interface for accessing data in a DBMS using Structured Query Language (SQL). An ODBC driver manager provides the programming interface and run-time components you need to locate DBMS-specific drivers. ODBC drivers are typically supplied by the DBMS vendor. These drivers translate generic calls from the ODBC driver manager into calls to the native data access method.
The primary advantage of using ODBC is that developers need to learn only one API to access a wide range of DBMSs. Applications can access data from multiple DBMSs at the same time. In fact, the application developer need not even target a specific DBMS—the exact DBMS to be used can be decided when the application is deployed.
Unfortunately, there are several drawbacks to the ODBC approach. First, there must be an ODBC driver for every data store you want to access. These drivers must support SQL queries, even if the database does not use SQL for its native query language. Second, the ODBC API treats all data as relational tables. Both of these constraints can cause problems for unstructured and nonrelational data stores. Finally, the ODBC API is a standard, controlled by a committee, which means that regardless of the capabilities of the underlying DBMS, the ODBC driver can expose only functionality that is part of the standard. Modifying the API is a complex process. The committee must agree to the proposed change, specify how ODBC drivers should implement the new function(s), and specify how applications or the driver manager can detect whether a given driver supports the new specification. Drivers must be updated, and applications must ensure that the new drivers are installed or that the applications are written defensively against older drivers.
In practice, ODBC is a widely used mechanism for database access and is supported by most major DBMS vendors. For applications that work only with traditional relational databases, ODBC is a fine solution. As applications move beyond the realm of the relational DBMS (RDBMS), however, a more comprehensive solution is needed.
Another way to attack the problem of disparate data sources is to put all the data into a single data store. This approach is sometimes called the universal storage approach because the single data store is supposed to hold any and all kinds of data. Universal storage solves the problem of multiple access methods since there would only be one type of store. However, it presents a huge technical problem: writing a data store that can efficiently store and retrieve any type of data. And it presents a huge business problem: what to do about the terabytes of existing data that are stored somewhere else! The cost of converting data to the universal store would be enormous, not to mention the risk associated with the single point of failure represented by the universal store itself.
Realistically, the ODBC approach of a common access method seems more feasible than the universal storage approach. However, the access method cannot be limited to relational database tables and SQL queries. It must encompass all types of data.
The Microsoft Universal Data Access (UDA) architecture is designed to provide high-performance access to any type of data—structured or unstructured, relational or nonrelational—stored anywhere in the enterprise. UDA defines a set of COM interfaces that generalize the idea of accessing data, as illustrated in Figure 3-1. UDA is based on OLE DB, a set of COM interfaces for building database components. OLE DB lets data stores expose their native functionality without making nonrelational data look relational. OLE DB also provides a way for generic service components, such as specialized query processors, to augment the features of simpler data providers. Because OLE DB is optimized for efficient data access rather than ease of use, UDA also defines an application-level programming interface, called Microsoft ActiveX Data Objects (ADO). ADO exposes dual interfaces, so it can easily be used with scripting languages as well as with C++, Microsoft Visual Basic, and other developer tools. ADO is discussed further in the section "ActiveX Data Objects" later in this chapter.
Figure 3-1. The UDA architecture.
MDAC provides an implementation of UDA that includes ADO as well as an OLE DB provider for ODBC. This capability means that ADO can be used to access any database that has an ODBC driver—effectively, any major database platform. OLE DB providers are also available for other types of stores, such as the Microsoft Exchange mail store, the Windows NT Directory Services, and the Windows file system itself via Microsoft Index Server. Developers can write applications using ADO as the single data access mechanism, for existing data as well as for new data, structured or unstructured, wherever it is located.