In Lesson 1, "Defining a Web Application Strategy," you were introduced to applications that are distributed across a multitiered environment. One of the components of this environment is the data tier, which provides the data store for applications that are responding to client requests for data. To support requests, Microsoft provides three data access technologies: Open Database Connectivity (ODBC), OLE DB, and ActiveX Data Objects (ADO). ASP and COM components can use these technologies to access data from local data sources such as SQL Server databases. This lesson describes ODBC, OLE DB, and ADO and explains how ASP and COM components can use them to communicate with a data source. The lesson also describes the steps you should follow when planning your integration strategy.
Many of today’s Web applications rely on a relational database (such as a SQL Server database) or another type of data store to provide accurate information to users. Because of the demands of user interactivity and application complexity, the information must be accessible in way that makes it easy to manipulate and modify. To facilitate this accessibility, Microsoft provides several technologies that enable data access, including ODBC, OLE DB, and ADO. Each of these technologies is described in Table 9.5.
Table 9.5 Data Access Technologies
ODBC is a standardized method for accessing data in relational databases. Although ODBC is fast and lightweight, it’s a common method not optimized for any specific data source.
OLE DB, like ODBC, is an open specification. OLE DB is a set of COM interfaces that can communicate with any data source, including relational data sources, such as SQL Server databases, and nonrelational data sources, such as Microsoft Excel spreadsheets, e-mail data stores, and text files. With ADO, you can access databases, indexed sequential access method (ISAM), text, or hierarchical data sources.
ADO is a collection of objects that exposes the attributes and methods used to communicate with an OLE DB data source. It provides a common programming model designed to work with OLE DB. ADO is ideal for server-side scripting and is the recommended technology for data access for ASP applications.
Server-side ASP and COM components can access ADO, which uses OLE DB providers to communicate with the data source. A provider is any component that allows technologies such as ADO to access data in a uniform way through the OLE DB interfaces. A provider manages access to the data source. Figure 9.5 illustrates how ASP and a COM component access data from a SQL Server database. The OLE DB provider used to access data from a SQL Server database is SQLOLEDB.
Figure 9.5 - Accessing data from a SQL Server database
When designing your strategy for integrating a database into your Web application, your planning process should include a number of steps, including setting up permissions, optimizing database connections, using stored procedures to manage data, managing data files, and scaling out your database. The rest of this section describes each of these steps.
This lesson focuses on server-side solutions to database access because these solutions are browser-independent, which is often preferable in mixed-browser environments like the Internet. However, Microsoft also supports technologies that facilitate client-side programming, such as Remote Data Service (RDS). For more information about these technologies, refer to the Microsoft Windows 2000 Server Resource Kit or to http://www.microsoft.com/windows2000.
Before an application can access data from a database, permissions must be set up properly. This is especially true when a database, such as a SQL Server database, and IIS are running on different computers, which is the most likely scenario in a distributed environment. In a Windows 2000 environment, connections to a SQL Server computer are set up by default to use named pipes. A named pipe is a high-level interprocess communication mechanism used by network computers. Named pipes provide connection-oriented messaging by using pipes. However, in most cases, the connection type often needs to be changed to the Transmission Control Protocol/Internet Protocol (TCP/IP) network library or another nonauthenticated protocol.
In order for a client to gain access to a named pipe, and subsequently the SQL Server database, the SQL Server computer must validate the client, which can occur in one of two ways:
When attempting to validate a connection, the SQL Server computer uses the identity of the user associated with the Web connection. If the connection is anonymous, you must ensure that a guest account has been created that corresponds to the IUSR_computername account and that permissions have been granted to that account to allow it to log on to the SQL Server computer. There are a number of methods that you can use to support an anonymous logon, including the following two:
You can also set up your system to support user authentication. For example, if you configure IIS to use Integrated Windows authentication, IIS tries to connect to the SQL Server computer by using the user’s security context. As with anonymous users, the connection attempts to use a named pipe to connect to the SQL Server computer if SQL Server is located on a separate computer from IIS. When Windows 2000 detects the attempt to use a named pipe that has been opened in a different user context, it closes the pipe, according to its security rules. To avoid this situation, you can use a nonauthenticated protocol, such as the TCP/IP network library, between the IIS computer and the SQL Server computer.
Note that in a highly available environment, SQL Server 2000 will often be installed on a cluster configured with the Cluster service. Currently in Windows 2000, clients can’t use Kerberos to authenticate a connection to a cluster vir- tual server. If Kerberos can’t be used, the clients attempt to authenticate with Windows NT LAN Manager (NTLM) authentication. (SQL Server 2000 supports NTLM authentication.) You shouldn’t disable NetBIOS over TCP/IP or restrict NTLM authentication on clients that will be communicating with cluster virtual servers.
Managing database connections can be a significant challenge with distributed Web applications. The process of opening and maintaining connections can severely affect the database server’s resources, even when no information is transmitted. As a result, connectivity performance can be affected. Table 9.6 describes several steps you can take to optimize your database connections.
Table 9.6 Optimizing Connections
Enhancing application performance on SQL Server
In the registry, change the threading model for the main ADO component from Apartment to Both. Don’t make this change if you intend to use ADO to connect to a Microsoft Access database.
Setting the connection timeout
Limit the amount of time that your application will wait for a connection to the database. A surge in database activity can cause the database to become backlogged, resulting in increased waiting periods. Excessive delays increase the amount of time a user must wait to learn that a request can’t be processed.
Closing your connections
Design your application to close connections as soon as they’re no longer needed to reduce the demand on data- base resources and to make those resources available to other users.
Sharing active connections
Design your application to share connections whenever possible, and don’t use connections that you don’t need.
Increasing the size of the record cache
Request multiple records at one time in order to increase connection throughput. You can increase the number of records that the provider will retrieve at one time into local memory.
A stored procedure is a precompiled set of queries that’s stored on the database server. You can use stored procedures to simplify development and speed up complex queries. They control which operations are performed and which database fields are accessed.
Stored procedures are a useful way of providing security to your database because you can grant users and applications execute permission on the stored procedures without providing direct access to the underlying views and tables. That way, data can’t be accessed by using query tools, except for the application front-end.
When you create a database, you must specify an initial size for the data files, which includes the database files and the transaction log. One way you can tune your database’s performance is by properly configuring the sizing of those files. When you first set up the database, the data files should be large enough to avoid the Input/Output (I/O)–intensive operation of autogrowing data files. If the size of the initial files is too small, the database management system (DBMS) will have to frequently increase the size of the data files, which can degrade performance.
In addition to the initial size, you should set a maximum size appropriate to your storage system. For example, if the server uses Small Computer System Interface (SCSI) disks, you shouldn’t allow the data files to grow beyond 85 percent of the drive’s capacity. Beyond that, SCSI disk performance begins to degrade.
You can scale out your database to provide high availability and performance to your Web applications. Three technologies that you can use are data partitions, failover clustering, and log shipping.
Partitioning refers to the process of distributing data from one table into multiple, identical tables on different servers. Once the data is partitioned, you can use distributed partitioned views to access that data. In SQL Server, you first create the member tables and then you create the partitioned view, which merges all the data so that the table divisions are transparent to users and applications. From the perspective of the users and applications, it doesn’t matter if the view is made up of one or eight tables—it looks and acts like one entity. The partitioned view provides seamless integration of data access to all the member tables.
Figure 9.6 illustrates how a distributed partitioned view accesses data. The CustomerData view accesses the Customer1, Customer2, and Customer3 tables. Each table contains a subset of the data that’s included in the CustomerData view.
Figure 9.6 - Accessing data through a distributed partitioned view
A copy of the distributed partition view resides in each partitioned database. For the partitioned view to be usable, all servers that contain a member table that participates in a view must be available. There is no built-in fault tolerance for partitioned views. If a server in the partitioned view fails, the view can’t access data from any of the member tables, even if the query isn’t requesting data from the table on the failed server.
You should partition only those tables whose use is appropriate for partitioning. When you’re designing a partition strategy, carefully consider how the request will be routed to the member tables. Your most important goal should be to minimize the amount of processing that the server needs to perform in order to return the requested data. Partitioning is related to data usage.
For example, suppose your database contains customer and sales data and you find that requests for data usually focus on sales region. A good strategy might be to partition data based on region. In this case you’d create member tables in which each one has the horizontally partitioned data for one or more regions. The partitioned view will retrieve data from all member servers according to how that data has been queried and will submit data only to the member server containing the data for a particular region.
Another way to access partitioned data is through data-dependent routing, which is an application-based approach that uses code to determine where the target data is located. Connections are then routed to the appropriate server. You can use data-dependent routing instead of partitioned views or in combination with them. In data-dependent routing, the information on how to go after the data is made available to the application.
When accessing data through distributed partitioned views, data partitioning alone doesn’t provide high availability. However, two high-availability solutions that you can use in conjunction with partitioning are failover clustering and log shipping.
SQL Server 2000 failover clustering is built on top of the Cluster service in Windows 2000 Advanced Server and Windows 2000 Datacenter Server. Failover clustering in SQL Server has three main components:
In SQL Server, an instance is an installation of SQL Server that’s completely separate from any other installations. However, instances can share underlying components that affect how SQL Server works in a clustered environment. SQL Server allows you to set up multiple instances per server.
SQL Server 2000 supports two types of failover clustering: single instance (similar to an active/passive cluster) and multiple instances (similar to an active/active cluster). These two types of failover clustering are described in Table 9.7.
Table 9.7 Types of Failover Clustering
|Type of Clustering||Description|
This type of cluster contains only one active instance of SQL Server, which is owned by a single node. All other nodes in the cluster are in a wait state. The waiting node is enabled if the active node fails.
This type of cluster supports up to four nodes. Although SQL Server can support up to 16 instances, having more than 4 instances isn’t recommended (1 instance per node). One strategy that you can use in a multiple-instance cluster is an N+1 topology. In this configuration all nodes are active except one. The passive node remains in standby and is configured as the primary failover computer.
Part of implementing a database into a distributed design is ensuring that database’s availability. One method that you can use to protect your database is log shipping, which is a process of copying transaction logs from a primary SQL Server computer and applying them sequentially (on a scheduled basis) to another SQL Server computer. If the primary SQL Server computer fails, you can direct your application to the backup server. Depending on when the last transactions were applied, your backup server will either be up to date or only slightly out of sync. Log shipping can work in conjunction with failover clustering.
Table 9.8 describes many of the factors that you should take into consideration when planning to implement log shipping.
Table 9.8 Configuring Log Shipping
You must determine how often the log data should be synchronized by taking into account how close the secondary server should be behind the primary server. In other words, how many transactions can you afford to lose? By default, transactions logs are backed up every 15 minutes.
Placing the servers
The location of the servers is important. Log shipping allows you to disperse log data across geographical areas, which is recommended. At a minimum, the primary and secondary servers should be on separate grids, and the log-shipping pair should never exist on the same server.
Connecting the servers
The log-shipping pair must be able to communicate with each other. Ideally, the servers should communicate log-shipping information over a private local area network (LAN) rather than share the network bandwidth. You should also configure the servers with higher bandwidth network cards, if your network infrastructure can support them.
Planning secondary server capacity
The secondary server should have the same capacity as the primary server to ensure that the application will perform as expected should failover occur.
Generating database backups
In most cases you can use the secondary server to perform full database backups because the secondary server is only out of sync by a small increment of time. This strategy keeps the primary server free of the overhead and contention incurred by the backup process.
In general, failover clustering is a better high-availability solution than log shipping (although you can use the two together). For example, failover clustering is fast and automatic, while log shipping is slow and manual. In failover clustering, all transactions are rolled forward if completed or rolled back during the failover process. In log shipping, if the primary server is unavailable, the last transactions can’t be retrieved. However, log shipping is generally less expensive than failover clustering and does provide a high level of protection—just not as much as failover clustering.
The process of designing a database Web integration strategy includes five steps. Table 9.9 describes the considerations that you should take into account for each of these steps.
Table 9.9 Integrating a Database into Your Application
Setting up permissions
When you set up permissions for data access in a Web environment, you must take into account whether users will access the site anonymously or whether they’ll have to be authenticated in order to access the site.
Optimizing database connections
When optimizing database connections, you should take into account application performance, connection time-outs, closing your connections, sharing active connections, and the size of the record cache.
Using stored procedures to manage data
Stored procedures can simplify development, speed up complex queries, and provide a security structure for protecting data.
Managing data files
You can fine-tune the performance of your database by controlling the limits and rate at which the data files can grow.
Scaling out your database
When scaling out your database to achieve high availability and performance, you can use data partitioning, failover clustering, and log shipping.
You can use ASP and components to access data through ADO. This process allows you to integrate your database into your distributed Web applications. When determining how to integrate your database into your distributed application, you should adhere to the following guidelines:
In addition to following these guidelines, you should take steps to scale out your database in order to provide highly available data to your applications. You can use partitioning to facilitate the scaling out process. Your goal in partitioning data should be to minimize the amount of processing that the server needs to perform to return the requested data.
Although partitioning alone isn’t enough to ensure highly available data, you can use it in conjunction with failover clustering, log shipping, or both. If partitioning isn’t appropriate for your organization, you should still implement failover clustering or log shipping. Whenever you’re trying to ensure high availability, your first choice should be clustering because it provides the quickest and most automatic failover and is virtually transparent to users and applications.
Margie’s Travel is implementing a Web site that provides travel information to the company’s customers. Much of the information is generated dynamically from content stored in a database. Customers on the Web can access the site at http://www.margiestravel.com/. From there they can search for the information they need.
The site supports this functionality by using ASP-based applications and COM+ components to access a SQL Server database. The application will use stored procedures to access the data. A multitiered topology is used to distribute the application components across the presentation tier, business logic tier, and data tier, as shown in Figure 9.7.
Figure 9.7 - Database integration in a distributed environment
The business logic layer comprises a Web cluster that’s configured with IIS and ASP. The COM+ components are located on the same Web cluster. The SQL Server database is located on the data layer, which is set up as a single-instance failover cluster. One node is configured with an active instance of SQL Server, and a second node is configured with a waiting instance of SQL Server. Both servers in the failover cluster are identical and each can easily support the anticipated number of data requests at peak times. Users will log on anonymously, so an appropriate guest account has been configured on the SQL Server computers.
Microsoft provides several technologies that enable data access, including ODBC, OLE DB, and ADO. Server-side ASP and COM components can access ADO, which uses OLE DB providers to communicate with the data source. Before an application can access data from a database, permissions must be set up properly. The method used to ensure data access depends on whether users access the site anonymously or whether they must be authenticated. You can optimize database connections by using the Both threading model, limiting the connection time-outs, closing and sharing connections, and increasing the size of the record cache. When accessing data, your application should use stored procedures when possible because they simplify development, speed up complex queries, and provide a security structure. Database integration should also take into account data files configuration. You should set an appropriate initial size and maximum size. You can use data partitions, failover clustering, and log shipping to scale out your database and provide high availability and performance. Partitioning refers to the process of distributing data from one table across multiple, identical tables on different servers. Failover clustering builds on the Windows 2000 Cluster service and provides failover and failback protection to your database services. Log shipping refers to the process of copying transaction logs from a primary SQL Server computer and applying them sequentially (on a scheduled basis) to another SQL Server computer to provide a backup server that’s either up to date or nearly up to date.