Lesson 2: SQL Server 7.0 Data Warehousing Features | Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

The data warehousing environment can be complex. Your tools need to interoperate with and leverage one another. This lesson introduces capabilities of SQL Server 7, such as linked servers and replication that support data warehousing particularly well. The tools discussed here are not used exclusively by data warehousing solutions. Because of the tools nature, however, they are often used in data warehousing environments.

SQL Server contains many features to further the goal of making the data warehousing process easier. The features described in this section are

Types of servers

Replication in a data warehousing environment

Query processor features

After this lesson, you will be able to:

Highlight certain features of SQL Server that particularly support data warehousing

Estimated lesson time: 25 minutes

Types of Servers

SQL Server has the ability to tightly integrate more than one computer in a solution. This is important when you are storing data on more than one computer, which is common in large data warehouses. An example of this functionality in SQL Server is the ability to support heterogeneous queries; these queries can directly access data from more than one server in a single SQL statement.

Two methods are supported for integrating multiple servers: remote servers and linked servers.

Remote Servers

Earlier versions of SQL Server supported the execution of stored procedures on remote servers. SQL Server 7.0 still supports remote servers for backward compatibility but introduces a much more powerful concept called linked servers.

Example

A sample remote server execution statement might look like the following:

EXEC HRServer.HRDB.dbo.GetEmployeeInfo 3716

Linked Servers

SQL Server allows you to create links to OLE DB data sources. These links are called linked servers. Linked servers make distributed queries, updates, and commands across heterogeneous data sources possible. With linked servers, you can refer to data sources that reside in different database management systems (DBMSs) as though they were tables being joined in the same database.

Example

A sample SELECT statement on two tables in different databases on different servers, one running SQL Server and the other running Oracle, might look like the following:

 SELECT emp.LastName, ord.OrderID, ord.Discount FROM SQLServer1.Northwind.dbo.Employees AS emp,      OracleSvr.Catalog1.SchemaX.Orders AS ord WHERE ord.EmployeeID = emp.EmployeeID AND ord.Discount > 0

Backup and Restore Features

Data warehouses are large and mission-critical. You don t want to waste time restoring and losing data because you didn t have an adequate backup plan. With the release of version 7.0 of SQL Server, Microsoft has added many new features and enhanced some existing functionality to provide for an easy, approachable way to safeguard your data. New features provide for faster and more precise database restoration and recovery than was previously available.

Differential Backups

When implemented properly, differential backups help save database restoration time. Differential backups are backups of only the data pages that have changed from the time of the last full database backup. They are many times faster than transaction log backups for the same time periods. Using these in conjunction with full database backups and transaction log backups will provide a quick database recovery solution when needed.

File or Filegroup Backups

File or filegroup backups are a specialized form of database backup in which only certain individual files or filegroups from a database are backed up. This is usually done when there is not enough time to perform a database backup of a very large database. To make use of file and filegroup backups, transaction log backups must be created as well.

Point-in-Time Restoration

Another enhancement is point-in-time restoration of a database. This is the ability to restore a database backup and transaction log to its condition in an earlier stated point in time. You can restore a database to any point in time as long as that point in time is covered in your transaction log backup. You can execute this transaction in both Transact-SQL and SQL Server Enterprise Manager.

Restarting a Restore

SQL Server 7.0 provides a way to restart a backup and restore operation that for one reason or another (such as power failure) stopped in mid-process. This method applies only to tape backups.

Query Processor Enhancements

In the past, many relational database management systems had query processors that were optimized for OLTP environments. As a result, the query processor could not handle the more complex query types that one finds in typical data warehousing operations. With version 7, the SQL Server query processor has been optimized to better handle complex queries.

New Join Techniques

The query processor can now support complex queries against large databases. The query processor incorporates new join techniques, which are more scalable to large databases than the nested-loop join used in earlier versions of SQL Server. These new join techniques provide the query processor with options for an optimum plan of query execution.

The join types are merge, hash match, and hash match teams.

Merge

The query optimizer generally uses a merge join when join fields are indexed and the columns represented in the index cover the query. The merge join directly uses the sorted merge columns, which are defined by the equality clauses of the join, to perform the join. The query is performed by scanning an index if one exists on the proper set of columns, or is performed after sorting these columns.

Hash Match

The query optimizer generally uses a hash match join on large, non-indexed tables. Hash joins create hash values based on the equality clauses of the join and compare them with values in a hash table, looking for matches.

Hash Match Teams

The query optimizer generally uses the hash match teams join when there are multiple hash operators using the same columns or a set of columns working together as a team.

See "Understanding Merge Joins" and "Understanding Hash Joins" in the SQL Server Books Online for a detailed description of the join technology.

New Index Intersection Methods

SQL Server 7.0 now uses various index intersection and union methods to quickly filter data before retrieving rows from a database. These methods allow for resolving of entire queries or portions thereof, without reading the data pages from disk physically (or at least reading a smaller subset than would be required using the single-index technique from earlier versions of SQL Server).

Other Enhancements

Other enhancements include

Intra-query parallelism Intra-query parallelism describes the process by which SQL Server generates a query plan that divides a query into multiple parts that can be processed on different CPUs.

Improved disk input/output (I/O) For a majority of implementations, version 7 of SQL Server provides more efficient disk I/O for data warehousing than prior versions by using 8K database pages versus the 2K pages that were used in earlier versions. Reduced disk I/O means better efficiency.

Cost-based analysis The query optimizer recognizes that data warehouse schemas are different from OLTP database schemas. It is able to use a cost-based analysis optimized for data warehouse types of databases and queries.

Lesson Summary

The data warehousing environment contains many challenges to ease of use and efficient operation. SQL Server 7.0 offers many features that directly address and resolve some of the obstacles that must be overcome to implement a successful analytical environment, whether that environment is an enterprise-level data warehouse or a smaller data mart.