Overview of Analysis Services | Microsoft SQL Server 2000 Administrators Companion

3 4

Analysis Services is a set of tools provided to assist you in developing and managing data used in online analytical processing. Analysis Services consists of the Analysis Service server, English Query, and other supporting components. The Analysis Service server constructs cubes of data to assist in multidimensional analysis. The term "cube" is used to describe aggregate data. This aggregate, or summary, data is used for complex analytical queries such as monthly sales results and sales projections. (Cubes are described in more detail in the section "OLAP Cubes" later in this section.)

In multidimensional analysis, multiple queries look at a database from different points of view, or dimensions. Let's look at an example. Imagine a bicycle store database in which sales data for the last year is kept. One query in a multidimensional analysis operation can look at customer buying habits. Another query can look at monthly sales. Yet another query can look at sales of a particular bicycle or component. Although the data is shared by all of these queries, each query takes a different view (dimension) of the data.

Analysis Services Components

Analysis Services provides a number of tools and wizards you can use to access multidimensional data. Analysis Services consists of the following components:

Analysis Manager Provides a graphical user interface for accessing the Analysis services such as building cubes, managing security, and browsing data sources.
Data Warehousing Framework Consists of a set of components and APIs that implement SQL Server's data warehousing features.
Data Transformation Services (DTS) Assists in loading and transforming data into the data mart or data warehouse. DTS consists of an Import Wizard and an Export Wizard that allow for movement of data as well as data transformation. DTS is described in detail in Chapter 24.
Repository Contains a number of interfaces, database schema models, and predefined data transformations to fit into the Data Warehousing Framework. Because data transformations occur on a regular basis, their definitions can be stored for future reuse.
Data Mining Provide algorithms for defining and implementing multidimensional cubes.
English Query Transforms English-language questions into SQL statements that can be run against the database.
Extensible Markup Language (XML) Provides a standard formatting and data representation language. XML is a key component of application-to-application data transfer and is used for publishing data to the Internet.

As you'll see in this chapter, these components fit together like pieces in a puzzle to provide a uniform tool.

OLAP Cubes

The primary form of data representation within Analysis Services is the OLAP cube. A cube is a multidimensional representation of both detail and summary data. Detail data is specific row data, whereas summary data is aggregate data. Cubes are designed based on the analytical requirements set by the data itself. Each cube represents a different business entity, such as sales or inventory. Each side of the cube presents a different dimensional picture of the data. In other words, a cube consists of various planes of data, hence the term "cube of data."

Analysis Services cubes are built using one of two types of database schemas: the star schema or the snowflake schema. (Even though the topic of schemas is really a development topic, this chapter briefly describes the two types of schemas just mentioned so that you can better understand Analysis Services.) Both the star schema and the snowflake schema are composed of fact tables and dimension tables. Analysis Services aggregates the data in these tables to create the cubes. Let's look at this process in a bit more detail.

Fact Tables

A fact table is the table in the data warehouse that stores historical data. This historical data is the core information of the data warehouse. In our bicycle store example, this information is a record of transactions (both database transactions and sales transactions) that have occurred in the bicycle store. This record includes data such as the transaction date, transaction type, item sold, dollar amount of the transaction, customer name, and salesperson name. This record can be used as the basis for the multidimensional analysis.

As you can see, fact table data revolves around a business transaction. This transaction can be the sale of an item, a credit card transaction, a return, and so on. In essence, the fact table records some type of business event.

The fact table in a data warehouse is the largest table of the database and experiences the most activity. As you might imagine, the fact table can contain millions of records and can take up over a terabyte (or 1024 gigabytes) of space.

Dimension Tables

A dimension table is used to define the fields in the fact table—for example, the salesperson name, the transaction type, or the item. This process is similar to normalization, or the breaking down of data into groups for more efficient processing. While the fact table contains historical information about transactions, the dimension tables contain information that is used to derive useful material from the fact table. In other words, dimension tables are used to specify the meaning of the data contained in the fact table.

For example, a fact table containing sales records might have a dimension table containing information about sales representatives that is used to create summary data such as monthly sales per salesperson. And it might have a dimension table containing regional data that is used to create summary data such as monthly data per region.

Unlike the fact table, dimension tables are usually quite small and contain only a few rows each. A data warehouse contains usually only one or two fact tables but typically several dimension tables.

Schemas

The star schema, or flat schema, is common in data warehouses. This schema consists of a single fact table and several dimension tables. The star schema is usually represented as a fact table surrounded by dimension tables, in the shape of a star. Each dimension table corresponds to a column in the fact table. The dimension tables are used to form the basis of the analysis on the data in the fact table.

In a snowflake schema, or multidimensional schema, several dimension tables are joined before being joined to the fact table—in other words, several layers of dimension tables are created, and each layer corresponds to a column in the fact table. The star and snowflake schemas are illustrated in Figure 29-1.

click to view at full size.

Figure 29-1. The star and snowflake schemas.

Data Aggregation

Analysis Services creates aggregates of columns in the fact table based on the data in the dimension tables. So, for example, dimension tables that relate to employees can be used to create summations of sales data by employee. Dimension tables related to items can be used to create summations of data based on items. Because the aggregations based on the dimension tables create different slices, or dimensions, of the data, a virtual cube of data is formed, as described earlier.

The OLAP cubes created by Analysis Services are essentially the aggregation functions that are calculated according to the star or snowflake schema. You use the Analysis Services Wizard to create these aggregations, which you use as you will to create business models and make business decisions.

Metadata

The term "metadata" is used to describe data about data. Thus, the summary and detail data that is used to describe the data in the database is considered metadata. In the example we've been using, the sales table and surrounding dimension tables represent the data. The summary information that we create (sales by salesperson, sales by item) is the metadata. The ability to create metadata (summary tables or aggregations) is the main benefit provided by Analysis Services. With Analysis Services, you can create the metadata easily and then get on with other work without having to maintain this data by hand.

Data Analysis Enhancements in SQL Server 2000

SQL Server 2000 incorporates several enhancements and new features for data analysis and data warehouses. These enhancements provide additional tools and information to aid in the data analysis process. In this section, you'll learn about some of the major enhancements.

Data-Mining Enhancements

Analysis Services includes new data-mining technology that can be used to discover data relationships in both relational databases and OLAP cubes. These relationships can be added to existing OLAP cubes to provide additional data analysis. One unique new data-mining feature is the Microsoft Decision Tree. The Decision Tree uses sophisticated classification techniques and algorithms to analyze data. The decision tree process then constructs one or more decision trees that can be used to perform predictive analysis on new data. For example, a decision tree could be constructed to analyze the credit history and buying history of potential customers at our bicycle store. This decision tree could thus be used to predict the credit worthiness of a customer.

Another new data-mining feature in Analysis Services is the use of clustering. The clustering technology used in data mining is completely different from the type of clustering described in Chapter 12. When Analysis Services performs clustering, it uses an algorithm called the nearest-neighbor algorithm to quickly group data records into clusters that have similar characteristics. Many times, these relationships are not obvious or intuitive. Thus, clustering technology can open up new avenues of data analysis.

Additionally, the SQL Server data-mining components incorporate new wizards and dialog boxes that make them easy to use. These enhancements enable the DBA to quickly perform most of the tasks involved in creating and maintaining a data mart or data warehouse.

Dimension Enhancements

SQL Server 2000 contains several enhancements to dimension tables. SQL Server now supports parent/child dimensions, relational OLAP (ROLAP) dimensions, and write-enabled dimensions.

The parent/child dimension allows the definition of hierarchies that are based on parent/child links between the members in a source table. An example of a parent/child relationship is assembling a component from parts. A single part, the parent, can have many subcomponents, the children. Using a parent/child dimension "ties" the subcomponents to the part when data analysis is performed.

A ROLAP dimension alleviates the size limitations inherent in the standard multidimensional OLAP (MOLAP) model that Analysis Services uses. The MOLAP model allows dimensions containing up to approximately 5 million members. Once the member set grows beyond this limit, a ROLAP dimension is required. The ROLAP model can grow extremely large, but the MOLAP model significantly outperforms ROLAP when querying the member set. For that reason, you should define ROLAP models only on dimensions that are extremely large.

When a write-enabled dimension is used, the members of the dimension can be updated by using Analysis Manager and client applications that support dimension write-back. SQL Server roles are used to control the client applications' write access to the dimension. SQL Server roles are explained in Chapter 34.

Security Enhancements

SQL Server 2000 also includes security enhancements that enable you to better protect the data used in the analysis of your business, which can be extremely sensitive. The enhancements include changes to the security of dimension tables, cell security features, and support for additional authentication techniques.

The dimension tables now function within the role-based security model of SQL Server. For each role defined, you can limit access to the individual dimensions, levels, and members. In addition, you can set read and read/write permissions on those resources. SQL Server 2000 supports role-based security on both FAT-based and NTFS-based systems.

SQL Server 2000 allows you to apply roles at the level of a cube's cell. Analysis Manager includes dialog boxes that allow you to control a role's access to any combination of a cube's cells. In addition, you can vary the cube's read and read/write permissions for each role.

Because it includes the Windows 2000 security model, SQL Server 2000 allows the Kerberos protocol, NT License Manager Security Support Provider, or any other provider that uses the Security Support Provider Interface (SSPI) to perform authentication when a user or an application requests access to cubes and their data. This allows you to use a consistent security model across all levels of your SQL Server installation.

English Query Enhancements

English Query has been enhanced in SQL Server 2000 to provide greater integration with the Microsoft Visual Studio 6 suite of products. This allows developers to integrate English statements rather than T-SQL statements into applications. In addition, a new graphical authoring tool is included to assist in the development of English Query statements. SQL Server 2000 also includes the SQL Project Wizard, which automatically creates the underlying database structures to support English Query, alleviating some of the complexity of setting up an English Query environment. The wizard scans the tables of a database and builds the associated SQL Server components.