Lesson 1: Introduction to MDX | Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

Multidimensional expressions (MDX) provide syntax for querying and manipulating the data that is stored in multidimensional cubes. You can use MDX with cubes stored on an OLAP Server or stored in a local file. It can be difficult to visualize multidimensional data structures. This lesson provides a conceptual overview that will make it easier for you to understand these structures when you start to write MDX queries.

After this lesson, you will be able to:

Define MDX and explain why SQL is not suitable for querying multidimensional data sources

Describe a multidimensional schema and understand the concepts of working with multidimensional data

Estimated lesson time: 35 minutes

MDX vs. Transact-SQL

MDX statements resemble Transact-SQL statements in that they both take the format SELECT FROM WHERE , but the similarity ends here. Do not try to closely compare or relate MDX statements with SQL statements. Although they are kindred languages in that they both retrieve data structures, MDX supports expressions and operations relevant only when dealing with multidimensional and time-based data structures. After you gain an initial understanding of MDX, you should approach MDX as an entirely new language.

TIP
Most analysis tools present users with a graphical interface for creating multidimensional queries. The tools generate MDX statements and submit them to the PivotTable Service for you.

Transact-SQL is designed for manipulating tabular data structures. The results of a Transact-SQL query are returned in a two-dimensional rowset. The ADO object for working with a rowset is called a recordset.

MDX is designed for manipulating multidimensional cubes. The results of an MDX query are returned in a multidimensional dataset. The ADO MD object for working with a dataset is called a cellset. You can retrieve the results of an MDX query into an ADO recordset; in this case, the results are automatically "flattened" into a tabular rowset. This concept is discussed in greater detail in Lesson 3, "Programming with ADO MD Objects."

The following topics highlight the major differences to be aware of when working with multidimensional data.

In OLAP, Data Aggregations Are Always Available

SQL tables usually have only the atomic facts. SQL requires the calculations of aggregates as part of the query by using the GROUP BY clause and aggregate functions.

In OLAP, aggregations are precalculated summaries of data that improve query response time by having the answers ready before the questions are asked. For example, a query requesting the weekly sales totals for a particular product line from a data warehouse fact table that contains hundreds of thousands of rows of transactions can take an unacceptable amount of time to answer if the fact table has to be scanned using standard SQL. The response can be almost immediate if the summarization data to answer this query has been precalculated. Precalculation of summary data is the foundation for the rapid response times of OLAP technology.

Hierarchies Commonly Occur in OLAP Dimensions

Cubes are the structure by which OLAP technology organizes summary data into multidimensional structures. Dimensions and their hierarchical levels reflect the queries that can be asked of the cube. Aggregations are stored in the multidimensional structure in cells at coordinates specified by the dimensions.

For example, the question, "What were the sales of product X in 1998 for the Northwest region?" involves three dimensions (product, time, and geography) and one measure (sales). The numeric value in the sales cell within the cube at the coordinates (product X, 1998, Northwest) is the answer to the question.

Hierarchies relate the levels in a dimension. For example, store totals can be rolled up into city totals, which can be rolled up into regional totals, which can be rolled up into national totals. When you ask questions of a cube, you can specify any level in a dimension, and the data can be retrieved directly or by aggregating the lower levels in a hierarchy.

Handling hierarchies in SQL is possible, but it is a complex task compared with the functionality already built into MDX to handle this type of structure.

OLAP Embeds Business Logic in the Cube

SQL requires procedural language to express business logic. Here are some examples of business logic using MDX:

Margin: Sales - Cost

Annual growth: ([FY97] - [FY96])

Margin increase: (Margin,FY97 - [Margin,FY96])

Cost allocation: (Rent, [All Products]) * Sales/(Sales, [All Products])

Moving average (last four periods): Avg(Lag[Time.CurrentMember,3]: Time.CurrentMember)

Working with Multidimensional Data

When you work with MDX, you query a source multidimensional cube and retrieve results into a multidimensional dataset. You should visualize the dataset you want to achieve and then write an MDX query that returns the dataset. The following topics will help you visualize multidimensional datasets and become familiar with MDX terminology. In Lesson 2, "Writing an MDX Query," you will learn how to apply this understanding to writing MDX queries.

How to Understand Multidimensional Data

In your query, you specify how many axes the resulting cube must have. You will typically create two- and three-dimensional cubes. Cubes with more than four axes are very complex and difficult for people to visualize. Each axis has coordinates along it. The data values in the cube occur at the intersections of these coordinates. The intersection of coordinates is usually referred to as a cell. Each cell has a unique coordinate address in the cube.

How Many Dimensions?

When working with MDX, note the following:

A cube is not a square three-dimensional block. It is any multidimensional structure, so you can have a one-dimensional cube, a two-dimensional cube, a three-dimensional cube, and so on.

The dimensionality of the cube is represented by the number of axes it has, not the number of dimensions. Therefore, a one-dimensional cube has one axis; a two-dimensional cube has two axes; a three-dimensional cube has three axes, and so on. You can project many dimensions on a single axis.

Figures 12.1, 12.2, and 12.3 show simple examples of one-dimensional, two-dimensional, and three-dimensional cubes, respectively.

In the one-dimensional cube shown in Figure 12.1, the highlighted cell has a coordinate of (C). Notice that the first axis is numbered axis 0 (zero).

Figure 12.1 A one-dimensional cube

In the two-dimensional cube shown in Figure 12.2, the highlighted cell has coordinates of (C, 1).

Figure 12.2 A two-dimensional cube

In the three-dimensional cube shown in Figure 12.3, the highlighted cell has coordinates of (C, 1, X).

click to view at full size

Figure 12.3 A three-dimensional cube

Because it is difficult to visualize multidimensional datasets, software products generally generate grid or graphical representations of datasets that allow you to create reports from or manipulate the datasets. In this chapter, you will work with two-dimensional datasets that are displayed in a grid with rows and columns. The benefits of using this flattened format are the following:

It is easy to understand.

You can use the MDX Sample Application that is supplied with SQL Server OLAP Services to test and view the results of two-dimensional MDX queries.

Two-dimensional datasets displayed in a grid can easily be printed on the two-dimensional pages of this book.

The MDX Sample Application is only capable of displaying cellsets that have one or two axes. You must always bear in mind that MDX queries do not have to be limited to this format and can be represented in a variety of three-dimensional ways.

Projecting Dimensions

When you write an MDX query, you create the coordinates on an axis by adding dimensions to the axis a process called projecting the dimension. You can project any number of dimensions from the source cube on each axis, but a dimension can be projected on only one axis. Projecting more than one dimension onto a single axis is called nesting dimensions. The dimensions that you project on each axis determine what data will be present in the resulting cube.

For example, you could project a product dimension on the first axis and a time dimension on the second axis of an MDX query that returns sales figures. The dataset will have a cell for the sales of each product for each time period.

The axes in a cube are all equally weighted; they are said to be symmetrical. This is a major advantage over the tabular results returned by SQL, which forces the data to be represented in strict columns and rows. Switching the rows and the columns in a SQL table is a very complicated process. Because axes in a cube are all equally weighted, MDX queries allow you to easily switch the projected dimensions between axes. You can also group and sort data along all axes.

Figure 12.4 shows an example from the Foodmart sample OLAP database. The example shows how easily you can manipulate the axes of an MDX query. This example shows two axes and illustrates projecting more than one dimension on an axis. Members of the Store dimension (Seattle and Beverly Hills) and members of the Product dimension (Beer, Wine, Drinks, and Hot Beverages) are projected on one of the axes. Members of the Time dimension (Q1, Q2, Q3, and Q4 in 1997) are projected on the other axis. Do not try to understand the queries at this stage, but notice that only the boldfaced parts of the two queries have changed in order to switch the axes in the results.

The query for the first result in Figure 12.4 is as follows:

 SELECT      CROSSJOIN({[Store].[Seattle], [Store].[Beverly Hills]},      {[Product].[Beer and Wine].children, [Product].[Drinks],           [Product].[Hot Beverages]}) ON COLUMNS,           {[1997].[Q1], [1997].[Q2], [1997].[Q3], [1997].[Q4]} ON ROWS FROM Sales  WHERE ([Store Sales])

The query for the second result in Figure 12.4 is as follows:

 SELECT      CROSSJOIN({[Store].[Seattle], [Store].[Beverly Hills]},           {[Product].[Beer and Wine].children, [Product].[Drinks],           [Product].[Hot Beverages]}) ON ROWS,      {[1997].[Q1], [1997].[Q2], [1997].[Q3], [1997].[Q4]} ON COLUMNS  FROM Sales  WHERE ([Store Sales])

click to view at full size

Figure 12.4 MDX query results showing switched axes

Accessing Multidimensional Data

Multidimensional expressions provide the syntax for working with the objects in a multidimensional schema. MDX expressions make it possible to navigate a multidimensional cube and create calculated values from the cells in a cube.

Multidimensional Schemas

Schemas are broken down into many objects: cubes, dimensions, members, and cells.

Cubes

The cube is the central metadata object in OLE DB for OLAP on which ADO MD is built. A cube defines a set of related axes that form an n-dimensional grid. Each point in this grid is uniquely identified by a set of coordinates, the values of which come from the component dimensions of the axes.

Dimensions

Within a multidimensional schema, one or more dimensions identify what an axis represents. The dimensions of a cube depend on your business entities. Typically, each dimension is an independent mechanism for selecting data. Dimensions are divided into two subsets:

Axis dimensions, for which data is retrieved for multiple members. Axis dimensions are projected on the axes of the cube.

For example, sales data is retrieved for each of a number of products specified by an axis dimension.

Slicer dimensions, for which data is retrieved for a single member. Slicer dimensions act as a filter to limit the data that is returned in the dataset. All dimensions that are not projected on the axes of the cube are slicer dimensions. If a value is not specified for a slicer dimension, the first member of the dimension is used.

For example, sales data is retrieved for one product specified by a slicer dimension.

Note that the measures in a cube are a dimension. To return multiple measures in the dataset, add the measures dimension to an axis, making it an axis dimension. To return a single measure in the dataset, make the measures dimension a slicer dimension.

Members

Member is the multidimensional schema term for each discrete value in a dimension. In mathematics, you might use the set of real numbers ranging from zero to infinity as the members of the x and y dimensions of a graph. In a multidimensional cube, the members represent what you are trying to evaluate.

The dimension members that are projected on an axis form a set. Sets on an axis are enclosed in curly braces {}. Each member in the set is known as a tuple, or a position. If multiple dimensions are projected on the same axis, each tuple is a combination of a member from each dimension. These tuples are written as the name of each member in the tuple in descending order, separated by periods.

Figure 12.5 shows a query result from the Foodmart sample OLAP database. In the figure, the set of tuples on the rows axis of the grid is {[Seattle].[Beer], [Seattle].[Wine], [Seattle].[Drinks], [Seattle].[Hot Beverages], [Beverly Hills].[Beer], [Beverly Hills].[Wine], [Beverly Hills].[Drinks], [Beverly Hills].[Hot Beverages]}.

click to view at full size

Figure 12.5 Multidimensional schema example

Cells

A cell is the intersection point of a position from each axis. The cell stores the information that you are trying to evaluate. Each cell is identified by a coordinate set, which is a set containing a tuple (position) from each axis. The coordinates of the highlighted cell in Figure 12.5 are ([Beverly Hills].[Beer], [Q3]). Depending on the implementation, OLE DB for OLAP allows cells to contain more than one value.

Example

Consider a two-dimensional cube representing sales for a single grocery store. The columns axis represents the month in which the sale occurred ([January 1998]; [February 1998], and so on). The rows axis represents the product purchased ([Bread], [Milk], [Apples], and so on).

The cells in this cube would be identified with coordinates like ([January 1998], [Bread]), ([January 1998], [Milk]), ([January 1998], [Apples]), ([February 1998], [Bread]), ([February 1998], [Milk]), ([February 1998], [Apples]), and so on.

Dimension Hierarchies

Within each dimension are natural levels of aggregation that combine multiple members. Hierarchies are used to calculate aggregations and to support drilling down into data. A common example is the aggregation of days into weeks, weeks into months, months into quarters, and quarters into years. When you define hierarchies, you define them in terms of levels. A level consists of all the entities that represent the same type of aggregation. In the multidimensional schema, each entity in the level becomes a member in the dimension.

You use dimension names, level names, and member names in MDX queries, so it is important to distinguish between levels in the dimension hierarchy and members of the dimension. For example, Quarter may be a level in a time dimension and Q1 may be a member of the dimension at the Quarter level.

Because of dimension hierarchies, members are arranged in an inverted tree structure within the dimension. Members can have parent and child members. The members of the root level of the hierarchy do not have parents, and the members of the leaf level of the hierarchy do not have children. For example, in a time dimension that has a hierarchy of Year, Quarter, Month, and Day, 1997 is the parent of Q1, Q2, Q3, and Q4. In the same dimension, 31 is a child of July but not of June.

MDX provides functions for including members from different parts of the hierarchy so that you do not have to list them explicitly. For example, you can include all of the members from a dimension, all of the members from a level, or all of the members from selected levels with a single function call.

Some dimensions have multiple hierarchies. For example, a time dimension could have a hierarchy of Year, Quarter, Month, and Day and another hierarchy of Year, Week of Year, and Day of Week. These hierarchies add new points of intersection with the other dimensions and different options for drilling down into the data.

NOTE
Create a multiple-hierarchy dimension in the OLAP Manager by creating more than one dimension named with the same prefix followed by a period and a unique suffix. For example, you can create a time dimension with two hierarchies by creating two dimensions named Time.Quarters and Time.Weeks.

Properties

Properties represent attributes of a member, not the values that are stored at intersection points. For example, if a company s fiscal first quarter runs from July 1 to September 30, properties for a Quarter 1 member can store the starting date and ending date for the quarter.

Each level of hierarchy can have a different set of properties, but each member of a given level must have the same properties as the other members. The properties can store different values for each member, but they must represent the same type of information.

NOTE
Names (of dimensions, levels, and members) must be delimited with square brackets [ ] if they contain spaces or other special characters, if they begin with a digit, or if they are reserved words. For example, Product, Seattle, and Quarter1 do not need to be delimited, but [Store Sales] (contains a space), [1997] (starts with a digit), and [Name] (reserved word) do need to be delimited. It is a good habit to include delimiters, even when they are not required.

Lesson Summary

Multidimensional data can be difficult to visualize. MDX is designed as a standard syntax for working with multidimensional datasets. The MDX multidimensional schema describes a cube in terms of axes, dimensions, hierarchies, members, and cells.