In contrast to a relational database, which is solely designed as a reliable data storage engine, you can think of an Analysis Services database as a rich semantic layer that provides information to users in the way they like to think about it, and which also loads data into a cache with some precalculated summaries to increase performance.
Getting Started with Analysis Services
We start by using the BI Development Studio to create a new Analysis Services project. This will create a new local folder on your development machine that will contain the definitions for the Analysis Services objects that we will be creating. Later when we have finished designing the objects, we can deploy the project to a server that is running Analysis Services to create the cube and allow users to access the information.
You will notice that this is a different approach from working with normal relational databases. So far, when we have used the SQL Server Management Studio for tasks such as defining the data warehouse database, any modifications that we made to objects such as tables or indexes were immediately updated to the live database when we clicked the Save button.
Defining the Logical Data Model
All Analysis Services databases have one or more data sources that contain the information that will be loaded into the cubes. In our example, the source will be the manufacturing data warehouse database. The first step for creating an Analysis Services project is to add a data source and specify the type of database and connection information, using the same techniques we used for Integration Services projects in the previous chapter.
Most source databases contain a large number of tables and views. You can select the parts of the source database that will be used for analysis by defining a logical view of the database called a data source view (DSV), so we will be focusing this logical view on the dimension and fact views that we created when we set up the database. The DSV can also be used to define relationships between objects and to implement business logic by adding calculated columns to the logical view.
The DSV acts as an abstraction layer over the source system and can prove useful when you are building cubes over legacy data sources where you don't have access to add simplifying views over the normalized underlying schema. When using data warehouses as the source, it is usually better to add any required business logic or calculated columns into the underlying views rather than using the DSV. As discussed in the "Data Modeling" section, we can add calculated columns to the underlying views in the database. These views can then be managed along with their corresponding tables and can be made available for other applications, too.
Defining the Relationships Between Dimension and Fact Views
After we have added all the dimension and fact views to the DSV, we need to create the relationships between them. If we were using physical tables rather than views, the wizard that created the DSV could detect these relationships based on the primary key and any referential integrity constraints, but this doesn't work for views, so we need to create the relationships manually.
The simplest approach is to arrange the dimension tables roughly around the fact table, as shown in Figure 5-2, and then drag each key column from the fact table to the corresponding dimension table primary key column. We also need to create the relationships for any snowflake dimensions such as Product. Similar to the fact columns, you drag the foreign key to the primary key (such as dragging the ProductCategoryKey column from Product Subcategory to Product Category).
Figure 5-2. Data source view
For dimensions such as Customer, there may be multiple relationships with the fact table. You can simply repeat the drag-and-drop process for each column on the fact table, such as ShippingCustomerKey and BillingCustomerKey. This is referred to as a "role playing dimension" in SQL Server Books Online.
Tip: Dealing with Multiple Date Relationships
Fact tables frequently contain multiple date columns, which are often used to calculate values such as the total time taken to move from one state to another. When you are defining relationships in the DSV, you may decide to create relationships between the Time dimension and the fact table for each of the date columns. This makes sense as long as users are going to expect to be able to analyze the information by each of the different dates. So, when the users are looking for year-to-date sales, they would need to decide which kind of date analysis they would use, such as by order date or ship date.
For the Time dimension in the Manufacturing database, we will create relationships for the ShipDateKey, OrderDateKey, and DueDateKey columns on the Shipments fact table. This decision (like many decisions in BI) will make some things easier and others harder. If we had defined a single relationship with the ShipDateKey column on the Shipments fact table, this would have made things simpler for the user. There would be no ambiguity about which Time dimension to pick, but users would not be able to perform certain analyses such as the number of orders placed last month versus the number of orders shipped.
Building the Initial Cube
Now that the logical data model has been defined in the DSV, we can create a cube. BI Development Studio includes a wizard that will analyze the relationships that you have defined, as well as the actual data in the tables; the wizard usually produces a pretty good first pass for a cube design. The wizard makes suggestions for which tables are probably facts and which are probably dimensions, and also looks at the data in your dimension columns to try and come up with some reasonable attributes and hierarchies where possible.
A single cube can contain many different fact tables with different granularities and different subsets of dimensions. Each fact table that you include in a cube becomes a measure group, with a corresponding set of measures based on the numeric fact columns. This allows users to easily compare information from different fact tables, such as comparing Actual Revenue from a Sales measure group with Planned Revenue from a Budget measure group. This may trip you up if you are familiar with other BI technologies that had one separate cube per fact tablein Analysis Services 2005, the most common approach is to have only one cube in the database with multiple measure groups.
You may have noticed that the Ship Method table was identified as both a dimension and a fact. This is because there are numeric or money columns on the dimension table, and the wizard is including the table as a possible fact because the users might want to include those columns as measures in a query. In addition to having a Ship Method dimension, the cube will include the numeric measures from the Ship Method table as part of a separate Ship Method measure group.
Tip: Why the Wizard Sometimes Identifies Dimensions as Facts
When using the Cube Wizard, you will often get to the step that shows which tables were identified as facts or dimensions and see that one of your dimension tables is incorrectly showing up as a fact. This is often due to a missing relationship or a relationship defined in the wrong direction in the DSV, and can usually be fixed by canceling the wizard, returning to the DSV to define the relationship, and then rerunning the wizard.
Defining the Time Dimension Periods
When you select a Time dimension in the Cube Wizard, you are prompted to specify the types of columns in the table. For example, you can select calendar year, month and day columns, as well as the equivalent columns that contain fiscal or manufacturing calendar information. The wizard uses the types that you select to apply special logic for time dimensions such as creating sensible hierarchies, and calculation formulas that you may add to the cube, such as year-to-date values, also use these types.
For elements such as years or months, Time dimension tables often have both descriptive columns containing the name of the period (such as FY 05 or Quarter 2), as well as a key column that identifies the period. This key column may be unique, such as a datetime column storing the first day of the period as the key (for example, 1/1/2006 for the first quarter of 2006). Alternatively, the key column may be a repeating number, such as using the numeric index of the period (for example, 3 for March).
When designing the Time dimension table, it is better to use the first approach (that is, create unique keys for periods), and then select these unique keys when specifying Time dimension periods in the wizard. Using the unique keys rather than the descriptions (such as selecting Year rather than Year_Name) will result in more efficient storage and avoid any problems with hierarchies that may be caused by nonunique keys, such as strange-looking hierarchies with members in the wrong places.
Loading Data into the Cube
When the wizard has finished creating the cube, it will be opened up in the cube designer. You can use the Cube Structure tab shown in Figure 5-4 to view the dimensions and measures in the cube, and to add or remove new measures and dimensions. Before we start changing the structure of the cube, however, we should probably take a look at how the cube would look when a user browses it.
Figure 5-4. Cube designer
If you click the Browser tab after creating a cube, you get an error message that "Either the user does not have access to the database, or the database does not exist." The problem is that although Visual Studio has created the files on your local PC that describe the definition of the cube and dimensions, we have not yet deployed these definitions to the server or loaded data into the objects. If you have a background with relational databases, you can think of our progress so far as similar to having used a fancy designer to generate a bunch of CREATE TABLE SQL statements, but not having executed them on the database server to create the tables, or loaded any data into them.
You can fix this problem by selecting Process from the Database menu. Visual Studio will detect that the definitions that you have created on your development PC differ from the definitions on the server and prompt you to build and deploy the project to the server first. When this deployment has been completed, the server will contain the same cube and dimension definitions that you designed locally, but they will still not contain any data. After the definitions have been deployed, you are prompted to process the database. This executes queries against the source database tables to load the cube and dimensions with data. Note that this might take a while when you have large volumes of data.
You can change to which server the database is deployed by choosing Properties from the Project menu and changing the target server.
Using the Cube Browser
The cube browser in BI Development Studio is intended to show developers how the cube will look when users browse the information. Of course, this assumes that the users have access to a BI application that supports the new features in SQL Server 2005, such as measure groups and attributes. If not, their experience may be very different, and you should test your cube design in the applications that they will ultimately use.
To use the browser, just drag and drop the measures that you want to analyze on to the center of the display, and then drag any hierarchies or attributes on to the columns and rows (see Figure 5-5). You can start to see how some of the decisions that the wizard makes show up in the browser, such as names of measures and attributes, and how hierarchies are structured. After building the initial cube, you will probably want to spend some time trying out different combinations and taking note of aspects that you would like to change.
Figure 5-5. Cube browser
For example, you may notice that the wizard has not created any hierarchies for the Plant dimension, even though it would be useful for users to be able to drill down through divisions to plants. Because Analysis Services 2005 is attribute based, users could actually build this hierarchy dynamically at runtime by simply dropping the Division Name on to the rows, followed by the Plant Name. Still, it is usually helpful for users to see some prebuilt hierarchies, so we will go ahead and add some in the next section.
If you are building a cube that doesn't have much data in the fact table yet, you might have difficulty figuring out how your attributes and hierarchies will look because the browser by default only displays rows and columns that actually have data. If you want to include the rows with no associated fact data, select Show Empty Cells from the Cube menu.
Setting Up the Dimensions
After we spend some time browsing through the data in the Manufacturing cube, we can see that we need to change a number of areas. This is where developing BI solutions gets interesting, because there are probably no clear requirements that you can turn to that say things like "the users would probably prefer to have a Divisions hierarchy to pick from rather than selecting two or three plant attributes manually." There is a lot to be said for an iterative approach to developing cubes, where you build prototypes and show them to users for feedback, and then release versions of the cube to allow the users time to explore the system and give more feedback.
Because many dimension tables contain a lot of fields that are not particularly useful for users, you should exercise some care about which attributes you create rather than just creating an attribute for every field. Too many attributes can be confusing to users, and for very large dimensions, it may impact query performance because each attribute expands the total size of the cube.
Changing the Display Values for Attributes
When the Plant dimension was created, the wizard created separate Plant Name and Division Name attributes. There is also a special attribute called Plant based on the PlantKey column. Each dimension has one attribute like this that has its Usage property set to Key, usually based on the surrogate key, which uniquely identifies a dimension record but doesn't actually contain any information that the user would be interested in seeing. If a user includes the Plant attribute in the browser, the user will see meaningless integer values.
What we can do to fix this is to change the Plant attribute so that the plant name is displayed rather than the surrogate key. This can be accomplished by clicking the Plant attribute in the dimension designer, going to the NameColumn property in the Properties window, and selecting the Plant Name column. This means that having a separate attribute for Plant Name is now redundant, so we can remove it by selecting the attribute and clicking the delete button.
Also, now we have one attribute called Plant and another called Division Name, which is somewhat inconsistent, so it's probably a good idea to change the attribute name to Division because you are not restricted to using the original source column name for an attribute (see Figure 5-6).
Figure 5-6. Defining a Plant hierarchy
Adding Hierarchies to a Dimension
A hierarchy is a way to organize related attributes, providing a handy way for users to navigate the information by drilling down through one or more levels. Although users can define their own drilldown paths by dragging separate related attributes onto the browser, providing a sensible set of hierarchies for each dimension will help them to get started with common analyses.
The next change we make for the Manufacturing cube is to add a hierarchy to the Plant dimension that drills down through Division and Plant (see Figure 5-6). To set up our hierarchy, you can drag the Division attribute on the hierarchies and levels area of the dimension designer, and then drag the Plant name underneath it to create two levels. The name will default to Hierarchy, but this can be changed by right-clicking the new hierarchy and choosing Rename to change the name to something more meaningful.
Usually when you define a hierarchy, the bottom level contains the key attribute for the dimension (Plant, in this case) so that users can drill down to the most detailed data if they want to; this is not mandatory, however, and you can create hierarchies where the lowest level is actually a summary of dimension records.
Another area of the dimension that you might sometimes want to change is the name of the All level. When you include a hierarchy as a filter in the cube browser, the default selection is the top of the hierarchy, which includes all the data in the dimension. The name that displays for this special level can be changed for each hierarchy using the AllMemberName property of the hierarchy, such as All Divisions for the hierarchy we just created. You could make a similar change to the dimension's AttributeAllMemberName property, which affects the name used when any simple attribute is used as a filter.
Declaring the Relationships Between Attributes
One of the most important steps you need to take when working with dimensions is to correctly define the relationships between attributes. For example, in a typical Geographic dimension, city is related to state, and state is related to country. These relationships are used by Analysis Services to figure out which aggregates would be useful, so they can have a big impact on the performance of your cube and also affect the size of your dimensions in memory.
Setting Up the Relationships in the Time Dimension
If you look at the structure of the Time dimension that the wizard has created, you can see that we can clean up a few areas. When we set up the Time dimension in the Cube Wizard, we specified the key columns such as Year and Quarter rather than Year_Name or Quarter_Name, so we also need to change the NameColumn property for each of the attributes to point to the corresponding text column containing the name of the period. We can also rename the hierarchies and some of the attribute names.
If you expand the key attribute in the Dimension Structure tab, you will notice that the wizard created a relationship between the key attribute and every other attribute. The reason is that each attribute needs to be related to the key in some way, and it is technically true that every attribute can be uniquely determined by the key. When you think about how calendars actually work, however, you can probably see that there are other potential ways of defining these relationships. Each month belongs to one quarter, and each quarter belongs to a single year.
Because strong relationships exist between the time period levels, we should define the attribute relationships accordingly so that each child attribute has a relationship with its parent. Starting with the regular calendar periods of Year-Quarter-Month-Date, we can expand each of the attributes in the attributes pane on the left, and then drag the Year attribute onto the <new attribute relationship> marker under the Quarter attribute, and drag the Quarter attribute onto Month. The final level of relationships between Month and Date is already defined by default, so we don't need to create this, but we do need to delete the existing Year and Quarter relationships under the Date attribute because these have been defined in a different way (see Figure 5-7).
Figure 5-7. Defining Time attribute relationships
The same approach applies to the Fiscal and Manufacturing attributes because they also have strong relationships, so we can repeat the process of relating Fiscal_Year to Fiscal_Quarter, Fiscal_Quarter to Fiscal_Month, and Fiscal_Month to Fiscal_Day. However, every attribute needs to be related in some way to the dimension key, whether directly like Month or indirectly like Year. So, we also need to create a relationship between Fiscal_Day and Date so that all the fiscal attributes also have a relationship with the key.
Modifying the Cube
Now that we have the dimensions looking more or less the way we want them, we can turn to setting up the cube itself. The Cube Structure tab in the cube designer shows the measures and measure groups that have been defined, as well as a diagram that shows the underlying parts of the data source that make up the cube.
Numeric Formats for Measures
One of the areas that you probably noted when browsing the cube was that the numeric formats of the measures didn't really match up with how users would want to see the information. For example, currency measures are shown with many decimal positions. You can change the display format by selecting a measure in the Cube Structure tab and modifying the FormatString property. The formats that you can specify are based on the Visual Basic syntax, which basically consists of a set of special named formats such as Percent or Currency, as well as characters that you can use to build user-defined formats.
If you have a lot of measures, changing the formats one at a time is painful, and the standard view of measures doesn't allow you to select more than one measure at a time. A feature can help with this, as shown in Figure 5-8: Select Show Measures In Grid from the Cube menu, and you can then select multiple measures at a time and set all their formats simultaneously.
Figure 5-8. Setting numeric formats
Adjusting the Relationships Between the Cube and the Dimensions
When the wizard was creating the cube, it looked at the relationships that you had defined between the fact and dimension tables in the DSV to determine how to add the dimensions to the cube. You can adjust the results using the Dimension Usage tab in the cube designer, which shows the grain of the different measure groups (see Figure 5-9).
Figure 5-9. Dimension Usage tab
We need to modify a few things to get the cube to work the way we want. As you can see, a dimension such as Time or Sales Territory can be used multiple times in a single cube. The names that are shown to the user for these role-playing dimensions are based on the column names from the fact table. For example, the names that were created for the Time dimension are Due Date, Ship Date, and Order Date. You can modify these names by clicking the dimension on the left side of the Dimension Usage tab and then changing the Name property.
Usually, the order date is used as the standard way of analyzing a date, so we can name this relationship Date and rename the others to Date Due and Date Shipped so that they are listed together in the users' tools.
Adding Calculations to the Cube
Many of the key measures that are part of our manufacturing solution are just pulled directly from the fact table, but a lot of the really interesting information that the users need is derived from the basic data. Cubes can contain calculated measures that take data from the physical measures and apply complex logic. These calculated measures are computed at query time and do not take up any disk space; and because they are based on the Multidimensional Expressions (MDX) language, you can create sophisticated metrics if necessary.
On-Time Delivery Analysis
One of the major business requirements is to analyze how the company is doing when it comes to delivering shipments on time. We have already added some physical measures such as On Time Count that can help with this, but we need to add more to the cube to really allow the users to make decisions. For example, it's probably more important to be able to see the percentage of shipments that were on time rather than just a count.
To add the calculated measure, switch to the Calculations tab and click the New Calculated Member button (see Figure 5-10). The formula for this measure is simply the number of shipments that were on time, divided by the total number of shipments:
[Measures].[On Time Count]/[Measures].[Shipments Count]
Figure 5-10. Defining calculated measures
The major rule that you need to follow when naming calculated measures is that if you include spaces in the name, you have to surround the whole name with square brackets, such as [On Time Percent]. Because the name inside the brackets will be shown in the users' analytical tools, it's a good idea to use spaces and try to make the name as meaningful as possible. This may be counterintuitive for database designers who are used to avoiding spaces in names, but it is important to remember that the names in an Analysis Services cube are shown to users and need to make sense to them.
If you tried to use the Days Late or similar measures that we discussed in the "Data Model" section in a query, you will probably have realized that they are not useful at the moment because they just give the total number of days late for whatever filters you specify. A much more useful measure is the average number of days late for whatever the user has selected, whether that is, for example, a certain type of product or a date range such as the past three months.
The formula to express the average days late is to divide the total number of days late by the number of shipments in question:
[Measures].[Days Late]/[Measures].[Shipments Count]
Because the three physical measures (Days Late, Days Notice, and Days To Ship) aren't really useful, we can switch back to the Cube Structure tab and change the Visible property of each of the three measures to False after we have created the equivalent average calculations. This is a common approach for measures that need to be in the cube to support calculations, but should not be shown to users.
We have some helpful measures such as Total Cost and Discount Amount in the cube, but the other business requirement that we need to support more fully is profitability analysis. Although profitability calculations are not particularly complex, it is much better to define them centrally in the cube instead of requiring users to build them in their client tool or add them to reports.
Percentages are often a useful way to understand information, such as adding a Discount Percentage measure that expresses the discount as a percentage of the total sales amount. Users could then select groups of customers to see what kinds of discounts they have been offered, or look at product categories to see which products are being discounted.
We can also add a Contribution measure that shows the profit achieved after deducting costs and discounts. Because it will probably be useful to also see this number as a percentage, we can add another measure that divides the new Contribution measure that we just created by the total sales amount:
You can use the Browser tab of the cube to test the calculated measures that you have defined. It's a good idea to try different filters and combinations of dimensions on the rows and columns to make sure that your new measures make sense in all the circumstances that the user will see them. Because adding a calculation is a change to the structure of the cube, you need to deploy the solution again and click the Reconnect button in the cube browser to make the new measures show up.
The first thing you will notice is that your new measures are not grouped under the existing measure groups such as Shipments or Budget, but are at the bottom of the list. Because the users probably don't care which measures are calculations and which are physical measures, you should go back to the Calculations tab and select Calculation Properties from the Cube menu and specify an associated measure group such as Shipments for each of the calculations.
Showing Order and Invoice Details
Several columns on the fact table contain potentially useful information that we cannot define as measures because they have no meaning when they are aggregated. For example, if users are looking at all the shipments for the past year, there is no point in showing them the total or average of all the thousands of Carrier Tracking Numbers or Customer PO Numbers.
This kind of column is often referred to as a degenerate dimension, because you can think of it as a dimension with only a key (like Order Number) and no descriptive attributes. Analysis Services 2005 instead uses the term fact dimension, meaning a dimension that is based on a fact table.
Figure 5-11 shows how a user can see the invoice and carrier tracking numbers for a set of shipments by selecting attributes from the Shipment Details dimension that we just created.
Figure 5-11. Shipment Details dimension
Managing Security for Users
The last step before we can deploy our solution is to figure out which users will need to have access to the database and whether they will need to see all the information or just parts of it. You can develop sophisticated security schemes by adding roles to your Analysis Services project, by selecting New Role from the Database menu.
Each role defines which cubes and dimensions users will have access to, and can even restrict the data that they will see (for example, limiting the Sales Territory dimension to show only North American territories for U.S. sales reps). Dimensions or measures with potentially sensitive information can be turned off completely for some groups of users.
After the role has been defined, you can set up the membership by adding Windows groups or specific user accounts to the role. You can test that the role is working the way you intended using the cube browser, by clicking the Change User button on the left of the toolbar and selecting the role.