Designing a Data Mart


Once you establish the need for a data mart, it is time to create a design. As we learned in Chapter 3, a data mart is made up of measures, dimensions organized in hierarchies, and attributes. We begin our design by identifying the information that our decision makers need for effective decision making. We then need to reconcile this with the data we have available from our OLTP systems and organize this data into the data mart components: measures, dimensions, hierarchies, and attributes. When this is complete, we can build the database structure for the data mart using either a star or snowflake schema.

We discuss each step of the data-mart design process in the following sections. This is followed by a section called "Learn By Doing." The Learn By Doing section lets you make the concepts more concrete by applying what you just learned to a business scenario from Maximum Miniatures, Incorporated. I encourage you to work through these Learn By Doing exercises to make what can be complex and overwhelming concepts more tangible and straightforward. As Aristotle said, "One must learn by doing the thing, for though you think you know it, you have no certainty until you try."

Decision Makers' Needs

Business intelligence design should never be done in a vacuum. As we discussed in Chapter 1, the goal of business intelligence is to provide the tools for effective decision making. Therefore, any business intelligence design must start with the decision makers themselves. What foundation and feedback information do they need? How do they need that information sliced and diced for proper analysis? To create truly effective business intelligence, these questions need to be answered by the decision makers themselves.

There are two important reasons to have the decision makers involved in this design process. First, the decision makers are the ones in the trenches. They know the choices that are made each day in the organization's operation. They also have a pretty good idea of what information can aid them in making those choices.

Second, the decision makers are the ones who ultimately determine the success or failure of a project. They do this through their willingness or unwillingness to use the resulting business intelligence tools. Your tool may produce dynamite information but, ultimately, it is a failure if no one uses that information to produce more effective decisions.

Involving decision makers in the design of the data mart structures distributes the perceived ownership of the project. Most people who get their brainstorming ideas on the whiteboard during a design meeting or are allowed to submit a design suggestion via e-mail feel some sense of having contributed to the project. They feel a small piece of ownership in the project. Just as the owner of a single share in a billion-dollar corporation cares whether the company's stock goes up or down, the person who feels ownership in a project, no matter how small their actual contribution might be, cares about the success or failure of that project. A decision maker who has taken a small piece of ownership in our business intelligence project is far more likely to use the resulting tool and, if appropriate, to push for others to make use of it as well.

So, with the goal of both gaining important insight into the decision makers' needs and creating a sense of ownership among the future users of our business intelligence tool, we need to have our decision makers answer the following questions:

  • What facts, figures, statistics, and so forth do you need for effective decision making? (foundation and feedback measures)

  • How should this information be sliced and diced for analysis? (dimensions)

  • What additional information can aid in finding exactly what is needed? (attributes)

The answers to these questions form half of the required design information. The other half comes from the OLTP data itself.

Available Data

The input and the ownership buy-in of the decision makers are important to our data mart design. Next comes the reality check. The fact is, we cannot place any measures, dimensions, hierarchies, or attributes in the data mart if they are not represented in the OLTP data source.

We need to analyze the data to be received from the data source to make sure that all of the information requested by the decision makers can be obtained from there. Measures and attributes can come directly from fields in the OLTP data or from calculations based on those fields. Dimensions and hierarchies must be represented in the data and relationships contained in the OLTP data.

If a piece of requested information is not present in the OLTP data source, we need to determine if it is present in another data source. If so, data from these two data sources can be joined during the populating of the data mart to provide the decision makers with the desired result. If the requested information is unavailable, we have to work with the decision makers to either determine a method to gather the missing information in the future or identify an alternative bit of information already present in the OLTP data source.

Data Mart Structures

We are now ready to specify the structures that will be in our data mart. These are the measures, dimensions, hierarchies, and attributes. These structures lead us to the star or snowflake schema that is going to define our data mart.

Measures

We start our data mart design by specifying the measures. The measures are the foundation and feedback information our decision makers require. We reconcile these requirements with what is available in the OLTP data to come up with a list of measures as shown in Figure 6-4.

image from book
Figure 6-4: Designing the measures in a data mart

In Chapter 4, we learned measures are numeric quantities. The following are some examples of numeric data that can be used as measures:

  • Monetary Amounts

    • The cost of raw materials

    • The value of a sale

    • Operational expenses

    • Labor expenses

  • Counts

    • The number of items produced

    • The number of items ordered

    • The number of items shipped

    • The number of items returned

    • The number of calls to customer service

  • Time Periods

    • The number of minutes or hours required to produce a product

    • The number of days required to fill an order

    • Mean time between failure of a product

In the design, we need to note the following for each measure:

  • Name of the measure

  • What OLTP field or fields should be used to supply the data

  • Data type (money, integer, decimal)

  • Formula used to calculate the measure (if there is one)

As previously discussed, the data cannot appear out of thin air. It must be copied from or be calculated from somewhere else. Therefore, identifying the OLTP fields that supply the data is important.

Refer to the "Learn By Doing—Designing the Maximum Miniatures Manufacturing Data Mart" section for an example of measure design.

Dimensions and Hierarchies

As we learned in Chapter 3, dimensions are used to spread a measure into its constituent parts. Hierarchies are used to organize dimensions into various levels. Dimensions and hierarchies are used to drill down into a measure to move from more general information to more specific information. While measures define what the decision makers want to see, the dimensions and hierarchies define how they want to see it.

When the decision makers tell us they want to see "total sales by sales person by year," they are describing a measure, total sales, and two dimensions: sales person and date. In discussions with decision makers, dimensions often are preceded with the words "by," "for each," or "for every." When the decision makers tell us they want to be able to "roll up sales persons into sales regions" or "drill down from year into quarter," they are describing hierarchies. The sales region dimension is above the sales person dimension and the year dimension is above the quarter dimension in the hierarchy they are describing. These are all indications of how the decision makers want to view the measure.

We again reconcile the requested dimensions and hierarchies with what is available from the OLTP data to come up with the list of dimensions and their hierarchy for our design. This is shown in Figure 6-5. In the design, we need to have the following listed for each dimension:

  • Name of the dimension

  • What OLTP field or fields are to be used to supply the data

  • Data type of the dimension's key (the code that uniquely identifies each member of the dimension)

  • Name of the parent dimension (if there is one)

image from book
Figure 6-5: Designing the dimensions and hierarchies in a data mart

Refer to the "Learn By Doing—Designing the Maximum Miniatures Manufacturing Data Mart" section for an example of dimension and hierarchy design.

Attributes

Attributes provide additional information about a dimension. Attributes may result from information decision makers want to have readily available during analysis. Attributes may also result from information decision makers want to filter on during the analysis process.

As before, we need to reconcile the requested attributes with the data available from the OLTP database to come up with the list of attributes in our design. This is shown in Figure 6-6. In the design, we need to include the following for each attribute:

  • Name of the attribute

  • What OLTP field or fields is to be used to supply the data

  • Data type

  • Name of the dimension to which it applies

image from book
Figure 6-6: Designing the attributes in a data mart

Refer to the "Learn By Doing—Designing the Maximum Miniatures Manufacturing Data Mart" section for an example of attribute design.

Stars and Snowflakes

Data marts are architected using either a star schema or a snowflake schema. Refer to Chapter 3 if you need a refresher on these two layouts. Our last step is to turn our lists into either a star or a snowflake schema. Figure 6-7 shows a star schema, while Figure 6-8 shows a snowflake schema.

image from book
Figure 6-7: A star schema

image from book
Figure 6-8: A snowflake schema

Recall that all the measures are placed in a single table called the fact table. The dimensions at the lowest level of the hierarchies are each placed in their own dimension table. In the star schema, all the information for a hierarchy is stored in the same table. The information for the parent (or grandparent or great-grandparent, and so forth) dimension is added to the table containing the dimension at the lowest level of the hierarchy.

The snowflake schema works a bit differently. In the snowflake schema, each level in the dimensional hierarchy has its own table. The dimension tables are linked together with foreign key relationships to form the hierarchy. Refer to Chapter 3 for a discussion of the advantages and disadvantages of star and snowflake schemas.

Once we create our schema, we are ready to implement that schema in a database. Before we look at implementing the schema, however, let's walk through an example of data mart design.

Learn by Doing—Designing the Maximum Miniatures Manufacturing Data Mart

Feature Highlighted
  • Designing a data mart

In this section, we apply the knowledge gained in the previous sections to complete a sample task for Maximum Miniatures, Incorporated. In this case, we design the Manufacturing data mart to hold information that is initially logged by the manufacturing automation system in comma-delimited text files.

Business Need The vice president (VP) of production for Max Min, Inc. wants to analyze the statistics available from the manufacturing automation system. He would like an interactive analysis tool, rather than printed reports, for this analysis. In keeping with Max Min's new business intelligence strategy, Microsoft SQL Server 2005 Analysis Services is the platform for this analysis tool. Because the manufacturing automation system does not use a database, logging everything to comma-delimited text files instead, a data mart must be designed and built as a repository for this information.

The manufacturing automation system controls all the machines used by Max Min to create its figurines. Each machine handles all the steps in the manufacturing process of a figurine. This includes the following:

  • Filling a mold with the raw material (clay, pewter, or aluminum)

  • Aiding the hardening of this material

  • Removal from the mold when hardening is complete

  • Computerized painting of the figurine, if necessary (pewter figurines are not painted)

  • Curing the paint, if necessary

Multiple painting and curing cycles may be necessary, depending on the intricacy of the paint job required by a product. A quality assurance check is done by the machine operator as the figurine is completed.

Operators log onto a machine. As part of this logon process, the operator tells the manufacturing automation system what product is being produced, along with the batch number of the raw material being used by that machine. The operator also makes an entry in the system when a figurine is rejected.

An interview with the VP of production yielded the following data requirements for effective decision making:

  • Number of accepted products by batch by product by machine by day

  • Number of rejected products by batch by product by machine by day

  • Elapsed time for molding and hardening by product by machine by day

  • Elapsed time for painting and curing by paint type by product by machine by day

  • Product rolls up into product subtype, which rolls up into product type

  • Machine rolls up into machine type, which rolls up into material (clay, pewter, or aluminum)

  • Machine also rolls up into plant, which rolls up into country

  • Day rolls up into month, which rolls up into quarter, which rolls up into year

  • The information should be able to be filtered by machine manufacturer and purchase date of the machine

The export file from the manufacturing automation system contains one row for each product produced. Each row includes the following information:

  • Product

  • Batch number of the raw material

  • Machine number

  • Operator employee number

  • Start of manufacture date and time (when the batch run begins)

  • End of manufacture data and time (when the batch is complete)

  • Reject flag

Steps
Note 

In this particular "Learn By Doing" exercise, you don't have that much to do except follow along and make sure you understand each step. You can use these steps as a guideline when you have to create your own data mart design. Subsequent "Learn By Doing" sections require you to use the SQL Server 2005 tools to create part of a working business intelligence solution.

  1. Prepare a list of the measures requested by the VP of production. This is shown in the Measure column of Figure 6-9.

  2. Determine which fields in the OLTP data source supply the data for these measures. This is shown in the OLTP Fields column of Figure 6-9. The Reject Flag field tells us whether a product has been accepted or rejected. This can be used to determine the number of accepted and number of rejected products. The manufacturing system does not track the time spent in each individual production step. It only tracks the date and time at the start of manufacture, and the date and time at the end of manufacture. Therefore, we need to put NOT AVAILABLE for these two items.

  3. Resolve any problems with information that is unavailable. In this case, a follow-up interview with the VP reveals he will be satisfied with knowing the elapsed minutes for the entire manufacturing process. When our list is updated with this information, it appears as shown in the Measure and OLTP Fields columns of Figure 6-10.

  4. Add the data types and calculations to the list. This is shown in the Data Type and Formula columns of Figure 6-10.

  5. Prepare a list of the dimensions requested by the VP of production. This is shown in the Dimension column of Figure 6-11.

  6. Determine which fields in the OLTP data source are going to supply the data for these dimensions. This is shown in the OLTP Fields column of Figure 6-11.

  7. Resolve any problems with information that is not available. The manufacturing automation system does not include information on the hierarchies within Max Min. To include these hierarchies in the data mart, we need to pull data from another system. It turns out that the Accounting System has the data we need. We remove the Paint Type dimension because this data is not available electronically. When our list is updated with this information, it appears as shown in the Dimension and OLTP Fields columns of Figure 6-12.

  8. Add the data type of the dimension's key along with the name of the parent dimension. This is shown in the Data Type and Parent Dimension columns of Figure 6-12.

  9. Prepare a list of the attributes requested by the VP of production. This is shown in the Attribute column of Figure 6-13.

  10. Determine which fields in the OLTP data source will supply the data for these attributes. Remember, some of this data needs to come from the accounting system. This is shown in the OLTP Fields column of Figure 6-13.

  11. Resolve any problems with information that is unavailable. In this case, we do not have any problem attributes, so no changes need to be made to the list.

  12. Add the data type of the attribute along with the name of the dimension it is associated with. This is shown in the Data Type and Dimension columns of Figure 6-13.

  13. Turn the lists in Figure 6-10, Figure 6-12, and Figure 6-13 into your choice of a star schema or a snowflake schema. In this case, we use a snowflake schema. Place the measures and their data types in the ManufacturingFact table, as shown in Figure 6-14.

  14. Place each dimension into its own dimension table as shown in Figure 6-15. Include a table name beginning with "Dim" and a primary key designation in each table.

    Note 

    The time-related dimensions of day, month, quarter, and year do not need dimension tables. Instead, a single datetime field is placed in the ManufacturingFact table (DateOfManufacture). The entire time-related hierarchy will be extrapolated from this field when we create a cube based on this data mart in a later chapter. This is done using a feature of SQL Server Analysis Services.

  15. Create the dimensional hierarchies by adding foreign keys to the dimension tables. This is shown in Figure 6-16.

  16. Link the lowest-level dimensions to the fact table by adding foreign keys to the fact table. Also, add the attributes to the dimension tables. The final Manufacturing data mart schema is shown in Figure 6-17. This may be a pretty funny looking snowflake, but it is, indeed, a snowflake schema.

Measure

OLTP Fields

Number of Accepted Products

Reject Flag

Number of Rejected Products

Reject Flag

Elapsed Minutes for Molding and Hardening

NOT AVAILABLE

Elapsed Minutes for Painting and Curing

NOT AVAILABLE


Figure 6-9: Requested measures for the Manufacturing data mart

Measure

OLTP Fields

Data Type

Formula

Number of Accepted Products

Reject Flag

Int

Count when reject flag is false

Number of Rejected Products

Reject Flag

Int

Count when reject flag is true

Elapsed Minutes for Manufacturing

Start of manufacture date and time, End of manufacture date and time

Decimal(6.2)

DATEDIFF( mi. [Start of manufacture date and time]. [End of manufacture date and time])


Figure 6-10: Finalized measures for the Manufacturing data mart

Dimension

OLTP Fields

Product

Product

Product Subtype

NOT AVAILABLE

Product Type

NOT AVAILABLE

Batch

Batch

Machine

Machine

Machine Type

NOT AVAILABLE

Material

NOT AVAILABLE

Plant

NOT AVAILABLE

Country

NOT AVAILABLE

Day

Start of Manufacture Date and Time

Month

Start of Manufacture Date and Time

Quarter

Start of Manufacture Date and Time

Year

Start of Manufacture Date and Time

Paint Type

NOT AVAILABLE


Figure 6-11: Requested dimensions for the Manufacturing data mart

Dimension

OLTP Fields

Data Type

Parent Dimension

Product

Product

Int

Product Subtype

Product Subtype

Accounting System.ProductSubtype

Int

Product Type

Product Type

Accounting System.ProductType

Int

None

Batch

Batch

Int

None

Machine

Machine

Int

Machine Type, Plant

Machine Type

Accounting System.MachineType

Varchar(30)

Material

Material

Accounting System.Material

Varchar(30)

None

Plant

Accounting System.Plant

Int

Country

Country

Accounting System.Country

Char(3)

None

Day

Start of Manufacture Date and Time

Int

Month

Month

Start of Manufacture Date and Time

Int

Quarter

Quarter

Start of Manufacture Date and Time

Int

Year

Year

Start of Manufacture Date and Time

Int

None


Figure 6-12: Finalized dimensions and hierarchies for the Manufacturing data mart

Attribute

OLTP Fields

Data Type

Dimension

Machine Manufacturer

Accounting.Equipment

Varchar(50)

Machine

Date of Purchase

Accounting.Equipment

Datetime

Machine


Figure 6-13: Finalized attributes for the Manufacturing data mart

ManufacturingFact

Field Name

Data Type

AcceptedProducts

Int

Rejected Products

Int

ElapsedTimeForManufacture

Decimal(6.2)


Figure 6-14: The Manufacturing data mart schema with measures in a fact table

image from book
Figure 6-15: The Manufacturing data mart schema with dimension tables added

image from book
Figure 6-16: The Manufacturing data mart schema with hierarchies added

image from book
Figure 6-17: The completed Manufacturing data mart schema

The design of the Manufacturing data mart is complete. Next, we learn how to turn that design into a reality.

Creating a Data Mart Using the SQL Server Management Studio

Now that we have a design for our data mart based on the decision maker's requests and the data available from the OLTP systems, it is time to turn that design into database tables. We are going to build the relational data mart as shown in Figure 6-18. We use the schema in Figure 6-17 to create a database with a fact table and several dimension tables. We then add the foreign key constraints to enforce the relationships in the snowflake schema.

image from book
Figure 6-18: Building the relational data mart

Follow the steps in the "Learn By Doing—Creating the Maximum Miniatures Manufacturing Data Mart Using the SQL Server Management Studio" section to create the Manufacturing data mart.

Learn by Doing—Creating the Maximum Miniatures Manufacturing Data Mart Using the SQL Server Management Studio

Features Highlighted
  • Creating a data mart database

  • Creating dimension tables

  • Creating a fact table

  • Setting relationships

Business Need The business need was stated in the previous Learn By Doing section where we created the schema for the Manufacturing data mart. In this section, we implement the Maximum Miniatures Manufacturing data mart schema using the SQL Server Management Studio.

Steps
  1. Open the SQL Server Management Studio, as we discussed in Chapter 5.

  2. Connect to a development or test SQL Server. (Do not perform this or any other "Learn By Doing" activity on a server used for production database operations!)

  3. Expand the node for the SQL Server in the Object Explorer window. Right-click the Databases folder. The Context menu appears as shown in Figure 6-19.

  4. Select New Database from the Context menu. The New Database dialog box appears as shown in Figure 6-20.

  5. Enter MaxMinManufacturingDM for Database Name. Click on the Options page and select Simple for Recovery model. Click OK to create the database.

    Note 

    Because this is not a transactional database, there is no need to recover the content of this database using the transaction log. The simple recovery model truncates the transaction log on checkpoint, keeping the size of the transaction log small. The Full and Bulk-logged recovery models require a backup of the transaction log before it is truncated.

  6. Expand the Databases folder and expand the MaxMinManufacturingDM database folder. Right-click the Tables folder and select New Table from the Context menu. A Table Designer tab appears as shown in Figure 6-21.

  7. We use both the Properties window and the Column Properties window as we create our tables. If the Properties window is not visible as shown in Figure 6-21, select Properties Window from the View menu. Be sure to note whether the entries in the following steps are being made in the Properties window or in the Column Properties window.

  8. Begin by creating the DimProduct table from the schema (Figure 6-17). In the Properties window, enter DimProduct for Name and Product Dimension populated from the Manufacturing Automation System export file. for Description.

    Note 

    To use a larger editing area when entering the description, click on the ellipses button () at the end of the Description property entry area. This opens a dialog window with a multiline entry area for typing the description.

  9. In the first row of the Table Designer tab, enter ProductCode under Column Name, select int under Data Type, and uncheck Allow Nulls. Click the Set Primary Key button in the toolbar to make this the primary key for this table.

  10. In the second row of the Table Designer tab, enter ProductName under Column Name, select varchar(50) under Data Type, and uncheck Allow Nulls. (Leave Length in the Column Properties window set to the default value of 50.)

  11. In the third row of the Table Designer tab, enter ProductSubtypeCode under Column Name, select int under Data Type, and uncheck Allow Nulls.

  12. Click the Save toolbar button to save this table design. This creates the table in the database. Next, click the Close button in the upper-right corner of the Table Designer tab.

  13. Next, create the DimProductSubtype table from the schema. Right-click the Tables folder and select New Table from the Context menu. In the Properties window, enter DimProductSubtype for Name and ProductSubtype Dimension populated from the Accounting System.ProductSubtype table. for Description.

  14. In the first row of the Table Designer tab, enter ProductSubtypeCode under Column Name, select int under Data Type, and uncheck Allow Nulls. Click the Set Primary Key button in the toolbar to make this the primary key for this table.

  15. In the second row of the Table Designer tab, enter ProductSubtypeName under Column Name, select varchar(50) under Data Type, and uncheck Allow Nulls. (Leave Length in the Column Properties window set to the default value of 50.)

  16. In the third row of the Table Designer tab, enter ProductTypeCode under Column Name, select int under Data Type, and uncheck Allow Nulls.

  17. Click the Save toolbar button to create the table, and then click the Close button in the upper-right corner of the Table Designer tab.

  18. Use the same process to create the DimProductType, DimBatch, DimMachine, DimMachineType, DimMaterial, DimPlant, and DimCountry dimension tables based on the schema in Figure 6-17. Be sure to uncheck Allow Nulls for all fields and to enter the appropriate data type along with the appropriate length in the Column Properties window. Also, create a primary key for each field that is followed by a (PK) in Figure 6-17.

  19. Right-click the MaxMinManufacturingDM database entry in the Object Explorer window and select Refresh from the Context menu. When you expand the node for columns under each table, the entries should appear just as in Figure 6-22 and Figure 6-23.

    Note 

    If you notice any mistakes as you compare Figure 6-22 and Figure 6-23 with your tables, right-click the table that is in error and select Modify from the Context menu. Make the necessary changes to the table structure to correct the error. Be sure to click the Save toolbar button when you are done. You need to refresh the contents of the Object Explorer window after your changes before again comparing to Figure 6-22 and Figure 6-23.

  20. Now, create the ManufacturingFact table. Right-click the Tables folder and select New Table from the Context menu. In the Properties window, enter ManufacturingFact for Name and Manufacturing Fact populated from the Manufacturing Automation System export file. for Description.

  21. Using the schema in Figure 6-17, create the entries for the seven fields in the ManufacturingFact table. When creating the ElapsedTimeForManufacture field definition, select decimal under Data Type, and then enter 6 for Precision and 2 for Scale. (Continue to uncheck Allow Nulls for all fields.)

    Note 

    The decimal data type is used to store real numbers. The precision determines the total number of digits contained in the number. The scale tells how many of those digits are to the right of the decimal. The ElapsedTimeForManufacture field has a maximum of six digits with two of those digits to the right of the decimal. Therefore, the largest number that can be stored in this field is 999.99.

  22. Click the square to the left of the DateOfManufacture field definition. Hold down SHIFT and click the square to the left of the MachineNumber field definition. This selects these two field definitions and all the field definitions in between, as shown in Figure 6-24. Click the Set Primary Key button in the toolbar to make these four fields a compound primary key for this table.

  23. Click the Save toolbar button, and then click the Close button in the upper-right corner of the Table Designer tab.

  24. Right-click the Tables folder and select Refresh from the Context menu. Expand the node for Tables and expand the node for the ManufacturingFact table. Right-click the Keys folder and select New Foreign Key from the Context menu. The Foreign Key Relationships dialog box appears.

    Note 

    When New Foreign Key is selected, the SQL Server Management Studio sets some default values for the new foreign key relationship that is being created. The primary key table and the foreign key table will both be set to the name of the table selected in the Object Explorer window. The primary key field(s) and the foreign key field(s) will default to the current primary key of that table. In Step 25, the default primary key table and foreign key table will both be the ManufacturingFact table. The default primary key fields and foreign key fields will both be the set of four fields in the compound primary key for the ManufacturingFact table. If you click OK at this point, the SQL Server Management Studio creates this rather silly, "self-referential" foreign key relationship. To avoid this, we need to either modify the values for the foreign key relationship, as we do in Steps 25–29, or delete the foreign key relationship before clicking OK.

  25. Click the Tables and Columns Specifications entry, and then click the ellipsis () button as shown in Figure 6-25. This displays the Tables and Columns dialog box.

  26. Select DimBatch for Primary Key Table.

  27. Click the first row below Primary Key Table as shown in Figure 6-26. This activates a field drop-down list in this row. Select BatchNumber from this drop-down list.

  28. Note that the ManufacturingFact table is already selected as the Foreign Key Table. The ManufacturingFact table is selected because you clicked New Foreign Key under the ManufacturingFact table entry in the Object Explorer window. Click the first row below Foreign Key Table to activate the field drop-down list. Select <None> from this drop-down list. Select <None> from the drop-down lists in the second and fourth rows under Foreign Key Table as well. The rows end up blank as shown in Figure 6-27. BatchNumber will be left as the only foreign key field.

  29. Click OK to exit the Tables and Columns dialog box.

  30. Click Add in the Foreign Key Relationships dialog box.

  31. The Tables and Columns Specifications entry is still highlighted. (Click this entry if it is not highlighted.) Click the ellipsis () button again to display the Tables and Columns dialog box.

  32. Select DimMachine for Primary Key Table.

  33. Click the first row below Primary Key Table to activate the field drop-down list. Select MachineNumber from this drop-down list.

  34. Click the first row below Foreign Key Table to activate the field drop-down list. Select <None> from this drop-down list. Select <None> from the drop-down lists in the second and third rows under Foreign Key Table. MachineNumber will be left as the only foreign key field.

  35. Click OK to exit the Tables and Columns dialog box.

  36. Repeat Steps 30–35 using DimProduct as the primary key table, ProductCode as the primary key field, and ProductCode as the foreign key field.

  37. Click Close to exit the Foreign Key Relationships dialog box.

  38. Click the Save toolbar button to create the foreign key relationships you have just defined. The Save dialog box appears showing the tables that are to be modified. Both the primary key table and the foreign key table are modified when a foreign key relationship is created. Click Yes to continue.

  39. Click the Close button in the upper-right corner of the Table Designer tab to close the ManufacturingFact tab.

  40. Expand the node for the DimProduct table. Right-click the Keys folder and select New Foreign Key from the Context menu. The Foreign Key Relationships dialog box appears.

    Note 

    Two foreign key relationships appear in the Foreign Key Relationships dialog box: the relationship we created between the Manufacturingfact table and the DimProduct table, and the new relationship we are in the process of creating.

  41. Repeat Steps 31–35, using DimProductSubtype as the primary key table, ProductSubtypeCode as the primary key field, and ProductSubtypeCode as the foreign key field. Note, the DimProduct table is already selected as the foreign key field because we clicked New Foreign Key under the DimProduct table entry in the Object Explorer window.

  42. Click Close to exit the Foreign Key Relationships dialog box.

  43. Click the Save toolbar button to create the new foreign key relationship you have just defined. The Save dialog box appears, showing the tables to be modified. Click Yes to continue.

  44. Click the Close button in the upper-right corner of the Table Designer tab to close the DimProduct tab.

  45. Use this same process to create the remaining foreign key relationships, as shown in the following table:

    Create New Key for This Table

    Primary Key Table

    Primary Key Field

    Foreign Key Field

    DimProductSubType

    DimProductType

    ProductTypeCode

    ProductTypeCode

    DimMachine

    DimMachineType

    MachineType

    MachineType

    DimMachine

    DimPlant

    PlantNumber

    PlantNumber

    DimMachineType

    DimMaterial

    Material

    Material

    DimPlant

    DimCountry

    CountryCode

    CountryCode

  46. Right-click the MaxMinManufacturingDM database entry in the Object Explorer window and select Refresh from the Context menu. When you expand the node for columns and the node for the keys under each table, the entries should appear just as in Figure 6-28 and Figure 6-29.

image from book
Figure 6-19: The Database folder Context menu

image from book
Figure 6-20: The New Database dialog box

image from book
Figure 6-21: The SQL Server Management Studio ready to create a table

image from book
Figure 6-22: The tables in the MaxMinManufacturingDM with columns (Part 1)

image from book
Figure 6-23: The tables in the MaxMinManufacturingDM with columns (Part 2)

image from book
Figure 6-24: Selecting multiple fields to create a compound primary key

image from book
Figure 6-25: Entering the Tables and Columns Specifications for a foreign key

image from book
Figure 6-26: Activating the field drop-down list in the Tables and Columns dialog box

image from book
Figure 6-27: Setting the drop-down lists to <None> in the Tables and Columns dialog box

image from book
Figure 6-28: The tables in the MaxMinManufacturingDM with columns and keys (Part 1)

image from book
Figure 6-29: The tables in the MaxMinManufacturingDM with columns and keys (Part 2)

We have now manually built the Manufacturing data mart structure using the SQL Server Management Studio. Even though this process took 46 steps, we are only halfway done. In Chapter 8, we complete the other half of the process by creating an Analysis Services cube that uses this data mart as its data source.

Creating a Data Mart Using the Business Intelligence Development Studio

In the previous section, we completed the first half of a two-step process for creating a relational data mart and an Analysis Services cube built on that data mart. This is a long and, as you will see somewhat redundant process. The Business Intelligence Development Studio comes to our rescue here. We will now use this tool to create the Analysis Service cube and its underlying relational data mart for the Maximum Miniatures sales data at the same time. This is shown in Figure 6-30. The Cube Wizard creates an Analysis Services cube, and then creates the matching dimension and fact tables in the Sales data mart based on the cube definition.

image from book
Figure 6-30: Building the Sales relational data mart and the Sales Analysis Services cube

Follow the steps in the "Learn By DoingCreating the Maximum Miniatures Sales Data Mart Using the Business Intelligence Development Studio" section to create the Sales data mart through the Cube Wizard.

Learn by Doing—Creating the Maximum Miniatures Sales Data Mart Using the Business Intelligence Development Studio

Features Highlighted
  • Creating an Analysis Services cube using the Cube Wizard in the Business Intelligence Development Studio

  • Creating a relational data mart from a cube definition in the Business Intelligence Development Studio

Business Need The VP of sales for Max Min, Inc. would like to analyze sales information. This information is collected by three OLTP systems: the Order Processing System, the Point of Sale (POS) System, and the MaxMin.com Online System (refer to Figure 2-4). The Order Processing System and the MaxMin.com Online System both use Microsoft SQL Server as their back-end databases. The POS System uses XML files to transfer data from each of the five Max Min-owned retail stores.

Microsoft SQL Server 2005 Analysis Services is the platform for this analysis tool. Because the data must come from three different systems and because one of these systems is file-based, a relational data mart is created to hold the information that is to serve as the data source for the cube.

The VP of sales would like to be able to analyze the following numbers:

  • Dollar value of products sold

  • Number of products sold

  • Sales tax charged on products sold

  • Shipping charged on products sold

These numbers should be viewable by:

  • Store

  • Sales Promotion

  • Product

  • Day, Month, Quarter, and Year

  • Customer

  • Sales Person

An analysis of the three data sources shows that all of this information is available from at least one data source.

We use the Sales and Inventory cube template to create the Analysis Services cube and the relational data mart in the Business Intelligence Development Studio.

Note 

We need to use the SQL Server Management Studio to create a database for our Max Min Sales data mart. The Business Intelligence Development Studio creates the dimension and fact tables in this database later in the process.

Steps
  1. Copy the "Sales and Inventory" folder in the materials from the book's web site to the cube template folder. The default location of this folder is C:\Program FilesMicrosoft SQL Server\90\Tools\Templates\OLAP\Cube Template.

  2. Open the SQL Server Management Studio and connect to your development or test SQL Server. Create a new database called MaxMinSalesDM, which uses the Simple recovery model. If you have any questions about creating this database, refer to Steps 3–5 in the "Learn By Doing—Creating the Maximum Miniatures Manufacturing Data Mart Using the SQL Server Management Studio" section of this chapter.

  3. Close the SQL Server Management Studio.

  4. Open the SQL Server Business Intelligence Development Studio as we discussed in Chapter 5.

  5. Click the New Project button in the toolbar.

  6. Make sure Business Intelligence Projects is selected from the Project Types, and then select Analysis Services Project from the Templates.

  7. Enter MaxMinSalesDM for Name and set the Location to the appropriate folder. Leave Create Directory for Solution checked.

    Note 

    You should create a folder to hold all of the "Learn By Doing" activities in this book. Make sure this folder is clearly named, so it does not become confused with production source code.

  8. Click OK to create the project.

  9. Once the project is open, right-click on the Cubes folder in the Solution Explorer and select New Cube from the Context menu. The Cube Wizard dialog box appears.

  10. On the Welcome page, click Next.

  11. On the Select Build Method page, make sure that Build the Cube Without Using a Data Source is selected. Click Use a Cube Template and select Sales and Inventory from the drop-down list. Click Next.

    Note 

    You do not need to use a cube template to create the Analysis Services cube and the relational data mart at the same time. We are using a cube template in this process so you can see how they work. This particular template uses a standard definition for sales measures. This standard appends 1997 to the measure (or fact) groupings used in the template. This is a means of identifying the standard that was used to create the resulting cube.

  12. On the Define New Measures page, uncheck all of the measures except for Store Sales and Unit Sales as shown in Figure 6-31.

  13. Click the highlighted cell in the Add New Measures grid. Enter Sales Tax in the Measure Name column. Select Sales Fact 1997 in the Measure Group column. Select Currency in the Data Type column. Sum should be selected by default in the Aggregation column.

  14. Click the cell containing Add New Measure. Enter Shipping in the Measure Name column. Sales Fact 1997 should be selected by default in the Measure Group column. Select Currency in the Data Type column. Sum should be selected by default in the Aggregation column.

  15. Click Next.

  16. On the Define New Dimensions page, uncheck the row in the Select Dimensions from Template grid with Warehouse in the Name column.

  17. Click the highlighted cell in the Add New Dimensions grid. Enter Sales Person in the Name column. Check the box in the SCD (Slowly Changing Dimension) column.

    Definition 

    A Slowly Changing Dimension (SCD) varies over time. Of course, the data in many dimensions can change over time. What differentiates an SCD is the fact that the history of that change is important and must be tracked in the business intelligence information. For example, the sales person dimension changes as sales people are transferred from one sales territory to another. We need to track which territory a sales person was in last year versus which territory they are in this year, so sales can be rolled up into the correct territory for year-to-year comparisons.

  18. Click Next.

  19. The Define Time Periods page enables us to specify range, format, and hierarchy for the time dimension. Select Wednesday, January 01, 2003 from the First Calendar Day date picker. Check Month, Quarter, and Year in Time periods. (Leave Date checked as well.)

    Note 

    If we use week in a time hierarchy, we can specify the first day of the week to use with this dimension. We can also specify the language to use for the members of the time dimension.

  20. Click Next.

  21. The Specify Additional Calendars page lets us add additional calendars, other than the standard 12-month calendar to the time dimension. This might include a calendar for a fiscal year that starts on a day other than January 1st or the ISO 8601 standard calendar. Max Min tracks sales based on the standard 12-month calendar, so simply click Next.

  22. The Define Dimension Usage page enables us to specify which dimensions are related to each group of measures. The Max Min Sales data mart is only using one group of measures, called Sales Fact 1997, so we want all the dimensions to apply to this group. Check the Sales Person dimension. All the other dimensions should be checked by default. Click Next.

  23. The Completing the Wizard page lets us review the cube we defined. Enter MaxMinSales for the Cube Name. We can expand the nodes in Preview to view the measures and dimensions that are to be created in this cube. Note, only the Time dimension has a hierarchy. We add additional hierarchies manually. When all the nodes except the Attributes nodes are expanded, the Preview should appear as shown in Figure 6-32.

  24. Check Generate Schema Now. This causes the Cube Wizard to create the relational database to hold the data mart that serves as the source for this cube.

  25. Click Finish.

  26. Because we checked Generate Schema Now, the Schema Generation Wizard is automatically launched. On the Welcome to the Schema Generation Wizard page, click Next.

  27. The Specify Target page enables us to select the database server where the data mart relational database is created. This is done by creating a data source and a data source view. Click New. The Data Source Wizard appears.

  28. On the Welcome to the Data Source Wizard page, click Next. On the Select How to Define the Connection screen, click New.

  29. The Connection Manager dialog box lets us define a connection string used to access a database. Provider enables you to select between Native OLE DB providers and the .NET providers for either SQL Server or Oracle. In this case, we leave Provider set to Native OLE DB\SQL Native Client. For Server Name, enter the name of the database server where you created the MaxMinSalesDM database in Step 2 of this section.

  30. Select Use Windows Authentication under Log On to the Server. The Schema Generation process requires Windows Authentication. Make sure that your current Windows credentials have rights to create tables in the MaxMinSalesDM database.

  31. Select MaxMinSalesDM from the Select or Enter the Database Name drop-down list. (A valid set of credentials must be specified before the drop-down list is populated.)

  32. Click OK to exit the Connection Manager dialog box.

  33. In the Select How to Define the Connection dialog box, click Next.

  34. On the Impersonation Information page, if you are using Windows Authentication in your data connection, select "Use a specific user name and password" and enter a valid Windows user name and password for database access. If you entered a specific SQL Server login in the data connection, you do not need to worry about impersonation. Click Next. The Completing the Wizard page will appear.

    Note 

    Analysis Services needs to access the data mart relational database in order to populate and update the information in the OLAP cube. This is done using the information in the data connection selected in Step 30. The Analysis Services updates may run as a background process. This can be an issue if the data connection specifies Windows Authentication or if the connection is to a file-based database such as Access or FoxPro. In these situations, Analysis Services needs to know what Windows credentials to use when accessing the data source. In other words, it needs to impersonate a Windows identity while accessing the data.

  35. The Completing the Wizard page shows the finished connection string. Click Finish to exit the Data Source Wizard.

  36. We are now back to the Specify Target page of the Schema Generation Wizard. Click Next.

  37. The Subject Area Database Schema Options page lets us select the operations we want the Schema Generation Wizard to perform. Leave all the items checked to allow the wizard to take care of the whole process. Populate should be selected from the Populate Time Table(s) drop-down list. Click Next.

    Note 

    There are three options for handling the population of the tables for the time dimension. The Populate option creates records in the time dimension table for the timeframe specified on the Define Time Periods page in Step 19.

  38. The Specify Naming Conventions page enables us to determine how the tables and fields in the data mart are named. We use all the default settings. Click Next.

  39. The Completing the Wizard page provides a summary of the schema generation we defined. Click Finish.

  40. The Schema Generation Progress dialog box appears. This dialog box shows each step as the data mart (called the subject area database) is created. When the process has completed, the dialog box should appear similar to Figure 6-33. Click Close.

  41. Click the Save All toolbar button.

    Note 

    If an error occurs during the process, a red asterisk appears next to the last step as shown in Figure 6-34. The text next to the red asterisk describes the error. If an error occurs, use the description to determine the cause of the error, click Stop to terminate the process, click Close, and then repeat the process with the problem corrected. The cube and the dimensions may have been created in the Analysis Services project before the error occurred. To restart the entire process, the cube and dimensions need to be deleted from the project. The most common cause of an error is not using Windows Authentication in the data source or being logged onto Windows with credentials that do not have rights to create tables in the MaxMinSalesDM database.

image from book
Figure 6-31: After selecting measures in the Cube Wizard

image from book
Figure 6-32: The MaxMinSales cube preview in the Cube Wizard

image from book
Figure 6-33: The Schema Generation Progress dialog box with a successful process completion

image from book
Figure 6-34: The Schema Generation Progress dialog box after an error has occurred

The Sales Data Mart

We now have a Sales data mart along with an Analysis Services cube built on that data mart. The star schema that defines both the data mart and the cube is shown in Figure 6-35. We can tell this is a star schema rather than a snowflake schema because all the levels in the time hierarchy are contained in a single Time table, rather than multiple linked tables. We will add additional hierarchies to this cube later.

image from book
Figure 6-35: The Max Min Sales star schema

Also, note the fields in the Sales_Person table. Several fields contain SCD in the field name. These fields are included to help maintain the history of a member of this dimension. We see this in action as we populate the Sales_Person table in Chapter 7.




Delivering Business Intelligence with Microsoft SQL Server 2005
Delivering Business Intelligence with Microsoft SQL Server 2005: Utilize Microsofts Data Warehousing, Mining & Reporting Tools to Provide Critical Intelligence to A
ISBN: 0072260904
EAN: 2147483647
Year: 2007
Pages: 112
Authors: Brian Larson

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net