DOT MODELING WORKSHOPS

only for RuBoard - do not distribute or recompile

DOT MODELING WORKSHOPS

The purpose of the remainder of this chapter is to provide assistance to data warehouse practitioners in the delivery of data warehouse applications. This practitioner's guide is subdivided into two main sections, conceptual design and logical design. The conceptual design stage consists of the development of a data model using the dot modeling method described in this chapter. The logical model is developed as an extension of the dot modeling method and this will be described in the next chapter.

The Dot Modeling Methodology

The methodology, as outlined earlier in this chapter, will now be described in detail. As you might expect, there are a fair number of processes associated with the construction of a complete dot model.

Experience shows that these processes can be conducted through the use of two highly structured workshops. The use of joint application development (JAD) workshops is now a well-accepted practice, and holding one or more of these workshop sessions is a good way of gathering user requirements. The purpose of JAD workshops is to enable the development of better systems by bringing together the business people and the system developers. The idea is that the system design is a joint effort. These workshops require skilled facilitation.

Each of the two workshops can be completed in two days

The Information Strategy Workshop

The objective of the information strategy workshop is for the business people, within the customer's organization, to develop their own dot model that reflects their own perception of their business.

The session should last for approximately two days. The emphasis is on the word workshop. Every participant must expect to work and not to be lectured to.

Participants Practitioners

There should be, at least, two consultants present at the workshop sessions. The ideal combination is to have one highly organized workshop facilitator and one business consultant who understands the methodology and the customer's business.

It is very useful to have a third person who is able to record the proceedings of the sessions.

Quite a substantial proportion of the work is actually done in teams . It is quite useful to have an extra person who can assist in wandering from team to team checking that everyone understands what is required and that progress is being made.

The success or failure of these sessions often has very little to do with the technical content of the workshop but has everything to do with softer issues such as:

  • Ambienceare they enjoying actually being there?

  • Relationships between the participants

  • Friendliness of the presenters

  • Comfort of the participants

It is helpful to have consultants who are sensitive to these issues and can detect and respond to body language, suggest breaks at the right time, etc.

Also, where the native language of the participants is different from the native language of the consultants, great care must be taken to ensure that the message, in both directions, is clear. Ideally , at least one of the consultants should be fluent in the language of the participants. If that is not possible, then someone from the customer organization must step in to act as the interpreter. If an interpreter is required to translate every statement by the presenters, then the workshop would generally take about 50 percent longer than normal.

ParticipantsCustomer

The information strategy workshop requires a mixed attendance of business people and IT people. The ideal proportions are two- thirds from the business and one-third from IT. It is very important that the session is not dominated by IT staff.

Getting several senior people together for two whole days can be a virtually impossible task. The only way to achieve this is to ensure that:

  • You have senior executive commitment

  • You book their diaries in advance

It's no good waiting until the week before; you won't get them to attend .

There must be some representation from the senior levels of management. As a test of seniority , we would describe a senior person as one who is measured on the performance of the business. This is someone whose personal income is, to some extent, dependant on the business being successful. In any case, they must be able to influence the business strategy and direction. The key to being successful in data warehousing is to focus on the business, not the technology. There must be a clear business imperative for constructing the data warehouse. It is not enough to merely state that information is strategic. This tendency to cite business requirements as fundamental to success has many followers. Any information system should exist solely to support the firm's business objectives, and it is critical that the business goals are clearly defined. One of the major goals, initially, is to build and validate the business case, and we will be covering this in Chapter 8.

The Information Strategy Workshop Process

The steps involved in running the workshop are now described. Some of the subsequent sections should be regarded as being for guidance only. Different consultants have differing approaches to facilitating workshops, and these variations in technique should be maintained . Here is one method that has been found to be successful in the past. First, the room organization:

  1. The tables in the room should be organized in a horseshoe shape rather than a classroom style, as this greatly encourages interaction.

  2. Plenty of writing space should be available. I personally favor white boards with at least two flip charts to record important points.

  3. Try to work without an overhead projector as they tend to introduce a lecturing or presentation tone to the proceedings, whereas the atmosphere we want to encourage is that of a workshop where everyone contributes .

We'll now go through the workshop step by step. For each step I'll give an estimate for how long it ought to take. Bear in mind that different people might prefer a faster or slower pace.

Workshop Introduction

Estimated time: 60 minutes

Explain why the team is assembled and what the objectives are.

The objective:

By the end of the information strategy workshop we will have built a conceptual model of the information requirements needed to support the business direction of the organization.

Do not assume that everyone in the room will know who all the others are. I have yet to attend a workshop session where everyone knew one another.

Get everybody to introduce themselves . This is, obviously, a good ice-breaking method that has been used successfully for many years . Ask them for:

  • Name (pronunciation is very important if the language is foreign to you)

  • Position in the organization

  • Brief description of their responsibilities

Also, ask them to tell us one thing about themselves that is not work related , maybe a hobby or something.

Breaking the ice is very important in workshops and so it should not be rushed. There are a number of different approaches that you may already be familiar with. One way is to get people to spend 10 minutes talking to the person sitting next to them and for them then to introduce each other. A good variation on this theme is to hang a large piece of paper on the wall and get them to draw a picture of the person they have introduced. If this sounds juvenile, well I suppose it is, but it really does get people relaxed .

I also like to make a note about which of them are business people and which are IT people.

It is a good idea to introduce yourself first and for your colleagues to follow you. This takes the pressure off the participants.

Next come the ground rules. These are important and here are some suggestions. I think they are just common sense, but you never know whom you are dealing with so it is best to be explicit:

Silence means assent. If people do not voice objections to decisions, then they are deemed to have accepted them. It is of no use to anyone if, during the wrap-up session on day two, someone says something like, Well I didn't agree with your basic ideas in the first place!

The sessions start on time. It is incumbent on the facilitator to state clearly and precisely when the next session is due to start. It is the responsibility of all attendees to be there at that time. So it is absolutely vital that the consultants are never late to sessions.

Mobile telephones are not allowed.

Personal criticisms are not allowed. It's OK for them to criticize the consultants, or the method, but not each other.

Only one person is allowed to be speaking at a time.

No stupid questions. This does not mean that you are not allowed to ask stupid questions. It actually means that no question should be regarded as stupid.

Always give them the collective authority to change the rules or add others.

Always give them the collective authority to change or add rules.

The next thing to do is to outline the agenda for the whole of the workshop. People like to know when their breaks are, what time lunch will be served , and roughly at what time they might expect to be leaving in the afternoon. It is always worth asking whether anyone has any time constraints ( flights , etc.) on the last day. Try to accommodate these as far as you are able by manipulating the agenda.

Now we move into the workshop proper. The first session is on business goals.

Business Goals

Estimated Time: 30 Minutes

This is the first real business session. The following question should be posed:

Does your organization have business goals?

This is a directed question in the sense that it should be asked of the senior business people present. You need to pick up the body language quickly because some people may feel:

  1. Embarrassed that they don't know what their business goals are

  2. Uncomfortable about sharing their business goals with outsiders

Experience shows that the former is more common than the latter. Most people have a vague idea about business goals and would offer things like:

  • Increase market share

  • Increase customer loyalty

  • Increase profitability

What we need to get to is to be able to articulate business goals that have the following properties:

  1. Measurable

  2. Time bounded

  3. Customer focused

The third of these is not an absolute requirement, but most business goals do seem to relate to customers in some way. This is particularly true if the data warehouse is being built in support of a CRM initiative. If this is the case, try to guide the participants toward customer-related business goals. So a well- articulated business goal might be:

To increase customer loyalty by 5 percent per annum for the next three years

If you can get one of these types of goals from your business people, that is good. If you can get two or three, that's excellent .

The most likely result is that you won't get any. That's OK too. Its just that, if the organization has business goals and is prepared to share them, then it is sensible to use them.

Thinking About Business Strategy

Estimated time: 1 hour

This is the first real workshop exercise. We split them into groups with three or four people in a group . In order to get the best out of this exercise, the composition of the groups is important. This is why it is useful to know who the business people are and who the IT people are.

We need one senior business person, if possible, in each group. The remaining business people should be spread as evenly as possible throughout the groups and the IT people likewise. Don't allow the IT people to become dominant in any one group. IT people have a tendency to become joined at the hip in these sessions if allowed.

One way of achieving an even spread of business and IT people is as follows :

During the introduction session earlier, the nonspeaking facilitator should note the names and job titles of each of the participants and can form the groups independently.

The attendees can then be simply informed as to which group they have been assigned.

The objectives of the exercise at this stage are:

  1. Decide upon the business goals ( assuming none was forthcoming previously), or use the business goals previously presented. Each team, therefore, is working toward its own set of goals, rather than the entire class having common goals. This is perfectly OK. Each of the business people in the room may have their own goals that will be different from the others' goals. One or two goals are sufficient. Each goal must possess the three properties listed above.

  2. Think about business strategy to help them to achieve the goals. This is a prioritized set of initiatives that the organization might need to implement. It is unlikely that a single item, by itself, would be enough to make the goal achievable.

  3. What steps would they need to take in order to put the strategy into operation? The steps will be driven by the prioritization of the strategic components .

Be careful to allow enough time for this. This process invariably leads to serious discussion in the groups as almost everyone has a view on the single thing their organization needs to do in order to become more successful. These discussions are, in the main, a good thing and should be allowed to run their course. About one hour should be allowed for this part and a break should be included.

It is worth recommending that each person spends the first 15 minutes thinking about the problem, so that each has a contribution to make before the group convenes.

Make sure that you have sufficient room for the group discussions to be conducted without affecting each other.

The groups must record their decisions as, later on, they will be expected to present them to the other groups.

The Initial Dot Model

Estimated time: 12 hours

The class should reconvene for this teaching session. The objective is to get them to understand how to use the dot modeling system. This is quite easy, but the IT people will understand it more quickly than the business people. It is sensible to start by focusing on customer behavior, as this usually is the easiest part for people to comprehend.

The best way to explain how to make a dot model is as follows:

First, on a flipchart or a whiteboard, go through the spreadsheet explanation that we saw earlier in this chapter. Everyone knows what a spreadsheet looks like and can relate to the axes, the cells , etc. Then build a model in front of them, like the one in Figure 5.6. It represents the Wine Club.

Figure 5.6. Initial dot model for the Wine Club.
graphics/05fig06.gif

It is vital that at least some of the members of the team understand how to do this. It is useful to ask questions, especially of the IT people, to convince yourself that each group is capable of developing a rudimentary model. At this stage, we do not expect perfect models and there is a refinement stage later on. However, it is important that they know the difference between a fact and a dimension. This is obvious to some people, but experience shows it is not obvious to others. It helps to draw on the star schema analogy. Most IT people will be able to relate to it straight away. The business people will pick it up, albeit more slowly.

Behavior

Estimated time: 15 minutes

We begin by explaining the meaning of the dot, that is, the behavior. These are the attributes of the dot itself and they represent the focus of the model. Any data warehousing practitioner would be expected to be able to explain this. Typical facts are sales values and quantities . We must be careful to include derived facts such as profit or return on investment (ROI). Each fact attribute must have the properties of being:

  1. Numeric

  2. Summable, or semi summable

Each measurable fact must earn its place in the warehouse. Whoever suggests a fact should demonstrate that it has the capability to contribute toward the business strategy. In addition, questions should be posed, in the form of queries (in natural language, not SQL), to show how the fact will be used.

The measurable fact name, as it will be known in the data warehouse, should be recorded. It is vital that everyone has a clear idea as to the meaning of the facts, and it makes sense to spend a little time focusing on the semantics of the information. In the sales scenario, the question What is a sale? may result in a discussion about the semantics. For instance, an accountant may say that a sale has occurred when an invoice has been produced. A salesperson might say that a sale has occurred when an order is received. A warehouse manager might say that a sale has occurred when a delivery note is signed by the customer. Everyone has to be clear about the meaning of the data before moving on.

We also need to know the time granularity of the measurable fact. This should, ideally, be the lowest level of grain of time that is practical. The best possible level is known as the valid time. Valid time means the time that the event occurred in real life. This usually translates to the transaction level. In a telecommunications application where the data warehouse is to record telephone calls, the valid time relates to the time the call was placed, right down to the second. In other applications, the time granularity might be set to day.

Usually, all the facts will share the same time granularity. However, it is important to know where this is not the case so we record the time granularity for each measurable fact attribute.

For each fact attribute, metadata supporting the attribute should also be recorded. This means a precise description of the attribute in business terms.

The Dimensions

Estimated time: 15 minutes

We now describe the meaning of dimensions. Again, it is expected that data warehousing practitioners understand this term, and so the intention here is to give guidelines as to how to explain the term to nontechnical people.

One way of achieving this is to draw a two-dimensional spreadsheet with axes of, say, customer and product and explain that this is an example of a two-dimensional view of sales. Each intersection, or cell , in the spreadsheet represents the sale of a particular product to a particular customer. I usually draw an intersection using lines and a dot where they intersect. The dot represents the measurable fact and, by using it, you reinforce the dot notation in the minds of the participants. The two-dimensional view can be extended to a three-dimensional view by transforming the diagram into a cube and labeling the third axis as, say, time. Now each intersection of the three dimensions, the dot, represents the sale of a particular product to a particular customer at a particular time. We have now reached the limit of what is possible to draw. So we need a diagrammatic method that releases the dot from its three-dimensional limitation, and this is where the dot model helps. By describing the dimensions as axes of multidimensional spreadsheets, it is usually fairly easy to understand.

Creating the Initial Dot Model

Estimated time: 1 hour

Taking the business goals, the strategy, and the steps they decided upon in the earlier group session, the groups now have to do the following:

  1. Using their highest-priority strategy, decide what information they need to help them. They will need guidance on this at first. Lead them through an example.

  2. Formulate some questions that they would like to be able to ask.

  3. Create a dot model that is able to answer those questions.

You need to allow at least one hour for this exercise and about half an hour for the teaching session. It is useful to include a major break, such as the lunch break, or overnight.

The facilitators must be on hand all the time to assist where required. At this stage you will always get questions like Is this a fact or a dimension? Also, the more switched-on IT people will already start making assumptions about things like:

  • Availability of data from source systems

  • Target database architectures

  • Data quality issues, etc.

They should be discouraged from these lines of thought. It is important that they do not constrain the innovative thinking processes. We are trying to establish the information we need, not what is currently available.

Group Presentations

Estimated time: 30 minutes per group

Each group must now present, to the other groups:

  1. Business goals

  2. Steps in their strategy

  3. Information needed to support the steps

  4. Dot model to support the information requirements

  5. Some example questions that the model supports

  6. How each question is relevant to their strategy

The information should be presented on a flip chart or overhead projector. The groups should each elect a spokesperson. Experience shows that some groups prefer to have two people, or even the whole group, participating in the presentation.

The other groups should provide feedback. The facilitators must provide feedback. The feedback at this stage must always be positive with some suggestions as to how the model could be enhanced. Do not comment on the accuracy or completeness of the models. The refinement process will resolve these issues.

The Refinement Process

Estimated time: 1 hour

We now have a classroom session about refinement of the model.

First, we have to decide whether the groups should continue to work on their own models or whether all groups should adopt the same, perhaps a composite, model. There are no real preferences either way at this time. In either case, the work is still undertaken in groups.

Then we explain about:

  1. Combination of dimensions

  2. Dimensional hierarchies

  3. Inclusion of others' suggestions

Point 1 above concerns dimensions on the original model that are not really dimensions at all. There is an instance of this in the example provided in that the vintage dimension is really no more than an attribute of wine. After discussion, it is sometimes wise to leave these, apparently extraneous, dimensions alone if no agreement can be reached.

The creation of dimensional hierarchies is a fairly straightforward affair. The refined example dot model is shown in Figure 5.7.

Figure 5.7. Refined dot model for the Wine Club.
graphics/05fig07.gif

The groups should now be re- formed to refine the models. The facilitators should wait for about 10 to 15 minutes and then approach the groups to see if they need help. There will usually be some discussion and disagreement about how, and whether, changes should be incorporated.

Presenting the Refined Models

Estimated time: 15 minutes per group

The refined models are re-presented to the other groups. The team has to show:

  1. How the model has evolved from the original model as a result of the review process

  2. How any enhancements, from the previous feedback, have been incorporated

Documenting the Models

Estimated time: 1 hour

The models are documented, in their refined state, using the data model and the first part of the entities and segmentation worksheets (the remaining parts of the entities and segmentation worksheet are completed during the second workshop). The entities include the customers' circumstances and the dimensions from the behavioral dot models. The segmentation relates to the derived segments, which is the final part of our GCM. Sharing a worksheet for these two components of the model is a good idea because it can be difficult for business people to separate the two things in their mind.

Encourage each group member to complete their own worksheets. The facilitators must make sure they take a copy of each group's model. These models provide the primary input to the component analysis workshop.

Some examples of the data model and entities worksheets for the Wine Club are shown in Figure 5.8 (a fuller set of dot model worksheets is included in Appendix B).

Figure 5.8. Dot modeling worksheet showing Wine Club sales behavior.
graphics/05fig08.gif

The following are examples of the entities and dimensions worksheet. The first, predictably, is the customer entity. At this stage, we are really concerned with describing only the entity itself, not the attributes.

Dot Modeling Entities and Segments
Entity Name Retrospection Frequency
Customer True Monthly

Metadata

The customer entity contains all customer details. We require to monitor discontinuous existences and to be able to identify active and nonactive customers.

An active customer is defined to be one who has placed an order within the past 12 months. Any customer not having done so will be classified as inactive. Subsequent orders enable the customer to be reclassified as active, but the periods of inactivity must be preserved.

Attribute Name   PK?
Retrospection Frequency Dependency
     

Metadata

Attribute details to be completed during component analysis workshop.

Source Transformation Data Type
     
Attribute Name   PK?
Retrospection Frequency Dependency
     

Metadata

Attribute details to be completed during component analysis workshop.

Source Transformation Data Type
     

The second example is the wine dimension. This will be used, principally, as a dimension in one or more behavioral models.

Dot Modeling Entities and Segments
Entity Name Retrospection Frequency
Wine True Monthly

Metadata

The wine dimension holds information about the wines sold by the club. Details include prices and costs.

Attribute Name   PK?
Retrospection Frequency Dependency
     

Metadata

Attribute details to be completed during component analysis workshop.

Source Transformation Data Type
     
Attribute Name   PK?
Retrospection Frequency Dependency
     

Metadata

Attribute details to be completed during component analysis workshop.

Source Transformation Data type
     

All that we are recording at this stage is information relating to the entity itself, not to any of the attributes belonging to the entity.

Workshop Wrap-Up

Estimated time: 30 minutes

It is important to summarize the process that has been conducted over the previous two days and to congratulate the participants on having taken the first steps toward the creation of a business-led information model for the organization:

  1. Explain how the model is just the first step and how it becomes the input to the next stage. Briefly describe the main components of the component analysis workshop.

  2. Try to secure commitment from as many people as possible to attend the second workshop.

  3. The arrangements for the second workshop should be confirmed, if that is possible.

  4. Hand out the attendee feedback form and ask each person to complete it. We all want to improve on the service we offer, so we need as much feedback as we can get.

While they are filling in the feedback forms, explain about dimensional attributes and try to encourage them to start thinking about this for the next workshop.

The Component Analysis Workshop

The principal objective of the component analysis workshop is to put some substance into the model that was created in the information strategy workshop.

ParticipantsPractitioners

As far as possible, the same consultancy people as were present in the first workshop should participate in the second. This exercise is more technical than the first and so some additional skills are required but continuity of personnel should be preserved.

ParticipantsCustomer

Similarly, as far as is feasible , the same people as before should be present. However, as has been stated, this is a more technical workshop, and some of the more senior business people should be permitted to send fully authorized deputies to work in their place. The business goals and the first part of the dot model have been established, so their part of the task is complete.

The proportions of business people to IT people can be reversed in this workshop, so maybe two-thirds IT people is OK this time. It is, however, vital that some business people are present.

What is also important is continuity, so I would reiterate the point that we need as many people as we can get from the first workshop to attend this second workshop.

The Component Analysis Workshop Process

The organization of this second workshop is very similar to that of the first workshop. The room layout and writing facilities are the same. We do not need extra rooms for group work this time.

Review of Previous Model

Estimated time: 30 minutes

The purpose of this first exercise is to refresh the minds of the participants as to the state of the dot model as at the end of the first workshop.

Since the previous workshop, it is quite likely that the participants will have thought of other refinements to the model.

Whether these refinements should be adopted depends on the authority of the persons present. As long as any refinements have the full backing of the business, and can be demonstrated to be genuinely in pursuit of business goals, they should be discussed and adopted where appropriate.

The extent of refinement is difficult to assess in general terms. Ideally, the lead consultant on the project will have kept in touch with developments inside the customer organization and will be able to estimate the time needed to conduct this part of the exercise.

If there are no changes required, it should take no more than 30 minutes.

Defining Attributes

Estimated time: 34 hours

This can be quite a time-consuming process. The objective is to build a list of attributes for each dimension in the model.

Make sure lots of breaks are included in this session.

Encourage people to walk around.

The supporting dot modeling entities worksheet should be used for this exercise. We started completing the entities worksheet in the first workshop. Now we can complete it by adding the details of all the attributes. It is recommended that one of the facilitators maintains these worksheets as the session proceeds.

The examples of this worksheet that were previously shown are reproduced with further examples of some attributes. Note that, although the form is shown as completed, some of the information can be added later in the logical modeling stage. Right now you should be aiming to record just the names of the attributes and the business metadata.

As previously, we have completed a couple of examples for the customer's details:

Dot Modeling Entities and Segments
Entity Name Retrospection Frequency
Customer True Monthly
Metadata
The customer entity contains all customer details. We require to monitor discontinuous existences and to be able to identify active and nonactive customers.
An active customer is defined to be one that has placed an order within the past 12 months. Any customer not having done so will be classified as inactive. Subsequent orders enable the customer to be reclassified as active, but the periods of inactivity must be preserved.
Attribute Name Customer Name PK? N
Retrospection Frequency Dependency
False Monthly None
Metadata
The name of the customer in the form of surname preceded by initials
Source Transofrmation Data Type
Customer Admin None Char 25
Attribute Name Customer Address PK? N
Retrospection Frequency Dependency
True Monthly Sales area hierarchy
Metadata
The customer's address
Source Transformation Data Type
Customer Admin None Char 75
Attribute Name Lifetime value indicator PK? N
Retrospection Frequency Dependency
True Monthly None
Metadata
The calculated lifetime value for the customer. Values are from 1 to 20 in ascending value
Source Transformation Data type
Customer Admin SQL package crm_life_value.sql Numeric (2)

The second example again is the wine entity that is used as a dimension in one or more dimensional dot models.

Dot Modeling Entities and Segments
Entity Name Retrospection Frequency  
Wine True Monthly
Metadata
The wine dimension holds information about the wines sold by the club. Details include prices and costs.
Attribute Name Wine.Bottle_Cost PK? N
Retrospection Frequency Dependency    
True Monthly None
Metadata
The cost of one bottle of wine, net of all discounts received and including all taxes, duties , and transportaion charges.
Source Transformation Data Type
Stock (Goods Inwards) None Numeric (7,3)
Attribute Name Wine.ABV PK? N
Retrospection Frequency Dependency
False Monthly None
Metadata
The alcohol content of a wine expressed as a percentage of the total volume to one decimal place.
Source Transformation Data type
Stock (Goods Inwards) None Numeric (3,1)

One worksheet, or set of worksheets, is completed for each entity in the model. This includes the customers' details, products (e.g., wine), and market segments. The identifying attribute should be included where known. The attribute name, as it will be known in the data warehouse, should be recorded. The name must make it clear, to business people, what the attribute represents. For each attribute, metadata supporting the attribute should also be recorded. At this level we are looking for a concise but precise description of the attribute in business terms.

The facilitator should call for nominations of attributes from the workshop participants.

Each attribute must earn its place in the model. Whoever suggests an attribute should demonstrate that it has the capability to contribute toward the business strategy. In addition, questions should be posed, in the form of queries (in natural language, not SQL) that will show how the attribute will be used.

As has been stated, this exercise can be very time-consuming and somewhat tedious . It should be punctuated with frequent breaks.

Dimensional Analysis of the Facts

Estimated time: 30 minutes

Next, for each measurable fact attribute, we examine each dimension to determine the extent to which the standard arithmetic functions can be applied. This enables us to distinguish between fully summable facts and semi summable additive facts.

For instance:

  • Supermarket quantities can be summed by product but not by customer.

  • Bank balances can be summed at a particular point in time but not across time.

  • Return on investment, and any other percentages, cannot be added at all, but maximums, minimums, and averages may be OK.

  • So called factless facts, such as attendance, can be counted but cannot be summed or averaged, etc.

So each of the measurable facts should undergo an examination to determine which of the standard arithmetical functions can be safely applied to which dimensions.

The worksheet to be used to perform this exercise is called the fact usage worksheet, and an example is as shown below (again, a fuller model is contained in Appendix B).

Dot Modeling Fact Usage
Model Name: Wine Club Sales
Fact Name: Value Frequency Daily
Dimensions Sum Count Ave Min Max
1. Customer ? ? ? ? ?
2. Hobby ? ? ? ? ?
3. Wine ? ? ? ? ?
4. Supplier ? ? ? ? ?
Hierarchies and Groupings

Estimated time: 1 hour

This exercise is simply to provide some metadata to describe the relationships between the different levels in the dimensional hierarchy. Similar to before, the metadata is required to provide a precise description of the relationship in business terms.

The worksheet used to capture this information is called the hierarchies and groupings worksheet, and an example is shown in Figure 5.9.

Figure 5.9. Example of a hierarchy.
graphics/05fig09.gif

The hierarchy or grouping is identified by recording the names of the dimensions in the hierarchy diagram at the top of the worksheet. The example above shows Sales Area ? Customer grouping. The higher (senior) level in the hierarchy is always placed at the top of the diagram, and the lower (junior) level is placed at the bottom.

Retrospection

Estimated time: 12 hours

We covered retrospection earlier in this chapter. However, we need to be able to present the concept to the workshop attendees so that they can assign values for retrospection to each of the data warehouse components. Whereas it is important for us to fully comprehend the design issues surrounding retrospection, usually it is not necessary for our customers to have the same level of understanding. We should present to them something that enables them to grasp the business significance of the subject without becoming embroiled in the technical detail.

Anyway, here's a recap:

Retrospection is the ability to look backward into the past. Each of the database objects above can take one of three possible values for retrospection:

  1. True.   True retrospection means that the data warehouse must faithfully record and report upon the changing values of an object over time. This means that true temporal support must be provided for the data warehouse object in question.

  2. False.   False retrospection means that, while the object can change its value, only the latest value is required to be held in the data warehouse. Any previous values are permanently lost. This means that temporal support is not required.

  3. Permanent.   Permanent retrospection means that the values will not change during the life of the data warehouse. This means that temporal support is not required.

The meaning of retrospection varies slightly when applied to different types of warehouse objects.

When applied to dimensions, the value for retrospection relates to the existence of the dimension in question. For instance, if we need to know how many customers we have right now, then we must be able to distinguish between current customers and previous customers. If we want to be able to ask:

How many customers do we have today compared to exactly one year go?

then we need to know exactly when a customer became a customer and when they ceased being a customer. Some customers may have a discontinuous lifespan. For some periods of time they may be active customers, and during the intervening periods they may not be. If it is required to track this information faithfully, then retrospection = true would apply to the customer dimension.

Another example might be the supplier dimension. While we might be interested to know whether a supplier is a current or dead supplier, it is perhaps not so crucial to know when they became a supplier and when they ceased. So, in this case, retrospection = false may well be more appropriate.

Still another example of dimensions is region. In some applications, the regions might be expected to always exist. In these cases, there is no point in tracking the existence of the dimensions, so we allow for a status of retrospection = permanent to accommodate this situation.

Insofar as hierarchies and attributes are concerned, retrospection relates to the values that are held by these objects.

If a customer moves from one region to another, or changes some important attribute such as, say, number of children, then, depending on the application, it may be important to be able to trace the history of these changes to ensure that queries return accurate results. If this is the case, then retrospection = true should be applied to the attribute or hierarchy.

In other cases it may be required to record only the latest value, such as the customer's spouse's name. In this case, retrospection = false would apply.

In still other cases the values may never change. An example of this is date of birth. In these cases a value of retrospection = permanent would be more applicable .

The value given, to each data warehouse object, for retrospection will become very important when the project moves into the design phase. It must be remembered that the requirements relating to retrospection are business, and not technical, requirements. It is all about accuracy of results of queries submitted by the users.

The measurable fact attributes, once entered into the warehouse, never change.

These attributes have an implied value of: Retrospection = Permanent

Therefore, we have to examine each dimension, dimensional attribute, and hierarchy to establish which value for retrospection should be applied. This means that the worksheets, previously completed, must be revisited.

How to Determine the Value for Retrospection

In order to establish the value for retrospection of each data warehouse object, the designer must investigate the use of the object within the organization.

One way of doing this is to question the appropriate personnel as follows:

If this object were to change, would you want to be able the track the history accurately?

Experience has shown that the response to this type of question is, almost invariably, in the affirmative . As a result, every attribute becomes a candidate for special treatment with respect to time, and every dimension, relationship, and attribute will be assigned a value for retrospection of true. In view of the fact that provision of temporal support is expensive in terms of resources and adversely affects performance, it is very important that such support is provided only where it will truly add value to the information provided by the data warehouse. So we need to adopt an approach to ascertaining the real need for temporal support on an object by object basis. The method should, as far as possible, provide an objective means of evaluation.

One approach is to ask questions that do not invite a simple yes or no response. Try asking the following questions:

For entities and dimensions: How would the answer to the following question help you in making decisions regarding your business objectives:

How many customers do we have today compared to the same time last year?

Other questions along similar lines such as:

How many customers have we had for longer than one year, or two years?

should also be phrased and the answer should be expressed in terms that show how the ability to answer such questions would provide a clear advantage.

For relationships, the questions might be:

How many customers do we have in each sales area today compared to the same time last year?

For attributes:

How many of our customers have moved in the past year?

The questions need to focus on the time requirements, preferably with a time series type of answer. The responder should be able to demonstrate how the answers to the questions would be helpful in the pursuit of their goals.

Granularity and the Dot_Time Table

Estimated time: 12 hours

We have discussed the time granularity of the measurable facts. Now we must discuss the time granularity of the dimensions, hierarchies, and attributes. We also have to decide on the contents of the Dot_Time table.

For dimensions, hierarchies, and attributes, the time granularity relates to the frequency with which changes to values, and changes to existence, are notified and applied to the data warehouse. So if, for example, customer updates are to be sent to the warehouse on a monthly basis, then the time granularity is monthly.

In order for complete accuracy to be assured, the granularity of time for the capture of changes in the dimensions must be the same as the granularity of time for the capture of the measurable fact events. In practice, this is hardly ever possible. There is an inevitable delay in time between the change event and the capture of that change in the source system. There is, usually, a further delay between the implementation of the change in the source system and the subsequent capture of the change in the data warehouse.

The challenge for data warehouse designers is to minimize the delays in the capture of changes.

The Dot Time worksheet is simply a list of all the headings, relating to time, by which the users will need to group the results from their queries. The requirements will vary from one application to another. Below is an example of the worksheet.

Dot ModelingTime
Model Name: Example
Name Description Data Type
Day name Standard day names String
Day number Day number 17 (Monday = 1) Numeric
Bank holiday flag Y = bank holiday Character (Y/N)
Month number Standard months 0112 Numeric
Month name Standard month names String
Quarter Standard quarters Q1, Q2, Q3, Q4 Numeric
Year Year numbers Numeric
Weekend day Is it a Saturday or Sunday? Character (Y/N)
Fiscal month no. April = 01, March = 12 Numeric
Fiscal quarter Q1 = April June Numeric
24-hour opening Y = open for 24 hours Character (Y/N)
Weather conditions Indicator for weather Character

It is also important to establish how much history the application will wish to hold. This has two main benefits:

  1. It tells the designers how many records will have to be created in order to populate the Dot Time table.

  2. It gives the designers some indication of the ultimate size of the database when fully populated .

Also, check whether future data will be needed, such as forecasts and budgets , as these will have to be built into the Dot Time table as well.

Workshop Wrap-Up

Estimated time: 30 minutes

It is important to summarize the process that has been conducted over the previous two days. We now have a complete, business-driven, conceptual information model that the designers can use to help to build the data warehouse.

We now have all the components of our conceptual model and lots of instructive data to help us to move forward and develop the logical model, which is the next step in the process.

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net