DOT MODELING

only for RuBoard - do not distribute or recompile

The remainder of this chapter is devoted to the explanation of a methodology for the development of conceptual models for data warehouses. The methodology is called dot modeling.

Dot modeling is based on the simplified requirements for dimensional models that were described in the introduction to this chapter. It is a complete methodology that enables nontechnical people to build their own conceptual model that reflects their personal perception of their organization in dimensional terms. It also provides a structured way for constructing a logical (currently relational) model from the conceptual.

The method was invented in July 1997 and has been used in real projects since then. It has received positive reviews from nontechnical people in environments where it has been deployed. The name was given by a user . Dot is not an acronym. It comes from the characteristic that the center of the behavioral part of the model, the facts, are represented by a dot. The method was developed as a kind of evolution using dimensional concepts and has been evolved to adapt to the requirements of the customer centric GCM. We start by modeling behavior. Figure 5.1 represents the design of a two dimensional tabular report. This kind of report is familiar to everyone and is a common way of displaying information, for example, as a spreadsheet.

Figure 5.1. Example of a two-dimensional report.

The intersection of the axes in this example, as shown by the dot, would yield some information about the sale of a particular product to a particular customer. The information represented by the dot is usually numeric. It could be an atomic value, such as the monetary value of the sale, or it could be complex and may include other values such as the unit quantity and profit on the sale.

Where there is a requirement to include a further dimension such as time into the report, then one might envisage the report being designed as several pages where each page represents a different time period. This could be displayed as shown in Figure 5.2.

Figure 5.2. Example of a three-dimensional cube.

Now the dot represents some information about the sale of a particular product to a particular customer at a particular time. The information contained in the dot is still the same as before. It is either atomic or complex and is usually numeric. All that has changed is that there are more dimensions by which the information represented by the dot may be analyzed .

It follows , therefore, that the dot will continue to represent the same information irrespective of how many dimensions are needed to analyze and report upon it. However, it is not possible to represent more than three dimensions diagrammatically using this approach. In effect, the dot is trapped inside this three-dimensional diagram. In order to enable further dimensions of analysis to be represented diagrammatically, the dot must be removed to a different kind of structure where such constraints do not apply. This is the rationale behind the development of the dot modeling methodology. In dot modeling the dot is placed in the center of the diagram and the dimensions are arranged around it as shown in Figure 5.3.

Figure 5.3. Simple multidimensional dot model.

The model readily adopts the well- understood radial symmetry of the dimensional star schema.

The Components of a Behavioral Dot Model

There are three basic components to a dot model diagram:

Dot. The dot represents the facts. The name of the subject area of the dimensional model is applied to the facts. In the Wine Club, the facts are represented by sales.

Dimension names . Each of the dimensions is shown on the model and is given a name.

Connectors. Connectors are placed between the facts and dimensions to show first-level dimensions. Similarly, connectors are placed between dimensions and groupings to show the hierarchical structure

Emphasis has been placed on simplicity so there are virtually no notational rules on the main diagram. It is sensible to place the dot near the center of the diagram and for the dimensions to radiate from the dot. This encourages a readable dimensional shape to emerge.

The behavioral dot model for the Wine Club is reproduced in Figure 5.4.

The attributes for the facts and dimensions are not shown on the diagram. Attributes are described on supporting worksheets. Similarly, the temporal requirements are represented on supporting worksheets rather than on the diagram.

The method uses a set of worksheets. The worksheets are included in the appendices. Some of the worksheets are completed during the conceptual design stage of the development and some are completed during the logical design stage. The first worksheet is the data model worksheet itself. It contains the following:

Name of the application, or model (e.g., The Wine Club-Sales)
Diagram, as shown in Figure 5.4

Figure 5.4. Representation of the Wine Club using a dot model.
List of the fact attributes (i.e., quantity and value in the Wine Club)

For each fact attribute, some information describing the fact is recorded under, what is commonly known as metadata. Its purpose is to document the business definition of the attribute. This is to solve the problem of different people, within an organization, having differing views about the semantics of particular attributes. The descriptions should be phrased in business terms.

A second worksheet, the entities worksheet, is used to record the following:

Behavioral dimensions
Customer circumstances
Derived segments

This part of the method holds some of the more complex information in the model. The model name is given on each page to ensure that parts of the document set are not mistakenly mixed up with other models' documents. The purpose of the entities worksheet is to aid the designers of the system to understand the requirements in order to assist them in the logical design.

For each entity the following items of information are recorded:

Name of the dimension as it is understood by the business people. For example, customer.
Retrospection of the entity's existence.
Existence attribute for the entity. For entities with permanent retrospection, an example of which might be region in the Wine Club, there is no requirement to record the existence of an entity, because, once established, the entity will exist as long as the database exists. With other entities, however, an attribute to represent existence would be needed so that, for instance, the Wine Club would be able to determine which wines were currently stocked.
Frequency of the capture of changes to the existence of the dimension. This will help to establish whether the dimension will be subject to errors of temporal synchronization.

For each dimension, a set of attributes is also defined on a separate worksheet. The existence attribute has already been described. The following description refers to the properties of other attributes. So for each attribute, the following information is recorded:

Name of the dimension that owns it.
Name of the attribute. This is the name as the business (nontechnical) people would refer to it.
Retrospection. Whether or not the historical values of this attribute should be faithfully recorded.
Frequency. This is the frequency with which the data is recorded in the data warehouse. This is an important component in the determination of the accuracy of the data warehouse.
Dependency. This relates to causality and identifies other attributes that this attribute is dependent upon.
Identifying attribute. This indicates whether the attribute is the identifying attribute, or whether it forms part of a composite identifying attribute.
Metadata. A business description of the attribute.
Source. This is a mapping back to the source system. It describes where the attribute actually comes from.
Transformations. Any processing that must be applied to the attribute before it is eligible to be brought into the data warehouse. Examples of transformations are the restructuring of dates to the same format, the substitution of default values in place of nulls or blanks.
Data type. This is the data type, and precision of the attribute.

Information about dimensional hierarchies is captured on the hierarchies worksheet. Pictorially, the worksheet shows the names of the higher and lower components of the hierarchy. The following information is also captured:

Retrospection of the hierarchy
Frequency of capture
Metadata describing the nature of the hierarchy

Dot and the GCM

An interesting development occurred when working with a major telecommunications company in the United Kingdom. Their business objective is to build a customer-centric information data model that covers their entire enterprise. There were several different behavioral dot models:

Call usage. The types of phone calls made, the duration, cost, etc.

Payments. Whether the customer paid on time, how often they had to be chased, etc.

Recurring revenue. Covered insurance, itemized billing, etc.

Nonrecurring revenue. Accessories and other one-off services requested

Order fulfillment. How quickly orders placed by customers were satisfied by the company

Service events. Customers recording when a fault has occurred in their equipment or service

Contacts. Each contact made to a customer through offers, campaigns , etc.

In dimensional modeling terms this means several dimensional models, each having a different subject area. During a workshop session with this customer I was able to show how the whole model might look using a single customer-centric diagram, which I now refer to as joining the dots. The diagram is shown in Figure 5.5.

Figure 5.5. Customer-centric dot model.

Figure 5.5 shows seven separate dimensional models that share some dimensions. This illustrates that, even with very complex situations, it is still very easy to determine the individual dimensional models using the dot modeling notation because the radial shape of each individual model is still discernible.

The use of the dot model, in conjunction with business-focused workshops, coming up next , enables the softer business requirements, in the form of business objectives or key performance indicators, to be expressed in information terms so that the data warehouse can be designed to provide precisely what the business managers need.

A further requirement is that the model should enable the business people to build the conceptual abstraction themselves . They should be able to construct the diagrams, debate them, and replace them. My own experiences, and those of other consultants in the field, are that business people have found dot models relatively easy to construct and use.

only for RuBoard - do not distribute or recompile

DOT MODELING

DOT MODELING

Figure 5.1. Example of a two-dimensional report.

Figure 5.2. Example of a three-dimensional cube.

Figure 5.3. Simple multidimensional dot model.

The Components of a Behavioral Dot Model

Figure 5.4. Representation of the Wine Club using a dot model.

Dot and the GCM

Figure 5.5. Customer-centric dot model.