INTRODUCTION

only for RuBoard - do not distribute or recompile

INTRODUCTION

In this chapter we provide an introduction to data warehousing. It is sensible as a starting point, therefore, to introduce data warehousing using first generation principles so that we can then go on to explore the issues in order to develop a second generation architecture.

Data warehousing relates to a branch of a general business subject known as decision support. So in order to understand what data warehousing is all about, we must first understand the purpose of decision support systems (DSS) in general.

Decision support systems have existed, in different forms, for many years . Long before the invention of any form of database management systems (DBMS), information was being extracted from applications to assist managers in the more effective running of their organizations.

So what is a decision support system?

The purpose of a decision support system is to provide decision makers in organizations with information. The information advances the decision makers ' knowledge in some way so as to assist them in making decisions about the organization's policies and strategy.

A DSS tends to have the following characteristics:

  • They tend to be aimed at the less well structured, underspecified problems that more senior managers typically face.

  • They possess capabilities that make them easy to use by noncomputer people interactively.

  • They are flexible and adaptable enough to accommodate changes in the environment and decision-making approach of the user .

The job of a DSS is usually to provide a factual answer to a question phrased by the user.

For instance, a sales manager would probably be concerned if her actual product sales were falling short of the target set by her boss. The question she would like to be able to ask might be:

Why are my sales not meeting my targets?

There are, as yet, no such computer systems available to answer such a question. Imagine trying to construct an SQL (Structured Query Language) query that did that !

Her questioning has to be more systematic such that the DSS can give factual responses. So the first question might be:

For each product, what are the cumulative sales and targets for the year?

A DSS would respond with a list of products and the sales figures. It is likely that some of the products are ahead of target and some are behind. A well- constructed report might highlight the offending products to make them easier to see. For instance, they could be displayed in red, or flashing. She could have asked:

What are the cumulative sales and targets for the year for those products where the actual sales are less than the target ?

Having discovered those products that are not achieving the target, she might ask what the company's market share is for those products, and whether the market share is decreasing . If it is, maybe it's due to a recently imposed price rise.

The purpose of the DSS is to respond to ad hoc questions like these, so that the user can ultimately come to a conclusion and make a decision.

A major constraint in the development of DSS is the availability of data ”that is, having access to the right data at the right time. Although the proliferation of database systems, and the proper application of the database approach, enables us to separate data from applications and provides for data independence, the provision of data represents a challenge. The introduction of sophisticated DBMSs has certainly eased the problems caused by traditional applications but, nonetheless, unavailability of data still persists as a problem for most organizations. Even today, data remains locked away in applications. The main reason is that most organizations evolve over time. As they do, the application systems increasingly fail to meet the functional requirements of the organization. As a result, the applications are continually being modified in order to keep up with the ever-changing business. There comes a time in the life of almost every application when it has been modified to the point where it becomes impossible or impractical to modify it further. At this point a decision is usually made to redevelop the application. When this happens, it is usual for the developers to take advantage of whatever improvements in technology have occurred during the life of the application. For instance, the original application may have used indexed sequential files because this was the most appropriate technology of the day. Nowadays, most applications obtain their data through relational database management systems (RDBMS). However, most large organizations have dozens or even hundreds of applications. These applications reach the end of their useful lives at various times and are redeveloped on a piecemeal basis. This means that, at any point in time, an organization is running applications that use many different types of software technology.

Further, large organizations usually have their systems on diverse hardware platforms. It is very common to see applications in a single company spread over the following:

  • Large mainframe

  • Several mid-range multi-processor machines

  • External service providers

  • Networked and stand-alone PCs

A DSS may require to access information from many of the applications in order to answer the questions being put to it by its users.

Introduction to the Case Study

To illustrate the issues, let us examine the operation of a (fictitious) organization that contains some of the features just described.

The organization is a mail order wine club. With great originality, it is called the Wine Club. As well as its main products (wines), it also sells accessories to wines such as:

Glassware ”goblets, decanters, glasses , etc.

Tableware ”ice buckets, corkscrews, salvers, etc.

Literature ”books and pamphlets on wine-growing regions , reviews, vintages, etc.

It has also recently branched out further into organizing trips to special events such as the Derby, the British Formula One Grand Prix, and the Boat Race. These trips generally involve the provision of a marquee in a prominent position with copious supplies of the club's wines and a luxury buffet meal. These are mostly one-day events, but there are an increasing number of longer trips such as those that take in the French wine-growing regions by coach tour.

The club's information can be modeled , by an entity attribute relationship (EAR) diagram. A high-level EAR diagram of the club's data is as shown in Figure 2.1

Figure 2.1. Fragment of data model for the Wine Club.
graphics/02fig01.gif

Accessory ( ProductCode, ProductDesc, SellingPrice, CostPrice)

Class ( ClassCode, ClassName, Region)

Color ( ColorCode, ColorDesc)

Customer ( CustomerCode, CustomerName, CustomerAddress, CustomerPhone)

CustomerOrder ( OrderCode, OrderDate, ShipDate, Status, TotalCost)

OrderItem ( OrderCode,ItemCode, Quantity, ItemCost)

ProductGroup ( GroupCode, Description)

Reservation ( CustomerCode, TripCode, Date, NumberOfPeople, Price)

Shipment ( ShipCode, ShipDate)

Shipper ( ShipperCode, ShipperName, ShipperAddress, ShipperPhone)

Stock ( LocationCode, StockOnHand )

Supplier ( SupplierCode, SupplierName, SupplierAddress, SupplierPhone)

Trip ( TripCode, Description, BasicCost)

TripDate ( TripCode, Date, Supplement, NumberOfPlaces)

Wine ( WineCode, Name , Vintage , ABV, PricePerBottle, PricePerCase)

The Wine Club has the following application systems in place:

Customer administration.   This enables the club to add new customers. This is particularly important after an advertising campaign, typically in the Sunday color supplements, when many new customers join the club at the same time. It is important that the new customers' details, and their orders, are promptly dealt with in order to create a good first impression . This application also enables changes to a customer's address to be recorded, as well as removing ex-customers from the database. There are about 100,000 active customers.

Stock control.   The goods inward system enables newly arrived stock to be added to the stock records. The club carries about 2,200 different wines and 250 accessories from about 150 suppliers.

Order processing.   The directors of the club place a high degree of importance on the fulfillment of customers' orders. Much emphasis is given to speed and accuracy. It is a stated policy that orders must be shipped within ten days of receipt of the order. The application systems that support order processing are designed to enable orders to be recorded swiftly so that they can be fulfilled within the required time. The club processes about 750,000 orders per year, with an average of 4.5 items per order.

Shipments.   Once an order has been picked, it is packed and placed in a pre-designated part of the dispatch area. Several shipments are made every day.

Trip bookings.   This is a new system that records customer bookings for planned trips. It operates quite independently of the other systems, although it shares the customer information held in the customer administration system.

The club's systems have evolved over time and have been developed using different technologies. The order processing and shipments systems are based on indexed-sequential files accessed by COBOL programs. The customer administration system is held on a relational database. All these systems are executed on the same mid range computer. The stock control system is a software package that runs on a PC network. The trip bookings system is held on a single PC that runs a PC-based relational database system.

There is a general feeling among the directors and senior managers that the club is losing its market share. Within the past three months, two more clubs have been formed and their presence in the market is already being felt. Also, recently, more customers than usual appear to be leaving the club and new customers are being attracted in fewer numbers than before.

The directors have held meetings to discuss the situation. The information upon which the discussions are based is largely anecdotal. They are all certain that a problem exists but find it impossible to quantify.

They also know that helpful information passes through their systems and should be available to answer questions. In reality, however, while it is not too difficult to get answers to the day-to-day operational questions, it is almost impossible to get answers to more strategic questions.

Strategic and Operational Information

It is very important to understand the difference between the terms strategic and operational.

In general, strategic matters deal with planning and policy making, and this is where a data warehouse can help. For instance, in the Wine Club the decision as to when a new product should be launched would be regarded as a strategic decision.

Examples pertaining to other types of organization include:

  • When a telecommunications company decides to introduce very cheap off-peak tariffs to attract callers away from the peak times, rather than install extra equipment to cope with increasing demand.

  • A large supermarket chain deciding to open its stores on Sundays.

  • A general 20 percent price reduction for one month in order to increase market share.

Whereas strategic matters relate to planning and policy, operational matters are generally more concerned with the day-to-day running of a business or organization. Operations can be regarded as the implementation of the organization's strategy (its policies and plans).

The day-to-day ordering of supplies, satisfying customers' orders, and hiring new employees are examples of operational procedures. These procedures are usually supported by computer applications and, therefore, they must be able to provide answers to operational questions such as:

  • How many unfulfilled orders are there?

  • On which items are we out of stock?

  • What is the position on a particular order?

Typically, operational systems are quite good at answering questions like these because they are questions about the situation as it exists right now. You could add the words right now to the end of each of those questions and they would still make sense. Questions such as these arise out of the normal operation of the organization.

The sort of questions the directors of the Wine Club wish to ask are:

  1. Which product lines are increasing in popularity and which are decreasing?

  2. Which product lines are seasonal?

  3. Which customers place the same orders on a regular basis?

  4. Are some products more popular in different parts of the country?

  5. Do customers tend to purchase a particular class of product?

These, clearly, are not right now types of questions and, typically, operational systems are not good at answering such questions.

Why is this?

The answer lies in the nature of operational systems. They are developed to support the operational requirements of the organization. Let's examine the operational systems of the Wine Club and see what they actually do. Each application's role in the organization can usually be expressed in one or two sentences.

The customer administration system contains details of current customers. The stock control system contains details of the stock currently held. The order processing system holds details of unfulfilled customer orders and the shipments system records details of fulfilled orders awaiting delivery to the customers.

Notice the use of words like details and current in those descriptions. They underline the right now nature of operational systems. You could say that the operational systems represent a snapshot of an organization at a point in time. The values held are constantly changing. At any point in time, dozens or even hundreds of inserts , updates and deletes may be executing on all, or any, parts of the systems. If you were to freeze the systems momentarily, then they would provide an accurate reflection of the state of the organization at precisely that moment. One second earlier, or one second later, the situation would have changed.

Now let us examine the five questions that the directors of the Wine Club need to ask in order to reach decisions about their future strategy.

What is it that the five questions have in common?

If you look closely you will see that each of the five questions is concerned with sales of products over time.

Looking at the first question:

Which product lines are increasing in popularity and which are decreasing?

This is obviously a sensible strategic business question. Depending on the answer, the directors might:

  • Expand their range of some products and shrink their range of other products

  • Offer a financial incentive on some products, such as reduced prices or discounts for volume purchases

  • Enhance the promotional or advertising techniques for the products that are decreasing in popularity

For the moment, let's focus in on sales of wine and assess whether the information required to ask such a question is available to the directors. Have a look back at the EAR diagram at the beginning of the case study. Remember we are looking for sales of products over time. The only way we can assess whether a product line is increasing or decreasing in popularity is to trace its demand over time.

If the order processing information was held in a relational database, we could devise an SQL query such as:

 Select Name, Sum(Quantity), Sum(ItemCost) Sales        From   CustomerOrder a,               OrderItem b,               Wine c        Where a.OrderCode = b.OrderCode        And    a.WineCode = c.WineCode        And    OrderDate = <today's date>        Group by Name 

If this query were to be executed at the end of the day, it would return the value of all the orders received for the day.

The ability for us to discover the value of orders received today is a good start. This is useful information, but what we would really like to do is to examine the trend over (say) the past six months, or to compare this month with the same month last year.

So what is the solution?

Let's say we were to execute our query every day and append the results for each day in a table. That way, over time, we would be able to build up the historical information that we need.

This is the beginning of a data warehouse.

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net