As the data warehousing analysis process works from top to bottom, the resolution of the information goes from high level to detailed, and the focus goes from business to technical. In this lesson, you will work through the steps of the data warehousing analysis process.
After this lesson, you will be able to:
- Differentiate between typical objectives in operational and analytical systems
- Describe and detail the steps involved in analyzing and implementing data warehousing
Estimated lesson time: 50 minutes
The information landscape is made up of several types of systems, each with a specific perspective. Systems generally fall into categories based on their configuration and the functions that they provide to the enterprise.
Operational systems are dedicated to tracking individual events or making tactical decisions. Examples of operational systems include:
Operational systems most often show a point-in-time view of the business and handle a large number of transactions. Many source systems can be considered operational systems. Operational systems examine the current state of some predetermined combined set of events, potentially across multiple processes. Figure 3.1 shows how the operational system processes satisfy the objective of reporting the state of a system: An inquiry evaluates information about the status of the system and the status is returned as the result of that inquiry.
Figure 3.1 Operational system processes
For example, an operational system for a bank:
Operational systems have the following characteristics:
Analytical systems provide information about multiple events. This information is used to drive strategic decision making. Examples of analytical systems include:
Figure 3.2 shows the objectives that characterize analytical system processes.
Figure 3.2 Analytical system processes
Analytical systems have the following characteristics:
When building an analytical system such as a data warehouse, you develop a system that must support some undefined or changing requirements over time. The higher the degree of changing requirements, the harder your success (or failure) is to measure. This lack of a finite definition is one of the major challenges in a data warehousing project. Maintaining focus on exactly what the business wants to accomplish can become difficult if this is not controlled.
It is critical that you correctly identify your business requirements. Analytical systems fulfill those requirements only if the decision-making process is improved.
In many organizations, the distinction between system perspectives can be blurred. For example, an integrated mail-order application may exhibit characteristics of an operational system (order entry, customer set-up, and account/order status) as well as some analytical characteristics (distribution by zip code, volume by month, and cost by format).
Operational systems deal with individual events, while analytical systems deal with multiple events and situations over time, as shown in the following table.
Operational Systems | Analytical Systems |
---|---|
Detect an event | Measure many different events |
Respond to the event | Measure the responses of the system to the events |
Inquire about the status of the event | Evaluate relationships between events, after the fact, on a large scale |
Analytical systems provide insight into ways to improve responses and elicit more desirable events from the environment; essentially, they show how to improve the processes, services, and product offerings of a business. Because analytical systems are potentially less clearly defined than operational systems, it is important to define your system specifications as completely as possible.
It is important to emphasize the business side of data warehousing. It is too easy to get caught up in the implementation details before the full business case is understood. The first direction any data warehousing effort should take is to understand why an organization may benefit from better use of analytical systems. Figure 3.3 illustrates the steps in this process. The rest of this lesson discusses these steps in detail.
Figure 3.3 The data warehousing analysis process
What factors affect the company? Business drivers are principal factors in determining what, when, and how strategic decisions are made. Once you understand how external forces affect the company, you can design a system that provides information that enables management to make decisions on how to respond to these forces.
By examining the external forces that affect (or drive) the business, you will begin to develop a high-level understanding of the business requirements. These external forces include:
A business objective is a clearly defined goal that will increase or retain opportunities. In response to external drivers, management establishes business objectives, such as:
Understanding these external drivers and the objectives set in place to respond to them will give you the context, priority, and initial scope of your data warehouse design effort.
After you identify the needs of a business, begin the initial collection of information. You will use this information, which is general and summary in nature, to paint a clearer picture of the business processes that your particular client uses to gather and disseminate information.
Hold a Business Definition Session
Use these sessions with senior management from various business units across the enterprise to:
Documentation is collected throughout the entire process. During this phase, the focus is on enterprise-level materials, which, along with the sessions, enable an accurate assessment. As information about the process is developed, planning can be taken to a more detailed level.
Continuous management involvement is critical to the development of a data warehouse because it helps to:
Hold an Information Technology Environmental Review Session
Use these sessions with senior management from the IT group to:
The information collected during this phase takes the form of enterprise-level documentation and facilitated session notes. The analysis of this information results in findings and recommendations in the form of an assessment.
Collect and Review Enterprise Documentation
Collect and review documentation to gather and analyze client business and IT planning and infrastructure documentation. This information is typically requested prior to the actual start of the process. It provides the team with a baseline understanding that optimizes session effectiveness.
Look for the following documentation from your client:
Analyze and Assess Information
In order to zero in on the information that will prove most useful for identifying data warehouse objectives, consider the following aspects of your particular client s business:
Develop a Project Plan for the Requirements Definition and Architecture Analysis Phase
You develop a project plan to:
Having defined the scope of the initial iteration, you can develop a project plan for the next phase of the process and a management checkpoint.
Once you have defined the business objectives, it is a fairly simple task to identify the analyses needed to support them. It is then up to management (with the help of the project team) to establish priorities for these analyses and select those that will be included in the initial iteration of the data warehouse.
With any data warehouse project there exists an ideal amount of preparation:
The ideal is to engage in a thorough enough analysis that one can feel confident about the first iteration of the warehouse but not so much that the implementation never occurs. The best way to assess the correct amount of preparation is to decide on which objectives the first iteration of the data warehouse will serve.
The following table illustrates how a specific client may prioritize various analyses.
Analysis Description | Low | Medium | High |
---|---|---|---|
Sales | X | ||
Customers | X | ||
Competitors | X | ||
Product movement | X | ||
Store, department, category, and product mix | X | ||
Promotion | X | ||
Labor resource | X | ||
Gross profit/margin | X |
By selecting only those items that fall into the high-priority column, you limit, and therefore make more manageable, the scope of the data warehouse project. Of course, if you can include other priority items without greatly increasing the scope of the project, it makes sense to do so.
When you have completed the information-gathering phase, it is time to ensure that you are on the right track before beginning the identification of roles and processes.
Include Business Sponsors, Client Project Managers, and Key Stakeholders
Participation by business sponsors, client project managers, and key stakeholders is essential to project success; the data warehouse must sell itself to management at every opportunity.
Review and Approve Phase Deliverables
During the review and approval process, the project team gets to show its wares and gauge the reaction of management. It is possible that some reworking will be necessary and an additional checkpoint scheduled.
Identify and Discuss Open Issues
When the open issues are identified and discussed, management has its turn to ask questions. The project team should acknowledge all known issues and be ready to receive additional ones from management. A log of open issues (updated with current status) should be maintained by the project team lead.
Review and Approve Project Plans for the Next Phase
Reviewing and approving project plans for the next phase should be a mere formality. However, the team must be prepared to discuss and, if necessary, justify timings and resource requirements. A well-crafted plan should stand up to close scrutiny.
After the first management checkpoint, you must identify the roles and processes involved with the business. Conduct interviews with business analysts for their insight on important measurements. Do not overlook the opportunity to conduct interviews with subject matter experts at all levels of the enterprise.
Identifying Roles of Players in the Analysis
By identifying all the potential players who have anything (however remote) to do with your chosen analysis area, you ensure that you get an enterprise perspective. For each role, identify those individuals who represent the needs of that role. Figure 3.4 shows the different individuals that might play a part in a sales data warehouse.
Figure 3.4 Different roles in a subject area
In the following scenario, assume that you are working with a company that has a chain of convenience stores. The stores have begun to expand their product offerings, and it has become difficult for buyers to keep up. In response to buyers feedback, the corporation has decided to institute a category management program for products. The product analyst has been tasked with analyzing product movement, sales analysis over time, and profit margin analysis.
Once individuals activities have been characterized, interviewees are asked to get more specific about their role by identifying distinct analysis processes/needs. These are then reconciled against the initial prioritized business objectives established in the opportunity evaluation phase.
Figure 3.5 shows that the product analyst is tasked with providing the category managers with historical sales analysis to better manage sales and inventory performance. By analyzing the product movement, the category managers can ensure that they are stocking and merchandising the right product mix at the right time, thereby reducing inventory costs and maximizing profit margins. This information can then be used for strategic product planning, buying, and delivery.
Figure 3.5 Identifying role analysis processes
Understanding Role Information Requirements
It is often easier to understand each analysis and its underlying information requirements when the analysis is phrased as a question. As the interviewee explains each analysis, a specific question for that analysis is developed. For example,
"How many bags of our product, by size, by flavor, were sold each month this year, in the Eastern division, by store, broken down by store size, compared to the same month last year?"
In the above example, the product movement analysis is expressed as a comparative evaluation of a brand and type of snack food in a specific area over time.
The business analysis and interview phases produce a list of key performance indicators (KPIs).
You need to map out the interaction and elements of the business process. Figure 3.6 is one example of how to diagram such a business process.
Figure 3.6 Identifying key performance indicators
This figure shows what the KPIs are and reveals the previously unidentified discount and promotion KPIs.
Information Objects and Events
Information objects are persistent entities that have attributes, which are changed by events. For example, a product (information object) is placed on back-order (event), then it is restocked (event), and finally, it is shipped (event).
Identifying Information Objects and Events
You decide that you must keep track of product movement in each store on a daily basis. Each store has a point-of-sale (POS) register that is networked to a central server for all the stores.
Figure 3.7 illustrates the information objects, which are:
Figure 3.7 illustrates the events, which are:
Figure 3.7 Identifying information objects and events
In this exercise, you will examine the text from a sample interview and extract the pertinent information objects, events, and key performance indicators.
Scenario
The following is an excerpt from the transcript of an interview with a product analyst:
Interviewer: Describe the product sales analysis that you do.
Analyst: I look at a number of reports and identify trends and significant changes in the normal monthly product activity by our customers.
Interviewer: Describe the activity for a customer.
Analyst: Our employees (salespeople) take orders from customers and send them to the company by fax and telephone. The data entry staff will take orders from customers, enter each product ordered, and figure the total. The customer will be shipped the order by one of our shipping companies.
Interviewer: What metrics do you use to evaluate sales?
Analyst: Units sold, sales revenue, and product gross profit.
Interviewer: How do you calculate product gross profit?
Analyst: We take the product unit price and multiply by the number of units sold. Then we subtract any discounts granted to customers on a particular product purchase. The freight will then be calculated and added to the amount.
Interviewer: How does the customer get these discounts?
Analyst: The discounts are given to a customer as a promotional campaign for a group of products.
Interviewer: How do you determine trends?
Analyst: I look at activity from month to month and from one month in a year to the same month in the previous year.
Interviewer: How about these significant changes that you mentioned?
Analyst: I look at how a product is selling in a given month to one customer compared to all the other customers in the region or state. For example, in the U.S., the regions are the actual states. Because we sell to many countries, the geographical equivalent of a state is a region. So, I can also compare the total activity across cities within regions and countries for a product.
List the nouns that were converted.
Answers
List the verbs that were converted.
Answers
List the nouns or noun pairs that imply a process measurement (such as liquidity index, ROI, and breakeven point).
Answers
Current management reports provide a wealth of information about the analytical requirements of the business. Use the current management reports to test whether the information objects and events that you have identified completely cover the business analysis needs. Figure 3.8 shows an example of a current management report that includes facts, dimensions, and hierarchies, which you will learn to identify in the following topic. This typical example of a management report contains sales volumes for several cities over a three-month period. Evaluating this report uncovers the following:
Figure 3.8 Evaluating current management reports
Identifying dimensions and events can be broken down into two tasks. The first task is choosing the measures. A fact represents the measures that are recorded for each occurrence of an event. Examples of measures are:
Each combination of dimensions and facts represents a measurable business requirement.
As the relationships between roles, processes, and information are discovered, the common points among these three components must be understood. This understanding provides the basis for a conceptual data model, or usage diagram. Without a clear understanding of these common points, it is unlikely that the conceptual data model will be scalable across the enterprise. Figure 3.9 shows how analysis can provide a conceptual model that scales across the enterprise. The arrows in this figure represent the events, such as the sale of a product.
Figure 3.9 Looking for common points between roles, processes, and information
The second task is choosing the dimensions, defined as the entities (and their associated hierarchy) with which events interact. Examples of dimensions are:
In this exercise, you will examine a sample management report and extract the pertinent dimensions, facts, and hierarchies. During an interview with the product analyst, you were provided with a report that the product analyst uses to track monthly product units sold for a specific geographical location. The report is shown in Figure 3.10:
Figure 3.10 Sample management report
List the lowest-level dimension for the column headings.
List the lowest-level dimension for the column headings.
List the dimensions in the report heading
Answers
List the intersection of the lowest-level dimension column and row headings.
Answers
List all aggregated column headings.
List all aggregated row headings.
List all higher-level hierarchies in the report heading for the Time dimension.
Answers
At this point you have developed a good conceptual model of your data warehouse. You know what information the warehouse must provide. The final step is to identify where the information will come from and what you must do to it to make it suitable for the data warehouse. You should choose data that provides immediate realization of value. Simplify the requirements for extracting data by:
After identifying the data sources, you perform transformations when moving data from the data sources to the data warehousing system. Data sources and transformations are important components early in the data warehousing life cycle.
Figure 3.11 shows data sources and transformations in the data warehousing life cycle.
Figure 3.11 Sources and transformations
The data sources shown in Figure 3.11 are:
These are operational systems that are used to run the business.
Transformations can take place any time that data is moved between two points.
Documentation
Deliverables from this step in the process are:
Figure 3.12 High-level source context diagram
Data warehouse analysis has some unique challenges:
The broad scope, multiple dependencies, and fuzzy goal line make analysis essential. Thorough analysis provides a clear starting point for conceptual design and a clear mandate for the data warehouse project.
The success of the requirements gathering process rests with the following:
Involve as much of the business sector as possible. The true value of the data warehouse is fully realized when it is extended across the enterprise. Because the intent is to build an enterprise data warehouse incrementally, even those business areas that will not be involved in the early iterations feel that they are part of the process and will eventually receive benefit.
The intent of the data warehouse is to build business value through insight from information. This requires that it start with the business and move to the technology, not the other way around.
Interviews with management and analysts from within the initial iteration subject area are used to collect the information necessary to develop a conceptual model for the iteration. Interviews with the staff that are experts regarding the subject matter in the legacy application and with the managers of the network infrastructure are used to collect information on how to implement the data warehouse environment and populate the database. Issues, concerns, and constraints that might affect the success of the project are collected from all interviewees.
The information collected during the sessions and interviews must be reviewed with the source. This does two things: It ensures that the information was collected accurately and that no invalid assumptions have been made, and it provides an opportunity to reinforce the lines of communication with management and users.
Ultimately, the users will decide the success (or failure) of the data warehouse. Involve them as early as possible and maintain contact with them throughout the process. Periodic status meetings and progress reports are two good ways to keep users in the loop. Involving users provides two significant benefits: