Chapter 1: Constructing a Data Warehouse

Team-Fly

What is Data Warehousing?

William H. Inmon, known as the father of data warehousing, defines a data warehouse as "a subject-oriented, integrated, non-volatile and time-variant collection of data in support of management decisions" (Building the Data Warehouse). See Appendix A, "Data Warehouse Industry References and SAP BW Training."

This simple concept has recently become a multibillion-dollar industry. New breeds of vendors are introducing tools and technologies at an alarming rate to deliver data warehouse solutions. This fast-paced and fluid data warehousing industry makes it difficult to select a set of technologies to implement a data warehouse that will stay in the data warehousing industry in the coming years. The construction of a data warehouse requires three key steps:

  1. Extract data from transaction systems.

  2. Manipulate extracted data to generate reports.

  3. Make such reports accessible to the decision-makers.

Though the concept behind a data warehouse is simple, construction, deployment, and management of a data warehouse that changes as business processes change is a huge challenge. Knowledge workers, decision-makers, executives, and analysts expect fast, quality, and up-to-the-minute information about strategic and tactical business operations. The need to access information quickly is blurring the line between data warehouses and On-Line Transaction Processing (OLTP) environments. E-commerce is an example. When you process a credit card transaction (even at a gas station), chances are that your transaction request will pass through a very large data warehouse (to detect fraud using data mining techniques) before an authorization number is issued.

Due to this trend in tight integration between OLTP, data warehouses, and business application vendors-especially ERP vendors-all are working hard to provide to their customers robust, scalable, and reliable business intelligence solutions that go far beyond traditional data warehouses.

Traditional data warehouses are usually after-the-fact phenomena, because reporting requirements and On-Line Analytical Processing (OLAP) needs are often not brought to the table during the requirement-gathering and implementation phases of OLTP applications.

Often during final reviews of OLTP application deployment plans, reporting requirements are brought to the attention of the deployment team. Management teams want business performance and operation reports available soon after going live with a new OLTP application. Then at that time, reporting and data warehouse projects are launched.

This after-the-fact reporting requirement often impacts OLTP performance due to system configuration, especially when SAP R/3 is your corporate transaction processing system. If SAP R/3 OLTP is not configured properly, business content may not capture needed data for analysis. Based on your data needs, you may configure additional workflows or update rules in SAP R/3 to capture needed data elements required for analytical applications. If such workflows and update rules are not implemented correctly, they may impact OLTP transaction performance. Therefore, in SAP R/3, the data capture schemes for analytical applications must be designed as a part of OLTP configuration of OLTP business transactions. You will learn about this subject in more detail in Chapter 2, "Evolution of SAP Business Information Warehouse."

Today, accurate and quick access to corporate information sources is key to the success of a business. Most organizations have, or will launch, an enterprise data warehouse project with the same priority level as the OLTP applications. This is needed to effectively manage corporate-wide data and knowledge. While corporate operations managers want to know the state of business operations within the corporation, executives are primarily interested in business performance benchmarks against industry Key Performance Indicators (KPI). Building such dynamic enterprise data warehouses that go far beyond enterprise boundaries, or extraprise data warehouses, is an enormous challenge.

Business Intelligence

Data warehouses drive the corporate information supply chain to support corporate business intelligence processes. Business intelligence, introduced by Howard Dresner of the Gartner Group in 1989, is a set of concepts and methodologies to improve decision-making in business through the use of facts and fact-based systems. These fact-based systems include the following:

  • Executive Information Systems

  • Decision Support Systems

  • Enterprise Information Systems

  • Management Support Systems

  • OLAP

  • Data and Text Mining

  • Data Visualization

  • Geographic Information Systems

Each subsystem under the business intelligence umbrella has a limited scope and view of data to serve a very special function. Having so many subsystems beneath this umbrella may lead to data puddles, or data sets that flow from one subsystem to another. Data content in such data sets is altered at each subsystem to meet special business needs, and then pushed out to the next application. By the time it (the data set) reaches the decision-maker, no one knows of its origin or transformations applied to it before it reaches its destination, leaving behind a stream of small data objects called data puddles. These non-traceable data puddles lead to severe data quality and data management problems across an enterprise. In the next section, I discuss how problems like data puddles are resolved by using data warehouse architectures.

Data Warehouse Categories

Terms used to define data warehouse objects vary from one vendor to another and can be divided into the following categories:

  • Data Warehouse. This is conceptually the same as defined by Inmon. It could be one large physical instance or a collection of several physical data object instances (detailed and aggregated), each serving a special purpose conforming to a grander corporate vision.

  • Data Mart. Data marts are stand-alone small data warehouses limited to a subject area (for example, Sales Analysis), as shown in Figure 1-1. Data marts can be extracted views of a corporate data warehouse. Such data marts, called dependent data marts, contain qualified data and conform to corporate data standards. Data marts built directly against the transaction systems, called independent data marts, are often deployed in less time. Such data marts are quickly implemented because they do not have to first conform to central data warehouse standards and processing, which usually takes a lot of time. Independent data marts often result in data quality problems because they do not conform to the corporate data standards.

    click to expand
    Figure 1-1: Independent and Dependent Data Marts.

  • Operational Data Store (ODS). The operational data store is a central data repository that consists of very detailed level transaction data. Data warehouses and data marts are built by fetching data from ODS instead of transaction systems. Moreover, ODS is a data consolidation and integration point for several transaction systems. Instead of building application-to-application interfaces, all transaction systems have access to data in ODS to view consolidated and detailed transaction-level information, such as a person's name and street address for shipping an order. This detailed information may be needed by a shipping application to print the shipping label and not necessarily needed in a warehouse or analytical application, where you may be interested in using the company or Zip code for analysis rather than individual names and street names. The ODS becomes the data hub for both data warehouses and transaction systems.

  • Extraprise Data Warehouse. Extraprise data warehouses are the future trend in data warehousing. Such data warehouses, along with typical decision support operations, become an integral component of enterprise business-critical applications-such as order administration, order fulfillment, Customer Relationship Management-across the globe, as well as meet business-to-business and business-to-consumer information needs. I discuss this in detail in Chapter 2.


Team-Fly


Business Information Warehouse for SAP
Business Information Warehouse for SAP (Prima Techs SAP Book Series)
ISBN: 0761523359
EAN: 2147483647
Year: 1999
Pages: 174
Authors: Naeem Hashmi

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net