Data Warehouse Performance Metrics

Team-Fly

It is a known fact that once you have a data warehouse, sooner or later you will start receiving feedback on poor data warehouse performance. Data warehouse performance has different meanings to different users. For example, an end user may decide a data warehouse is performing poorly because it takes too long (several minutes) to run a report that displays requested data. This poor response may happen for several reasons. It could be that the end-user workstation does not have enough memory to display graphics, the database engine is not configured properly, or the database server has a slow processor. Perhaps the user encounters slow response at a certain time of the day. But what does this all mean? How do we assess data warehouse performance?

A data warehouse has several characteristics-dimensions-responsible for its performance. Often only one dimension contributes to poor data warehouse performance. For example, narrow network bandwidth causes delays in moving data from transaction systems as well as delivering information to end users while data transformation and end-user query execution are not performance issues. To end users, poor network bandwidth is perceived as poor data warehouse performance.

Figure 1-4 shows characteristics of a high-performance data warehouse. It shows that this data warehouse supports a large number of users. Most users issue less complex queries without any performance issues and have a low data warehouse operations/maintenance time requirement, making this data warehouse highly available. Under this environment, everyone is satisfied with the data warehouse performance. In reality, this seldom happens.

click to expand
Figure 1-4: Data Warehouse Performance Characteristics.

The number of users, however, is not a good indication of data warehouse performance. One hundred users may be issuing simple queries and not have a problem until one analyst starts a complex analytical job that virtually joins a large number of tables to do complex sales trend analysis. This exercise will probably block the data warehouse server for hours. Data warehouse architects must design governing features to control such runaway resource-consuming processes during peak hours.

Improving data warehouse behavior associated with individual performance dimension translates into significant cost. For example, if data extraction from OLTP systems takes more time than expected due to an increase in the company business, what can be done to fetch all newly changed data quickly? Assume that the OLTP systems are made available for four hours each day to do regular operations and maintenance. During this period, it is also expected to extract new transaction data for data warehouses, but this is not working. It requires one additional hour to extract current transaction volumes for data warehouses.

Due to the high business activity that results from global operations, businesses cannot afford to freeze order-processing activities beyond four hours per day. So what choices do you have?

Keep OLTP systems down one additional hour to complete data extraction for the data warehouse and lose new business, or just extract all data possible during the maintenance time and then stop-complete or incomplete data for the data warehouse?

The consequence of not having complete order operation information in the data warehouse is that the planning, finance, sales, and marketing organizations will not have a full view of corporate operations, such as product inventories and what to stock to fulfill consumers' demands. Not having this information will result in loss of business revenues as well.

The performance issue here is that extracting complete data sets from OLTP and loading that information in a data warehouse all in one step can consume significant OLTP and data warehouse resources, such as the locking up of source and target data objects, network bandwidth, and CPU/memory usage.

One possible solution is to keep the data extraction process out of the daily OLTP maintenance operation and break down large data extraction processes into multiple tasks. Each task is scheduled several times during regular OLTP business operations to extract and move new data in the data warehouse in an operational data store. Then refresh the data warehouse once or twice a day by combining all incremental data sets from the operational data store without touching the OLTP systems. Of course, this will require planning, negotiating options with business organizations, and work to modify data extraction code to accommodate the new extraction scheme, hence resulting in expense.

Note 

In the data warehousing world, you will run into business- and culture-related issues more often than technical. Success of a data warehouse requires creative thinkers with superb negotiation and people skills to resolve organizational issues while keeping business profitability in mind.

Change capture in OLTP applications is a complex process. Application vendors are implementing methods to move changes to data warehouses without locking OLTP activities. SAP has implemented sophisticated change capture methods in OLTP R/3 applications. I will discuss SAP BW change capture techniques in detail in Chapters 2, 8, and 13.

Data scrubbing and transformations often take a lot of time in preparing data for a data warehouse. When, where, and how such transformations are applied can impact data warehouse refresh time. For example, you need to qualify an incoming order to assure product ID, customer ID, sales organization, and order quantity before loading it in the data warehouse. In this case, it will be worth doing such qualification during the change capture process in an OLTP system and not in the data warehouse. Then do additional transformation and aggregations in the data warehouse.

Loading large data volumes in a data warehouse is an enormous challenge. Today, in a 24-7 environment, it is not easy to bring down a data warehouse for several hours a day to refresh with new changes. Reducing large data load time requires a combination of high-performance hardware gears and software components to load and process data in parallel. Another option to reduce data load time in a data warehouse is to drop table indexes prior to loading data: Load data in the warehouse and re-create indexes.

Network topology is one area that needs special consideration when measuring data warehouse performance. A high-speed network connection between OLTP and data warehouse servers will definitely reduce the time it takes to transport large data volumes. If your data warehouse uses multi-tiered architecture, put data-volume-intense application servers on a high-speed network channel with fewer network hops. Moreover, put OLTP and OLAP users on separate network routes. This way OLAP users do not cause any network contention for the OLTP users.

The next chapter discusses how SAP BW addresses such data warehouse performance metrics when implementing an extraprise data warehouse.


Team-Fly


Business Information Warehouse for SAP
Business Information Warehouse for SAP (Prima Techs SAP Book Series)
ISBN: 0761523359
EAN: 2147483647
Year: 1999
Pages: 174
Authors: Naeem Hashmi

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net