Growth Management


By conservative estimate, the data in a BI decision-support environment doubles every two years . The good news is that the cost per query for most BI decision-support environments goes down with properly managed growth. The bad news is that the overall cost climbs, assuming that more and more business people use the BI decision-support environment as time progresses. The three key growth areas to watch are data, usage, and hardware.

Growth in Data

Growth in data means not only adding new rows to the tables but also expanding the BI target databases with additional columns and new tables. Adding new columns to a dimension table is not as involved as adding new dimension tables to an existing star or snowflake schema, which usually requires the following:

  • Unloading the fact table

  • Adding another foreign key to the fact table to relate to the new dimension table

  • Recalculating the facts to a lower granularity (because of the new dimension)

  • Reloading the fact table

BI target databases need a large amount of disk space, with workspace and indices taking up as much as 25 to 40 percent of that space. In the relational world, the data is only a fraction of the overall database size; a major portion of it is index space. Indexing is required to provide better response time when enormous volumes of data are read.

graphics/hand_icon.gif

When calculating space requirements, it might be prudent to use the standard engineering maxim : Calculate how large the BI target databases will be (including indices), and then triple those numbers .

As data volumes increase, there needs to be a plan to aggregate and summarize the data as it ages. Business analysts rarely require the same level of granularity for very old data as they do for recent data. Therefore, the level of granularity should decrease with a moving calendar. For example, assume the business people want to store ten years of historical data. They require monthly summary data by department for two years but are satisfied with monthly summaries by region for the remaining eight years. Before a new month is loaded into the BI target database, the department-level data for the 24th month is summarized into regional totals and rolled off into another fact table so that the 23rd month becomes the 24th month, the 22nd month becomes the 23rd month, and so on.

The following list contains some of the new technologies available to support the massive data volumes and the analysis capabilities of these huge databases:

  • Parallel technologies

  • Multidimensional databases

  • New indexing technologies

  • Relational online analytical processing (ROLAP) tools

  • Distributed database maintenance tools and utilities

Growth in Usage

Another key growth area is usage. Organizations that have built successful BI applications have often uncovered a pent-up need for information throughout the organization. This need translates to more business people using the existing BI applications and asking for new ones. The number of business people accessing the BI target databases can easily double or triple every year, which drives up growth in usage exponentially. Since different business people want to see different data and look at it in different ways, they want to slice and dice the data by new business dimensions, which increases the data volume. Although the data volume is a far more critical factor in determining processor requirements, the number of people accessing the BI target databases is equally important.

Technicians view growth in usage as something negative. Managers, however, think of growth in usage as something positive, as long as there is a return on investment (ROI). Purchasing new hardware or updating existing hardware to handle the growth may not be a concern if the organization is making a sizable profit due to better decision-making capabilities with the BI applications. Therefore, growth in usage may mean that the BI strategy is working.

graphics/hand_icon.gif

BI target databases are by nature read-only and grow-only. Therefore, the key is to stop trying to conserve disk space if the BI applications and BI data are helping the organization make a profit.

Growth in Hardware

Given the information about the growth in data and the growth in usage, it should be obvious that scalability of the BI hardware architecture is key. But before you plan five years ahead, remember that the hardware cost is only one part of the total BI cost. Look at a planning horizon of 12 to 24 months; it is best to start small but also plan for long- term growth. Consider the following factors.

  • Keep in mind the capacity threshold of your BI platform. If you exceed that capacity, you have to add more processors, I/O channels, independent disk controllers, and other high-speed components to keep the BI decision-support environment at an acceptable level of performance.

  • Of all the BI hardware, the BI server platform is the most important. When ordering new hardware of any kind, there must be enough lead time to have the equipment delivered, tested , and prepared for the development and production environments.

  • Parallel technology is an absolute must for VLDBs. The ability to store data across striped disks and the ability to have multiple independent disk controllers play an enormously important role in the performance of processes running against the BI target databases.

  • The Transmission Control Protocol/Internet Protocol (TCP/IP) is appropriate for most hardware platforms. TCP/IP is rapidly becoming a standard for scalability, growth considerations, and multiplatform environments.

  • Consider the advantages of a data mart approach with separate BI target databases. This approach permits scalability in smaller and less expensive increments .



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net