Data Warehouse vs. Big Database


One of the key mistakes people make is labeling their database as a data warehouse solely based on its size. Over the past decade , three phenomena have occurred resulting in major increases in average database size:

  • The cost of space versus the value of the data has decreased.

  • Companies now value the data as a critical business asset.

  • Companies have merged into large multi-national entities.

In other words, the cost of keeping data online is cheap, the perceived value of that data is now very high, and the size of companies and their data needs have grown. As such, many of today's OLTP and ODS databases routinely grow into the 100 “800 gigabyte (GB) range. But that does not make them data warehouses. For example, SAP and PeopleSoft enterprise resource planning (ERP) databases of 400 GB or more are not uncommon, yet they are not data warehouses, even at these extremely large sizes. Remember, size alone does not a data warehouse make.

The simplest way to avoid labeling a large database as a data warehouse is to add some DBA-centric questions and answers to the description of the nature of that database. For each subject area in your data warehouse, simply ask the physical DBA to provide estimates for the following seven items:

  • The number of tables

  • Average big table row count

  • Average big table size in GB

  • Largest table's row count

  • Largest table's size in GB

  • Largest transaction rollback needed in GB

  • Largest temporary segment needed in GB

Data warehouses generally have fewer, larger tables, whereas non-data warehouse databases usually possess more, smaller tables. Of additional interest are the temporary and rollback segment needs of the database. Data warehouses tend to need them as large as the largest object (for rebuilds), whereas non-data warehouse databases only need them large enough for the largest transaction.

Use the criteria outlined in Table 1-2 for your evaluation.

Table 1-2. General Database Application Characteristics
 

OLTP

ODS

OLAP

DM / DW

Number of Tables

100 “1000's

100 “1000's

10 “100's

10 “100's

Average Table's Row Count

10's of Thousands

10's of Thousands

10 “100's of Millions

100 “1000's of Millions

Average Table's Size in GB

10's of MB

10's of MB

10's of GB

10 “100's of GB

Largest Table's Row Count

10 “100's of Millions

10 “100's of Millions

10 “100's of Millions

100 “10,000's of Millions

Largest Table's Size in GB

10's of GB

10's of GB

10's of GB

10 “100's of GB

Rollback Segment's Size in GB

100's of MB

100's of MB

N/A

10 “100's of GB

Temp Segment's Size in GB

100's of MB

100's of MB

N/A

10 “100's of GB

Continuing with our previous example, suppose your requirements are as follows :

  • 8 tables

  • 500 million rows per big table

  • 50 GB per big table

  • 2 billion rows for largest table

  • 160 GB for largest table

  • 160 GB to rebuild largest table

  • 60 GB to rebuild largest index

From this example, we can again discern that we have a data mart or data warehouse. First, we have very few tables. A typical OLTP or ERP database would have hundreds or even thousands of tables. Second, the row counts of our smallest big table and largest table have the right order of magnitude. Row counts expressed with lots of zeros or in powers of ten greater than ten (e.g., 10 10 ) are more likely to be in data warehouses. Finally, look at our rollback and temporary segments' needs. They're as big as some entire databases!

While it may seem like I've once again painted an example tailored to the conclusion, I've actually found the process to be this straightforward and easy in most cases as well. Unfortunately, these days, people tend to call any very large database a data warehouse. Once again, it's okay for people to call their projects whatever they like. But as pointed out, the techniques in this book only apply to the DM/DW column of Table 1-2.



Oracle DBA Guide to Data Warehousing and Star Schemas
Oracle DBA Guide to Data Warehousing and Star Schemas
ISBN: 0130325848
EAN: 2147483647
Year: 2003
Pages: 79
Authors: Bert Scalzo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net