Every company of a reasonable size has some major systems that run the business. These systems are known as OLTP (online transaction processing) systems and are often responsible for the vital processes such as handling orders and invoices. Because of their key role, they usually end up storing the most critical information that the business relies on, such as the list of how much money customers owe or how much the company owes in tax.
Most OLTP systems handle many thousands of individual transactions in a day. The goals of transaction systems are primarily to provide consistency of the information and the ability to support additions and modifications to typically small pieces of data at a time. These requirements are fairly standard across many OLTP systems and have led to the broad adoption of a specific approach to organizing the data in these databases.
The data model for these systems is usually produced through a process of entity-relationship (ER) modeling, which leads to a normalized structure in which each entity has its own separate table that is related to the others, as shown in Figure 1-1. The normalized data model is a great fit for OLTP's requirements because it ensures that every piece of information exists only in one place and can be updated easily and efficiently.
Figure 1-1. OLTP database schema
These data models typically contain dozens or even hundreds of separate tables, most of which are connected to the others through large numbers of relationships. The normalized relational database has become such a common feature of systems that many database administrators (DBAs) and application designers can glance at a new report and automatically form a picture in their heads of a normalized data model that would fit.
Many people use reports directly from their company's enterprise resource planning (ERP) system or other major systems all the time, but the kind of information that can easily be retrieved is restricted by the design and purpose of a transaction system. Using operational systems for standard reporting works well for operational-level data such as reports on specific customer records or order transactions, but trying to understand your entire business by analyzing detailed transactions is unlikely to prove successful.
Why OLTP Reporting and Analysis Fails to Deliver
The really interesting questions that business users would like to answer almost always touch much more data than single transactions or records, such as "Which product category sold best in the northwest last year?" followed by "So, what kinds of customers were buying that product category in the region?."
OLTP systems are the systems that run a business. The OLTP system is a "live" picture of the current state of the business that is changing underneath the users as they do their analysis. If they run one report that shows the totals by region, then another report that shows the details, the totals might not correspond if more data has been entered in between running the reports. Also, trying to use these systems to actually understand the business as it runs is a risky proposition because it will almost certainly affect the performance and availability of system resources.
Every interesting query against the OLTP schema shown in Figure 1-1 will likely involve lots of different tables and joins with filters against the data. The performance of those queries is probably not going to be good for any database of reasonable size, regardless of the hardware and software you are using. Even optimizing the tables for this kind of query is usually not an option: Remember that OLTP systems must first and foremost provide fast, atomic updates.
One of the most important reasons that OLTP systems fail to deliver BI is related to the restricted ways that users can access the information, which is usually via static or parameterized reports that were designed and published by the IT department. Because of the complexity of the database and the performance implications of a user possibly launching a huge, poorly designed query that takes eight hours to complete on the live OLTP system, the users are restricted to accessing specific sets of information in a prescribed way.
The promise of "end-user reporting" tools that people could use to create their own reports on their desktops never really materialized. Even when reporting tools started to get user-friendly Windows interfaces, the complexity of the schema in the transaction systems defeated most attempts to provide access directly to users. Ultimately, they are still restricted by the database design and the operational requirements for the transaction system.
Despite all the drawbacks we have just described, there is an even more compelling problem with trying to use an OLTP system directly as the vehicle for intelligent analysis. Every organization we have ever worked with has valuable information that is spread out in different areas, from the HR department's system to the spreadsheet that contains next year's budget. The solution to the problem of providing access to information must lie outside a single transaction system. The solution lies in a separate system: a data warehouse.