Although data warehousing has matured as a technology over the last several years, data warehouse developers face new challenges as the business and technology environments change. The database must be built for performance and reliability and must be able to scale with the business as it grows.
As large data warehouses are growing to many terabytes in size, with increasingly higher availability requirements, it is critical to maintain good performance for large numbers of geographically distributed users. Backup and recovery procedures must be established, and both the data content and the use or activity in the warehouse must be managed.
The decision support workload is highly variable. In an OLTP system, an application is tuned to process many identical update transactions as quickly as possible. In a data warehouse, performance must be tuned to process as many variable queries as possible.
Usage patterns provide the foundation for tuning the warehouse for better performance. Who is using what data? What levels of summarization are they looking at? What data is not being used? Is the data structured in the most efficient manner; is it indexed on the correct columns? Can the summary tables be used for most queries? If many queries join data from one table with another, it might be beneficial to denormalize the data, prejoining the data. Workload information helps determine where indexes should be added, where tables should be combined, and where summaries should be created.
Eventually, it may no longer be necessary or practical to keep all the detail data on-line for immediate access. The data may be purged without keeping a copy. Or the data can be archived and moved to some low-cost medium such as tape or CD-ROM, where it can later be retrieved if necessary.
Metadata is data that describes the other data and operations on that data. Metadata can be used for either technical or business purposes. As data flows from the operational systems into the warehouse, it is extracted, transformed, and summarized. Technical metadata is needed to describe this process and is essential for proper "drill down" to finer levels of detail. Business metadata allows end users to determine what data is available in the warehouse or data mart and how the data can be accessed. Metadata provides the integration and uniformity of data across the corporation. It is the place where the different departments describe their use of the term "product." Metadata is stored in a repository, which is typically a set of tables in an Oracle database. It can then be shared by any user or tool.
In 2000 the Object Management Group (OMG) published the "Common Warehouse Metamodel (CWM) Specification," which defines a metadata format for all data warehouse and business intelligence products. The specification was developed jointly by several companies, including Oracle and IBM Corporation. Since the publication of the specification, many of the data warehousing products have evolved to adhere to the standard.
One of the biggest technology issues facing enterprises today is the explosion of data volumes, which is expected to occur over the next several years. Data warehouses are "the" very large databases. The size of databases is increasing more rapidly now than ever before. There are many reasons for this growth, including the following:
With hardware improvements and storage costs continuing to decline each year, it is economically feasible to keep more and more detailed historical data. You may now be able to store a record of every product a customer bought in the supermarket, not just the fact that they bought five items for a total cost of $25.75.
Businesses are storing more and more data for longer periods of time.
Data is stored multiple times for different purposes. Indexes and materialized views are created to improve query performance, but these access structures require additional storage space, further increasing the size of the database.
Unstructured data can be integrated with traditional business intelligence applications. Storing multimedia data increases the database size. To store one hour of video requires about 1 GB of storage. To store one minute of audio requires a little less than 1 MB. Images can range from 20 KB to as much as 60 MB depending on the type and quality of the image.
Documents can be tagged with XML-based metadata and stored in Oracle 9i. Clickstream data is expected to add terabytes of data to the warehouse.
But remember that a data warehouse should not be viewed as a repository for archived data; this is not its purpose.
Ensuring the availability of the data warehouse is becoming more and more mission critical for many businesses. As data warehouses are becoming more operational in nature, feeding information back to the OLTP systems in businesses that operate globally, users need access to the data warehouse 24/7.
Oracle 9i Database is designed to eliminate the need for planned downtime and withstand any failure: system failure, storage failure, site failure, or human error. If a server goes down, your applications keep running. Real application clusters (RAC) make applications scalable and highly available. A single database can be run on a group of servers clustered together. As additional servers are added to the cluster, applications can scale to support increased throughput, with no modification.
Data Guard can be used to maintain a transactionally consistent copy of the data warehouse, which can be used to ensure operations continue with minimal interruption if there were a disaster, human error, or data corruption.
The ability to publish reports on the Web makes information available to many more people. As data warehouses and business intelligence tools make more and better data available, the number of end users continues to grow. The demand for better performance is more important than ever. In addition, the types of queries are increasing in complexity.
With the average user only staying on a Web page five seconds, information must be up-to-date and available instantaneously. In most cases it must be precomputed and cached.
Data warehouses are being used to support new types of e-business initiatives, including customer relationship management (CRM) and supply chain management. CRM helps attract new customers and develop customer loyalty, important in the retention of existing customers. A data ware-house contains the information about a company's customers and is often the integration point for sales, marketing, and customer care applications.