|< Day Day Up >|| |
The term data warehouse first appeared in (Inmon, 1993), defined as a "subject oriented, integrated, nonvolatile, and time variant collection of data in support of management's decisions. "Data warehouses are closely related to OLAP technology, first introduced by Dr. E.F. Codd in 1993 (Codd, Codd, & Salley, 1993) to characterize the requirements of aggregation, consolidation, view production, formulae application and data synthesis in many dimensions. A data warehouse is a repository of information mainly coming from online transactional processing (OLTP) systems that provides data for analytical processing and decision support.
A multi dimensional view of data is not anything new (in fact, it is a very old concept): managers observe the evolution of interesting data organized in dimensions, such as products, clients, promotions, sell points, and, of course, time. The need of having simply and rapidly every historical information of operational systems has pushed companies to look for new ways of structuring and accessing their data to have advantage over their competitors. There is an agreement in that traditional database systems are not appropriate for multidimensional data analysis. Traditional OLTP systems are optimized for providing high performance in processing a lot of concurrent transactions. These transactions usually affect very few records. Meanwhile, multidimensional systems have to answer to complex queries (sometimes unpredictable) that need a huge number of records (Cabibbo & Torlone, 1998). In fact, as Ralph Kimball points out (Kimball, 1996), OLTP is profoundly different from dimensional data warehousing in their users, their data content and structures, their hardware and software, their administration and management, and their daily rhythms.
Because OLTP and OLAP environments are profoundly different, the techniques used for operational database design are inappropriate for data warehouse design (Kimball, 1996; Kimball, Reeves, & Thornthwaite, 1998).
The development of a data warehouse needs the integration of data mainly from legacy systems. The process of developing a data warehouse is, like any other task that implies some kind of preexisting resources integration, profoundly complex. This process is "labor-intensive, error-prone, and generally frustrating, leading a number of warehousing projects to be abandoned midway through development" (Srivastava & Chen, 1999).
In this respect, in recent years, there have been many proposals for some particular aspects involved in the data warehouse design process. However, although many solutions have been developed "for interesting subproblems like handling multidimensional data as typical requirement for data warehouses, view maintenance for aggregated data, data integration etc., combining these partial and often very abstract and formal solutions to an overall design methodology and warehousing strategy is still left over to the practitioners" (Gatziu, Jeusfeld, Staudt, & Vassiliou, 1999).
Despite the obvious importance of having a methodological support for the development of OLAP systems, the design process has received very little attention from the scientific community and product providers. Models usually utilized for operational database design (like E/R model) should not be used without further ado for analytical environment design. Attending only to technical reasons, databases obtained from E/Rmodels are inappropriate for decision support systems, in which query performance and data loading (including incremental loading) are important (Kimball, 1996). The multidimensional paradigm should be used not only in database queries but also during its design and maintenance. "To use the multidimensional paradigm during all development phases it is necessary to define dedicated conceptual, logical and physical data models for the paradigm and to develop a sound methodology which gives guidelines how to create and transform these models during the development process" (Dinter, Sapia, Blaschka, & Höfling, 1999). In (Wu & Buchmann, 1997) authors claim data warehouse design methodologies and tools "with the appropriate support for aggregation hierarchies, mapping between the multidimensional and the relational models, and cost models for partitioning and aggregation that can be used from the early design stages. "
There are a few proposals for data warehouse design (Cabibbo & Torlone, 1998; Debevoise, 1999; Giovinazzo, 2000; Golfarelli & Rizzi, 1999; Inmon, 1993). They are usually very focused from its initial phases on the relational model. Most of them are incomplete or starts from ideal assumptions such as that every information is included in a corporate E/R scheme. There are also many other partial proposals focused on issues such as model translations, view materialization, indexes, etc.
The problem with all these works is that they propose to use a new different methodology for data warehouse design, so organizations must use at least two totally different methodologies: one for OLTP environments and one for OLAP environments. We think it is better to integrate data warehouse design in existing methodologies, modifying and adding new activities, so that the training and learning curve for data warehouse design was less difficult.
In this chapter we present a methodology for the development of multidimensional data warehouses. The methodology is based on an existing traditional methodology and differs from the traditional one in three of its processes: analysis, design and construction. It allows the creation of multidimensional or relational databases, and it is supported by a CASE tool.
The rest of the chapter is organized as follows: we briefly present IDEA, the multidimensional model used in the methodology. Next, we outline a methodology for the development of multidimensional data warehouses. Then, a modeling example is developed. Finally, we present some conclusions.
|< Day Day Up >|| |