10.8 The CWMI Architecture

Team-Fly    

 
Internet-Enabled Business Intelligence
By William A. Giovinazzo
Table of Contents
Chapter 10.  Common Warehouse Metadata

10.8 The CWMI Architecture

CWMI is an IEBI metadata framework. To provide for all the components of an IEBI system, this includes more than just the metadata concerning the target data structures. It also includes the metadata for the data warehouse processes and the data sources. The data warehouse processes addressed by CWMI deal with the creation and management of the data. Again, the goal of CWMI is to provide a means of metadata interchange among analytical tools. To meet this variety of needs, CWMI consists of a number of submetamodels, as shown in Figure 10.8.

  • Foundation Metamodels for the representation of model elements representing shared concepts and structures.

  • Data Resources Metamodels for the representation of object-oriented, relational, record, multidimensional, and XML data sources.

  • Data Analysis Metamodels for the representation of data transformation, OLAP, data mining, visualization, and business nomenclature .

  • Warehouse Management Metamodels for the representation of data warehouse processes, including the representation of the results of the operations.

Figure 10.8. The CWMI metamodel.

graphics/10fig08.gif

CWMI uses packages . Packages are a mechanism by which we can control the complexity of CWMI by creating logical groupings of interrelated classes. The developer can then focus attention on the individual metamodel packages, using them independently of the others. Although the packages are independent, by virtue of their integration into the overall CWMI architecture, they can share a common purpose. In the sections that follow, we explore each submetamodel and its contribution to the overall CWMI architecture.

10.8.1 CWMI FOUNDATION

The CWMI Foundation layer contains the metamodels of concepts and structures that are common to the other CWMI packages. The Foundation metamodels sit between the more general world of the object model and the IEBI-specific metamodels of other CWMI packages. The object models at the very base of this structure are general metamodels that can be applied to any number of diverse areas. The Foundation metamodels, however, act as the basis for the entire CWMI architecture. The other CWMI metamodels can extend the Foundation metamodels to meet a specific need. As a result the packages within this layer are less specific to IEBI and have a more general flavor than the metamodels in the other categories. Let's examine each in a bit more detail.

10.8.1.1 Business Information

The Business Information metamodel is meant to provide a means to define business-oriented information. Some metadata relates not to the things represented within the IEBI system, but to the IEBI system itself. The Business Information metamodel represents this type of data. This is not the representation of a complete IEBI metamodel, but of the business information around the data warehouse and IEBI system.

In the Business Information metamodel, we find the classes Document, ResponsibleParty, and Description. The ResponsibleParty class contains information pertaining to the parties responsible for the IEBI system, including who they are and how they might be contacted. Likewise, the Document metamodel provides information pertaining to the documentation of the IEBI system itself, while the Description class provides general information describing the system.

10.8.1.2 Data Types

In establishing a metamodel for data types, we again encounter the challenge that the diversity of IEBI tools present. Since there are so many different IEBI tools, it would be difficult to establish in advance the data types necessary to meet all their needs. The reasons these tools are incompatible are as varied as the tools themselves . In some instances, it may be as simple as an ISV attempting to differentiate itself in the market. In other cases, dependencies on hardware or implementation language may contribute to the variation. The Foundation metamodel therefore does not define a specific set of data types.

There is a recognized need to define data types that may be specific to a particular environment. The desire for the exchange of data between systems of differing types is also recognized. To meet these needs, the Foundation metamodel first provides the definition of several generally common data types, which are included more to serve as an example in the appropriate use of the metamodel than anything else. Second, the Foundation metamodel contains data types that are necessary for the exchange of information among diverse tools and systems.

10.8.1.3 Expression

The Expression metamodel provides a means for the other packages with CWMI as well as IEBI tools to define expressions in a common form. By describing expressions in a common form, the expression can be exchanged between systems. This makes it possible for systems to share transformations and mappings as well as how data elements within the system are derived. Another important aspect is the ability to provide lineage tracking. If we are going to use a particular transformation or equation in our system, we would like to know its source. The CWMI model, by storing the data in a common form, provides this capability.

What is interesting about the Expression metamodel is that it defines all expressions in terms of expression trees. Take the simple expression X + B . We can rephrase this to be sum(X + B). We can then construct a hierarchy from this expression similar to the one shown in Figure 10.9 ( a ), or we can create an even deeper structure using this same hierarchical approach. As an example, let's look at one of my personal favorite expressions, y = mx + b . We could rephrase this to be sum(multiply(m,x),b) , in which case we would have a tree structure similar to the one shown in Figure 10.9 ( b ). By the way, for those of you not familiar with this equation, this is the equation of a line. I used it extensively in the days when I wrote computer graphics programs.

Figure 10.9. Expression trees.

graphics/10fig09.gif

10.8.1.4 Keys Indexes

The Keys Indexes metamodel describes keys and indexes. Okay, I know you're shocked. The term keys here applies to the data elements that specify a particular instance of an object. An index is the means by which these elements are sorted. We are still in the Foundation layer, so this metamodel simply defines expressions of base concepts, such as unique constraints and relationships. The other packages within the CWMI structure or other IEBI tools build on these base elements.

The Keys Indexes metamodel is of great importance to the data warehouse architect. My previous book, Object-Oriented Data Warehouse Design , discusses the importance of keys and establishing abstract keys in the data warehouse. At the same time, we do not wish to lose the keys that are part of the system of record. The decision maker would like to have the ability to drill down to the atomic level of the data. In some instances, he or she might even desire to trace the data back to the system of record. This makes it important for the data warehouse architect to bring the keys over with the original data. As we established earlier, we need more than just the data. We need the metadata as well. The Keys Index metamodel is the basis by which we can communicate this data.

10.8.1.5 Type Mapping

The Type Mapping metamodel is used when different systems have data types that are not quite the same. The Type Mapping metamodel provides a means by which these differing types can be mapped between systems. The data between these systems can then be exchanged. The metamodel provides the data warehouse architect with the ability to create multiple mappings between two data types and to specify which of the two is preferred.

10.8.1.6 Software Deployment

The Software Deployment metamodel describes how software is used within the IEBI system. As we examine the metamodel, we see such objects as Deployment Components, Machine Objects, Data Managers, and Data Providers. All of these objects work together to provide a complete picture of how and where the software is being used.

Let's look at this a bit more closely. The Deployed Component Object defines a specific component on a specific computer within the IEBI system. If, for example, we are working in an environment with multiple dependent data marts, we might have the same multidimensional analysis tool operating on two different systems. Each instance of the tool will have a separate deployed component. The systems upon which these separate instances are running will each be described by their own instance of the Machine object.

There are multiple Deployed Component subclasses. A database management system (DBMS) is a Data Manager. These objects are associated with data containers entities such as schemas, relational catalogs, and files that provide access to data. Another subclass of Deployed Component is a Data Provider. Providers are the means by which data within the Data Manager is accessed. We would expect a Data Provider to incorporate Java Database Connectivity (JDBC), which provides a client with access to the database.

Using the Software Deployment metamodel as a base, we can see how a complete environment can be described within the system. Again, we need to remind ourselves that we are not just working with a BI system, but with an IEBI system in an environment with multiple systems operating in conjunction with one another. Such an environment is much more complex than a transaction-processing system, where one system operates within clearly defined parameters. It is even more complex than simple BI, where the BI system does not reach outside of the organization. IEBI is a system of many diverse systems that may or may not reside within the same organization. We all know the theme of this chapter by now: Where there is data, there is metadata. The Software Deployment metamodel provides the structure for this metadata.

10.8.2 CWMI DATA RESOURCE

The Data Resource layer includes metamodels for the definition of data resources: relational, record, object-oriented, and multidimensional. These are all base-level data resources from which we draw our data. As we progress up the CWMI framework, we progress from the general to the specific. The Foundation layer is very general; it simply provides a basis for the construction of other objects. The Data Resource layer provides us with a metamodel for the description of our data source. In the following subsections, we examine each metamodel within this layer.

10.8.2.1 Object Model

The Object Model contains the features, and only those features, of UML that are necessary for the creation of CWMI metamodel classes. Other CWMI packages use the Object Model for the creation of their own metamodel classes. By making the Object Model a subset of the UML, the CWMI packages can take advantage of the benefits of the UML without being encumbered by the weight of its full breadth and scope.

The Object Model attempts to thread the eye of the proverbial needle, providing simplicity while sharing a common functionality. It is divided into four subpackages: Core, Behavioral, Instance, and Relationship. The Core package acts as the basis for the other packages, providing the elements necessary for the common functionality. The other three packages are based on the Core package. The functionality of the packages are independent of one another, so the implementation of one, such as the Instance, does not require the implementation of another, such as the Behavioral.

Each package within the Object Model collects classes and associations that describe some subset of CWMI types. The behavioral metamodel, for example, collects the classes and associations that describe the behavior. It acts as a foundation for recording the invocation of defined behaviors. In like manner, the Relationship metamodel collects the classes and associations for the description of relationships between the objects within the CWMI repository. The Instance metamodel provides for the inclusion of an actual instance of the data with the metadata.

The Instance metamodel may be a bit confusing without an example, so let's take a moment to look at this more closely. The Instance metamodel is useful in situations like the one shown in Figure 10.10. My previous book, Object Oriented Data Warehouse Design , discusses the use of a self-referencing data structure to represent corporate structures. The structure shows two companies related to one another. The company has two ends, the parent company and the child. Each instance of the Corporate Structure association has a sting-value attribute describing the relationship. This relationship can be shown as CWMI Object Model metaclasses: Class, Attribute, Data Type Association, and Association End.

Figure 10.10. Instance metamodel.

graphics/10fig10.gif

10.8.2.2 Relational

The Relational package deals with relational data resources. It describes data sources from which data is retrieved via Structured Query Language (SQL), Open Data Base Connectivity (ODBC), or JDBC. The top-level container of the Relational package is the Catalog, the unit managed by a data resource. Inside are the catalog schemas, which are composed of tables. These tables are comprised of columns of specific data types. The Relational package also addresses indexing, primary keys, and foreign keys. As we see throughout the higher levels of the CWMI architecture, the Relational package extends the structures established in both the Foundation layer and Object packages.

10.8.2.3 Record

The Record metamodel is part of the Data Resource Layer. Its purpose, along with the other packages within this layer, is to describe a data resource. One such resource type can be a record. The CWMI model uses the Record metamodel to cover a great many different types of data resources. A record can include a variety of structures and is not limited to what one might traditionally think of as a record. These resources can extend beyond what is found in files and databases to include structured data types within languages or documents. The Record metamodel can be used to describe any structure that has a hierarchical nature. The only exception to what is included within a record type are structures whose only use is in a specific language. Such structures or record types are best addressed in an extension to the CWMI architecture.

10.8.2.4 Multidimensional

Just as we have a Relational metamodel to represent Relational data resources, we also have a metamodel for multidimensional data resources. These are used to represent multidimensional resources that are actually represented by Multidimensional database systems (MDBS). The MDBS world differs greatly from its relational database management systems (RDBMS) cousin. In the relational world, we are used to certain standards and constructs. Unfortunately, this is not the case with MDBS. Such OLAP concepts as dimensions and hierarchies are implemented within the MDBS engine. These engines are proprietary, and there is no published standard on the representation of multidimensional databases. The metamodel is therefore general in nature. Extensions can be made to provide for the specific OLAP tools.

10.8.2.5 XML

The final data resource, while not yet the most important data resource, is certainly growing in significance. In Chapter 9, we discussed the importance of XML in the exchange of information. It is quickly being accepted by many as the standard language of exchange between systems. As such, XML is an important data resource for IEBI. The XML metamodel describes XML data resources. While the version of the metamodel is based on XML 1.0, the XML metamodel will be revised as modifications to XML are adopted by the W3C.

The XML metamodel is composed of a schema. An XML schema is composed of element types. Element types are definitions and declarations of XML attributes and content models. A specific element type definition can define an attribute, content model, or both. An attribute can have a default of required, implied , default, or fixed, while content models can be either empty, any, mixed, or element . A content model of the type element consists of specified element type references, element content models, or both. Mixed content models are composed of character data and element type references. Finally, content models of the type any can consist of any element types.

10.8.3 DATA ANALYSIS

The Data Analysis layer of the CWMI packages deals with the use of the data. Whereas the Data Resources layer dealt with the source of the data, the Data Analysis layer deals with what is done with the data once it is extracted from this source. In the layer, we see metamodel packages for the representation of the data transformations, OLAP, data mining, information visualization, and business nomenclature. Each is concerned with the use of the data as it comes from a data source.

10.8.3.1 Transformation

As we discussed in Chapter 3, the first step in the BI loop is the ETL of the data. Transformation is the process that converts the format and content of the data to be consistent with the data warehouse. As one can well imagine, the transformation process is a core function of the BI loop. We must be able to share the metadata concerning this process between systems.

The Transformation metamodel provides the mechanism for the exchange of metadata concerning the transformation process. The Transformation metamodel associates a transformation with the data sources and targets. The sources and targets can be object-oriented or relational data types. The granularity of the data can be a class, attribute, table, or column. We relate the source and target data through the transformation. The relationship can happen at a coarse level, or high level, of granularity where the specifics of how one data element relates to another is unknown. This is known as a black box transformation. We can also be more specific in the description of the relationship and define how a specific piece of data relates to another. In this case, the specific mapping of the data is described. This is referred to as a white box transformation.

10.8.3.2 OLAP

The OLAP metamodel package is used to described the features most common to OLAP systems. First and foremost, the metamodel must include a means to describe a multidimensional view of the data. The OLAP metamodel must also support time-series and what-if scenario analyses. The metamodel must also support the ability to drill down and roll up data along a hierarchy. The OLAP metamodel provides for a mapping of these structures onto an actual implementation, as described in the CWMI Relational and Multidimensional packages.

10.8.3.3 Data Mining

As we discussed in Chapter 3, data mining is the process of finding patterns that are hidden in the data. Perhaps we search for the demographics of the people who visit our Web site that are most likely to buy. We discover these patterns by examining a known data set, such as the demographics of people who actually purchased products. We then search data with unknown results for those patterns. The Data Mining metamodel provides the structure for the metadata that describes this process.

In previous chapters, we noted that the data mining process entails the construction of a model that is controlled by settings. The model is associated with its own set of attributes. Within CWMI the Data Mining model is defined by the MiningModel. The settings for this model are defined by ModelSettings, and the attributes are defined by the ApplicationInputSpecification. The MiningModelResult defines the metadata for the results of the data mining operation.

10.8.3.4 Information Visualization

Ultimately, we need to get the data out of the system. Information Visualization is concerned with data outputthe Decision Support System (DSS) level of the BI loop, which is presented in Chapter 3. DSS is a very broad category with systems ranging from simple reporting to graphics tools that display information from a variety of viewpoints. Since visualization is such a diverse category, the Information Visualization package is generic, with container-like structures.

10.8.3.5 Business Nomenclature

In section 10.3, we discussed the different types of metadata. Up to this point we have been discussing mainly what is traditionally thought of as metadata, data structure, and format. The Business Nomenclature metamodel looks at the data from the business perspective. It is concerned with how the data was derived, from which data sources it was derived, as well as the DSS tools used to examine the data.

The objective of CWMI is to provide for the exchange of metadata. To the business strategist, this form of metadata is perhaps the most critical. Key to understanding the validity of a piece of data is to know the origin of the data. "Where did you get your numbers ?" As data moves from the source systems through the data warehouse to the business strategist, the business metadata travels with it. The Business Nomenclature metamodel provides the vehicle for the exchange of this data by each system in this process.

10.8.4 WAREHOUSE MANAGEMENT

The final layer of metamodel packages is the Warehouse Management layer, which is the topmost layer in the CWMI architecture. It represents the warehouse processes as well as the results of these operations.

10.8.4.1 Warehouse Process Package

The Warehouse Process package defines the processes within a transformation. A Warehouse Process object relates a transformation and the events used to trigger the transformation. The transformation process itself can be viewed as either a complete process using the TransformationActivity object or at a more granular level with the TransformationSteps object. The WarehouseProcess object is either of the subtype WarehouseActivity for the representation of TransformationActivity or of the subtype WarehouseStep for TransformationStep.

The WarehouseProcess that represents the transformation process is related to one or more events identified by WarehouseEvents. There are three types of Warehouse events. A Schedule event occurs at specific points in time or at regular intervals, such as every two days. Events can also be external events, which are events that occur outside of the data warehouse. Internal events are events that occur within the data warehouse.

10.8.4.2 Warehouse Operation

The Warehouse Operations metamodel package deals with the daily operations of the data warehouse. The data contained within this metamodel deals with not the structure of the warehouse or the data contained within it, but with the operation of the warehouse itself. We see in this package such operational considerations as Transformation Executions, Measurements, and Change Requests.

The Transformation Execution package describes the most recent executions of transformation. This data is used to determine the timeliness of the data within the data warehouse. It also can be used to record the history of the data warehouse. The history of the warehouse includes a record of when data was incorporated into the warehouse, the transformation processes, and the originating system. The measurement package provides for the application of measurements to model objects. This could include such things as the anticipated or planned size of the object. The Changes Request package provides for the recording of proposed changes to the data warehouse. Data warehouse architects can also use this metamodel to keep a record of which changes were actually made to the data warehouse and which were rejected.


Team-Fly    
Top
 


Internet-Enabled Business Intelligence
Internet-Enabled Business Intelligence
ISBN: 0130409510
EAN: 2147483647
Year: 2002
Pages: 113

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net