Populating the Meta Data Repository


Populating a meta data repository is usually not a manual effort. A meta data repository receives most of its meta data from many different meta data sources. These meta data sources are controlled by people other than the meta data administrator, as illustrated in Figure 14.1. A meta data migration process has to be developed to extract the meta data from these sources, associate (link) related meta data components , and populate the meta data repository. The different sources for the meta data are briefly discussed below.

  • Word processing files can be manuals, procedures, and other less formal documents that contain data definitions and business rules. Embedded in these business rules could be policies about the data, processing rules, data domains (in code translation manuals), and sundry notes about the history and ownership of the data or the processes performed on the data. Some word processing files also contain valuable technical documentation describing data and process rules enforced by programs.

    graphics/hand_icon.gif

    Be cautious with word processing files. These files are rarely maintained , and the information contained in them could be out of date and no longer applicable .

  • Spreadsheets contain calculations and macros, which are executed on the data in the private spreadsheets of business analysts after they have downloaded the data from the various operational systems. These calculations and macros could be the source for transformation rules, cleansing rules, derivations , aggregations, and summarizations.

  • CASE tools contain the names , definitions, sizes, lengths, relationships, cardinality information, referential integrity rules, and notes about data that has been modeled either for an operational system or for a BI application. In addition, CASE tools usually can store the technical names of tables and columns , as well as primary keys and foreign keys. Some of the more sophisticated CASE tools have modules to include meta data for process components, such as programs, screen displays, and report layouts.

  • Internal DBMS dictionaries are an integral part of all DBMSs since the dictionaries control the database structures. In relational databases, these are usually called SYSTABLES, and they store the names, definitions, sizes, lengths, relationships, and volumes of database structures, such as storage groups, tablespaces, tables, columns, primary keys, foreign keys, and indices.

  • ETL tools would not function without instructions (technical meta data) for the required transformations. The internal dictionaries of ETL tools store the source-to-target mapping as well as all the transformation algorithms, which are applied to the source data during the ETL process.

  • OLAP tools store specifications about data derivations (calculations), aggregations, and summarizations in their internal directories. These specifications allow the OLAP tool to perform its drill-down and roll-up functions. Some OLAP products have the capability to drill across into another database under the same DBMS or even into another database under a different DBMS to extract detailed data for a query.

  • Data mining tools store the descriptions of the analytical data models against which the data mining operations are executed.

Figure 14.1. Sources for the Meta Data Repository

graphics/14fig01.gif

If any meta data contained in these meta data sources is about to change, the meta data administrator must be notified before the change occurs. He or she will then have to determine whether the meta data repository can accommodate that change or whether the meta data repository has to be modified or enhanced. Therefore, in order to maintain a healthy and useful meta data repository, the meta data administrator must collaborate with the ETL team, the OLAP team, the data mining expert, the data administrator, the database administrator, and the business people on the BI projects. In addition, and more importantly, the meta data administrator must have full cooperation from the operational systems people who maintain the operational source files and source databases.

graphics/hand_icon.gif

Changes made to operational source systems are frequently not communicated to the meta data administrator in time to make the necessary changes to the meta data repository. Because these changes also affect the ETL process, and in some cases the structures of the BI target databases, the ETL team is also impacted. This breakdown in communication between the operational systems people and the BI decision-support staff can cause severe delays.

Ideally, meta data repositories should be active repositories, similar to the DBMS dictionaries. In an active meta data repository, changes would be made only to the meta data repository, and the meta data repository would propagate the changes into the appropriate target tool or DBMS. However, currently the meta data repository products on the market are still passive repositories. That means changes must be made both to the meta data repository and to the appropriate target tool or DBMS, and these changes must be kept synchronized, either manually or with programs.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net