Meta Data Classifications


Since BI projects can generate a great number of meta data components, it is useful to classify these components and to prioritize them for incremental implementation.

Groupings of Meta Data Components

Meta data components can be sorted into four meta data groupings or classifications: ownership, descriptive characteristics, rules and policies, and physical characteristics (Figure 7.2). The meta data repository should be able to store the meta data components of all four classifications, as listed below.

Figure 7.2. Meta Data Classifications

graphics/07fig02.gif

Ownership
  • Data owner: Data is owned by the organization. However, since the organization is a legal entity and not a person, someone in the organization must take on the authority and responsibility to set policy, determine rules, and establish standards for the organizational data. This authority and responsibility can be distributed among line-of-business managers or assigned to a data ownership committee (whose members will most likely be some or all of the line-of-business managers). An example of distributed data ownership is a manager of the human resource department who has the authority and responsibility to establish policies, rules, and standards for payroll data but not for product data. With data ownership by committee, the committee establishes policies, rules, and standards for all data by consensus, by delegation to a committee member, or by some other committee rule.

  • Application owner: Traditionally, ownership has been assigned to a system as a whole. Since a system is usually composed of an application and its data, "system ownership" implies that the same person has authority to set policy, determine rules, and establish standards for both data and functionality (the application). That may be a valid condition for operational systems where data is originated, but it is not valid for BI applications because most business people using the BI applications are not the same individuals who originate the operational data. Therefore, BI information consumers may own the BI application, but most of them will not own the data.

Descriptive Characteristics
  • Name : Every data object, data element, and business process should have a unique name.

  • Definition: Every data object, data element, and business process should have a brief definition explaining what it is.

  • Type and length: Every data element should have an official type and length declared for it, even if the data elements in the source systems or the columns or cells on the target databases may deviate from it. That deviation would also be defined as meta data under the data element, the column, or the cell where it occurred.

  • Domain: Every data element should have a declared set of allowable values, even if the set is all-inclusive, such as "any character, number, or sign."

  • Notes: Additional facts of interest about data or processes should be included. This is a catchall for free-form comments, such as "Dispute between engineering and marketing regarding the meaning of Product Subcomponent Type Code was turned over to the BI steering committee for resolution."

Rules and Policies
  • Relationships: Data objects are related to each other through business activities. The meta data repository should be able to store information about these relationships.

  • Business rules and business policies: These components can apply to data as well as to processes. They can be technical data conversion rules, business data domain rules, business data integrity rules, or processing rules.

  • Security: Requirements for security can apply to data, meta data, processes, databases, applications (programs and screens), tools, and Web sites.

  • Cleanliness: Metrics about the ETL reconciliation totals and about the quality of the BI data should be stored. The metrics can be expressed as reliability percentages of a data load (e.g., 89 percent of the customer type code is valid) or as record counts stating the number of records filtered (rejected) and the number of records passed through during the ETL process.

  • Applicability: Data does not live forever. Occasionally, new data is invented and captured, and old data is retired and no longer used. Since the BI target databases store many years of history, some columns or cells will not have values for all time periods because the data was not applicable or did not exist during certain time periods. If spikes appear on trend analysis graphs, the meta data repository should be consulted to determine the applicability of that particular piece of data.

  • Timeliness: Business people will want to know when the source data was last updated and which of the versions of the operational systems were used for the update. Not all operational systems run daily or on the same day of the month. One operational system may run on the last calendar day of the month while another may run on the last business day of the month. Some operational systems do not "close out the month" until they complete an adjustment run four to ten days after the last calendar day of the month.

Physical Characteristics
  • Origin (source): Since BI target databases only store existing operational data (internally generated and externally purchased), the origin or source for each data element should be documented. One column in the BI target database can be populated with data elements from multiple sources. For example, the column Account Balance in the Account table could be populated from the data element Demand Deposit Account Balance in the Checking Account source database and from the data element Time Deposit Account Daily Balance in the Savings Account Transaction file. Conversely, one source data element can feed multiple columns in the BI target database. For example, the data element Type Code may be used for two purposes in the operational system. The data values "A", "B", and "C" of Type Code may be used to populate the column Customer Type Code in the Customer table, and the data values "N", "O", and "P" of the same Type Code may be used to populate the column Product Type Code in the Product table.

  • Physical location: Several meta data components (e.g., tables, columns, dataset names ) should describe where the data resides in the BI decision-support environment.

  • Transformation: Very few data elements can be moved from source to target without any type of transformation. At a minimum, the data type and length may have to change, or single-character codes may have to be translated into multi-character mnemonics . In the worst case, lengthy business rules may require more complicated transformations involving editing, filtering, combining, separating, or translating data values.

  • Derivation: This component stores the calculation for derived columns. While derived columns are customarily not stored in operational systems, it is the norm to store them in BI target databases.

  • Aggregation and summarization: Similar to derivation, aggregation and summarization rules should be stored as meta data.

  • Volume and growth: The size and growth of BI target databases are often enormous . Therefore, projected as well as actual volumes should be documented as meta data in terms of the number of rows and the percentage of expected growth.

Business people most frequently access the meta data components in the descriptive characteristics classification as well as the rules and policies classification (Figure 7.3). Technicians typically access the meta data components in the physical characteristics classification (Figure 7.4).

Figure 7.3. Meta Data Usage by Business People

graphics/07fig03.gif

Figure 7.4. Meta Data Usage by Technicians

graphics/07fig04.gif

Prioritization of Meta Data Components

Capturing all meta data components may not be necessary or practical for all BI projects. However, capturing none is unacceptable. As a rule, meta data should be a deliverable with every BI project. It will serve the business people to recognize their old data, trace what happened to it (transformation), locate it in the new BI target databases, and determine how to use it properly. In other words, the business people will greatly benefit from having meta data available to help them navigate through the BI decision-support environment.

Not all meta data components have the same value to all business people or all BI applications. It might be useful to prioritize the meta data components into three groups: mandatory, important (beneficial but not mandatory), and optional. Table 7.2 shows a recommended prioritization scheme for capturing meta data components in a meta data repository.

Table 7.2. Prioritization of Meta Data Components

Meta Data

Mandatory

Important

Optional

Owner

 

 

Business data name

 

 

Technical data name

   

Definition

   

Type and length

   

Content (domain)

 

 

Relationships

 

 

Business rules and policies

 

 

Security

   

Cleanliness

 

 

Applicability

 

 

Timeliness

   

Origin (source)

   

Physical location (BI databases)

   

Transformation

   

Derivation

 

 

Aggregation

 

 

Summarization

 

 

Volume and growth

   

Notes

   

graphics/hand_icon.gif

All mandatory meta data components, and as many important meta data components as possible, should be captured and stored in the meta data repository. Optional meta data components could be postponed to future BI application releases.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net