Metadata

Canaxia's applications developed in-house were previously developed in a monolithic manner whereby each application performed every step in a process that was required to complete a business function. Additionally, different IT areas within Canaxia developed applications in a tactical manner with no thought on reuse or a shared enterprise view. This resulted in multiple sources for customers, accounts, billings, and manufacturing data, with no single database being the authoritative source. Each application had its own rendition of name, address, and contact information.

Kello James would like the data architecture to accommodate the establishment of an authoritative source for each data element. He has realized this will be difficult since each data element within each database is inconsistent in its usage and content. Canaxia has been good about using standard data modeling techniques and, in fact, incorporated this practice early in the application development life cycle. However, Canaxia now realizes that different modeling philosophies have emerged over time.

The various approaches to modeling have resulted in many anomalies within the Canaxia data architecture including, but not limited to, the following:

  • Identification

  • Semantic

  • Synonym

  • Homonym

Canaxia has no unified method for uniquely identifying an entity. In the sales lead database, each prospect is labeled with a sequentially assigned number, whereas in the customer database, customers are uniquely identified by their Social Security numbers. In the billing database, customers are assigned another unique identifier. No data source exists to identify how each record correlates across data sources.

Each database has semantically different ways of using different data values for the same data element. The Canaxia sales database refers to gender as male and female, whereas the billing database refers to gender as 0 or 1. Semantics are relevant when different values are associated with the same data element. Several of the databases have different field names for the same data element (synonym).

A data dictionary stores data definitions. A data repository manages additional information, including data ownership, the systems that use the data, the organizations the data serve, and so on.


Sometimes the same field name may exist for different data elements (homonym). For example, the term race could refer to a person and his or her genetic makeup, or it could refer to the action of a person, such as a race for election.

Homonyms occur where data exist within departmental applications. This will be the most difficult problem to solve. Usually the solution results in having distributed data sources with commonly defined definitions. This approach is sometimes referred to as federated metadata.

Metadata is information about data.


Federated Metadata

Federated metadata, when defined in a consistent manner, describes data the same way in each physical table in which the data are stored (Figure 11-5). The definition includes the name of the field, its length, format, and a list of valid values. When data have the same format, it is easier for exchanges of information to occur across systems.

Figure 11-5. Federated metadata.

graphics/11fig05.gif

A federated metadata approach can be implemented even when information is physically partitioned across disparate data sources. In the ideal world, all related data would exist within a single physical location, making application development easier. Reality dictates that information will be spread across multiple heterogeneous platforms and topologies. Defining this up front will simplify application integration. Data sharing can be further extended through reusable services using a service-oriented approach or by using a broker-based architecture. To access data, each application would send a request to the service. The service would execute the request and return the appropriate response to the application. This approach protects the integrity of data and guarantees that the data retrieved are both accurate and consistent.

To be successful in defining metadata, the first step is to define the appropriate owner of each data element. Usually data elements can be classified by who is the authoritative source. For example, Canaxia assigns each employee an employee number. The employee number is used by more than one department, including security, manufacturing, and legal, but the human resources department issues the employee number and is, therefore, the authoritative source. Sometimes, data elements are used by more than one department but are not identified as the authoritative source. For example, each employee of Canaxia within the United States has a Social Security number that is used in multiple applications, including payroll, employee benefits, and security. However, no department within Canaxia issues Social Security numbers.

Table 11-2 illustrates the use of the Social Security number within Canaxia's metadata repository.

Metadata also stores information about the location of databases and their aggregated content. Storing database information in the metadata repository allows information to be quickly located and shared. Additionally, Canaxia will benefit by using this approach to become compliant with the Healthcare Information Portability Act (HIPA), as well as to manage its data privacy concerns in a unified enterprise manner. A Big Five accounting firm that performs audits of Canaxia's business can additionally certify that data are being stored and used in an appropriate manner. At some future time, Canaxia can also extend this view to its customers and the public at large without grief or fear of wrongful disclosure.

The people at Canaxia realize that a metadata repository can help them overcome many of the business and technology problems they are experiencing, including the following:

  • Building the enterprise data model

  • Design reviews

  • Use of XML DTDs and/or schemas for data exchange validation

Table 11-2. Canaxia's use of metadata.

Field

Values

Data Element Name

Social Security number

Element Definition

A 9-digit number assigned to an individual by the Social Security Administration

Business Format

999-99-9999

Business Length

11 positions

Business Type

Number

Exchange Format

9999999999

Exchange Length

9 bytes

Exchange Type

Character

Storage Format

999999999

Storage Length

9 bytes

Storage Type

Character or variable character

Scope

United States

Using the metadata repository to store data element definitions has the result of building the enterprise data model. In this usage, if the repository is kept up to date, the resulting enterprise data model is also up to date. By being able to start from either direction, the repository can serve as the tool that will empower an enterprise to determine how changes in data affect other processes. It will also help realize the goal of data reliability, reusability, and sharing across organizational boundaries.

Throughout the remainder of this text, the term schema refers to tables, views, triggers, stored procedures, and other "objects" that are stored within a database.


In small projects, design reviews are typically conducted by members of self-organizing teams. In larger projects, this model definitely breaks down. The ideal design review of data architecture would uncover any inconsistencies of data usage within and across applications and would reveal whether it is stored redundantly. The metadata repository will allow quick discovery of such scenarios. The other problem typically associated with a data architecture design review is the need to produce documentation that no one will look at once the review has been conducted. This step in the project life cycle will occur quickly because the repository will have the effect of uniformly stating which other applications receive data from this application and whether or not a requirement can be resolved by using an existing federated metadata element.

An industry trend is to use XML as a data transfer format for exchanging information between disparate applications. XML allows data to become self-describing. The metadata repository also will allow for instant creation of XML schemas since the repository is knowledgeable about each data element, its format, and the list of valid values. As the repository is updated, new schemas can be generated automatically, as appropriate.

XML schemas do not cover the semantics of the data. For additional information, visit www.agiledata.org/essays/advancedXML.html.


As part of its architecture, Canaxia has defined a metadata services broker (see Table 11-2) that adheres to the Open Applications Group Integration Specification (OAGIS). This specification defines a virtual, content-based business object model that allows an enterprise application to construct a virtual object wrapper around itself. Communication among software components occurs through the metadata services broker using a business object document to a virtual-object interface, as shown in Figure 11-6.

Figure 11-6. OAGIS virtual business object model.

graphics/11fig06.gif

The business object document uses the metadata contained within the repository and is contained within an XML schema. By taking this approach, the business object document will contain not only data related to the business service request but also to the business data area. Each business service request (BSR) contains a unique verb/noun combination that drives the contents of the business data area (BDA). Examples could include post journal or sync password, which all systems can understand. The combination of the BSR and BDA in object-oriented terms, maps directly to the object name, method, and arguments in a method invocation.

For more information on OAGIS, visit www.oag.org.


Metadata has the ability to become dysfunctional easily. It is important to keep metadata streamlined, making it easy to work with so that it stays current. Metadata access should never be held back unreasonably from developers or other interested parties.



Practical Guide to Enterprise Architecture, A
A Practical Guide to Enterprise Architecture
ISBN: 0131412752
EAN: 2147483647
Year: 2005
Pages: 148

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net