A Brief History of Metadata


The business systems industry has changed metadata from being a data dictionary (documentation about the fields used within an application) to being a style of development that recognizes that the definitions of the data are subject to change in the same way as the rest of the data in an application. We have moved from metadata being an afterthought to being an architectural principle. Let's review how this came to be.

Data Dictionaries

Once upon a time, metadata was documentation for the data model (often the file and transaction layouts) of an application, as shown in Figure 6.2. Initially, the data dictionaries were used after the fact (at the end of the project or later) and were the equivalent of "as built" drawings of buildings. Arthur Andersen's (now Accenture) "Lexicon" is generally regarded as the first of these products.

click to expand
Figure 6.2: As built metadata.

The information was primarily about fields and records: which fields were on which records and which records were accessed by which programs. Moving this documentation to a central location made a world of difference as applications were becoming more and more complex.

The Data Dictionary Database

It didn't take long for people to figure out that putting this documentation into a database, as shown in Figure 6.3, would make it eminently more usable. "Where used" reports and various other analyses were now easy to create.

click to expand
Figure 6.3: The data dictionary.

The "Active" Dictionary

Next was the "active" dictionary, an innovation that made it possible to (1) create the record layouts and COBOL copybooks from the dictionary and (2) install procedures to make sure they were updated only from this dictionary (Figure 6.4).

click to expand
Figure 6.4: Active Data Dictionary.

Once this was done, the architects of this innovation felt a need to distance themselves from the old "as built" approach and dubbed this the "active" dictionary. Changes to the dictionary became changes to the system (admittedly after some delay).

Computer-Aided Software Engineering Tools

Computer-aided software engineering (CASE) extended the concept with the generation of application source code (or at least shells of application source programs), as indicated in Figure 6.5. These were generally based on metadata, sometimes proprietary to the tool and sometimes using the native capability of the database management systems.

click to expand
Figure 6.5: CASE tools generating metadata and some code.

Data Definition Language

The term data definition language (DDL) was coined along with the implementation of relational databases. It distinguished statements that manipulated metadata (DDL statements) from those that manipulated regular or instance data (data manipulation languages [DMLs], the most prevalent of which was structured query language [SQL]), as shown in Figure 6.6.

click to expand
Figure 6.6: Data definition language and data manipulation language.

Schemas

The term schema began to be applied to database design, particularly data models, in the 1990s. However, it was primarily the rise of object databases and extensible markup language (XML) that led to the current popularity of the term.

Object-oriented database management systems introduced the concept of schema evolution. This can be understood as the answer to the question, "How can you update the definition of the data in a database if there is data in the database that depends on the old schema?" Until this point, little effort had been spent on trying to update the schema while the system was running, because the prevailing wisdom was "dump and restore." What would later be called "schema evolution" was "dump, reformat, change the DDL, and restore."

Standard Generalized Markup Language Document Type Definition

Standard generalized markup language (SGML; a tagged language for document-centric systems on which hypertext markup language [HTML] and XML were based) introduced its own type of DDL called the document type definition (DTD) (Figure 6.7). The DTD was both a schema (in that it constrained what could be stored in an SGML document) and a grammar (in that it defined what sequences of elements needed to be present). These DTDs were metadata for the data in the SGML documents.

start figure

 DTD   <!ELEMENT order (line*)>   <!ELEMENT line (product, qty, price)> SGML   <order>     <line>       <product> .... </product>       <qty> ... </qty>       <price> ... </price>     </line>   </order> 

end figure

Figure 6.7: DTD schema with an SGML document.

Extensible Markup Language Document Type Definition

XML, created in 1997, is a derivative of SGML that is intended to bring semantic markup to the World Wide Web. Chapter 11 covers XML in more detail; for now we will outline the XML initiatives that relate to metadata. XML began life with DTDs that were expressly present to define the schema of the document or message. The XML DTDs were very similar to the SGML DTDs.

XML Schema Definition

In the last few years, the shortcomings of DTDs have become apparent. One area of shortcoming was the lack of any real ability to control, at a detailed level, the conformance of a document to the schema. The developers of XML schema definition (XSD) decided not to continue the DTD tradition of separating the grammar for the schema from the instance data, and therefore XSD is expressed in XML.

Figure 6.8 is part of an XSD that would define an equivalent structure to the DTD in Figure 6.7. We won't go into the syntax of XSD (or DTD); it is sufficient to know that there has been a progression from treating metadata as being different from regular data to treating it as being the same as regular data.

start figure

 <xs:element name="line">   <xs:complexType>     <xs:sequence>       <xs:element ref="product"/>       <xs:element ref="qty"/>       <xs:element ref="price"/>     </xs:sequence>   </xs:complexType> </xs:element> 

end figure

Figure 6.8: Part of an XSD document.

Metaobject Facility, XML Metadata Interchange, and Common Warehouse Metadata Interchange

There has been a great deal of effort recently to standardize different aspects of metadata. The unified modeling language (UML) was built on and incorporates the metaobject facility (MOF), essentially the metamodel for software architectures. XML metadata interchange (XMI) and the common warehouse metadata interchange (CWMI) are standards that allow metadata to be converted from one format to another.

Resource Definition Framework

Resource definition framework (RDF) has recently emerged, essentially as metadata for content. We'll save most of the discussion of RDF for Chapter 14, but to put it into a metadata context, consider that although XML has a schema (its metadata) we don't necessarily know what the schema means. We need metadata that would be rich enough to store an ontology, and that is where RDF comes in.




Semantics in Business Systems(c) The Savvy Manager's Guide
Semantics in Business Systems: The Savvy Managers Guide (The Savvy Managers Guides)
ISBN: 1558609172
EAN: 2147483647
Year: 2005
Pages: 184
Authors: Dave McComb

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net