Section 20.8.  Documents and data

Prev don't be afraid of buying books Next

20.8. Documents and data

For many decades, data processing got the big budgets while document processing got a room in the basement with a copying machine. While the data processors relished their importance to the organization, the document processors basked in their importance to humanity. They were preservers of human knowledge, not just high-speed bean counters.

No wonder the two never got along!

Markup languages are changing all that. With XML, documents and databases both store data and can share it, so document processing and data processing can be performed at the same time, by the same people.

20.8.1 It's all data!

In an XML document, the text that isn't markup is data. You can edit it directly with an XML editor or plain text editor. With a stylesheet and a rendering system you can cause it to be displayed in various ways.

In a database, you can't touch the data directly. You can enter and revise it only through forms controlled by the database program. However, rendition is similar to XML documents, except that the stylesheet is usually called something like "report template".

The important thing is that, in both cases, the data can be kept in the abstract, untainted by the style information for rendering it. This is very different from word processing documents, of course, which normally keep their data in rendered form. Even WordML is a rendition, despite its use of XML.

20.8.2 Data-centric vs. document-centric

Documents, data, and processes are sometimes characterized as "data-centric" in contrast to "document-centric". Since all XML documents (except empty ones) contain data, these terms are actually a misleading shorthand. Worse, they are applied in two very different contexts:

  • how much the XML resembles relational data; and,

  • whether you have to deal with the whole document at once.

20.8.2.1 How relational is it?

The data-centric misnomer is common among database hackers trying to describe structures that map easily onto relational tables and primitive datatypes. Structures that don't are called document-centric.

The intended meaning of data-centric is that the document structure – really element structure, since a document is essentially just the largest element – is fully predictable.

An element has a fully predictable structure if it and its subelements are constrained to contain either:

  • type-sequenced elements (e.g., a sequence of elements of the types: quantity, itemNum, description, price),

  • data characters only (i.e., #PCDATA), or

  • nothing at all.

Fully predictable elements can easily be visualized as forms. A business transaction document such as a purchase order is more likely to be fully predictable than a memo.

In addition to "data-centric", the misnomer highly structured is sometimes used. However, highly predictable would be more precise, particularly as many documents that aren't fully predictable are still much more predictable than they are freeform.

20.8.2.2 How granular is it?

Another (mis)use of data-centric is to characterize the storage and/or access of documents at the level of individual elements, rather than the entire document at once (document-centric). Once again, the usage is misleading because what it describes has nothing to do with data per se, and because it implies a contradiction between data and documents that does not exist.

20.8.3 Document processing vs. data processing

While "data-centric" and "document-centric" aren't rigorous terms for characterizing information, they are quite meaningful when applied to processing. XML, however, because it can preserve abstract data (like a database) but still be interchanged and processed as a character string (like a document), is starting to break down the historic separation of the two paradigms. Applications can now intermix data processing and document processing techniques to get the job done.

20.8.4 Comparing documents to data

Since documents contain data, what are people doing when they compare or contrast documents and data?

They are being human. Which is to say, they are using a simplified expression for the complex and subtle relationship shown in Table 20-1. They are comparing the typical kind of data that is found in XML and word processing (WP) documents with business process (BP) transactional data (operational data), which usually resides in databases.

Table 20-1. Typical traits of data
 

XML data

BP data

WP data

Presentability

Abstraction

Abstraction

Rendition

Source

Written

Captured

Written

Structure

Hierarchy+ links

Tables

Paragraphs

Purpose

Processing

Processing

Presentation

Location

Document

Database

Document




Note that the characteristics in the table are typical, not fixed. For example, XML data can be a rendition (HTML and WordML are examples). In addition, XML data could:

  • Be captured from a data entry form or a program (rather than written);

  • Consist of simple fields like those in a relational table (rather than a deeply nested hierarchy with links among the nodes); and

  • Be intended for presentation as well as processing.

Caution

The true relationship between documents and data isn't as widely understood as it ought to be, even among experts. That is in part because the two domains existed independently for so long. This fact can complicate communication.




Amazon


XML in Office 2003. Information Sharing with Desktop XML
XML in Office 2003: Information Sharing with Desktop XML
ISBN: 013142193X
EAN: 2147483647
Year: 2003
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net