Section 20.1.  Structured vs. unstructured

Prev don't be afraid of buying books Next

20.1. Structured vs. unstructured

Structured is arguably the most commonly used word to characterize the essence of markup languages. It is also the most ambiguous and most often misused word.

There are four common meanings:

structured = abstract

XML documents are frequently referred to as structured while other text, such as renditions in notations like RTF, is called unstructured. Separating "structure from style" is considered the hallmark of a markup language. But in fact, renditions can have a rich structure, composed of elements like pages, columns, and blocks. The real distinction being made is between "abstract" and "rendered".

structured = managed

This is one of the meanings that folks with a database background usually have in mind. Structured information is managed as a common resource and is accessible to the entire enterprise. Unfortunately, there are also departmental and individual databases and their content isn't "structured" in quite the same sense.

structured = predictable

This is another alternative for relational database people. Structured data is captured from business transactions, comes in easily identified granules, and has metadata that identifies its semantics. In contrast, freeform data is normally buried in reports, with no metadata, and therefore must be "parsed" (by reading it!) to determine what it is and what it means. If an essentially freeform document has islands of structured data within it, the document might be termed semi-structured. See 20.8, "Documents and data", on page 436 for more on this.

structured = possessing structure

This is the dictionary meaning, and the one used in this book. There is usually the (sometimes unwarranted) implication that the structure is fine-grained (rich, detailed), making components accessible at efficient levels of granularity. A structure can be very simple – a single really big component – but nothing is unstructured. All structure is well-defined and "predictable" (in the sense of consistent), it just may not be very granular.

These distinctions aren't academic. It is very important to know which "structured" a vendor means.

What if your publishing system has bottlenecks because you are maintaining four rendered versions of your documents in different representations? It isn't much of a solution to "structure" them in a database so that modifying one version warns you to modify the others.

You'll want to have a single "structured" – that is, abstract – version from which the others can be rendered. And if you find that your document has scores of pages unrelieved by sub-headings, you may want to "structure" it more finely so that both human readers and software can deal with it in smaller chunks.

Keep these different meanings in mind when you read about "structured" and "unstructured". In this book, we try to confine our use of the word to its dictionary meaning, occasionally (when it is clear from the context) with the implication of "fine-grained".

Amazon


XML in Office 2003. Information Sharing with Desktop XML
XML in Office 2003: Information Sharing with Desktop XML
ISBN: 013142193X
EAN: 2147483647
Year: 2003
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net