5.1 Background


5.1.1 History

Declarative content earned its name due to the style of programming that calls for specifying the desired result without specifying how to achieve it . Declarative programming was, and still is, a discipline. One of the early goals of declarative programming was the development of techniques that allow specifying layout of documents and constraints on this layout, using markup languages, without specifying the algorithms with which this layout is achieved. This background motivation instilled the name markup content .

Document layout software vendors were developing their own document structure, format and encoding, giving rise to interpretability issues. For example, proprietary markup languages have been used by word processors to address their need to describe how each part of the document was to look. Today, in the context of iTV and the Internet, declarative content and markup content refer to the same type of content. Standardization efforts were launched to address this problem. The first widely accepted declarative content language is the Standard Generalized Markup Language (SGML), defined in ISO 8879 [SGML]. SGML served as the foundation on which Hyper Text Markup Language (HTML) [HTML], eXtensible Markup Language (XML) [XML], and all other markup content formats were built. As of the writing of this text, some vendors , such as Adobe, still support SGML.

5.1.2 Anatomy

The basic building block of all documents written using markup languages is the element. It is essentially a portion of the document that can be associated with layout attributes such as color , size, alignment, and so on. An element typically consists of three parts : a start tag, content, and an end tag. The start tag is written "<paren-name>", and the end tag is written as "</paren-name>" (adding the '/' characters ), where paren-name is the name of the element. This notation is an extension of parentheses notation commonly used to group words, in which the parentheses are given a name. For example, the markup representation of "(see Figure 5)" would be "<paren-name>see Figure 5</paren-name>" where <paren-name> is the name given to the pair of parentheses surrounding the text.

Each element is associated with a list of attributes, each with a name and a value assigned by the document author. Attribute value pairs appear before the final ">" of an element's start tag. As an example, one could use "<paren-name color='red'>see Figure 5</paren-name>" to indicate that the text should be printed in red.

In addition to SGML, the developers of the Internet were familiar with the LISt Processing (LISP) language, used by Artificial Intelligence (AI) programmers. In LISP, the parentheses notation is commonly used to specify expressions, declarations, functions, and program blocks. On the one hand, the SGML markup notation had the advantage that the parentheses were given names , and that opening parentheses (i.e., tags) could be associated with attributes. On the other hand, some were concerned about efficiency of encoding. Eventually, with the understanding that computing power would increase exponentially over time marginalizing the encoding efficiency issue (e.g., Moore's law predicts doubling of power every 18 months), the markup notation was selected as the foundation for standardization and subsequent development.

To enable automation of document processing, however, some standard structure was needed. In the same way that forms enable organization to improve the efficiency of their operation, document structure enables computers to automatically process documents. As it turns out, SGML has already induced an element tree structure with a single root element. Additional constraints on the structure that enable interoperability are specified by the Document Type Definition (DTD), which associates with each element the list of its attributes and the list of its possible child elements. As with forms, knowledge of the DTD enables writing efficient programs, such as search engines, that process large volumes of documents.

The parts of a document written in an SGML-based language are as follows :

  • Declaration : Specifies the characters and diameters to be used by the document.

  • DTD : Specifies constraints on the syntax of the document in terms of the elements, their attributes, and their tree structure.

  • Semantics : Describes the semantics of markup and introduces additional constraints beyond the DTD.

  • Instance : The document itself, the syntax and semantics of which are specified in the preceding parts.

Typically, for the documents written in a given markup language, the first three parts are fixed. In other words, two documents in which all four parts are identical are written in the same language; the reverse, however, might not be true as it is possible that a markup language uses multiple character sets of DTDs.

5.1.3 Architecture

SGML processors have a three-layered architecture depicted in Figure 5.1. In the first layer, the parser takes, as input, a markup document, and optionally , a DTD, and produces a document structure, which is typically the DOM [DOM]. Subsequently, the document structure is fed into flow analysis, resulting in flow-tree. Next, the flow tree is fed into a layout engine, resulting in a layout structure known as an area tree . This structure is a tree in which parent rectangles typically subsume children rectangles, and sibling rectangles do not overlap. Subsequently, each leaf in the area tree is rendered by media handlers; the media handler assigned to each area and the rendering varies according to the type of the content within each rectangle. Finally, the final details of the GUI are drawn on the graphic device.

Figure 5.1. A simplified conceptual architecture of an SGML document processor.



ITV Handbook. Technologies and Standards
ITV Handbook: Technologies and Standards
ISBN: 0131003127
EAN: 2147483647
Year: 2003
Pages: 170

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net