4.3 Element Type Declarations


Elements are the main structure of an XML document. An element is defined as a group of one or more subelements or subgroups, character data, EMPTY, or ANY. Element type declarations identify the names of elements, the nature of their content, and how to use them. They have the following generic syntax:

 <!ELEMENT   elementName   elementContents > 

Sometimes the content is text. At other times, the content consists of other elements that are arranged in a certain order or used a certain number of times. The list of contents in an element type declaration is called the content model.

4.3.1 Element Structures

The element rules build a hierarchy of elements that describe how one element relates to another element. XML developers use a variety of names to describe the various relationships between elements. For example, elements can be referred to as subelements, parents, children, siblings, ancestors, descendants, trees, branches, leaves, or roots. All of these are tree terms.

A parent child relationship exists when an element type declaration gives the name of the element and the children that element may have. The content type portion of the element definition defines the parent child relationship. The DTD can specify the precise ordering of the child elements in the document and the number of times that the document can contain the child element. Similarly, the DTD may group elements to create more detailed rules.

When an element is contained within another element, it is referred to as a descendant of that element and the containing element is referred to as the ancestor. Thus the root element is the ancestor of all elements in the document. You must list the root element first in the DTD.

4.3.2 Element Content Models

Content models describe the relationship of elements and child elements by using keywords and symbols. Table 4-1 lists the three types of element content models that are indicators for what the element may contain.

Elements that you define as character data or EMPTY constitute terminals, so they can have no further descendants. For example:

 <!-- Element A is a nonterminal. --> <!ELEMENT A (B)> <!-- Element B is a terminal. --> <!ELEMENT B (#PCDATA)> <!-- Element C is a terminal. --> <!ELEMENT C EMPTY> 

Table 4-1. Element Content Model
Element Content Meaning
(other elements) A list of elements that can be nested within parentheses.
ANY

This element may contain zero or any combination of elements from this DTD or data. It takes the following content model form:

 <!ELEMENT elementName ANY> 
EMPTY

This element contains no data or elements. It takes the following content model form:

 <!ELEMENT elementName EMPTY> 
#PCDATA

This element contains parsed character data. It takes the following form:

 <!ELEMENT elementName (#PCDATA)> 

The ANY Content Model

The keyword ANY is shorthand for mixed content that can contain all declared elements from the DTD. Although the ANY model is very useful, excessive use of this content model can make it difficult to limit document structures. XML document designers can use ANY as a placeholder or where extensibility is important. Listing ANY is useful for root elements of unstructured documents. For example, your DTD might include the following element type declaration:

 <!DOCTYPE Chapter [    <!ELEMENT Chapters ANY>    <!ELEMENT Chapter (NUMBER | TITLE | #PCDATA)>    <!ELEMENT NUMBER (#PCDATA)>    <!ELEMENT TITLE (#PCDATA)> ]> 

Your XML code would then include an element such as the following:

 <Chapter>        <NUMBER>10 </NUMBER>        <TITLE> Cliché </TITLE>        All good things come to those who wait </Chapter> 
The EMPTY Content Model

You use the keyword EMPTY to declare an empty element. EMPTY means that the element has no child elements or character data. Such an element contains only attributes but no text. You can use an EMPTY element as a flag. Declaring an element to be EMPTY means that all instances of it must be empty. Note that an element with PCDATA or optional child elements may sometimes be empty. With the EMPTY content model, your DTD might include the following element declaration:

 <!ELEMENT Part EMPTY> 

Your XML code would then include an element such as the following:

 <Part/> 
The #PCDATA Content Model

The presence of #PCDATA in an element type declaration means that the element can contain any valid character data. PCDATA is text occurring in a context in which markup and entity references such as "&amp;" may occur. No restriction constrains what the text can contain. For example, you might declare elements containing character data as containing only character data:

 <!ELEMENT A (#PCDATA)> 

Alternatively, you might declare them as containing a mixture of character data and elements:

 <!ELEMENT A (#PCDATA | B | C)*> 

The term "PCDATA," which stands for "Parsed Character DATA," is inherited from SGML. It means the XML processor parses the text in the XML document following the element tag looking for more markup tags.

graphics/soapbox.gif

A computer finds "mixed content" (content in which both PCDATA and elements are allowed) inherently more complex to parse than either element-only or PCDATA-only content. Furthermore, if mixed content did not exist, then XML could have rules saying the following:

  • Arbitrary white space (spaces, new lines, and so on) is always allowed between two start tags, between an end tag and a start tag, and between two end tags.

  • A parser never gives this white space to an application.

In other words, it would be easy to format XML with pretty indentation without having to worry about changing significant white space or breaking signatures. The possibility of mixed content and the requirement that all white space in content be given to the application makes white space problematical, however. See Chapter 9 for further discussion of this problem.


4.3.3 Frequency Indicators

The XML Recommendation [XML] specifies optional characters that follow an element name or list and that govern the frequency of that element or list item in the document. Table 4-2 lists the frequency indicators that can apply to an element content model. The absence of such an operator means that the element or content particle must appear exactly once.

The XML Recommendation requires that you use only one frequency indicator with each element name or group in parentheses. Of course, you can also use frequency indicators within groups and again for the entire group, which makes it possible to nest groups of elements.

4.3.4 Multiple Elements Within an Element

Generally, content models are built on a grouping of multiple elements. You can group elements by sequence, by alternative, or by both means. Table 4-3 lists the symbols that you can use to order multiple elements within an element's contents. You use the symbols to separate the list of child elements. Elements that contain only other elements have element-only content; elements that contain both other elements and #PCDATA have mixed content.

Table 4-2. Element Frequency Indicators
Element Content Meaning
(none) This element must appear once and only once.
+ (plus sign) This element can occur once or several times.
* (asterisk) This element can occur zero, once, or several times.
? (question mark) This element can occur zero or once.

Table 4-3. Ordering Multiple Elements
Element Content Meaning
| (vertical bar) Select one from several elements: OR, as in "this or that." For example, (THIS | THAT | THOSE) means that only one of the choices of THIS, THAT, or THOSE can occur.
, (comma) Each subsequent element follows the preceding element. Strictly ordered; analogous to an AND. For example, (YEAR, MONTH, DAY) means that a YEAR element must be followed by a MONTH element, followed by a DAY element. (YEAR, MONTH, DAY, DAY) means the same except that the content must contain exactly two DAY elements.
Space as a separator No particular order of the listed elements is required. For example, (ElementA ElementB ElementC) means that all must appear but can do so in any order.

4.3.5 General Guidelines for Element Type Declarations

Element content models (all the stuff in the parentheses) describe the element definitions. The ANY content model can prove useful during document conversion, but you should avoid using it in a production environment because it disables content checking in the affected element.

  • If the content model is mixed, place #PCDATA first. Separate the following elements with vertical bars. The entire group must be optional.

  • Begin elements with either a letter, an underscore ("_"), or a colon (":"). Follow that character with a combination of letters, numbers, periods ("."), colons, underscores, or hyphens ("-"). Do not include any white space. No tag should begin with "xml" in any capitalization. While technically the first character in a tag name may be a colon, in practice you should use colons only with namespaces, as explained in Chapter 3.

  • To make processing easier and the DTD easier to read, use parentheses to group elements into recognizable sets.

  • Separate child elements with spaces if the child elements are not required to appear in a specific order in the XML document for example, (FIRST SECOND THIRD).

  • Separate child elements with commas to force them to appear in a specific order in the XML document for example, (FIRST, SECOND, THIRD).

  • To indicate a choice, separate child elements with vertical bars ( "|" ) for example, (FIRST | SECOND | THIRD).



Secure XML(c) The New Syntax for Signatures and Encryption
Secure XML: The New Syntax for Signatures and Encryption
ISBN: 0201756056
EAN: 2147483647
Year: 2005
Pages: 186

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net