Elements are the main structure of an XML document. An element is defined as a group of one or more subelements or subgroups, character data, EMPTY, or ANY. Element type declarations identify the names of elements, the nature of their content, and how to use them. They have the following generic syntax: <!ELEMENT elementName elementContents > Sometimes the content is text. At other times, the content consists of other elements that are arranged in a certain order or used a certain number of times. The list of contents in an element type declaration is called the content model. 4.3.1 Element StructuresThe element rules build a hierarchy of elements that describe how one element relates to another element. XML developers use a variety of names to describe the various relationships between elements. For example, elements can be referred to as subelements, parents, children, siblings, ancestors, descendants, trees, branches, leaves, or roots. All of these are tree terms. A parent child relationship exists when an element type declaration gives the name of the element and the children that element may have. The content type portion of the element definition defines the parent child relationship. The DTD can specify the precise ordering of the child elements in the document and the number of times that the document can contain the child element. Similarly, the DTD may group elements to create more detailed rules. When an element is contained within another element, it is referred to as a descendant of that element and the containing element is referred to as the ancestor. Thus the root element is the ancestor of all elements in the document. You must list the root element first in the DTD. 4.3.2 Element Content ModelsContent models describe the relationship of elements and child elements by using keywords and symbols. Table 4-1 lists the three types of element content models that are indicators for what the element may contain. Elements that you define as character data or EMPTY constitute terminals, so they can have no further descendants. For example: <!-- Element A is a nonterminal. --> <!ELEMENT A (B)> <!-- Element B is a terminal. --> <!ELEMENT B (#PCDATA)> <!-- Element C is a terminal. --> <!ELEMENT C EMPTY>
The ANY Content ModelThe keyword ANY is shorthand for mixed content that can contain all declared elements from the DTD. Although the ANY model is very useful, excessive use of this content model can make it difficult to limit document structures. XML document designers can use ANY as a placeholder or where extensibility is important. Listing ANY is useful for root elements of unstructured documents. For example, your DTD might include the following element type declaration: <!DOCTYPE Chapter [ <!ELEMENT Chapters ANY> <!ELEMENT Chapter (NUMBER | TITLE | #PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> ]> Your XML code would then include an element such as the following: <Chapter> <NUMBER>10 </NUMBER> <TITLE> Cliché </TITLE> All good things come to those who wait </Chapter> The EMPTY Content ModelYou use the keyword EMPTY to declare an empty element. EMPTY means that the element has no child elements or character data. Such an element contains only attributes but no text. You can use an EMPTY element as a flag. Declaring an element to be EMPTY means that all instances of it must be empty. Note that an element with PCDATA or optional child elements may sometimes be empty. With the EMPTY content model, your DTD might include the following element declaration: <!ELEMENT Part EMPTY> Your XML code would then include an element such as the following: <Part/> The #PCDATA Content ModelThe presence of #PCDATA in an element type declaration means that the element can contain any valid character data. PCDATA is text occurring in a context in which markup and entity references such as "&" may occur. No restriction constrains what the text can contain. For example, you might declare elements containing character data as containing only character data: <!ELEMENT A (#PCDATA)> Alternatively, you might declare them as containing a mixture of character data and elements: <!ELEMENT A (#PCDATA | B | C)*> The term "PCDATA," which stands for "Parsed Character DATA," is inherited from SGML. It means the XML processor parses the text in the XML document following the element tag looking for more markup tags.
4.3.3 Frequency IndicatorsThe XML Recommendation [XML] specifies optional characters that follow an element name or list and that govern the frequency of that element or list item in the document. Table 4-2 lists the frequency indicators that can apply to an element content model. The absence of such an operator means that the element or content particle must appear exactly once. The XML Recommendation requires that you use only one frequency indicator with each element name or group in parentheses. Of course, you can also use frequency indicators within groups and again for the entire group, which makes it possible to nest groups of elements. 4.3.4 Multiple Elements Within an ElementGenerally, content models are built on a grouping of multiple elements. You can group elements by sequence, by alternative, or by both means. Table 4-3 lists the symbols that you can use to order multiple elements within an element's contents. You use the symbols to separate the list of child elements. Elements that contain only other elements have element-only content; elements that contain both other elements and #PCDATA have mixed content.
4.3.5 General Guidelines for Element Type DeclarationsElement content models (all the stuff in the parentheses) describe the element definitions. The ANY content model can prove useful during document conversion, but you should avoid using it in a production environment because it disables content checking in the affected element.
|