The ELEMENT Statement

[Previous] [Next]

Every element used in your XML documents has to be declared by using the <!ELEMENT> tag in the DTD. The format for declaring an element in a DTD is shown here:

 <!ELEMENT ElementName Rule> 

The Rule component defines the rule for the content contained in the element. These rules define the logical structure of the XML document and can be used to check the document's validity. The rule can consist of a generic declaration and one or more elements, either grouped or unordered.

The Predefined Content Declarations

Three generic content declarations are predefined for XML DTDs: PCDATA, ANY, and EMPTY.

PCDATA

The PCDATA declaration can be used when the content within an element is only text—that is, when the content contains no child elements. Our sample document contains several such elements, including title, a, h1, and b. These elements can be declared as follows. (The pound sign identifies a special predefined name.)

 <!ELEMENT title (#PCDATA)> <!ELEMENT a (#PCDATA)> <!ELEMENT h1 (#PCDATA)> <!ELEMENT b (#PCDATA)> 

NOTE
PCDATA is also valid with empty elements.

ANY

The ANY declaration can include both text content and child elements. The html element, for example, could use the ANY declaration as follows:

 <!ELEMENT html ANY> 

This ANY declaration would allow the body and head elements to be included in the html element in an XML document:

 <html><head/><body/></html> 

The following XML would also be valid:

 <html>This is an HTML document.<head/><body/></html> 

And this XML would be valid with the ANY declaration in our sample DTD:

 <html>This is an HTML document.<head/><body/><AnotherTag/></html> 

The ANY declaration allows any content to be marked by the element tags, provided the content is well-formed XML. Although this flexibility might seem useful, it defeats the purpose of the DTD, which is to define the structure of the XML document so that the document can be validated. In brief, any element that uses ANY cannot be checked for validity, only for being well formed.

EMPTY

It is possible for an element to have no content—that is, no child elements or text. The img element is an example of this scenario. The following is its definition:

 <!ELEMENT img EMPTY> 

The base, br, and basefont elements are also correctly declared using EMPTY in our sample DTD.

One or More Elements

Instead of using the ANY declaration for the html element, you should define the content so that the html element can be validated. The following is a declaration that specifies the content of the html element and is the same as the one given by XML Authority:

 <!ELEMENT html (head, body)> 

This (head, body) declaration signifies that the html element will have two child elements: head and body. You can list one child element within the parentheses or as many child elements as are required. You must separate each child element in your declaration with a comma.

For the XML document to be valid, the order in which the child elements are declared must match the order of the elements in the XML document. The comma that separates each child element is interpreted as followed by; therefore, the preceding declaration tells us that the html element will have a head child element followed by a body child element. Building on the preceding declaration, the following is valid XML:

 <html><head></head><body/></html> 

However, the following statement would not be valid:

 <html><body></body><head/></html> 

This statement indicates that the html element must contain two child elements—the first is body and the second is head—and there can only be one instance of each element.

The following two statements would also be invalid:

 <html><body></body></html> <html><head/><body/><head/><body/></html> 

The first statement is missing the head element, and in the second statement the head and body elements are listed twice.

Reoccurrence

You will want every html element to include one head and one body child element, in the order listed. Other elements, such as the body and table elements, will have child elements that might be included multiple times within the main element or might not be included at all. XML provides three markers that can be used to indicate the reoccurrence of a child element, as shown in the following table:

XML Element Markers

MarkerMeaning
? The element either does not appear or can appear only once (0 or 1).
+the element must appear at least once (1 or more).
*The element can appear any number of times, or it might not appear at all (0 or more).

Putting no marker after the child element indicates that the element must be included and that it can appear only one time.

The head element contains an optional base child element. To declare this element as optional, modify the preceding declaration as follows:

 <!ELEMENT head (title, base?)> 

The body element contains a basefont element and an a element that are also optional. In our example, the table element is a required element used to format the page, so you want to make table a required element that appears only once in the body element. You can now rewrite the Body element as follows:

 <!ELEMENT body (basefont?, a?, table)> 

The table element can have as many rows as are needed to format the page but must include at least one row. The table element should now be written as follows:

 <!ELEMENT table (tr+)> 

The same conditions hold true for the tr element: the row element must have at least one column, as shown here:

 <!ELEMENT tr (td+)> 

The a, ul, and ol elements might not be included in the p element, or they might be included many times, as shown here:

 <!ELEMENT p (font+, img, br, a*, ul*, ol*)> 

Because the br element formats text around an image, the img and br tags should always be used together.

Grouping child elements

Fortunately, XML provides a way to group elements. For example, you can rewrite the p element as follows:

 <!ELEMENT p (font*, (img, br?)*, a*, ul*, ol*)> 

This declaration specifies that an img element followed by a br element appears zero or more times in the p element.

One problem remains in this declaration. As mentioned, the comma separator can be interpreted as the words followed by. Thus, each p element will have font, img, br, a, ul, and ol child elements, in that order. This is not exactly what you want; instead, you want to be able to use these elements in any order and to use some elements in some paragraphs and other elements in other paragraphs. For example, you would like to be able to write the following code:

 <p> <font size=5> <b>Three Reasons to Shop Northwind Traders</b> </font> <ol> <li> <a href="Best.htm">Best Prices</a> </li> <li> <a href="Quality.htm">Quality</a> </li> <li> <a href="Service.htm">Fast Service</a> </li> </ol> <!--The following img element is not in the correct order.--> <img src="Northwind.jpg"></img> </p> 

As you can see, the img element is not in the correct order—it should precede the ol element, since the declaration imposes a strict ordering on the elements.

NOTE
Also, numerous elements are declared but are not included (for example, ul). The missing elements are not a problem because you have declared each element with an asterisk (*), indicating that there can be zero or more of each element.

To allow a "reordering" of elements, you could rewrite the declaration as follows:

 <!ELEMENT p (font*, (img, br?)*, a*, ul*, ol*)+> 

The plus sign (+) at the very end of the declaration indicates that one or more copies of these child elements can occur within a p element.

The preceding XML code could thus be interpreted as two sets of child elements, as shown here:

 <p> <!--The elements that follow are the first set of (font*, (img, br?)*, a*, ul*, ol*) elements (missing the (img, br), a, and ul elements).--> <font size=5> <b>Three Reasons to Shop Northwind Traders</b> </font> <ol> <li> <a href="Best.htm">Best Prices</a> </li> <li> <a href="Quality.htm">Quality</a> </li> <li> <a href="Service.htm">Fast Service</a> </li> </ol> <!--The img element that follows is a second set of (font*,(img, br?)*, a*, ul*, ol*) elements containing only an img element.--> <img src="Northwind.jpg"></img> </p> 

This new declaration is better, but it still does not allow you to choose any element in any order. All of the elements have been declared as optional and yet at least one member of the group must still be included (as indicated by the plus sign at the end of the list of elements). There is another option.

Creating an unordered set of child elements

In addition to using commas to separate elements, you can use a vertical bar (|). The vertical bar separator indicates that one child element or the other child element but not both will be included within the element—in other words, one element or the other must be present. The preceding declaration can thus be rewritten as follows:

 <!ELEMENT p (font | (img, br?) | a | ul | ol)+> 

This declaration specifies that the p element can include a font child element, an (img, br?) child element, an a child element, a ul child element, or an ol child element, but only one of these elements. The plus sign (+) indicates that the element must contain one or more copies of one or several child elements. With this declaration, you can use child elements in any order, as many times as needed.

NOTE
The additional markers (?, +, *) can be used to override the vertical bar (|), which limits the occurrences of the child element to one or none.

According to the new declaration, our XML code will be interpreted as follows:

 <p> <!--First group, containing single font element--> <font size=5> <b>Three Reasons to Shop Northwind Traders</b> </font> <!--Second group, containing the single child element ol--> <ol> <li> <a href="Best.htm">Best Prices</a> </li> <li> <a href="Quality.htm">Quality</a> </li> <li> <a href="Service.htm">Fast Service</a> </li> </ol> <!--Third group, containing a single child element img--> <img src="Northwind.jpg"></img> </p> 

Suppose you also want to include text within the p element. To do this, you will need to add a PCDATA declaration to the group. You will have to use the vertical bar separator because you cannot use the PCDATA declaration if the child elements are separated by commas. You also cannot have a subgroup such as (img, br?) within a group that includes PCDATA. We can solve this problem by creating a new element named ImageLink that contains the subgroup and add it to the p element as follows:

 <!ELEMENT ImageLink (img, br?)> <!ELEMENT p (#PCDATA | font | ImageLink | a | ul | ol)+> 

Web browsers that do not understand XML will ignore the ImageLink element. When you use PCDATA within a group of child elements, it must be listed first and must be preceded by a pound sign (#).

You can use the DTD to make certain sections of the document appear in a certain order and include a specific number of child elements (as was done with the html element). You can also create sections of the document that contain an unspecified number of child elements in any order. DTDs are extremely flexible and can enable you to develop a set of rules that matches your requirements.



Developing XML Solutions
Developing XML Solutions (DV-MPS General)
ISBN: 0735607966
EAN: 2147483647
Year: 2000
Pages: 115
Authors: Jake Sturm

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net