PCDATA

#PCDATA

Say that we want to let the <CUSTOMER> element store some plain textin particular, say that we want to store the name of a customer. All nonmarkup text is referred to as parsed character data in a DTD, and it's abbreviated as #PCDATA in element declarations. Parsed character data explicitly means text that does not contain markup; it's just simple character data.

The parsed character data is where you store the actual content of the document as plain text. Note, however, that this is the only way to specify the content of the document using DTDs. You can't say anything more about the actual type of content.

For example, even though you might be storing numbers , that data is only plain text as far as DTDs are concerned . This lack of precision is one of the reasons that XML schemas, the alternative to DTDs, were developed. With schemas, you can specify much more about the type of data you're storing, such as whether it's in an integer, a floating point, or even a date format, and XML processors can check to make sure that the data matches the format in which it's supposed to be expressed . I'll take a look at schemas in Chapter 5, "Creating XML Schemas."

Here's how I declare the <CUSTOMER> element so that it can contain PCDATA (and only PCDATA ):

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*>  <!ELEMENT CUSTOMER (#PCDATA)>  ]> <DOCUMENT>     <CUSTOMER>     .     .     .     </CUSTOMER> </DOCUMENT> 

Now I can add text to a <CUSTOMER> element in the document, like this:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>  <CUSTOMER>   Sam Smith   </CUSTOMER>  </DOCUMENT> 

Note that elements that have been declared to hold PCDATA can hold only PCDATA ; you cannot, for example, place another element in the <CUSTOMER> element the way it has been declared now:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>  Sam Smith   <CREDIT_RATING>   Lousy   </CREDIT_RATING>  </CUSTOMER> </DOCUMENT> 

The content model that supports both PCDATA and other elements inside an element is called the mixed-content model, and I'll take a look at it in a few pages. (You can also support a mixed-content model using the ANY content model, of course.)

There's another thing to note here now that we're dealing with multiple declarations: The order in which you declare elements doesn't matter, so this DTD, in which I've declared the <DOCUMENT> element after the <CUSTOMER> element, works just as well:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [  <!ELEMENT CUSTOMER (#PCDATA)>   <!ELEMENT DOCUMENT (CUSTOMER)*>  ]> <DOCUMENT>     <CUSTOMER>     .     .     .     </CUSTOMER> </DOCUMENT> 

Order of Element Declarations

Although the order of element declarations is not supposed to matter (and, in practice, that's the way I've always seen it), it is possible that some XML processors will demand that you declare an element before using it in another declaration.

It's also possible to declare elements in such a way that they can contain multiple children. In fact, you can specify the exact types of child elements that an element can enclose, as well as in what order those child elements must appear. I'll take a look at that now.

Dealing with Multiple Children

When you want to declare an element that can contain multiple children, you have several options. DTDs use a syntax to deal with multiple children that is much like working with regular expressions in languages such as Perl, if you're familiar with that. Here's the syntax you can use (here, a and b are child elements of the element you're declaring):

  • a+ One or more occurrences of a .

  • a* Zero or more occurrences of a .

  • a? a or nothing.

  • a, b a followed by b .

  • a b a or b , but not both.

  • ( expression ) Surrounding an expression with parentheses means that it is treated as a unit and may have the suffix operators ? , * , or + .

If you're not familiar with this kind of syntax, it's not much use asking why things are set up this way; this syntax has been around a long time, and the W3C adopted it for DTDs because many people were familiar with it. If this looks totally strange to you, it's just one of the skills you'll have to master when writing DTDsbut, fortunately, it soon becomes second nature. I'll take a look at each of these possibilities in detail now.

One or More Children

If you have to specify that the <DOCUMENT> element can contain only between 12 and 15 <CUSTOMER> elements, you'll have a problem when working with DTDs: the DTD syntax won't allow you to do that without getting very complex. You can, however, specify that the <DOCUMENT> element must contain one or more <CUSTOMER> elements like this, using the + operator:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [  <!ELEMENT DOCUMENT (CUSTOMER)+>  <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER>     <CUSTOMER>         Fred Smith     </CUSTOMER> </DOCUMENT> 

In this case, the XML processor now knows that you want the <DOCUMENT> element to contain at least one or more <CUSTOMER> elements, which makes sense if you want a useful document that actually contains some data. In this way, we've been able to specify the syntax of the <DOCUMENT> element in some more detail.

Zero or More Children

Besides specifying one or more child elements, you can declare elements so that they can enclose zero or more of a particular child element. This is useful if you want to allow an element to have a particular child element, or any number of such elements, but you don't want to force it to have that particular child element.

For example, a <CHAPTER> element might be capable of containing an <FOOTNOTE> element, or even several <FOOTNOTE> elements, but you wouldn't necessarily want to force all <CHAPTER> elements to have <FOOTNOTE> elements. Using the * operator, you can do that.

The * operator means that the indicated child element can appear any number of times in the declared element (including zero times). Here's how I indicate that the <DOCUMENT> element can contain any number of <CUSTOMER> elements:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [  <!ELEMENT DOCUMENT (CUSTOMER)*>  <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER>     <CUSTOMER>         Fred Smith     </CUSTOMER> </DOCUMENT> 

Zero or One Child

Besides using + to specify one or more occurrences of a particular child element and * to specify zero or more occurrences of a child element, you can use ? to specify zero or one occurrence of a child element. In other words, using ? indicates that a particular child element may be present in the element you're declaring, but it need not be.

For example, a <CHAPTER> element might be capable of containing one <OPENING_QUOTATION> element, but you wouldn't necessarily want to force all <CHAPTER> elements to have an <OPENING_QUOTATION> element. Using the ? operator, you can do that.

Here's an example. In this case, I'm allowing the <DOCUMENT> element to contain only zero or one <CUSTOMER> element (rather a limited clientele):

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [  <!ELEMENT DOCUMENT (CUSTOMER)?>  <!ELEMENT CUSTOMER (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         Sam Smith     </CUSTOMER> </DOCUMENT> 

We've advanced a little in DTD power now by allowing multiple child elements, but so far, we've allowed only child elements of the same type in any one declared element. That's about to change.

DTD Sequences

You can specify exactly what child elements a particular element can contain, and in what order, by using a sequence. A sequence is a comma-separated list of element names that tells the XML processor what elements must appear and in what order.

For example, say that we wanted to change the <CUSTOMER> element so that instead of containing only PCDATA , it can contain other elements. Here, I'll let the <CUSTOMER> element contain one <NAME> element, one <DATE> element, and one <ORDERS> element, in exactly that order. The resulting declaration looks like this:

 <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> 

I can break this down further, of course. For example, I can specify that the <NAME> element must contain exactly one <LAST_NAME> element and one <FIRST_NAME> element, in that order, like this:

 <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> 

Whitespace doesn't matter, of course, so the same declaration could look like this:

 <!ELEMENT   NAME        (LAST_NAME,      FIRST_NAME)> 

Being able to specify the exact order that the elements in your document must take can be great when you're working with software that relies on such an order.

Here's how I'll elaborate the ch02_01.xml document to include the previous two sequences as well as a third one that makes sure that the <ITEM> element contains exactly one <PRODUCT> element, one <NUMBER> element, and one <PRICE> element, in that order. The resulting DTD enforces the syntax of the ch02_01.xml document that we developed in the previous chapter; you can see the whole document, complete with working DTD, here:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [   <!ELEMENT DOCUMENT (CUSTOMER)*>   <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)>   <!ELEMENT NAME (LAST_NAME,FIRST_NAME)>   <!ELEMENT LAST_NAME (#PCDATA)>   <!ELEMENT FIRST_NAME (#PCDATA)>   <!ELEMENT DATE (#PCDATA)>   <!ELEMENT ORDERS (ITEM)*>   <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)>   <!ELEMENT PRODUCT (#PCDATA)>   <!ELEMENT NUMBER (#PCDATA)>   <!ELEMENT PRICE (#PCDATA)>   ]>  <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Jones</LAST_NAME>             <FIRST_NAME>Polly</FIRST_NAME>         </NAME>         <DATE>October 20, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Bread</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Apples</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>     <CUSTOMER>         <NAME>             <LAST_NAME>Weber</LAST_NAME>             <FIRST_NAME>Bill</FIRST_NAME>         </NAME>         <DATE>October 25, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Asparagus</PRODUCT>                 <NUMBER>12</NUMBER>                 <PRICE>.95</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT> 

You can use the same element in a sequence a number of times, if you want. For example, here's how I make sure that the <CUSTOMER> element should hold exactly three <NAME> elements:

 <!ELEMENT CUSTOMER (NAME,NAME,NAME)> 

Here's another important note: You can use + , * , and ? operators that we've already seen inside sequences. For example, here's how I specify that there can be one or more <NAME> elements for an order, an optional <CREDIT_RATING> element, and any number of <DATE> elements:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*>  <!ELEMENT CUSTOMER (NAME+,CREDIT_RATING?,DATE*,ORDERS)>  <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!ELEMENT CREDIT_RATING (#PCDATA)> ]> <DOCUMENT>     <CUSTOMER>         <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>             .             .             . 

Using + , * , and ? inside sequences provides you with a lot of flexibility because now you can specify how many times an element can appear in a sequenceand even if it can be absent altogether.



Real World XML
Real World XML (2nd Edition)
ISBN: 0735712867
EAN: 2147483647
Year: 2005
Pages: 440
Authors: Steve Holzner

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net