Attribute Types | Real World XML (2nd Edition)

So far, I've used just the CDATA attribute type when declaring attributesand, in fact, that's probably the most common declaration type for attributes because it allows you to use simple text for the attribute's value. However, you can specify a number of different attribute types, and I'll take a look at them here. These types are not (not yet, anyway) detailed enough to indicate specific data types such as float , int , or double , but they can provide you with some ability to check the syntax of a document.

CDATA

The most simple attribute type you can have is CDATA , which is simple character data. That means the attribute may be set to a value which is any string of text, as long as the string does not contain markup. The requirement that you can't use markup explicitly excludes any string that includes the characters < , " , or & . If you want to use those characters, use their predefined entity references ( < , " , and & ) instead: These entity references will be parsed and replaced with the corresponding characters. (Because these attribute values are parsedwhich is why you have to be careful about including anything that looks like markupyou use the term CDATA for this type, not PCDATA , which is character data that has already been parsed.)

We've already seen a number of examples of attribute declared with the CDATA type, as here:

 <?xml version = "1.0" standalone="yes"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   OWES CDATA "0"   LAYAWAY CDATA "0"   DEFAULTS CDATA "0">   ]>  .     .     .

The CDATA type is the most general type of attribute; from here, we get into more specific types, such as the enumerated type.

Enumerated

The enumerated type does not use a keyword like the other attribute types do. Instead, it provides a list (or enumeration ) of possible values. Each possible value must be a valid XML name (following the usual rules that the first character must be a letter or underscore , and so on).

Here's an example; in this case, I'm declaring an attribute named CREDIT_OK that can have one of only two possible values "TRUE" or "FALSE" and which has the default value "TRUE" :

Listing ch04_15.xml

 <?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   CREDIT_OK (TRUE  FALSE) "TRUE">   ]>  <DOCUMENT>  <CUSTOMER CREDIT_OK = "FALSE">  <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>  <CUSTOMER>  .         .         .     </CUSTOMER> </DOCUMENT>

Using enumerations like this is great if you want to set the possible range of values that an attribute can take. For example, you might want to restrict an attribute named WEEKDAY to these possible values: "Sunday" , "Monday" , "Tuesday" , "Wednesday" , "Thursday" , "Friday" , or " Sunday" .

NMTOKEN

Document authors commonly use another attribute type: NMTOKEN . An attribute of this type can take only values that are made up of proper XML name characters (that is, made up of one or more letters , digits, hyphens, underscores, colons, and periods). In particular, note that NMTOKEN values cannot include whitespace.

Using NMTOKEN attribute values can be useful in some applications. Note, for example, that XML names are very close to those that are legal for variables in C++, Java, and JavaScript, which means that you could even use those names in underlying applications in fancy ways. NMTOKEN values also mean that attribute values must consist of a single word because whitespace of any kind is not allowed; that can be a useful restriction.

Here's an example. In this case, I'm declaring an attribute named SHIP_STATE to hold the two-letter state code to which an order was shipped; I'm also declaring that the attribute with NMTOKEM rules out the possibility of values that are longer than a single term:

Listing ch04_16.xml

 <?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   SHIP_STATE NMTOKEN #REQUIRED>   ]>  <DOCUMENT>  <CUSTOMER SHIP_STATE = "CA">  <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             <ITEM>                 <PRODUCT>Oranges</PRODUCT>                 <NUMBER>24</NUMBER>                 <PRICE>.98</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER>  <CUSTOMER SHIP_STATE = "LA">  .         .         .     </CUSTOMER> </DOCUMENT>

NMTOKENS

You can even specify that an attribute value must be made up of NMTOKEN s separated by whitespace if you use the NMTOKENS attribute type. For example, here I'm giving the attribute CONTACT_NAME the type NMTOKENS to allow attribute values to hold first and last names, separated by whitespace:

Listing ch04_17.xml

 <?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   CONTACT_NAME NMTOKENS #IMPLIED>   ]>  <DOCUMENT>  <CUSTOMER CONTACT_NAME = "George Starr">  .         .         .     </CUSTOMER>  <CUSTOMER CONTACT_NAME = "Ringo Harrison">  .         .         .     </CUSTOMER>  <CUSTOMER CONTACT_NAME = "Paul Lennon">  .         .         .     </CUSTOMER> </DOCUMENT>

ID

There's another very important attribute type that you can declare: ID . XML gives special meaning to an element's ID value because that's the value that applications typically use to identify elements. For that reason, XML processors are supposed to make sure that no two elements have the same value for the attribute that is of the type ID in a document (and you can give elements only one attribute of this type). The actual value you assign to the attribute of this type must be a proper XML name.

Applications can use the ID value of elements to uniquely identify those elementsbut note that you don't have to name the attribute ID , as you do in HTML, because simply specifying an attribute's type to be the ID type makes it into an ID attribute. Here's an example in which I add an ID attribute named CUSTOMER_ID to the <CUSTOMER> elements in this document:

Listing ch04_18.xml

 <?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   CUSTOMER_ID ID #REQUIRED>   ]>  <DOCUMENT>  <CUSTOMER CUSTOMER_ID = "C1232231">  .         .         .     </CUSTOMER>  <CUSTOMER CUSTOMER_ID = "C1232232">  .         .         .     </CUSTOMER>  <CUSTOMER CUSTOMER_ID = "C1232233">  .         .         .     </CUSTOMER> </DOCUMENT>

Note that you cannot use the ID type with #FIXED attributes (because all #FIXED attributes have same value). You usually use the #REQUIRED keyword instead.

ID Values Must Be Proper Names

One thing to realize is that because ID values must be proper XML names, they can't be simple numbers such as 12345; these values cannot start with a digit.

IDREF

The IDREF attribute type represents an attempt to let you use attributes to specify something about a document's structurein particular, something about the relationship that exists between elements. IDREF attributes hold the ID value of another element in the document.

For example, say you wanted to set up a parent-child relationship between elements that was not reflected in the normal nesting structure of the document. In that case, you could set an IDREF attribute of an element to the ID of its parent. An application could then check the attribute with the IDREF type to determine the child's parent.

Here's an example. In this case, I'm declaring two attributes, a CUSTOMER_ID attribute of type ID and an EMPLOYER_ID attribute of type IDREF that holds the ID value of the customer's employer:

Listing ch04_19.xml

 <?xml version = "1.0" standalone="yes"?> <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   CUSTOMER_ID ID #REQUIRED   EMPLOYER_ID IDREF #IMPLIED>   ]>  <DOCUMENT>  <CUSTOMER CUSTOMER_ID = "C1232231">  .         .         .     </CUSTOMER>  <CUSTOMER CUSTOMER_ID = "C1232232" EMPLOYER_ID="C1232231">  .         .         .     </CUSTOMER>  <CUSTOMER CUSTOMER_ID = "C1232233">  .         .         .     </CUSTOMER> </DOCUMENT>

An XML processor can pass on the ID and IDREF structure of a document to an underlying application, which can then use that information to reconstruct the relationships of the elements in the document.

ENTITY

You can also specify that an attribute be of type ENTITY , which means that the attribute can be set to the name of an entity you've declared. For example, say that I declared an entity named SNAPSHOT1 that referred to an external image file. I could then create a new attribute named, say, IMAGE , that I could set to the entity name SNAPSHOT1 ; here's how that looks:

 <?xml version = "1.0" standalone="no"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   IMAGE ENTITY #IMPLIED>   <!ENTITY SNAPSHOT1 SYSTEM "image.gif">   ]>  <DOCUMENT>  <CUSTOMER IMAGE="SNAPSHOT1">  <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

This points out how to use the ENTITY attribute type (but actually it's not a complete example because there are specific ways to set up entities to refer to external, non-XML data that we'll see at the end of this chapter). In general, the ENTITY attribute type is a useful one if you declare your own entities. For example, you might want to declare entities named SIGNATURE_HOME , SIGNATURE_WORK , and so on that hold your name and home address, work address, and so on. If you then declare an attribute named, say, SIGNATURE of the ENTITY type, you can assign the SIGNATURE_HOME or SIGNATURE_WORK entities to the SIGNATURE attribute in the document.

ENTITIES

As with the NMTOKEN attribute type, which has a plural type, NMTOKENS , the ENTITY attribute type has a plural type, ENTITIES . Attributes of this type can hold lists of entity names separated by whitespace.

Here's an example. In this case, I'm declaring two entities, SNAPSHOT1 and SNAPSHOT2 , and an attribute named IMAGES that you can assign both SNAPSHOT1 and SNAPSHOT2 to at the same time:

 <?xml version = "1.0" standalone="no"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!ATTLIST CUSTOMER   IMAGES ENTITIES #IMPLIED>   <!ENTITY SNAPSHOT1 SYSTEM "image.gif">   <!ENTITY SNAPSHOT2 SYSTEM "image2.gif">   ]>  <DOCUMENT>  <CUSTOMER IMAGES="SNAPSHOT1 SNAPSHOT2">  <NAME>             <LAST_NAME>Smith</LAST_NAME>             <FIRST_NAME>Sam</FIRST_NAME>         </NAME>         <DATE>October 15, 2003</DATE>         <ORDERS>             <ITEM>                 <PRODUCT>Tomatoes</PRODUCT>                 <NUMBER>8</NUMBER>                 <PRICE>.25</PRICE>             </ITEM>             .             .             .             <ITEM>                 <PRODUCT>Lettuce</PRODUCT>                 <NUMBER>6</NUMBER>                 <PRICE>.50</PRICE>             </ITEM>         </ORDERS>     </CUSTOMER> </DOCUMENT>

As with the NMTOKENS attribute type, the reason for using the plural ENTITIES type is when you want to assign a number of entities to the same attributes. For example, you might have multiple entities defined that represent a customer's usernames, and you want to assign all of them to an attribute named USERNAMES . Because entities can be quite complex and even can include other entities, this is one way to store detailed data in a document simply using attributes.

NOTATION

The final attribute type is NOTATION . When you declare an attribute of this type, you can assign values to it that have been declared notations.

A notation specifies the format of non-XML data, and you use it to describe external entities. One popular type of notation is Multipurpose Internet Mail Extension (MIME) types such as image/gif, application/xml, text/html, and so on.

List of MIME Types

You can get a list of the registered MIME types at ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types.

Here's an example. In this case, I'll declare two notations, GIF and JPG , that stand for the MIME types image/gif and image/jpeg. Then I'll set up an attribute that can be assigned either of these values.

To declare a notation, you use the <!NOTATION> element in a DTD, like this:

 <!NOTATION  NAME  SYSTEM "  EXTERNAL_ID  ">

Here, NAME is the name of the notation and EXTERNAL_ID is the external ID you want to use for the notation, often a MIME type.

You can also use the PUBLIC keyword for public notations if you supply an FPI (see the rules for constructing FPIs in the previous chapter), like this:

 <!NOTATION  NAME  PUBLIC  FPI  "  EXTERNAL_ID  ">

Here's how I create the GIF and JPG notations:

 <?xml version = "1.0" standalone="no"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)>  <!NOTATION GIF SYSTEM "image/gif">   <!NOTATION JPG SYSTEM "image/jpeg">  .     .     .

Now I'm free to create an attribute named, say, IMAGE_TYPE , of type NOTATION that you can assign either the GIF or JPG notations to:

 <?xml version = "1.0" standalone="no"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!NOTATION GIF SYSTEM "image/gif"> <!NOTATION JPG SYSTEM "image/jpeg"> <!ATTLIST CUSTOMER     IMAGE NMTOKEN #IMPLIED  IMAGE_TYPE NOTATION (GIF  JPG) #IMPLIED>  ]>     .     .     .

At this point, I'm free to use the IMAGE_TYPE attribute:

 <?xml version = "1.0" standalone="no"?>  <!DOCTYPE DOCUMENT [ <!ELEMENT DOCUMENT (CUSTOMER)*> <!ELEMENT CUSTOMER (NAME,DATE,ORDERS)> <!ELEMENT NAME (LAST_NAME,FIRST_NAME)> <!ELEMENT LAST_NAME (#PCDATA)> <!ELEMENT FIRST_NAME (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT ORDERS (ITEM)*> <!ELEMENT ITEM (PRODUCT,NUMBER,PRICE)> <!ELEMENT PRODUCT (#PCDATA)> <!ELEMENT NUMBER (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> <!NOTATION GIF SYSTEM "image/gif"> <!NOTATION JPG SYSTEM "image/jpeg"> <!ATTLIST CUSTOMER     IMAGE NMTOKEN #IMPLIED     IMAGE_TYPE NOTATION (GIF  JPG) #IMPLIED> ]> <DOCUMENT>  <CUSTOMER IMAGE="image.gif" IMAGE_TYPE="GIF">  .         .         .     </CUSTOMER> </DOCUMENT>

Note that this example brings up an interesting point: Here, I've just set the value of an attribute, IMAGE , to the name of an image file, image.gif. But how do you actually make an unparsed entity like an image part of a document? There's a way of doing that explicitly; now that we know about notations, we're ready to do things that way.

xml:space and xml:lang

This completes our coverage of creating attributes. But don't forget that there are also two attributes that are in some sense predefined in XML (we've already covered those): xml:space , which you can use to preserve the whitespace in an element; and xml:lang , which you can use to specify the language used in an element and its attributes. They're not really predefined because you have to declare them if you want to use them, but you shouldn't use these attribute names for anything but their intended use.