Oddly, the DTD uses syntax that has a confusing similarity to XML syntax but is completely different. A DTD declaration has the form
This is the simplest declaration. It simply declares the existence of an element type and specifies the content it may have:
<!ELEMENT name (contents)>
<!ELEMENT> Content Rule Symbols
#ANY tells the validating processor that nothing is known about this element. For describing an element itself, this is a waste of bytes. However it serves a need when this element (about which nothing is yet known) needs to be introduced to the processor, so it can appear as content in the declaration of a larger element.
\#EMPTY defines the element as a leaf element with no content. (Leaf nodes are those without child nodes.) This does not mean that the element has no meaning. Such an element might carry a great informational cargo in its attributes. Even a bare element with neither content nor attributes can present great meaning. Elemental tokens by their presence and by their context can be very meaningful.
#PCDATA is parsed character data. This is text without markup except entity references. Of the following two lines, only the first is PCDATA:
He's coming! He <bold>is</bold> coming!
An element containing only PCDATA is a leaf node and is more specifically referred to as a node with character content.
elementname The appearance of a declared element name in the content rule marks this element as a parent element, that is an element with element content.
<!ELEMENT> Content Rule Operators
It is not unusual for an element to contain multiple elements or to contain both element content and character content ( mixed content ). Any combination can be expressed with a set of operators that express combination and occurrence.
(dot,star) is a dot followed by a star.
(dotstar) is either a dot or a star.
(dot(dot,star)) is either a dot or a dot and a star.
(dot,(dotstar)) is either dot-dot or dot-star.
(Q,A)+ One or more Q/A pairs
(Q,A*) Q with an unspecified number of As
(Q,A,A+)* Any number (even zero) of Qs, each followed by two or more As
(Q, ((A,A)(A,A,A,A,A)) ) Q followed by two or five As.
((A B C )*) Any series of any number of As, Bs and/or Cs in any order.
The DTD format is criticized for its inability to type content. It cannot specify whether content is text, numeric data, or any other type. For a technology called a document type definition, very little typing is defined. The major exception occurs in the attribute list.
The type of each attribute must be specified. A parameter in the declaration defines its type. However, if you expect to find familiar types like boolean or float or even number, you are out of luck. Take a deep breath and enter a complicated world.
CDATA This attribute contains string data. This might be a line of text or a URL or even a number ( 55 ) or a boolean ( true ). It is a very general specification that tells us very little.
ID This attribute contains a unique identifier. Although a number often makes an easy and reliable unique identifier, it is not accepted by the XML rules. The identifier must be a proper XML name and follow the rules for names considered earlier. Uniqueness is independent of context. An element called <player> Jocko is not unique from <game> Jocko. The ID must be unique only within this XML entity. An ID must obey the rules for XML names. It cannot be an optional attribute; its default must be #IMPLIED or #REQUIRED .
IDREF This attribute links to another element somewhere in this file. It does so by referring to the unique ID of that element. Although infrequently used, it permits the creation of data structures more complex than are afforded by XML's basic tree hierarchy. Complex directed graphs, circular references, multiply-linked lists, and relational database models can be represented with these internal references.
IDREFS indicates an attribute that is a sequence of IDREF symbols joined by white space. It gives us the power to make and manipulate simple lists.
ENTITY compels this attribute to be set to the name of a previously declared unparsed external entity. As we have seen, entities are references to files that are not necessarily XML files (e.g., a JPEG image or an MP3 sound). Naturally, references to such entities cannot be fully expanded by the XML parser, so they are maintained as references.
ENTITIES is just a list of entities joined by white space. These plural types are necessary because, unlike its child elements, all the attributes of an element must have unique names.
NMTOKEN restricts this attribute to a single well- formed name.
NMTOKENS allows a series of names separated by spaces.
Enumerated lists consist of a series of names separated by the (or) symbol. This list defines the set of values from which the value of this attribute must be selected.
NOTATION can precede an enumerated list and restrict it to notations. These are typically used to describe the properties (e.g., media types) of the files referred to by external unparsed entities.
A very useful feature of DTD attribute typing is the ability to define the default value of an attribute. Even more useful is the syntax that allows you three special options. Typically the default value is a simple literal like "true" or "5." But it can also be one of the following symbols.
#REQUIRED in place of a default value informs the processor that a valid XML file must supply a proper value for this attribute whenever it appears. There is no default.
#IMPLIED in place of a default value informs a processor that the final application is better able to supply a default value than is the XML processor. In essence, the processor is instructed that there is no default, but unlike the #REQUIRED case, an omission of this attribute is not an error.
#FIXED preceding a default value informs the processor that this attribute must always have this value. If omitted from the XML, it will be supplied. If present, it must agree with the default or an error has occurred that invalidates the XML.