| only for RuBoard |
A schema or Document Type Definition (DTD) contains the rules by which an XML document must abide. Schemas and DTDs are discussed in detail in a moment, but first, think of them as a set of rules. If the XML document conforms to the rules, it is said to be a valid document.
More precisely, XML documents are said to be valid if their content can be
As with the other sections in this chapter, let's begin by answering some basic questions. Before jumping into how to validate documents, you must understand why you would want to validate a document.
Providing the rules and vocabulary for a document helps to communicate the grammar associated with the document. By describing what is valid content for the document, you develop a vocabulary that can be extended and
Without explicitly
Consider a system where you receive a data feed from a customer. How would your customer know what is valid data and what is a valid structure of the document? You could give
As a benefit, XML tools can also use DTDs or schemas to assist the developer in creating the document. For example, the XML editor included in Visual Studio .NET uses schemas to provide code completion and
Now you're ready to look at DTDs and schemas in more detail.
A DTD is simply a syntax for declaring the grammar and vocabulary of an XML document. The DTD enables the developer to
This section covers DTDs only enough to convey their use and existence. DTDs are quickly losing ground to XML Schemas as the preferred validation mechanism for XML documents. This book focuses on the use of XML Schemas for validation over the use of DTDs.
A DTD declares the structure and content of an XML file. It defines the content model of a document. Consider the following XML document:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<SITES>
<LINKS>
<LINK>http://www.Microsoft.com</LINK>
<LINK>http://www.xmlandasp.net</LINK>
</LINKS>
</SITES>
Suppose that business rules are associated with the structure of this document, and that these rules are not immediately obvious. For example, suppose that you only accept one
LINKS
node as a child of the
SITES
node. Furthermore, there can be zero or more
LINK
elements as a child of the
LINKS
element. Finally, each
LINK
element contains text content. Here is the sample DTD that
<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <!DOCTYPE SITES [ <!ELEMENT SITES (LINKS)?> <!ELEMENT LINKS (LINK)*> <!ELEMENT LINK (#PCDATA)> ]>
To explain this a little further, begin by declaring the XML processing instruction, version, and the standalone attribute. The
The following line declares an element as a child of the SITES node. The child node is named LINKS , and the question mark ( ? ) declares that the element appears only zero or one time(s). Using this definition, the following XML is valid:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <SITES/>
This is because the LINKS element can occur zero times or one time, but cannot occur more than once.
The following line declares that a child element can occur either zero or many times.
<!ELEMENT LINKS (LINK)*>
If you want to declare that at least one LINK element must be a child of the LINKS element, use the + notation to signify a cardinality of greater than one:
<!ELEMENT LINKS (LINK)+>
The following line in the sample DTD declares that the LINK elements content is made up of PCDATA , or character data:
<!ELEMENT LINK (#PCDATA)>
You can also declare a sequence of child elements. Suppose that you want to add an ARTICLES element as a child of the SITES root node. Furthermore, the LINKS element must always precede the ARTICLES node. To add this, you must change the sample DTD to the following:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<!DOCTYPE SITES [
<!ELEMENT SITES (LINKS,ARTICLES)?>
<!ELEMENT LINKS (LINK)*>
<!ELEMENT LINK (#PCDATA)>
<!ELEMENT ARTICLES (ARTICLE*)>
<!ELEMENT ARTICLE (#PCDATA)>
]>
The following XML document is now valid using this DTD definition:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<!DOCTYPE SITES [
<!ELEMENT SITES (LINKS,ARTICLES)?>
<!ELEMENT LINKS (LINK)*>
<!ELEMENT LINK (#PCDATA)>
<!ELEMENT ARTICLES (ARTICLE*)>
<!ELEMENT ARTICLE (#PCDATA)>
]>
<SITES>
<LINKS>
<LINK>http://www.Microsoft.com</LINK>
<LINK>http://www.xmlandasp.net</LINK>
</LINKS>
<ARTICLES>
<ARTICLE>This is where an article may go</ARTICLE>
</ARTICLES>
</SITES>
The following XML, however, is not valid because the sequence was defined so that LINKS must precede ARTICLES :
<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <!DOCTYPE SITES [ <!ELEMENT SITES (LINKS,ARTICLES)?> <!ELEMENT LINKS (LINK)*> <!ELEMENT LINK (#PCDATA)> <!ELEMENT ARTICLES (ARTICLE*)> <!ELEMENT ARTICLE (#PCDATA)> ]> <SITES> <ARTICLES> <ARTICLE>This is where an article may go</ARTICLE> </ARTICLES> <LINKS> <LINK>http://www.Microsoft.com</LINK> <LINK>http://www.xmlandasp.net</LINK> </LINKS> </SITES>
Suppose that you want to have a choice between two different element types. Instead of requiring both elements and articles as a child of the root node, suppose that you want one or the other. To represent this, use the OR notation .
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<!DOCTYPE SITES [
<!ELEMENT SITES (LINKS ARTICLES)?>
<!ELEMENT LINKS (LINK)*>
<!ELEMENT LINK (#PCDATA)>
<!ELEMENT ARTICLES (ARTICLE*)>
<!ELEMENT ARTICLE (#PCDATA)>
]>
<SITES>
<ARTICLES>
<ARTICLE>This is where an article may go</ARTICLE>
</ARTICLES>
</SITES>
This document is
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<!DOCTYPE SITES [
<!ELEMENT SITES (LINKS ARTICLES)?>
<!ELEMENT LINKS (LINK)*>
<!ELEMENT LINK (#PCDATA)>
<!ELEMENT ARTICLES (ARTICLE*)>
<!ELEMENT ARTICLE (#PCDATA)>
]>
<SITES>
<ARTICLES>
<ARTICLE>This is where an article may go</ARTICLE>
</ARTICLES>
<LINKS>
<LINK>http://www.xmlandasp.net</LINK>
<LINK>http://www.microsoft.com</LINK>
</LINKS>
</SITES>
So far, only elements have been a focus. What if you want to specify an attribute? For example, what if you want to associate a
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<!DOCTYPE SITES [
<!ELEMENT SITES (LINKS ARTICLES)+>
<!ELEMENT LINKS (LINK*)*>
<!ELEMENT LINK (#PCDATA)>
<!ELEMENT ARTICLES (ARTICLE*)*>
<!ELEMENT ARTICLE (#PCDATA)>
<!ATTLIST LINK
name CDATA #REQUIRED
>
]>
<SITES>
<ARTICLES>
<ARTICLE>This is where an article may go</ARTICLE>
</ARTICLES>
<LINKS>
<LINK name="xmlandasp.net">http://www.xmlandasp.net</LINK>
<LINK name="Microsoft">http://www.microsoft.com</LINK>
</LINKS>
</SITES>
The #REQUIRED modifier for the attribute specifies that the attribute is required.
As you can see, DTDs can be useful for defining the structure and content model of an XML document.
The first drawback to using DTDs is that an XML parser cannot parse them. DTDs use a syntax that's difficult for parsers to represent. Two types of DTDs exist:
internal
and
external
. In this section, only internal DTDs are used, just to keep things simple. External DTDs are DTDs that are external to the XML document and are referenced from with the XML document. The XML parser cannot represent external DTDs, so working with DTDs becomes
As you might have seen throughout the DTD examples, the content of attributes and elements were specified as #PCDATA . This is because DTDs do not support typing of data. You cannot restrict that the content of an element will be a number or a string because everything in an XML document is a string, according to DTDs. You cannot specify the acceptable length of a string, nor can you specify restrictions on the string's contents.
Another drawback to using DTDs is that
<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <!DOCTYPE ROOT [ <!ELEMENT ROOT ( LINKS ,ARTICLES)?> <!ELEMENT LINKS (LINK)*> <!ELEMENT ARTICLES (ARTICLE)*> <!ELEMENT LINK (#PCDATA)> <!ELEMENT ARTICLE (LINKS)> ]> <ROOT> <LINKS> <LINK>NewRiders.com</LINK> </LINKS> <ARTICLES> <ARTICLE> <LINKS> <LINK>Microsoft.com</LINK> <LINK>Xmlandasp.net</LINK> </LINKS> </ARTICLE> </ARTICLES> </ROOT>
Notice that both the ROOT and ARTICLE elements declare a child element of type LINKS .
You can reuse definitions, but you cannot redefine names. Suppose that, instead of reusing the LINKS definition, you want to redefine it. You might try something like the following, but this code is invalid because the name is already declared:
<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <!DOCTYPE ROOT [ <!ELEMENT ROOT (LINKS,ARTICLES)?> <!ELEMENT LINKS (LINK)*> <!ELEMENT ARTICLES (ARTICLE)*> <!ELEMENT LINK (#PCDATA)> <!ELEMENT ARTICLE (LINKS)> <!ELEMENT LINKS (#PCDATA)> ]>
Finally, using namespaces with DTDs is difficult. It is not
XML Schemas are gaining ground on DTDs because schemas can easily represent different names, extend existing definitions, and easily use namespaces.
The
Validation Using Internet Explorer
Internet Explorer doesn't validate documents with either DTDs or schemas by default.
iexmltls.exe
, however, is a free add-on to IE that
|
To validate against a DTD using the ValidatingReader class in .NET, set the ValidationType property of the ValidatingReader object to ValidationType.DTD ,as shown here:
Sub Validate()
Dim xmlReader As System.Xml.XmlTextReader = New
System.Xml.XmlTextReader("c:\temp\xmlfile.xml")
Dim vReader As System.Xml.XmlValidatingReader = New
System.Xml.XmlValidatingReader(xmlReader)
vReader.ValidationType = ValidationType.DTD
AddHandler vReader.ValidationEventHandler, AddressOf ValidateCallback
While vReader.Read()
End While
End Sub
Public Sub ValidateCallback(ByVal sender As Object, ByVal args As
System.Xml.Schema.ValidationEventArgs)
Debug.WriteLine(args.Message)
End Sub
You can also use the .NET base classes or MSXML to programmatically validate against an XML document. Chapter 2 discusses the ValidatingReader class for validating XML documents, and the ValidatingReader class is discussed in greater detail in Chapter 6, "Exploring the System.Xml Namespace." Chapter 5, "MSXML Parser," discusses how to validate documents by using MSXML.
| only for RuBoard |