Validating XML


NOTE

The information in this section applies to ColdFusion MX 7 and later releases.


Any XML parser will tell you whether your XML document is well formed (that is, whether all tags are closed and nothing is improperly nested). But just because a document is well formed does not mean it adheres to the structure your business logic expects.

To know that your XML document is not only well formed, but also valid in accordance with your business rules, you must validate the document. Validating an XML document ensures that tags are nested in their proper hierarchy (so that something that's supposed to be a child is not actually a parent); that there are no extraneous attributes; and that all required elements and attributes are present.

There are two major methods for validating XML: the older DTD (Document Type Definition) standard, and the newer and more flexible XML Schema standard. This chapter does not go into much depth on DTDs; XML Schema is a more capable standard that is widely accepted.

NOTE

Some other validation standards have come and gone. Chief among them are XDR (XML Data-Reduced), which was Microsoft's attempt at a proprietary schema language, and Relax NG, yet another attempt at defining an XML syntax for creating schemas. However, XML Schema will most likely be the XML validation standard for some time, thanks to its flexibility, wide-ranging support, and status as a W3C recommendation.


DTDs

DTDs (Document Type Definitions) are a holdover from the days of SGML. The DTD describes the elements and attributes available in a document, how they can be nested, and which ones are required and which are optional. But its lack of support for namespaces and inheritance, as well as its somewhat confusing syntax, make the DTD too limited for most programmers' needs.

With this in mind, the remainder of this chapter is devoted to examining and understanding the XML Schema standard, which does everything DTDs can do and more.

XML Schemas

An XML schema is the definition of the required structure of an XML document. Assuming the markup in Listing 14.1, Listing 14.11 shows a schema defined for that document:

Listing 14.11. CompanyDirectory.xsdAn XML Schema Document
 <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">   <xsd:element name="company" type="CompanyType" />   <xsd:complexType name="CompanyType">     <xsd:sequence>       <xsd:element name="employee" type="EmployeeType" minOccurs="0" maxOccurs="unbounded" />     </xsd:sequence>   </xsd:complexType>   <xsd:complexType name="EmployeeType">     <xsd:sequence>       <xsd:element name="first-name" type="xsd:string" />       <xsd:element name="last-name" type="xsd:string" />       <xsd:element name="department" type="xsd:string" />     </xsd:sequence>     <xsd:attribute name="ssn" type="SSNType" />   </xsd:complexType>   <xsd:simpleType name="SSNType">     <xsd:restriction base="xsd:string">       <xsd:pattern value="\d{3}-\d{2}-\d{4}" />       <xsd:length value="11" />     </xsd:restriction>   </xsd:simpleType> </xsd:schema> 

Listing 14.11 is a schema that defines a top-level company element and its lower-level employee elements. Let's examine its parts.

Walkthrough of a Basic Schema

The first line is the schema header. Notice that XML Schema's tags are in a separate namespace, which by convention is xsd. The separate namespace is always necessary to prevent the collision of XML Schema tags and your own tags.

An XML schema always has xsd:schema as its root element. (Although the prefix may be different, the element name must always be schema.)

The first thing inside the schema is typically the global elements defined for the schema. (They do not have to be first, but it's usually the most logical place for them.) In this schema, there is only one global element:

 <xsd:element name="company" type="CompanyType" /> 

A global element is defined as a property of the schema, rather than as a property of a type. Usually the only global element will be the root element for the document (in this case, company).

Elements in an XML schema are defined by using the xsd:element tag. In Listing 14.11, the following line of code declares the XML document's root company element:

 <xsd:element name="company" type="CompanyType" /> 

When you declare an element in an XML schema, you must tell the validator the name of the element and the type of the element's content. Here, the element's name is company, and its type is CompanyType, which is a custom type defined later in the schema.

Next in the schema is a custom type definition, introduced by the element xsd:complexType. A complex type is any element type that contains element or attribute definitions of its own. In this definition, the complex type contains only a single element, represented by an xsd:element nested inside of an xsd:sequence.

The xsd:sequence denotes that a set of elements must occur in the specified order. The xsd:sequence is required here even though there is only one element; this is a quirk of the XML Schema language.

Now let's take a closer look at the xsd:element tag used to define the employee element:

 <xsd:element name="employee" type="EmployeeType" minOccurs="0" maxOccurs="unbounded" /> 

The company element did not define minOccurs and maxOccurs attributes because it was the root element and as such had to have one and only one occurrence. However, it's not necessary for any employee elements at all to be defined, and there is no upper limit on the number of employee elements that can be nested within a company.

Table 14.2 shows the possible values for minOccurs and maxOccurs and their effects.

Table 14.2. minOccurs and maxOccurs Values

PROPERTY

VALUE

EFFECT

minOccurs

0

This element does not have to be present.

minOccurs

1...n

This element must be present at least this many times.

maxOccurs

0

This element cannot be present.

maxOccurs

1...n

This element can be present, at most, this many times.

maxOccurs

unbounded

This element can be present any number of times.


minOccurs and maxOccurs both default to 1, meaning that if neither is specified, the element is required and may only be present once.

The type of the employee element is another custom type, defined by the next xsd:complexType in the schema. EmployeeType contains three elements and one attribute:

 <xsd:complexType name="EmployeeType">   <xsd:sequence>     <xsd:element name="first-name" type="xsd:string" />     <xsd:element name="last-name" type="xsd:string" />     <xsd:element name="department" type="xsd:string" />   </xsd:sequence>   <xsd:attribute name="ssn" type="SSNType" /> </xsd:complexType> 

The three xsd:element tags are inside of an xsd:sequence tag. This means they must always occur in the same order that they appear in the schema. If we wanted it such that elements could occur in any order, we would instead use xsd:choice:

 <xsd:choice minOccurs="3" maxOccurs="3">   <xsd:element name="first-name" type="xsd:string" />   <xsd:element name="last-name" type="xsd:string" />   <xsd:element name="department" type="xsd:string" /> </xsd:choice> 

This means that the developer can choose three of these elements in any order and in any combination. Be careful using the xsd:choice syntax, because the developer could use three first-name elements if desired. Using xsd:sequence is typically your best bet.

Also notice that the type attributes of the elements do not refer to a custom type. The first-name, last-name, and department elements all have a type of xsd:string, which is a built-in XML Schema type.

A full list of XML Schema types is beyond the scope of this book, but Table 14.3 lists the most common ones. There are many more built-in XML schema types, which are described in the XML Schema Primer at the W3C website. See "More XML Resources" at the end of this chapter for more information.

Table 14.3. Built-in XML Schema Types

TYPE NAME

DESCRIPTION

xsd:string

A string of characters with no special formatting.

xsd:integer

A generic integer value.

xsd:positiveInteger

A generic integer value greater than zero.

xsd:negativeInteger

A generic integer value less than zero.

xsd:nonPositiveInteger

A generic integer value that is zero or less than zero.

xsd:nonNegativeInteger

A generic integer value that is zero or greater than zero.

xsd:long

A 64-bit integer value.

xsd:unsignedLong

An unsigned 64-bit integer value.

xsd:int

A 32-bit integer value.

xsd:unsignedInt

An unsigned 32-bit integer value.

xsd:decimal

An exact decimal number value.

xsd:float

A 32-bit floating-point single-precision value.

xsd:double

A 64-bit floating-point double-precision value.

xsd:dateTime

A date/time value in the format 2005-01-18T13:20:00.000-05:00 (representing January 18, 2005, at 1:20 p.m., 5 hours behind UTC).

xsd:boolean

True, false, 1, or 0 (no other value will be accepted as valid, according to the schema).


After the three string elements in EmployeeType, there is an xsd:attribute tag describing the ssn attribute. Notice that the xsd:attribute is outside the xsd:sequence tag (because attributes are never ordered), and that its type attribute refers to SSNType, which is defined by an xsd:simpleType:

 <xsd:simpleType name="SSNType">   <xsd:restriction base="xsd:string">     <xsd:pattern value="\d{3}-\d{2}-\d{4}" />     <xsd:length value="11" />   </xsd:restriction> </xsd:simpleType> 

In this case, SSNType is a special kind of xsd:string that restricts its value to a valid SSN using a regular expression. (See Chapter 13, "Using Regular Expressions," for the full discussion of regular expression syntax.)

NOTE

xsd:complexType defines an element type possibly containing other elements and attributes, whereas xsd:simpleType defines a simple string type based on an existing type. Element values can be either simple or complex types because elements can contain other elements, but attribute values must always be simple types because simple types can always be expressed as string values.


The xsd:pattern and xsd:length tags are facets of the simple type's restriction. (Facet is an XML Schema term meaning "a kind of restriction.") Other facets that can be used inside an xsd:restriction tag are summarized in Table 14.4.

Table 14.4. Facets Used Inside an xsd:restriction Element

XSD ELEMENT

DESCRIPTION

EXAMPLE

xsd:length

Specifies that a string value must be a specific length

<xsd:length value="8" />

xsd:minLength

Specifies that a string value must be at least a certain length.

<xsd:minLength value="3" />

xsd:maxLength

Specifies that a string value must be at most a certain length.

<xsd:maxLength value="7" />

xsd:pattern

Specifies that a value must adhere to a given regular expression pattern.

<xsd:pattern value= "[0-9]{5}(-[0-9]{4})?" />

xsd:enumeration

Specifies that a value must be in a given set of values. Note that there is one xsd:enumeration tag for every value in the allowed set.

 <xsd:enumeration value="0" /> <xsd:enumeration value="1" /> 

xsd:whiteSpace

Specifies how extraneous whitespace is handled before other facets are validated. Can be preserve (to preserve all whitespace); replace (to replace all tabs and carriage returns with spaces); or collapse (to replace all tabs and carriage returns with spaces, replace all whitespace with a single space, and then trim the front and back)

<xsd:whiteSpace value= "collapse" />

xsd:minInclusive

Specifies that a value must be greater than or equal to a given value..

<xsd:minInclusive value="1" />

xsd:maxInclusive

Specifies that a value must be less than or equal to a given value

<xsd:maxInclusive value= "20" />

xsd:minExclusive

Specifies that a value must be greater than a given value.

<xsd:minExclusive value="0" />

xsd:maxExclusive

Specifies that a value must be less than a given value

<xsd:maxExclusive value= "21" />

xsd:totalDigits

Specifies that a numeric value can have, at most, this many digits.

<xsd:totalDigits value="12" />

xsd:fractionDigits

Specifies that a numeric value can have, at most, this many digits after the decimal place

<xsd:fractionDigits value= "2" />


Note that you can combine multiple facets to have granular control over your data. For example, if you wanted to ensure that a value was between three and eight characters long, you could use the following xsd:simpleType declaration:

 <xsd:simpleType name="SKUType">   <xsd:restriction base="xsd:string">     <xsd:minLength value="3" />     <xsd:maxLength value="8" />   </xsd:restriction> </xsd:simpleType> 

As you can see, even complicated schemas consist of simple building blocks. For more information on the XML Schema language and how to build schemas, see "More XML Resources" at the end of this chapter.

Validating XML in ColdFusion

Validating XML in ColdFusion is a simple process, assuming you've already written the appropriate schema or DTD. Both XmlParse() and the new function XmlValidate() provide the capability to target a schema and validate a document's contents.

Validating by Using XmlValidate()

You will most likely use XmlValidate() as your primary method of validating documents because it gives a good deal of information regarding any errors that may occur. XmlValidate() takes two arguments, as shown in Listing 14.12.

Listing 14.12. ValidateXML.cfmUsing XmlValidate() to Validate a Document
 <cffile action="READ"     file="#ExpandPath('SingleCompany.xml')#"     variable="xmlDocument"> <cfset xmlObject = XmlParse(xmlDocument, true)> <cfset errorStruct = XmlValidate(xmlObject, "CompanyDirectory.xsd")> <cfdump var="#errorStruct#"> 

The first argument is a reference to the XML document that needs validating, and the second argument is a reference to the schema document against which the XML document must be validated. Note that each argument can take a number of inputs:

  • Both arguments can take a string representation of the XML markup.

  • Both arguments can take a filename.

  • Both arguments can take a URL.

  • The first argument (the XML document) can also take a parsed XML object.

XmlValidate() returns a structure describing whether validation succeeded, as well as any errors that may have occurred during validation. Most often you will use errorStruct.status, which is YES or NO, depending on whether validation was successful, to determine whether or not to continue processing the document. (The other keys in the structure are usually only helpful during debugging.)

Note that by passing in the schema object to XmlValidate(), you can dynamically control the schema against which a given document is validated, as shown in the next section.

Embedding Schema Information within a Document

You will most often pass a schema to the XmlValidate() function in order to control how a document is validated, but it is sometimes necessary to embed a reference to the schema within the document itself. This is done by using syntax like this:

 <company xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"      xsi:noNamespaceSchemaLocation="Listing14-12.xsd">   <employee ssn="123-45-6789">     <first-name>Ed</first-name>     <last-name>Johnson</last-name>     <department>Human Resources</department>   </employee>   <employee ssn="541-29-8376">     <first-name>Maria</first-name>     <last-name>Smith</last-name>     <department>Accounting</department>   </employee>   <employee ssn="568-73-1924">     <first-name>Eric</first-name>     <last-name>Masters</last-name>     <department>Accounting</department>   </employee> </company> 

This syntax example points the parser to the schema at Listing14-12.xsd.

Note that this method is far from foolproof, because when the calling program validates this document using the provided schema, there is no guarantee that the schema is actually the correct one. The person who created the XML file could very well point the URL anywhere and still have a valid document at that location.

To make use of a schema location provided within the source document, you would call XmlValidate() without a second argument, like this:

 XmlValidate(xmlObject); 

This tells ColdFusion to use the embedded schema location.

Validating by Using XmlParse()

It is also possible to pass a validator parameter to the XmlParse() function and have ColdFusion validate the document when it is parsed:

 <cfset xmlObject = XmlParse(xmlDocument, true, "CompanyDirectory.xsd")> 

Using this form of XmlParse() will cause ColdFusion to throw an error if the document is invalid against the schema. It is better to use XmlValidate() separately, due to the additional information available in the returned structure.



Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net