Instance Documents and Validation | The Official XMLSPY Handbook

An XML content model defines a technical blueprint for a family or class of XML documents. An XML document that is intended to belong to a particular family of documents (that is, it is meant to conform to all the rules defined within a content model) is called an instance document. This differentiation is required to demonstrate that the document is just one instance member of a particular family of documents (for example, a particular markup language). The distinction between an instance document and its associated content model is analogous to the relationship between an object and its class definition in object-oriented programming languages, or to a database table’s schema definition and a row of data within the table. Both a class definition and a database schema define a template or model, which is subsequently used to create objects or data that share the same characteristics. Suppose that you want to convey, in XML format, the information contained in a collection of books. You first must design a general content model to describe a book; then, you can create several XML instance documents that correspond to each book in your collection.

Although the content model is an optional feature of XML, it is an extremely powerful feature. A content model is a necessary requirement for performing document validation; the process by which an XML instance document is systematically checked and verified to be in conformance with the all the rules and restrictions defined within a content model. Having a content model is a necessary requirement for document validation. In the absence of a content model, there are no rules to enforce upon an XML document. XML, by itself, does not enforce or place any restrictions on what kinds of data elements may be included or a document’s structure outside the basic rules for constructing XML documents (such as the requirements that a document be well-formed and adhere to certain naming conventions and other rules described in Chapter 2).

Document validation is an automatic process as far as the developer is concerned. All major commercial-grade XML parsers (also called XML processors) today are validating XML parsers. This means that they have both a parser, which reads an XML document, and an XML validator, which is a software component that can understand and enforce content models expressed in one of the major schema languages, such as DTD or XML Schema. You only have to specify an instance document to be validated and a content model containing the rules to be validated against, and then you simply instruct the XML parser to perform the document validation. Typically, the processor’s XML validator either reports a success message, if your instance document is determined to be valid, or provides some kind of error message in the event that the XML processor’s validator catches a violation of one of the constraints specified in your content model. Indeed, the XML processor included in XMLSPY’s parser includes an XML validator that can accept any content model (expressed either as a DTD or an XML Schema), as well as an XML instance document to be validated. It will validate the document against all the rules defined in the content model. In the following section, I show you how to use a DTD to validate XML documents within the XMLSPY editing environment.

Using content models to validate XML instance documents is important for the following reasons:

Consistent program behavior: As a general rule, data input to any software application should be validated for reasons of security and to ensure proper program behavior. By using a content model, you eliminate the fear of unexpected input, so your software application (which uses XML documents) is less likely to crash and more likely to produce consistent, meaningful results—obviously a highly desirable result!
Less validation code: The use of a content model to restrict program input and output offloads the burden of writing and maintaining vast amounts of validation code from the application developer by passing off this responsibility to the XML validator. By using an XML validator, programmers can write applications that create or consume XML documents without fear of encountering an unexpected input. One of the most common problems of implementing and maintaining separate application data validation modules is that any change of program input or output must be reconciled with the data validation module. Any desynchronization of the application’s external interface from the validation module will result in errors. As an application grows in complexity, so do the number of possible program inputs and the potential for errors.

For the reasons just stated, it’s not a good idea to use XML documents without an associated content model. Content model development should take precedence over editing XML instance documents in the XML application development process. Only after having spent at least some time developing and refining a content model for a family of XML documents should you start editing instance documents.