Back in a previous lifetime when I worked as a consultant for Digital Equipment Corporation, we all loved using the VMS operating system. However, we also joked that it was so capable and flexible that it gave us a thousand different ways to hang ourselves . The W3C XML Schema language is very similar. You can write a nearly infinite number of schemas to describe any particular instance document. Every one of these schemas can provide an accurate description of the document, be loaded into a parser, and validate the document.
If there were only one way to write a schema I'm not sure I would devote a whole chapter to the topic. However, because all of this variation is possible, a certain level of instruction is called for in order to help you understand schemas. Understanding the schemas you use is important for a number of reasons. When used, schemas are the ultimate authority on the format of your XML instance documents. If you encounter parsing or validation errors you may have to refer to the schema as well as the instance document to determine the source of the error. If you are like most people who use XML, your trading partners or application vendors will tell you which schemas to use. Because there is sometimes a perception that schemas are fully definitive and self-documenting , the schema may be the only documentation you'll get regarding how your instance documents are supposed to look.
In this chapter I attempt to convey to you the basics of what you need to know to read and make sense of schemas written in W3C's XML Schema language. This chapter focuses just on understanding the suckers. In Chapter 5 we'll write code to use schemas for validation. Later in this chapter we'll talk about creating schemas, and we'll revisit the topic in Chapter 12. However, for the most part I'll avoid discussing the advantages and disadvantages of the various schema design options. The topic of best practices in schema design is fairly broad and sometimes contentious. Since I anticipate that most of you reading this will not be designing schemas, I'll limit the discussion to a more pragmatic scope.
If you want a good introduction to most of the features of the schema language, you can't do better than W3C's Primer (see the Resources section at the end of this chapter). However, this chapter takes a different tack. While the Primer tries to describe most of the important schema features, I'll focus only on features and examples you are likely to see in schemas describing business documents (as opposed to wedding invitations, Web pages, or design documents for weapons systems). Although the Primer uses a purchase order as one of its main examples, many of the schema features discussed are not commonly used in business documents (not yet, anyway).
Although I have good words to say about the Primer, which is Part 0 of the XML Schema Recommendation, I don't have such good words to say about Parts 1 and 2, Structures and Datatypes, respectively. I find parts of them nearly impenetrable, and I have a master's degree in computer science! There also seems to me to be a great deal of functionality that is unnecessarily overlapping or outright duplicative. I don't want to come across as too harsh here since I have worked on standards committees myself and know firsthand what can happen when writing a document by committee. However, even when compared with other W3C Recommendations, Parts 1 and 2 of the Schema Recommendation don't measure up very well.
There are schema languages other than the W3C XML Schema language. However, despite my mixed feelings about the W3C XML Schema Recommendation, I'm not going to talk about those other schema languages. It's not that I think that Relax NG or any of its cousins are technically deficient or harder to use than the W3C XML Schema language. Far from it. The truth is that the market just isn't very interested in them. Almost everyone looks to the W3C as the final authority on all things XML. They have spoken.