InfosetsWhile discussing creating XML documents, it's worth discussing another XML specification: the XML Information Set specification, which you'll find at www.w3.org/TR/xml-infoset. XML documents excel at storing data, and this has led developers to wonder if XML will ultimately be able to solve an old problem: being able to directly compare and classify the data in multiple documents. For example, consider the World Wide Web as it stands today: There can be thousands of documents on a particular topic, but how can you possibly compare them? For example, a search for the term XML turns up millions of matches, but it would be extraordinarily difficult to write a program that would compare the data in those documents because all that data isn't stored in any remotely compatible format. The idea behind XML information sets, also called infosets, is to set up an abstract way of looking at an XML document so that it can be compared to others. To have an infoset, XML documents may not use colons in tag and attribute names unless they are used to support namespaces. Documents do not need to be valid to have an infoset, but they need to be well formed . An XML document's information set consists of two or more information items (the information set for any well-formed XML document contains at least the document information item and one element information item). An information item is an abstract representation of some part of an XML document, and each information item has a set of properties, some of which are considered core and some of which are considered peripheral. An XML information set can contain 15 different types of information items:
There is always one document information item in the information set. Here's a list of the core properties of the document information item:
The document information item can also have these properties:
The other information items, such as element information items and processing instruction information items, have similar properties lists. Currently, no applications create and work with infosets. However, W3C documentation often refers to the information stored in an XML document as its infoset, so it's an important term to know. The closest you come to working with infosets right now is working with canonical XML documents (see the next topic). |