XML parsers pass on only certain information, as dictated by the core XML Information Set specification, which you can find at www.w3.org/TR/xml-infoset (see New Riders Inside XML for more information on XML Information Sets), whereas XSLT processors adhere to the XSLT tree model. These models, and what they consider important, are different, which can lead to problems.
For example, two XML items that are part of the core information set but are not available in XSLT: notations and skipped entity references (entity references that the XML parser has chosen not to expand). In practice, this means that even if the XML parser passes on information about these items, the XSLT processor cant do anything with it. However, notations are rarely used, and very few XML parsers generate skipped entity references, so this is not a significant problem.
On the other hand, XML parsers can strip comments out of XML documents, which is something you should know about, because the XSLT model is supposed to include them.
In addition, DTD information is not passed on from the XML parser to the XSLT processor (perhaps because W3C is planning more widespread use of XML schemas in XSLT 2.0, although theres still no official mechanism to connect XML schemas with XML documents yet). Thats not usually a problem, because its up to the XML parser to validate the XML document, except in one case: when an attribute is declared of type ID . In XML, you can declare an attribute with any name to be of type ID , so the XSLT processor has no idea which attributes are of this type unless the processor has access to the DTD. This is important when youre using stylesheets that are embedded in XML documents, because then the XSLT processor needs to be able to know which element in the document holds the stylesheet you want to use to transform the document. In this case, some XSLT processors, like Saxon, exceed the XSLT recommendation and scan the DTD, if there is one, to see which attributes are of type ID .
There are a few more items that you also might want to know about. For example, the XSLT processing model makes namespace prefixes available in the input tree, but it gives you very little control over them in the output tree, where they are handled automatically. Also, the XSLT processing model defines a base URI for every node in a tree, which is the URI of the external entity from which the node was derived. (In the XSLT 1.1 working draft, thats been extended to support the XML, thats been extended to support the XML Base specification, as youll see near the end of this chapter.) However, in the XML information set, base URIs are considered peripheral, which means that the XML parser may not pass that information on to the XSL processor.
All in all, you should know that XSLT processors use XML parsers to read XML documents, and that the junction between those packages is not a seamless one. If you find youre missing some necessary information in an XSLT transformation, thats something to bear in mind. In fact, the differences between the XML infoset and XSLT tree model is one of the areas that XSLT 2.0 is supposed to address. Among other things, XSLT 2.0 is supposed to make it easier to recover ID and key information from the source document, as well as to recover information from the source documents XML declaration, such as XML version and encoding.