Out-of-Line Markup | Effective XML: 50 Specific Ways to Improve Your XML

Books like this one sometimes include complete XML documents as examples, sometimes just fragments thereof. To keep the examples correct, it's very useful if the fragments are automatically extracted from complete documents that can be validated , checked for well- formedness , and verified to be correct. This requires some way to mark off the portions of the document that will be extracted and included. Since processing instructions are ignored by validators and most other processes that don't expect to see them, they make an excellent tool for this purpose. For example, the following document uses <?begin-extract?> and <?end-extract?> processing instructions to select the checkmating move.

 <?xml version="1.0"?> <game>   <move>f3</move>   <move>e5</move>   <move>g4</move>   <?begin-extract?><move>Qh4++</move><?end-extract?> </game>

Here an empty element would not work as well because it might interfere with validation. Furthermore, a processing instruction only describes how this document will be used by one very particular process (the auto-assembly of a different document). It is not in any way a key part of the document's own information.

An alternative approach would be to use XPointers to extract the relevant subtrees or ranges. Then the document need not carry additional markup at all. The problem with this (besides the lack of effective tool support for XPointer) is that XPointer operates on a processed form of the document created by the XPath data model. In this case, we really do want to operate on the lexical layer and extract particular characters , not particular nodes. We might even use a non-XML-aware tool like a regular expression to do this. (Just look for the text between <?begin-extract?> and <?end-extract?> ). Thus, unlike most other cases involving overlapping markup, it may well be acceptable if a pair of extract instructions begins in the middle of one element and finishes somewhere outside that element.