Microsoft released several versions of the MSXML product. The original beta version 1.0 was quickly superseded by version 2.0, which was supplied with the final release of Internet Explorer 5.
Version 3, MSXML3, was first released in March 2000 and became a production release in October 2000. It is included as a standard part of Internet Explorer 6.
The current version is MSXML4. However, MSXML3 has not been superseded, because it is the last version that retains support for Microsoft's obsolete WD-xsl dialect . WD-xsl was first shipped in 1998 before XSLT 1.0 was finalized, and you still occasionally come across stylesheets written in this variant of the language: You can recognize them because they use the namespace URI http://www.w3.org/TR/WD-xsl . (Microsoft still, confusingly, refer to WD-xsl by the name "XSL," which means something quite different in W3C).
You can find download links for both MSXML3 and MSXML4 by going to http://msdn.microsoft.com/xml .
MSXML is not just an XSLT processor, it also includes Microsoft's XML parser and DOM implementation. The main difference between MSXML3 and MSXML4 has nothing to do with the XSLT engine; it is concerned with support for XML Schema, which is outside our scope here.
The objects, methods , properties, and events available with the MSXML3 parser are listed in the Help file that comes with the SDK. I have only included here the parts of the interface that are relevant to XSLT and XPath processing.
The objects of particular interest to XSLT and XPath processing are listed below:
Object | Description |
---|---|
IXMLDOMDocument | The root of an XML document |
IXMLDOMNode | Any node in the DOM |
IXMLDOMNodeList | A collection of Node objects |
IXMLDOMParseError | Details of the last parse error that occurred |
IXMLDOMSelection | A selection of nodes |
IXSLProcessor | An execution of an XSLT stylesheet |
IXSLTemplate | A compiled XSLT stylesheet in memory |
These objects are described in the sections that follow.
The IXMLDOMDocument class inherits all the properties and methods of IXMLDOMNode. IXMLDOMDocument2 is a later version of the interface, introducing a few extra properties and methods. This section lists the additional methods and properties of relevance to XSLT and XPath processing, in other words, all the methods and properties that are not also present on IXMLDOMNode, which is described on page 802.
The methods particularly relevant to XPath and XSLT processing are described in detail below.
The validate() and setProperty() methods actually belong to the IXMLDOMDocument2 interface, which is an extension to IXMLDOMDocument introduced with the MSXML2 product.
Name | Returns | Description |
---|---|---|
abort | (Nothing) | When a document is being loaded asynchronously, abort() can be called at any time to abandon the process |
load | Boolean | Loads document from the specified XML source. The argument is normally a string containing a URL. Clears out any existing content of the Document object, and replaces it with the result of parsing the XML source. Returns True if successful, False otherwise |
loadXML | Boolean | Loads the document from a string containing the text of an XML document. Clears out any existing content of the Document object, and replaces it with the result of parsing the XML string. Returns True if successful, False otherwise |
save | (Nothing) | Saves the document to a specified destination. The destination is usually a filename, given as a string. The effect is to serialize the Document in XML format as a file. It is also possible to specify various other objects as a destination, for example, it can be another Document object, in which case the document is duplicated |
setProperty | (Nothing) | Sets various system properties. The most important properties are: SelectionLanguage. This takes the value ‰ XPath ‰« (the MSXML4 default) or ‰ XSLPattern ‰« (the default for MSXML3). This affects the syntax used in the expression passed to the selectNodes() and selectSingleNode() methods. If you want to use XPath 1.0 syntax you must set this property to ‰ XPath ‰« . The value ‰ XSLPattern ‰« , refers to the old Microsoft-specific WD-xsl dialect SelectionNamespaces. The value of this property should be a space-separated list of namespace declarations, for example xmlns:a='http:// a.com/' xmlns:b='http://b.com/ These define the namespace prefixes that can be used within any expression passed to the selectNodes() and selectSingleNode () methods |
validate | (Nothing) | Validates the document, using the current DTD or schema |
Name | Type | Description |
---|---|---|
async | Boolean | True if the document is to be loaded asynchronously |
parseError | IXMLDOMParseError | The last parser error |
readyState | Long | Current state of readiness for use. Used when loading asynchronously. The values are Uninitialized (0), Loading (1), Loaded (2), Interactive (3), and Completed (4). |
validateOnParse | Boolean | Requests validation of the document against its DTD or schema |
This object represents a node in the document tree. Note that the tree conforms to the DOM model, which is not always the same as the XPath model described in Chapter 2: For example, the way namespaces are modeled is different, and text nodes are not necessarily normalized.
There are subclasses of IXMLDOMNode for all the different kinds of node found in the tree. I have not included descriptions of all these, since they are not directly relevant to XSLT and XPath processing. The only subclass I have included is IXMLDOMDocument, which can be regarded as representing either the whole document or its root node, depending on your point of view.
The methods available on IXMLDOMNode that are relevant to XSLT and XPath processing are listed below. Most often, these methods will be applied to the root node (the DOM Document object) but they can be applied to any node.
Name | Returns | Description |
---|---|---|
selectNodes | IXMLDOMNodeList | Executes an XPath expression and returns a list of matching nodes |
selectSingleNode | IXMLDOMNode | Executes an XPath expression and returns the first matching node |
transformNode | String | Applies a stylesheet to the subtree rooted at this node, returning the result as a string. The argument identifies the XSLT stylesheet. This will usually be a Document , but it may be a Node representing an embedded stylesheet within a Document . The serialized result of the transformation is returned as a string of characters (the <xsl: output> encoding is ignored) |
transformNode ToObject | (Nothing) | Applies a stylesheet to the subtree, placing the result into a supplied document or stream. The difference from transformNode() is that the destination of the transformation is supplied as a second argument. This will usually be a Document . It may also be a Stream |
The most useful properties are listed below. Properties whose main purpose is to navigate through the document are not listed here, because navigation can be achieved more easily using XPath expressions.
Name | Type | Description |
---|---|---|
baseName | String | The local name of the node, excluding any namespace prefix |
namespaceURI | String | The namespace URI |
nodeName | String | The name of the node, including its namespace prefix if any. Note that unlike the XPath model, unnamed nodes are given conventional names such as "#document", "#text", and "#comment" |
nodeTypeStrin | String | Returns the type of node in string form. For example, "element", "attribute", or "comment" |
nodeValue | Variant | The value stored in the node. This is not the same as the XPath string-value; for elements, it is always null |
prefix | String | The prefix for the namespace applying to the node |
text | String | Text contained by this node (like the XPath string-value) |
xml | String | XML representation of the node and its descendants |
This object represents a list of nodes. For our present purposes, we are interested in this object because it is the result of the selectNodes() method.
An IXMLDOMNodeList is returned as a result of the selectNodes() method: It contains the list of nodes selected by the supplied XPath expression. You can process all the nodes in the list either by using the nextNode() method or by direct indexing using the item property.
Name | Returns | Description |
---|---|---|
item | IXMLDOMNode | item(N) gets the node at position N |
nextNode | IXMLDOMNode | Gets the next node |
reset | (Nothing) | Resets the current position |
Name | Type | Description |
---|---|---|
length | Long | Identifies the number of nodes in the collection |
This object is accessible through the parseError property of the IXMLDOMDocument interface.
Name | Type | Description |
---|---|---|
errorCode | Long | The error code |
filepos | Long | The character position of the error within the XML document |
line | Long | The line number of the error |
linepos | Long | The character position in the line containing the error |
reason | String | Explanation of the error |
srcText | String | The XML text in error |
url | String | The URL of the offending document |
This object represents a selection of nodes. It is returned as the result of the selectNodes() method when the target document implements the IXMLDOMDocument2 interface.
It's simplest to think of this object as a stored expression that returns a list of nodes on demand. It's rather like a relational view: You don't need to know whether the results are actually stored, or whether they are obtained as required.
This interface extends the IXMLDOMNodeList interface.
Name | Returns | Description |
---|---|---|
clone | IXMLDOMSelection | Produces a copy of this IXMLDOMSelection |
getProperty | String | Returns the value of a named property such as SelectionLanguage |
item | IXMLDOMNode | item(N) gets the node at position N |
matches | IXMLDOMNode | Tests whether the given node is a member of the set of nodes (returns null if no match, otherwise the node from which the selection succeeds) |
nextNode | IXMLDOMNode | Gets the next node |
reset | (Nothing) | Resets the current position |
Name | Type | Description |
---|---|---|
expr | String | The XPath expression that determines the nodes selected. This can be changed at any time; doing so implicitly resets the current list of nodes, replacing it with a new list |
context | IXMLDOMNode | Establishes the context node for evaluating the expression. Changing the context node implicitly resets the current list of nodes, replacing it with a new list |
length | Long | Identifies the number of nodes in the collection |
An IXSLProcessor object represents a single execution of a stylesheet to transform a source document.
The object is normally created by calling the createProcessor() method of an IXSLTemplate object.
The transformation is achieved by calling the transform() method.
Name | Returns | Description |
---|---|---|
addParameter | (Nothing) | Sets the value of a stylesheet parameter. The first argument is the local name of the parameter, the second is the parameter value, and the third is the namespace URI (usually ""). The value can be a boolean, a number, or a string, or a Node or NodeLi s t |
reset | (Nothing) | Resets the state of the processor and aborts the current transform |
setStartMode | (Nothing) | Sets the initial mode. There are two arguments, representing the local name and the namespace URI parts of the mode name |
transform | Boolean | Starts or resumes the XSLT transformation process |
transform() => Boolean
This method applies the stylesheet (from which this XSLProcessor was derived) to the source document identified in the input property. The result of the transformation is accessible through the output property.
If the transformation is completed, the return value is True. If the source document is being loaded asynchronously, it is possible for the transform() method to return False, which means that it needs to wait until more input is available. In this case, it is possible to resume the transformation by calling transform() again later. The current state of the transformation can be determined from the readyState property.
Name | Type | Description |
---|---|---|
input | Variant | XML source document to transform. This is normally supplied as a DOM Document, but it may also be a Node . The input can also be supplied as an IStream |
output | Variant | Output of the transformation. If you don't supply an output object, the processor will create a String to hold the output, which you can read using this property. If you prefer, you can supply an object such as a DOM Document, a DOM Node, or an IStream to receive the output |
ownerTemplate | IXSLTemplate | The XSLTemplate object used to create this processor object |
readyState | Long | The current state of the transformation. This will be READYSTATE_COMPLETE (3) when the transformation is finished |
startMode | String | Name of the initial mode. See setStartMode() method above |
startModeURI | String | Namespace of the initial mode. See setStartMode() method above |
stylesheet | IXMLDOMNode | The current stylesheet being used |
An IXSLTemplate object represents a compiled stylesheet in memory. If you want to use the same stylesheet more than once, then creating an IXSLTemplate and using it repeatedly is more efficient than using the raw stylesheet repeatedly using transformNode() .
Name | Returns | Description |
---|---|---|
createProcessor | IXSLProcessor | Creates an IXSLProcessor object This method should only be called after the stylesheet property has been set to associate the IXSLTemplate object with a stylesheet It creates an IXSLProcessor object, which can then be used to initiate a transformation of a given source document |
Name | Type | Description |
---|---|---|
stylesheet | IXMLDOMNode | Identifies the stylesheet from which this IXSLTemplate is derived |
Setting this property causes the specified stylesheet to be compiled; this IXSLTemplate object is the reusable representation of the compiled stylesheet.
The DOM Node representing the stylesheet will normally be a DOM Document object, but it may be an Element representing an embedded stylesheet.
The document identified by the stylesheet property must be a free-threaded document object.
The example in this section shows one way of controlling a transformation using MSXML from within JavaScript on an HTML page.
This example demonstrates the way that you can load, parse, and transform an XML document using client-side JScript in Internet Explorer 5 or higher. The files are in a folder named msxml_transform.
The example shows an HTML page with two buttons on it. The user can click on either of the buttons to select how the data should be displayed. The effect of clicking either button is to apply the corresponding stylesheet to the source XML document.
XML Source
The XML source file for this example is tables_data.xml. It defines several tables (real tables, the kind you sit at to have your dinner), each looking like this:
<tables> <table> <table-name>Conference</table-name> <number-of-legs>4</number-of-legs> <table-top-materialtype="laminate">Ash</table-top-material> <table-shape>Oblong</table-shape> <retail-price currency="USD">1485</retail-price> </table> ... </tables>
Stylesheet
There are two stylesheets, tables_list.xsl and tables_catalog.xsl. Since this example is designed to show the JScript used to control the transformation rather than the XSLT transformation code itself, I won't list them here.
HTML page
The page default. htm contains some simple styling information for the HTML page, then the JScript code that loads the XML and XSL documents, checks for errors, and performs the transformation. Notice that the transformFiles function takes the name of a stylesheet as a parameter, which allows you to specify the stylesheet you want to use at runtime:
<html> <head> <style type="text/css"> body {font-family:Tahoma,Verdana,Arial,sans-serif; font- size: 14px} .head {font-family:Tahoma,Verdana,Arial,sans-serif; font-size:18px; font-weight:bold} </style> <script language="JScript"> function transformFiles(strStylesheetName) { // get a reference to the results DIV element var objResults = document.all['divResults']; // create two new document instances var objXML = new ActiveXObject('MSXML2.DOMDocument.3.0'); var objXSL = new ActiveXObject('MSXML2.DOMDocument.3.0'); // set the parser properties objXML.validateOnParse = true; objXSL.validateOnParse = true; // load the XML document and check for errors objXML.load('tables_data.xml'); if (objXML.parseError.errorCode != 0) { // error found so show error message and stop objResults.innerHTML showError(objXML) return false; } // load the XSL stylesheet and check for errors objXSL.load(strStylesheetName); if (objXSL.parseError.errorCode != 0) { // error found so show error message and stop objResults.innerHTML - showError(objXSB) return false; } // all must be OK, so perform transformation strResult = objXML.transformNode(objXSL); // and display the results in the DIV element objResults.innerHTML = strResult; return true; }
Provided that there are no errors, the function performs the transformation using the XML file tables_data.xml and the stylesheet whose name is specified as the strStylesheet Name parameter when the function is called.
The result of the transformation is inserted into the <div> element that has the id attribute value ‰ divResults ‰« . You'll later see where this is defined in the HTML.
If either of the load calls fail, perhaps due to a badly formed document, a function named showError is called. This function takes a reference to the document where the error was found, and returns a string describing the nature of the error. This error message is then displayed on the page instead of the result of the transformation:
function showError(objDocument) { // create the error message var strError = new string; strError = 'Invalid XML file !<BR />' + 'File URL: ' + objDocument.parseError.url + '<BR />' + 'Line No.: ' + objDocument.parseError.line + '<BR />' + 'Character: ' + objDocument.parseError.linepos + '<BR />' + 'File Position: ' + objDocument.parseError.filepos + '<BR />' + 'Source Text: ' + objDocument.parseError.srcText + '<BR />' + 'Error Code: ' + objDocument.parseError.errorCode + '<BR />' + 'Description: ' + objDocument.parseError.reason return strError; } // -- > </script>
The remainder of the file is the HTML that creates the visible part of the page. The opening <body> element specifies an onload attribute that causes the transformFiles() function in our script section to run once the page has finished loading:
... </head> <body onload="transformFiles('tables_list.xsl')"> <p><span class="head">Transforming an XML Document using the client-side code</span></p> ...
Because it uses the value ‰ tables_list.xsl ‰« for the parameter to the function, this stylesheet is used for the initial display. This shows the data in tabular form.
The next thing in the page is the code that creates the two HTML <button> elements, marked Catalog and Simple List. The onclick attributes of each one simply execute the transformFiles() function again, each time specifying the appropriate stylesheet name:
... View the tables as a <button onclick="transformFiles('tables_catalog.xsl')">Catalog</button> or as a <button onclick="transformFiles('tables_list.xsl')">Simple List</button> <hr /> Finally, at the end of the code, you can see the definition of the <div> element into which the function inserts the results of the transformation; <!--to insert the results of parsing the object model --> <div id="divResults"></div> </body> </html>
Output
When the page is first displayed, it looks like Figure C-1.
Click the Catalog button, and you will see an alternative graphical presentation of the same data, achieved by applying the other stylesheet.
Microsoft claims full compliance with XSLT 1.0 and XPath 1.0, although there are one or two gray areas where its interpretation of the specification may cause stylesheets to be less than 100% portable. These include:
Handling of whitespace nodes. The normal way of supplying input to Microsoft's XSLT processor is in the form of a DOM, and the default option in MSXML3 for building a DOM is to remove whitespace text nodes as the text is parsed. The result is that <xsl: preserve-space > in the stylesheet has no effect, because by the time the XSLT processor gets to see the data, there are no whitespace text nodes left to preserve. If you want conformant behavior in this area, set the preserveWhitespace property of the DOMDocument object to True, before loading the document. The same applies to the stylesheet; if you want to use <xsl:text> to control output of whitespace, particularly when generating output in a space-sensitive format such as comma-separated values, then load the stylesheet with preserveWhitespace set to True.
Normalization of text nodes. XSLT and XPath specify that adjacent text nodes in the tree are always merged into a single node. MSXML3 uses a DOM as its internal data structure, and the DOM does not impose the same rule. Although MSXML3 does a good job at creating a correct XPath view of the underlying DOM tree, this is one area where the mapping is incomplete. The two common cases where adjacent text nodes are not merged are firstly, when one of the text nodes represents the contents of a CDATA section in the source XML, and secondly, when one of them represents the expanded text of an entity reference (other than the built-in entity references such as ‰ < ‰« ). This makes it dangerous to use a construct such as <xsl: value-of select="text()"/> because MSXML3 will return only the first of the text nodes, that is, the text up to the start of an entity or CDATA boundary. It's safer to output the value of an element by writing <xsl:value-of select="."/>.
The <xsl:message> instruction has no effect when running a transformation in the browser, unless you specify ‰ terminate= "yes" ‰« .