MSXML34 | NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)

MSXML3/4

Microsoft released several versions of the MSXML product. The original beta version 1.0 was quickly superseded by version 2.0, which was supplied with the final release of Internet Explorer 5.

Version 3, MSXML3, was first released in March 2000 and became a production release in October 2000. It is included as a standard part of Internet Explorer 6.

The current version is MSXML4. However, MSXML3 has not been superseded, because it is the last version that retains support for Microsoft's obsolete WD-xsl dialect . WD-xsl was first shipped in 1998 before XSLT 1.0 was finalized, and you still occasionally come across stylesheets written in this variant of the language: You can recognize them because they use the namespace URI http://www.w3.org/TR/WD-xsl . (Microsoft still, confusingly, refer to WD-xsl by the name "XSL," which means something quite different in W3C).

You can find download links for both MSXML3 and MSXML4 by going to http://msdn.microsoft.com/xml .

MSXML is not just an XSLT processor, it also includes Microsoft's XML parser and DOM implementation. The main difference between MSXML3 and MSXML4 has nothing to do with the XSLT engine; it is concerned with support for XML Schema, which is outside our scope here.

The objects, methods , properties, and events available with the MSXML3 parser are listed in the Help file that comes with the SDK. I have only included here the parts of the interface that are relevant to XSLT and XPath processing.

Objects

The objects of particular interest to XSLT and XPath processing are listed below:

Object	Description
IXMLDOMDocument	The root of an XML document
IXMLDOMNode	Any node in the DOM
IXMLDOMNodeList	A collection of Node objects
IXMLDOMParseError	Details of the last parse error that occurred
IXMLDOMSelection	A selection of nodes
IXSLProcessor	An execution of an XSLT stylesheet
IXSLTemplate	A compiled XSLT stylesheet in memory

These objects are described in the sections that follow.

IXMLDOMDocument and IXMLDOMDocument2

The IXMLDOMDocument class inherits all the properties and methods of IXMLDOMNode. IXMLDOMDocument2 is a later version of the interface, introducing a few extra properties and methods. This section lists the additional methods and properties of relevance to XSLT and XPath processing, in other words, all the methods and properties that are not also present on IXMLDOMNode, which is described on page 802.

Additional Methods

The methods particularly relevant to XPath and XSLT processing are described in detail below.

The validate() and setProperty() methods actually belong to the IXMLDOMDocument2 interface, which is an extension to IXMLDOMDocument introduced with the MSXML2 product.

Name	Returns	Description
abort	(Nothing)	When a document is being loaded asynchronously, abort() can be called at any time to abandon the process
load	Boolean	Loads document from the specified XML source. The argument is normally a string containing a URL. Clears out any existing content of the Document object, and replaces it with the result of parsing the XML source. Returns True if successful, False otherwise
loadXML	Boolean	Loads the document from a string containing the text of an XML document. Clears out any existing content of the Document object, and replaces it with the result of parsing the XML string. Returns True if successful, False otherwise
save	(Nothing)	Saves the document to a specified destination. The destination is usually a filename, given as a string. The effect is to serialize the Document in XML format as a file. It is also possible to specify various other objects as a destination, for example, it can be another Document object, in which case the document is duplicated
setProperty	(Nothing)	Sets various system properties. The most important properties are: SelectionLanguage. This takes the value ‰ XPath ‰« (the MSXML4 default) or ‰ XSLPattern ‰« (the default for MSXML3). This affects the syntax used in the expression passed to the selectNodes() and selectSingleNode() methods. If you want to use XPath 1.0 syntax you must set this property to ‰ XPath ‰« . The value ‰ XSLPattern ‰« , refers to the old Microsoft-specific WD-xsl dialect SelectionNamespaces. The value of this property should be a space-separated list of namespace declarations, for example xmlns:a='http:// a.com/' xmlns:b='http://b.com/ These define the namespace prefixes that can be used within any expression passed to the selectNodes() and selectSingleNode () methods
validate	(Nothing)	Validates the document, using the current DTD or schema

Additional Properties

Name	Type	Description
async	Boolean	True if the document is to be loaded asynchronously
parseError	IXMLDOMParseError	The last parser error
readyState	Long	Current state of readiness for use. Used when loading asynchronously. The values are Uninitialized (0), Loading (1), Loaded (2), Interactive (3), and Completed (4).
validateOnParse	Boolean	Requests validation of the document against its DTD or schema

IXMLDOMNode

This object represents a node in the document tree. Note that the tree conforms to the DOM model, which is not always the same as the XPath model described in Chapter 2: For example, the way namespaces are modeled is different, and text nodes are not necessarily normalized.

There are subclasses of IXMLDOMNode for all the different kinds of node found in the tree. I have not included descriptions of all these, since they are not directly relevant to XSLT and XPath processing. The only subclass I have included is IXMLDOMDocument, which can be regarded as representing either the whole document or its root node, depending on your point of view.

Methods

The methods available on IXMLDOMNode that are relevant to XSLT and XPath processing are listed below. Most often, these methods will be applied to the root node (the DOM Document object) but they can be applied to any node.

Name	Returns	Description
selectNodes	IXMLDOMNodeList	Executes an XPath expression and returns a list of matching nodes
selectSingleNode	IXMLDOMNode	Executes an XPath expression and returns the first matching node
transformNode	String	Applies a stylesheet to the subtree rooted at this node, returning the result as a string. The argument identifies the XSLT stylesheet. This will usually be a Document , but it may be a Node representing an embedded stylesheet within a Document . The serialized result of the transformation is returned as a string of characters (the <xsl: output> encoding is ignored)
transformNode ToObject	(Nothing)	Applies a stylesheet to the subtree, placing the result into a supplied document or stream. The difference from transformNode() is that the destination of the transformation is supplied as a second argument. This will usually be a Document . It may also be a Stream

Properties

The most useful properties are listed below. Properties whose main purpose is to navigate through the document are not listed here, because navigation can be achieved more easily using XPath expressions.

Name	Type	Description
baseName	String	The local name of the node, excluding any namespace prefix
namespaceURI	String	The namespace URI
nodeName	String	The name of the node, including its namespace prefix if any. Note that unlike the XPath model, unnamed nodes are given conventional names such as "#document", "#text", and "#comment"
nodeTypeStrin	String	Returns the type of node in string form. For example, "element", "attribute", or "comment"
nodeValue	Variant	The value stored in the node. This is not the same as the XPath string-value; for elements, it is always null
prefix	String	The prefix for the namespace applying to the node
text	String	Text contained by this node (like the XPath string-value)
xml	String	XML representation of the node and its descendants

IXMLDOMNodeList

This object represents a list of nodes. For our present purposes, we are interested in this object because it is the result of the selectNodes() method.

An IXMLDOMNodeList is returned as a result of the selectNodes() method: It contains the list of nodes selected by the supplied XPath expression. You can process all the nodes in the list either by using the nextNode() method or by direct indexing using the item property.

Methods

Name	Returns	Description
item	IXMLDOMNode	item(N) gets the node at position N
nextNode	IXMLDOMNode	Gets the next node
reset	(Nothing)	Resets the current position

Properties

Name	Type	Description
length	Long	Identifies the number of nodes in the collection

IXMLDOMParseError

This object is accessible through the parseError property of the IXMLDOMDocument interface.

Properties

Name	Type	Description
errorCode	Long	The error code
filepos	Long	The character position of the error within the XML document
line	Long	The line number of the error
linepos	Long	The character position in the line containing the error
reason	String	Explanation of the error
srcText	String	The XML text in error
url	String	The URL of the offending document

IXMLDOMSelection

This object represents a selection of nodes. It is returned as the result of the selectNodes() method when the target document implements the IXMLDOMDocument2 interface.

It's simplest to think of this object as a stored expression that returns a list of nodes on demand. It's rather like a relational view: You don't need to know whether the results are actually stored, or whether they are obtained as required.

This interface extends the IXMLDOMNodeList interface.

Methods

Name	Returns	Description
clone	IXMLDOMSelection	Produces a copy of this IXMLDOMSelection
getProperty	String	Returns the value of a named property such as SelectionLanguage
item	IXMLDOMNode	item(N) gets the node at position N
matches	IXMLDOMNode	Tests whether the given node is a member of the set of nodes (returns null if no match, otherwise the node from which the selection succeeds)
nextNode	IXMLDOMNode	Gets the next node
reset	(Nothing)	Resets the current position

Properties

Name	Type	Description
expr	String	The XPath expression that determines the nodes selected. This can be changed at any time; doing so implicitly resets the current list of nodes, replacing it with a new list
context	IXMLDOMNode	Establishes the context node for evaluating the expression. Changing the context node implicitly resets the current list of nodes, replacing it with a new list
length	Long	Identifies the number of nodes in the collection

IXSLProcessor

An IXSLProcessor object represents a single execution of a stylesheet to transform a source document.

The object is normally created by calling the createProcessor() method of an IXSLTemplate object.

The transformation is achieved by calling the transform() method.

Methods

Name	Returns	Description
addParameter	(Nothing)	Sets the value of a stylesheet parameter. The first argument is the local name of the parameter, the second is the parameter value, and the third is the namespace URI (usually ""). The value can be a boolean, a number, or a string, or a Node or NodeLi s t
reset	(Nothing)	Resets the state of the processor and aborts the current transform
setStartMode	(Nothing)	Sets the initial mode. There are two arguments, representing the local name and the namespace URI parts of the mode name
transform	Boolean	Starts or resumes the XSLT transformation process

transform() => Boolean

This method applies the stylesheet (from which this XSLProcessor was derived) to the source document identified in the input property. The result of the transformation is accessible through the output property.

If the transformation is completed, the return value is True. If the source document is being loaded asynchronously, it is possible for the transform() method to return False, which means that it needs to wait until more input is available. In this case, it is possible to resume the transformation by calling transform() again later. The current state of the transformation can be determined from the readyState property.

Properties

Name	Type	Description
input	Variant	XML source document to transform. This is normally supplied as a DOM Document, but it may also be a Node . The input can also be supplied as an IStream
output	Variant	Output of the transformation. If you don't supply an output object, the processor will create a String to hold the output, which you can read using this property. If you prefer, you can supply an object such as a DOM Document, a DOM Node, or an IStream to receive the output
ownerTemplate	IXSLTemplate	The XSLTemplate object used to create this processor object
readyState	Long	The current state of the transformation. This will be READYSTATE_COMPLETE (3) when the transformation is finished
startMode	String	Name of the initial mode. See setStartMode() method above
startModeURI	String	Namespace of the initial mode. See setStartMode() method above
stylesheet	IXMLDOMNode	The current stylesheet being used

IXSLTemplate

An IXSLTemplate object represents a compiled stylesheet in memory. If you want to use the same stylesheet more than once, then creating an IXSLTemplate and using it repeatedly is more efficient than using the raw stylesheet repeatedly using transformNode() .

Methods

Name	Returns	Description
createProcessor	IXSLProcessor	Creates an IXSLProcessor object This method should only be called after the stylesheet property has been set to associate the IXSLTemplate object with a stylesheet It creates an IXSLProcessor object, which can then be used to initiate a transformation of a given source document

Name

Returns

Description

createProcessor

IXSLProcessor

Creates an IXSLProcessor object

This method should only be called after the stylesheet property has been set to associate the IXSLTemplate object with a stylesheet

It creates an IXSLProcessor object, which can then be used to initiate a transformation of a given source document

Properties

Name	Type	Description
stylesheet	IXMLDOMNode	Identifies the stylesheet from which this IXSLTemplate is derived

Setting this property causes the specified stylesheet to be compiled; this IXSLTemplate object is the reusable representation of the compiled stylesheet.

The DOM Node representing the stylesheet will normally be a DOM Document object, but it may be an Element representing an embedded stylesheet.

The document identified by the stylesheet property must be a free-threaded document object.

Putting it Together

The example in this section shows one way of controlling a transformation using MSXML from within JavaScript on an HTML page.

Using Client-Side JScript to Transform a Document

This example demonstrates the way that you can load, parse, and transform an XML document using client-side JScript in Internet Explorer 5 or higher. The files are in a folder named msxml_transform.

The example shows an HTML page with two buttons on it. The user can click on either of the buttons to select how the data should be displayed. The effect of clicking either button is to apply the corresponding stylesheet to the source XML document.

XML Source

The XML source file for this example is tables_data.xml. It defines several tables (real tables, the kind you sit at to have your dinner), each looking like this:

  <tables>   <table>   <table-name>Conference</table-name>   <number-of-legs>4</number-of-legs>   <table-top-materialtype="laminate">Ash</table-top-material>   <table-shape>Oblong</table-shape>   <retail-price currency="USD">1485</retail-price>   </table>   ...   </tables>

Stylesheet

There are two stylesheets, tables_list.xsl and tables_catalog.xsl. Since this example is designed to show the JScript used to control the transformation rather than the XSLT transformation code itself, I won't list them here.

HTML page

The page default. htm contains some simple styling information for the HTML page, then the JScript code that loads the XML and XSL documents, checks for errors, and performs the transformation. Notice that the transformFiles function takes the name of a stylesheet as a parameter, which allows you to specify the stylesheet you want to use at runtime:

  <html>   <head>   <style type="text/css">   body {font-family:Tahoma,Verdana,Arial,sans-serif; font-   size: 14px}   .head {font-family:Tahoma,Verdana,Arial,sans-serif;   font-size:18px; font-weight:bold}   </style>   <script language="JScript">   function transformFiles(strStylesheetName) {    // get a reference to the results DIV element    var objResults = document.all['divResults'];    // create two new document instances    var objXML = new ActiveXObject('MSXML2.DOMDocument.3.0');   var objXSL = new ActiveXObject('MSXML2.DOMDocument.3.0');    // set the parser properties    objXML.validateOnParse = true;   objXSL.validateOnParse = true;    // load the XML document and check for     errors    objXML.load('tables_data.xml');   if (objXML.parseError.errorCode != 0) {    // error found so show error message and stop    objResults.innerHTML showError(objXML)   return false;   }    // load the    XSL    stylesheet and check for     errors    objXSL.load(strStylesheetName);   if (objXSL.parseError.errorCode != 0) {    // error found so show error message and stop    objResults.innerHTML - showError(objXSB)   return false;   }    // all must be OK, so perform transformation    strResult = objXML.transformNode(objXSL);    // and display the results in the    DIV    element    objResults.innerHTML = strResult;   return true;   }

Provided that there are no errors, the function performs the transformation using the XML file tables_data.xml and the stylesheet whose name is specified as the strStylesheet Name parameter when the function is called.

The result of the transformation is inserted into the <div> element that has the id attribute value ‰ divResults ‰« . You'll later see where this is defined in the HTML.

If either of the load calls fail, perhaps due to a badly formed document, a function named showError is called. This function takes a reference to the document where the error was found, and returns a string describing the nature of the error. This error message is then displayed on the page instead of the result of the transformation:

  function showError(objDocument)   {    // create the    error    message    var strError = new string;   strError = 'Invalid XML file !<BR />'   + 'File URL: ' + objDocument.parseError.url + '<BR    />'    + 'Line No.: ' + objDocument.parseError.line + '<BR />'   + 'Character: ' + objDocument.parseError.linepos + '<BR />'   + 'File Position: ' + objDocument.parseError.filepos + '<BR />'   + 'Source Text: ' + objDocument.parseError.srcText + '<BR />'   + 'Error Code: ' + objDocument.parseError.errorCode + '<BR />'   + 'Description: ' + objDocument.parseError.reason   return strError;   }   //    --     >    </script>

The remainder of the file is the HTML that creates the visible part of the page. The opening <body> element specifies an onload attribute that causes the transformFiles() function in our script section to run once the page has finished loading:

  ...   </head>   <body onload="transformFiles('tables_list.xsl')">   <p><span class="head">Transforming an XML Document using   the client-side code</span></p>   ...

Because it uses the value ‰ tables_list.xsl ‰« for the parameter to the function, this stylesheet is used for the initial display. This shows the data in tabular form.

The next thing in the page is the code that creates the two HTML <button> elements, marked Catalog and Simple List. The onclick attributes of each one simply execute the transformFiles() function again, each time specifying the appropriate stylesheet name:

  ...   View the tables as a &nbsp;   <button   onclick="transformFiles('tables_catalog.xsl')">Catalog</button>   &nbsp; or as a &nbsp;   <button onclick="transformFiles('tables_list.xsl')">Simple List</button>   <hr />   Finally, at the end of the code, you can see the definition of the <div> element into which the function inserts the results of the transformation;    <!--to insert the results of parsing the object model -->    <div id="divResults"></div>   </body>   </html>

Output

When the page is first displayed, it looks like Figure C-1.

Figure C-1

Click the Catalog button, and you will see an alternative graphical presentation of the same data, achieved by applying the other stylesheet.

Restrictions

Microsoft claims full compliance with XSLT 1.0 and XPath 1.0, although there are one or two gray areas where its interpretation of the specification may cause stylesheets to be less than 100% portable. These include:

Handling of whitespace nodes. The normal way of supplying input to Microsoft's XSLT processor is in the form of a DOM, and the default option in MSXML3 for building a DOM is to remove whitespace text nodes as the text is parsed. The result is that <xsl: preserve-space > in the stylesheet has no effect, because by the time the XSLT processor gets to see the data, there are no whitespace text nodes left to preserve. If you want conformant behavior in this area, set the preserveWhitespace property of the DOMDocument object to True, before loading the document. The same applies to the stylesheet; if you want to use <xsl:text> to control output of whitespace, particularly when generating output in a space-sensitive format such as comma-separated values, then load the stylesheet with preserveWhitespace set to True.
Normalization of text nodes. XSLT and XPath specify that adjacent text nodes in the tree are always merged into a single node. MSXML3 uses a DOM as its internal data structure, and the DOM does not impose the same rule. Although MSXML3 does a good job at creating a correct XPath view of the underlying DOM tree, this is one area where the mapping is incomplete. The two common cases where adjacent text nodes are not merged are firstly, when one of the text nodes represents the contents of a CDATA section in the source XML, and secondly, when one of them represents the expanded text of an entity reference (other than the built-in entity references such as ‰ < ‰« ). This makes it dangerous to use a construct such as <xsl: value-of select="text()"/> because MSXML3 will return only the first of the text nodes, that is, the text up to the start of an entity or CDATA boundary. It's safer to output the value of an element by writing <xsl:value-of select="."/>.
The <xsl:message> instruction has no effect when running a transformation in the browser, unless you specify ‰ terminate= "yes" ‰« .