Section 21.1. Obtaining XML Documents | JavaScript: The Definitive Guide

21.1. Obtaining XML Documents

Chapter 20 showed how to use the XMLHttpRequest object to obtain an XML document from a web server. When the request is complete, the responseXML property of the XMLHttpRequest object refers to a Document object that is the parsed representation of the XML document. This is not the only way to obtain an XML Document object, however. The subsections that follow show how you can create an empty XML document, load an XML document from a URL without using XMLHttpRequest, parse an XML document from a string, and obtain an XML document from an XML data island.

As with many advanced client-side JavaScript features, the techniques for obtaining XML data are usually browser-specific. The following subsections define utility functions that work in both Internet Explorer (IE) and Firefox.

21.1.1. Creating a New Document

You can create an empty (except for an optional root element) XML Document in Firefox and related browsers with the DOM Level 2 method document.implementation.createDocument( ). You can accomplish a similar thing in IE with the ActiveX object named MSXML2.DOMDocument. Example 21-1 defines an XML.newDocument( ) utility function that hides the differences between these two approaches. An empty XML document isn't useful by itself, but creating one is the first step of the document loading and parsing techniques that are shown in the examples that follow this one.

Example 21-1. Creating an empty XML document

 /**  * Create a new Document object. If no arguments are specified,  * the document will be empty. If a root tag is specified, the document  * will contain that single root tag. If the root tag has a namespace  * prefix, the second argument must specify the URL that identifies the  * namespace.  */ XML.newDocument = function(rootTagName, namespaceURL) {     if (!rootTagName) rootTagName = "";     if (!namespaceURL) namespaceURL = "";     if (document.implementation && document.implementation.createDocument) {         // This is the W3C standard way to do it         return document.implementation.createDocument(namespaceURL,                                                       rootTagName, null);     }     else { // This is the IE way to do it         // Create an empty document as an ActiveX object         // If there is no root element, this is all we have to do         var doc = new ActiveXObject("MSXML2.DOMDocument");         // If there is a root tag, initialize the document         if (rootTagName) {             // Look for a namespace prefix             var prefix = "";             var tagname = rootTagName;             var p = rootTagName.indexOf(':');             if (p != -1) {                 prefix = rootTagName.substring(0, p);                 tagname = rootTagName.substring(p+1);             }             // If we have a namespace, we must have a namespace prefix             // If we don't have a namespace, we discard any prefix             if (namespaceURL) {                 if (!prefix) prefix = "a0"; // What Firefox uses             }             else prefix = "";             // Create the root element (with optional namespace) as a             // string of text             var text = "<" + (prefix?(prefix+":"):"") +  tagname +                 (namespaceURL                  ?(" xmlns:" + prefix + '="' + namespaceURL +'"')                  :"") +                 "/>";             // And parse that text into the empty document             doc.loadXML(text);         }         return doc;     } };

21.1.2. Loading a Document from the Network

Chapter 20 showed how to use the XMLHttpRequest object to dynamically issue HTTP requests for text-based documents. When used with XML documents, the responseXML property refers to the parsed representation as a DOM Document object. XMLHttpRequest is nonstandard but widely available and well understood, and is usually the best technique for loading XML documents.

There is another way, however. An XML Document object created using the techniques shown in Example 21-1 can load and parse an XML document using a less well-known technique. Example 21-2 shows how it is done. Amazingly, the code is the same in both Mozilla-based browsers and in IE.

Example 21-2. Loading an XML document synchronously

 /**  * Synchronously load the XML document at the specified URL and  * return it as a Document object  */ XML.load = function(url) {     // Create a new document with the previously defined function     var xmldoc = XML.newDocument( );     xmldoc.async = false;  // We want to load synchronously     xmldoc.load(url);      // Load and parse     return xmldoc;         // Return the document };

Like XMLHttpRequest, this load( ) method is nonstandard. It differs from XMLHttpRequest in several important ways. First, it works only with XML documents; XMLHttpRequest can be used to download any kind of text document. Second, it is not restricted to the HTTP protocol. In particular, it can be used to read files from the local filesystem, which is helpful during the testing and development phase of a web application. Third, when used with HTTP, it generates only GET requests and cannot be used to POST data to a web server.

Like XMLHttpRequest, the load( ) method can be used asynchronously. In fact, this is the default method of operation unless async property is set to false. Example 21-3 shows an asynchronous version of the XML.load( ) method.

Example 21-3. Loading an XML document asynchronously

 /**  * Asynchronously load and parse an XML document from the specified URL.  * When the document is ready, pass it to the specified callback function.  * This function returns immediately with no return value.  */ XML.loadAsync = function(url, callback) {     var xmldoc = XML.newDocument( );     // If we created the XML document using createDocument, use     // onload to determine when it is loaded     if (document.implementation && document.implementation.createDocument) {         xmldoc.onload = function( ) { callback(xmldoc); };     }     // Otherwise, use onreadystatechange as with XMLHttpRequest     else {         xmldoc.onreadystatechange = function( ) {             if (xmldoc.readyState == 4) callback(xmldoc);         };     }     // Now go start the download and parsing     xmldoc.load(url); };

21.1.3. Parsing XML Text

Sometimes, instead of parsing an XML document loaded from the network, you simply want to parse an XML document from a JavaScript string. In Mozilla-based browsers, a DOMParser object is used; in IE, the loadXML( ) method of the Document object is used. (If you paid attention to the XML.newDocument( ) code in Example 21-1, you've already seen this method used once.)

Example 21-4 shows a cross-platform XML parsing function that works in Mozilla and IE. For platforms other than these two, it attempts to parse the text by loading it with an XMLHttpRequest from a data: URL.

Example 21-4. Parsing an XML document

 /**  * Parse the XML document contained in the string argument and return  * a Document object that represents it.  */ XML.parse = function(text) {     if (typeof DOMParser != "undefined") {         // Mozilla, Firefox, and related browsers         return (new DOMParser( )).parseFromString(text, "application/xml");     }     else if (typeof ActiveXObject != "undefined") {         // Internet Explorer.         var doc = XML.newDocument( );  // Create an empty document         doc.loadXML(text);            // Parse text into it         return doc;                   // Return it     }     else {         // As a last resort, try loading the document from a data: URL         // This is supposed to work in Safari. Thanks to Manos Batsis and         // his Sarissa library (sarissa.sourceforge.net) for this technique.         var url = "data:text/xml;charset=utf-8," + encodeURIComponent(text);         var request = new XMLHttpRequest( );         request.open("GET", url, false);         request.send(null);         return request.responseXML;     } };

21.1.4. XML Documents from Data Islands

Microsoft has extended HTML with an <xml> tag that creates an XML data island within the surrounding "sea" of HTML markup. When IE encounters this <xml> tag, it treats its contents as a separate XML document, which you can retrieve using document.getElementById( ) or other HTML DOM methods. If the <xml> tag has a src attribute, the XML document is loaded from the URL specified by that attribute instead of being parsed from the content of the <xml> tag.

If a web application requires XML data, and the data is known when the application is first loaded, there is an advantage to including that data directly within the HTML page: the data is already available, and the web application does not have to establish another network connection to download the data. XML data islands can be a useful way to accomplish this. It is possible to approximate IE data islands in other browsers using code like that shown in Example 21-5.

Example 21-5. Getting an XML document from a data island

 /**  * Return a Document object that holds the contents of the <xml> tag   * with the specified id. If the <xml> tag has a src attribute, an XML  * document is loaded from that URL and returned instead.  *  * Since data islands are often looked up more than once, this function caches  * the documents it returns.  */ XML.getDataIsland = function(id) {     var doc;     // Check the cache first     doc = XML.getDataIsland.cache[id];     if (doc) return doc;     // Look up the specified element     doc = document.getElementById(id);     // If there is a "src" attribute, fetch the Document from that URL     var url = doc.getAttribute('src');     if (url) {         doc = XML.load(url);     }     // Otherwise, if there was no src attribute, the content of the <xml>     // tag is the document we want to return. In Internet Explorer, doc is     // already the document object we want. In other browsers, doc refers to     // an HTML element, and we've got to copy the content of that element     // into a new document object     else if (!doc.documentElement) {// If this is not already a document...         // First, find the document element within the <xml> tag. This is         // the first child of the <xml> tag that is an element, rather         // than text, comment, or processing instruction         var docelt = doc.firstChild;         while(docelt != null) {             if (docelt.nodeType == 1 /*Node.ELEMENT_NODE*/) break;             docelt = docelt.nextSibling;         }         // Create an empty document         doc = XML.newDocument( );         // If the <xml> node had some content, import it into the new document         if (docelt) doc.appendChild(doc.importNode(docelt, true));     }     // Now cache and return the document     XML.getDataIsland.cache[id] = doc;     return doc; }; XML.getDataIsland.cache = {}; // Initialize the cache

This code does not perfectly simulate XML data islands in non-IE browsers. The HTML standard requires browsers to parse (but ignore) tags such as <xml> that they don't know about. This means that browsers don't discard XML data within an <xml> tag. It also means that any text within the data island is displayed by default. An easy way to prevent this is with the following CSS stylesheet:

 <style type="text/css">xml { display: none; }</style>

Another incompatibility is that non-IE browsers treat the content of XML data islands as HTML rather than XML content. If you use the code in Example 21-5 in Firefox, for example, and then serialize the resulting document (you'll see how to do this later in the chapter), you'll find that the tag names are all converted to uppercase because Firefox thinks they are HTML tags. In some cases, this may be problematic; in many other cases, it is not. Finally, notice that XML namespaces break if the browser treats the XML tags as HTML tags. This means that inline XML data islands are not suitable for things like XSL stylesheets (XSL is covered in more detail later in this chapter) because those stylesheets always use namespaces.

If you want the network benefits of including XML data directly in an HTML page, but don't want the browser incompatibilities that come with using XML data islands and the <xml> tag, consider encoding your XML document text as a JavaScript string and then parsing the document using code like that shown in Example 21-4.