Hack 95 Create Well-Formed XML with JavaScript

   

figs/expert.gif figs/hack95.gif

Use Javascript to ensure that you write correct, well-formed XML in web pages.

Sometimes you need to create some XML from within a browser. It is easy to write bad XML without realizing it. Writing correct XML with all its bells and whistles is not easy, but in this type of scenario you usually only need to write basic XML.

There is a kind of hierarchy of XML:

  1. Basic: Elements only; no attributes, entities, character references, escaped characters, or encoding issues

  2. Plain: Basic plus attributes

  3. Plain/escaped: Plain with special XML characters escaped

  4. Plain/advanced: Plain/escaped with CDATA sections and processing instructions

The list continues with increasing levels of sophistication (and difficulty).

This hack covers the basic and plain styles (with some enhancements), and you can adapt the techniques to move several more steps up the ladder if you like.

The main issues with writing basic XML is to get the elements closed properly and keep the code simple. Here is how.

7.6.1 The Element Function

Here is a Javascript function for writing elements:

// Bare bones XML writer - no attributes function element(name,content){     var xml     if (!content){         xml='<' + name + '/>'     }     else {         xml='<'+ name + '>' + content + '</' + name + '>'     }     return xml }

This basic hack even writes the empty-element form when there is no element content. What is especially nice about this hack is that you can use it recursively, like this:

var xml = element('p', 'This is ' +      element('strong','Bold Text') + 'inline')

Both inner and outer elements are guaranteed to be closed properly. You can display the result for testing like this:

alert(xml)

You can build up your entire XML document by combining bits like these, and all the elements will be properly nested and closed.

The element() function does not do any pretty-printing, because it has no way to know where line breaks should go. If that is important to you, just create a variant function:

function elementNL(name, content) {     return element(name,content) + '\n' }

More sophisticated variations are possible but rarely needed.

7.6.2 Adding Attributes

At the next level up, the most pressing problems are to format the attribute string properly, to escape single and double quotes embedded in the attribute values, and to do the least amount of quote escaping so that the result will be as readable as possible.

We modify the element() function to optionally accept an associative array containing the attribute names and values. In other languages, an associative array may be called a dictionary or a hash.

// XML writer with attributes and smart attribute quote escaping  function element(name,content,attributes){     var att_str = ''     if (attributes) { // tests false if this arg is missing!         att_str = formatAttributes(attributes)     }     var xml     if (!content){         xml='<' + name + att_str + '/>'     }     else {         xml='<' + name + att_str + '>' + content + '</'+name+'>'     }     return xml }

The function formatAtributes() handles formatting and escaping the attributes.

To fix up the quotes, we use the following algorithm if there are embedded quotes (single or double):

  1. Whichever type of quote occurs first in the string, use the other kind to enclose the attribute value.

  2. Only escape occurrences of the kind of quote used to enclose the attribute value. We don't need to escape the other kind.

Here is the code:

var APOS = "'"; QUOTE = '"' var ESCAPED_QUOTE = {  } ESCAPED_QUOTE[QUOTE] = '&quot;' ESCAPED_QUOTE[APOS] = '&apos;'     /*    Format a dictionary of attributes into a string suitable    for inserting into the start tag of an element.  Be smart    about escaping embedded quotes in the attribute values. */ function formatAttributes(attributes) {     var att_value     var apos_pos, quot_pos     var use_quote, escape, quote_to_escape     var att_str     var re     var result = ''         for (var att in attributes) {         att_value = attributes[att]                  // Find first quote marks if any         apos_pos = att_value.indexOf(APOS)         quot_pos = att_value.indexOf(QUOTE)                 // Determine which quote type to use around          // the attribute value         if (apos_pos =  = -1 && quot_pos =  = -1) {             att_str = ' ' + att + "='" + att_value +  "'"             result += att_str             continue         }                  // Prefer the single quote unless forced to use double         if (quot_pos != -1 && quot_pos < apos_pos) {             use_quote = APOS         }         else {             use_quote = QUOTE         }             // Figure out which kind of quote to escape         // Use nice dictionary instead of yucky if-else nests         escape = ESCAPED_QUOTE[use_quote]                  // Escape only the right kind of quote         re = new RegExp(use_quote,'g')         att_str = ' ' + att + '=' + use_quote +              att_value.replace(re, escape) + use_quote         result += att_str     }     return result }

Here is code to test everything we've seen so far:

function test() {        var atts = {att1:"a1",          att2:"This is in \"double quotes\" and this is " +          "in 'single quotes'",         att3:"This is in 'single quotes' and this is in " +          "\"double quotes\""}          // Basic XML example     alert(element('elem','This is a test'))         // Nested elements     var xml = element('p', 'This is ' +      element('strong','Bold Text') + 'inline')     alert(xml)         // Attributes with all kinds of embedded quotes     alert(element('elem','This is a test', atts))         // Empty element version     alert(element('elem','', atts))     }

Open the file jswriter.html (Example 7-18) in a browser that supports Java-Script (the script is also stored in jswriter.js so you can easily include it in any HTML or XHTML document).

Example 7-18. jswriter.html
<html xmlns="http://www.w3.org/1999/xhtml"> <head><Title>Testing the Well-formed XML Hack</head> <script type='text/javascript'> // XML writer with attributes and smart attribute quote escaping  function element(name,content,attributes){     var att_str = ''     if (attributes) { // tests false if this arg is missing!         att_str = formatAttributes(attributes)     }     var xml     if (!content){         xml='<' + name + att_str + '/>'     }     else {         xml='<' + name + att_str + '>' + content + '</'+name+'>'     }     return xml } var APOS = "'"; QUOTE = '"' var ESCAPED_QUOTE = {  } ESCAPED_QUOTE[QUOTE] = '&quot;' ESCAPED_QUOTE[APOS] = '&apos;'     /*    Format a dictionary of attributes into a string suitable    for inserting into the start tag of an element.  Be smart    about escaping embedded quotes in the attribute values. */ function formatAttributes(attributes) {     var att_value     var apos_pos, quot_pos     var use_quote, escape, quote_to_escape     var att_str     var re     var result = ''         for (var att in attributes) {         att_value = attributes[att]                  // Find first quote marks if any         apos_pos = att_value.indexOf(APOS)         quot_pos = att_value.indexOf(QUOTE)                 // Determine which quote type to use around          // the attribute value         if (apos_pos =  = -1 && quot_pos =  = -1) {             att_str = ' ' + att + "='" + att_value +  "'"             result += att_str             continue         }                  // Prefer the single quote unless forced to use double         if (quot_pos != -1 && quot_pos < apos_pos) {             use_quote = APOS         }         else {             use_quote = QUOTE         }             // Figure out which kind of quote to escape         // Use nice dictionary instead of yucky if-else nests         escape = ESCAPED_QUOTE[use_quote]                  // Escape only the right kind of quote         re = new RegExp(use_quote,'g')         att_str = ' ' + att + '=' + use_quote +              att_value.replace(re, escape) + use_quote         result += att_str     }     return result } function test() {        var atts = {att1:"a1",          att2:"This is in \"double quotes\" and this is " +          "in 'single quotes'",         att3:"This is in 'single quotes' and this is in " +          "\"double quotes\""}          // Basic XML example     alert(element('elem','This is a test'))         // Nested elements     var xml = element('p', 'This is ' +      element('strong','Bold Text') + 'inline')     alert(xml)         // Attributes with all kinds of embedded quotes     alert(element('elem','This is a test', atts))         // Empty element version     alert(element('elem','', atts))     }    </script> </head>     <body onload='test()'> </body> </html>

When the page loads, you will see the following in four successive alert boxes, as shown in Figure 7-1. The lines have been wrapped for readability.


First alert:

<elem>This is a test</elem>


Second alert:

<p>This is <strong>Bold Text</strong>inline</p>


Third alert:

<elem att1='a1'

att2='This is in "double quotes" and this is

in &apos;single quotes&apos;'

att3="This is in 'single quotes' and this is in

&quot;double quotes&quot;">This is a test</elem>


Fourth alert:

<elem att1='a1'

att2='This is in "double quotes" and this is in

&apos;single quotes&apos;'

att3="This is in 'single quotes' and this is in

&quot;double quotes&quot;"/>

Figure 7-1. jswriter.html in Firefox
figs/xmlh_0701.gif


7.6.3 Extending the Hack

You may want to escape the other special XML characters. You can do this by adding calls such as:

content = content.replace(/</g, '&lt;')

Take care not to replace the quotes in attribute values, since formatAttributes() handles this so nicely. Because the parameters to elements() and formatAttributes() are strings, they are easy to manipulate as you like.

7.6.4 Creating Large Chunks of XML

If you create long strings of XML, say with more than a few hundred string fragments, you may find the performance to be slow. That's normal, and happens because JavaScript, like most other languages, has to allocate memory for each new string every time you concatenate more fragments.

The standard way around this is to accumulate the fragments in a list, then join the list back to a string at the end. This process is generally very fast, even for very large results.

Here is how you can do it:

var results = [  ] results.push(element("p","This is some content")) results.push(element('p', 'This is ' +      element('strong','Bold Text') + 'inline')) // ... Append more bits     var end_result = results.join(' ')

7.6.5 See Also

  • JavaScript: The Definitive Guide, by David Flanagan (O'Reilly)

Tom Passin



XML Hacks
XML Hacks: 100 Industrial-Strength Tips and Tools
ISBN: 0596007116
EAN: 2147483647
Year: 2006
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net