Section 27.4. Manipulating Documents with the DOM | Web Design in a Nutshell: A Desktop Quick Reference (In a Nutshell (OReilly))

27.4. Manipulating Documents with the DOM

The majority of your DOM Scripting work will likely center around reading from and writing to the document. But before you can do that, you need to understand how to traverse the node tree.

27.4.1. Finding Your Way Around

The DOM offers many ways to move around and find what you want. As any HTML or XML document is essentially a collection of elements arranged in a particular hierarchy, we traverse the DOM using the elements as markers . In fact, a lot of what we do in the DOM is akin to wayfinding: using elements (especially uniquely id-ed ones) as signposts, which let us know where we are in our documents and help us get deeper and deeper into our documents without losing our way.

Let's take the following snippet of (X)HTML, for example:

 <div >   <h1>This is a heading</h1>   <p>This is a paragraph.</p>   <h2>This is another heading</h2>   <p>This is another paragraph.</p>   <p>Yet another paragraph.</p> </div>

If you wanted to find the H2 in this snippet, you would need to use two of JavaScript's interfaces to the DOM, getElementById( ) and getElementsByTagName( ):

 var the_div = document.getElementById( 'content' ); var h2s     = the_div.getElementsByTagName( 'h2' ) var the_h2  = h2s[0];

In the first line, you use the DOM to find an element on the page with an id equal to content and assign it to the variable the_div.

If you don't already have an element reference to begin with, default to the document object, which refers to the page. getElementById( ) is always used with document.

Once you have your container div assigned to the_div, you can proceed to find the h2 you want, using getElementsByTagName( ) (line 2). That method returns an array of elements (H2s), referred to as a collection. Finally, you know that the desired h2 is the first one in that collection and, as Chapter 26 showed, the first element in an array has an index of 0, therefore, the h2 we want is h2s[0]. There's a shorter way to write all of this, too:

 var the_h2 = document.getElementById(              'content' ).getElementsByTagName( 'h2' )[0];

On a side note, if you ever want a collection of all of the elements in a given document, you can use the universal selector (*) in combination with the getElementsByTagName( ) method:

 var everything = document.getElementsByTagName( '*' );

Although not a horribly efficient means of collecting information from a page, this approach can be useful in certain instances.

The methods getElementById( ) and getElementsByTagName( ) are both quite useful, but sometimes you need to be able to move around without knowing the id, or even the type of the element you are accessing. For this reason, there are numerous properties available to move about the document easily: parentNode, firstChild, lastChild, nextSibling, and previousSibling. Each does exactly what you would expect: allows you to access any element filling the specified role. To access the first paragraph after the h2, for instance, you could write:

 var the_el = document.getElementById(              'content' ).getElementsByTagName( 'h2' )[0].nextSibling;

Another property available to you is childNodes, which is a collection of the element's children (element and text nodes). In certain instances, it may also be useful to test for child nodes before attempting to access them programmatically, using the hasChildNodes( ) method.

If you are moving around in the DOM using parentNode, nextSibling, and the like and want to know more information about the element you are targeting, there are a few properties available to you to provide more information. The first is nodeType, which, as you would expect, returns the type of node you are targeting. There are three commonly used nodeTypes, which are numbered:

1: Element node
2: Attribute node
3: Text node

The Empty Text Node Problem

It is important to keep in mind how much your (X)HTML source affects the DOM. Browsers with a strict interpretation of the DOM, such as the Mozilla family, will include whitespace used to indent your elements as text nodes. This behavior, although correct, can wreak havoc on your scripts if you are looking to use such properties as firstChild, lastChild, nextSibling, and so on. Take the following code snippet for example:

     <ul>       <li>This is a list item</li>       <li>This is another list item</li>     </ul>

In the code, the DOM sees a ul with five children:

A text node with a carriage return and two spaces
A list item (li)
Another text node with a carriage return and two spaces
Another list item
A final carriage return

If you're not paying attention to this and hope to grab the first list item with firstChild, you are likely to be surprised when an element is not returned.

To avoid this problem, you can either eliminate all unnecessary whitespace from your document or you could use a little script on those elements that you want to access the children of, by using this method:

 function stripWS( el ){   for(var i = 0; i < el.childNodes.length; i++){     var node = el.childNodes[i];     if( node.nodeType == 3 &&         !/\S/.test(node.nodeValue) )       node.parentNode.removeChild(node);   } }

A similar function called cleanWhitespace( ) is available as an Element method in the Prototype library (prototype.conio.net), which also includes several other helpful functions and methods.

You can also find the name of the node using the property nodeName. In the case of element and attribute nodes , nodeName is the element name or attribute name, respectively. Even if you are using XHTML, which requires lowercase tag and attribute names, the value returned by nodeName may be in uppercase, so it is considered a best practice to consistently convert the returned value to either upper- or lowercase when using it in a comparison. Using the (X)HTML example from earlier:

 <div >   <h1>This is a heading</h1>   <p>This is a paragraph.</p>   <h2>This is another heading</h2>   <p>This is another paragraph.</p>   <p>Yet another paragraph.</p> </div>

we could trigger an alert( ) using nodeType and nodeName whenever an element node is encountered, letting us know what it is:

 var the_div, children, node, name; the_div  = document.getElementById( 'content' ); if( the_div.hasChildNodes(  ) ){   children = the_div.childNodes;   for( node in children ){     if( children[node].nodeType == 1 ){       name = children[node].nodeName.toLowerCase(  );       alert( 'Found an element named "'+name+'"' );     }   } } else {   return; }

As you can see, there are many ways to "walk" the DOM: the direct route of getElementById( ), the meandering path of getElementsByTagName( ), and the step-by-step method of parent, child, sibling traversal. There are also numerous tools at your disposal to aid in orienting yourself, such as id attributes, but also node types and names.

27.4.2. Reading and Manipulating Document Structure

Once you can find your way around the DOM, it is easy to collect and alter the content of elements on the page. The content you collect can be in the form of other elements, attribute values, and even text content. The primary means of doing this is by using what are sometimes referred to as the "getters and setters" of the DOM: innerHTML, nodeValue, getAttribute( ), and setAttribute( ).

27.4.2.1. innerHTML

When compared to the surgical precision of other DOM methods and properties, innerHTML has all the subtlety of a sledgehammer. Originally part of the Internet Explorer DOM (i.e., not part of the W3C DOM), but now widely supported, this element property can be used to get and set all of the markup and content within the targeted element. The main problem with using innerHTML to get content is that the collected content is treated as though it is a string, so it's pretty much only good for moving large amounts of content from one place to another.

Using the example above, you could collect all of the contents of the content div by writing:

 var contents = document.getElementById( 'content' ).innerHTML;

Similarly, you could replace contents of the div by setting its innerHTML equal to a string of text that includes HTML:

 var contents = 'This is a <em>new</em> sentence.'; document.getElementById( 'content' ).innerHTML = contents;

It is also possible to append content to an element using innerHTML:

 var div = document.getElementById( 'content' ).innerHTML; div.innerHTML += '<p>This is a paragraph added using innerHTML.</p>';

27.4.2.2. nodeValue

Another property you can use to get and set the content of your document is nodeValue. The nodeValue property is just what it sounds like: the value of an attribute or text node. Assuming the following (X)HTML snippet:

 <a  href="http://www.easy-designs.net">Easy Designs</a>

you could use nodeValue to get the value of the text node in the link and assign it to a variable named text:

 var text = document.getElementById( 'easy' ).firstChild.nodeValue;

This property works in the other direction as well:

 document.getElementById(   'easy' ).firstChild.nodeValue = 'Easy Designs, LLC';

In the above example, we set the text of the link equal to Easy Designs, LLC, but we could just as easily have used concatenation to add the , LLC to the text:

 document.getElementById( 'easy' ).firstChild.nodeValue += ', LLC';

27.4.2.3. getAttribute()/setAttribute( )

You can collect the value of an element's attributes using the getAttribute( ) method. Assuming the same (X)HTML as the example above, you could use getAttribute( ) to collect the value of the anchor's href attribute and place it in a variable called href:

 var href = document.getElementById( 'easy' ).getAttribute( 'href' );

The value returned by getAttribute( ) is the nodeValue of the attribute named as the argument.

Similarly, you can add new attribute values or change existing ones using the setAttribute( ) method. If you want to set the href value of a specific page on easy-designs.net, you could do so using setAttribute( ):

 var link = document.getElementById( 'easy' ); link.setAttribute( 'href', 'http://www.easy-designs.net/index.php' );

You could also add a title to the link using setAttribute( ):

 link.setAttribute( 'title', 'The Easy Designs, LLC homepage' );

This brings us to our next topic: creating document structure using the DOM.

HTML Versus XML

There are a few differences between the HTML DOM and the XML DOM that can cause some confusion. Both are equally valid approaches, although the XML DOM is preferred for its forward compatibility.

Using the HTML DOM, you have quick access to element attributes, with each available as a property of that element:

 link.href

To read the same attribute using the XML DOM, you would need to use the getAttribute( ) method:

 link.getAttribute( 'href' );

Not only that, but accessing attributes as properties in the HTML DOM gives you the ability to both get and set the value of the attribute:

     var old_href = link.href; link.href = '/new/file.html';

There are also a few instances when you must use the HTML DOM method for cross-browser compatibility. Internet Explorer, for instance, does not allow read/write access to the class attribute using getAttribute( ) or setAttribute( ). Instead, you must use the className property of an element. Luckily, this property is well supported on other browsers.

When accessing the for attribute (used to associate label elements with form controls), you are in a similar situation. IE does not understand label.getAttribute('for'), but, instead, forces you to use label.htmlFor.

You can use both the HTML DOM and the XML DOM approach in (X)HTML documents. If you plan on serving your XHTML files as XML, however, HTML DOM properties (which also include innerHTML) will not work.

27.4.3. Creating Document Structure

JavaScript has a host of methods available for creating markup on the fly. We've already seen that setAttribute( ) can be used to add new attributes, in addition to modifying existing ones, but by using createElement( ) and createTextNode( ) methods, we can do so much more.

27.4.3.1. createElement( )

As you'd expect, the createElement( ) method (which is used on the document object) creates and returns a new element. To build a div for example, do the following:

 var new_div = document.createElement( 'div' );

This assigns your newly created element to the variable new_div. To actually see the newly created div on the page, we should probably put some content in it.

name and Element Creation in IE

Internet Explorer has a few DOM-related peculiarities, some of which we've already discussed. This one is particularly odd, however. In IE, elements that are generated through the DOM are incapable of being assigned a name attribute in any standard way. The following example should work (it does in all other browsers):

 var input = document.createElement( 'input' ); input.setAttribute( 'name', 'fname' );

You might think that this is another special case like that of class or for, but the following HTML DOM method doesn't work either:

 input.name = 'fname';

This is not normally a problem unless you are working with forms. If you are planning to use a generated form field to collect a response that will be used only by JavaScript, you should be fine as you can address a field by giving it an id. On the other hand, if you are planning on the form being submitted to the server, the value for the generated field will not go with it as it has no name to associate with the value.

The solution? A bastardization of the createElement( ) method that only works in IE:

 var input = document.createElement( '<input name="fname">' );

In a standards-compliant browser, this fails because it tries to generate an element named <input name="fname">, which is not valid. So what do we do? We work around it.

The createNamedElement( ) method, when added to the document object as seen here, will allow you to generate named elements with the DOM that will work everywhere:

 document.createNamedElement = function( type,                                         name ){   var element;   try {     element = document.createElement(       '<' + type + ' name="' + name + '">');   } catch( e ){}   if( !element || !element.name ){     element = document.createElement( type );     element.name = name;   }   return element; }

Here's a quick breakdown of what the script does:

Creates the variable element.
Tries createElement( ) the IE way (using try...catch to trap any errors).
If element or element.name is false when the script reaches the conditional, the script generates the element the correct way.
Returns element.

To use it, simply call document.createNamedElement( ), passing it the element you want to generate and the name you want to give it:

     input = document.createNamedElement( 'input',                                      'fname' );

For more information, view the discussion at easy-reader.net/archives/2005/09/02/death-to-bad-dom-implementations.

27.4.3.2. createTextNode( )

Using the createTextNode( ) method (also on the document object), we can generate a text node to attach to our newly created div. Let's assign the new text node to the variable text:

 var text = document.createTextNode( 'This is a new div' );

We now have two newly created nodes, but they aren't connected. To do that, we need to use the DOM to make the text node a child of the div. We can accomplish this in a number of ways.

27.4.3.3. appendChild( )

The most common means of making one node a child of another is to use the appendChild( ) method.

 new_div.appendChild( text );

appendChild( ) is a method available to any element node, and it takes only a single argument: the node you want to insert. With appendChild( ), you can also skip the intermediate step of assigning the text node to a variable, and directly append the new text node to the div:

 new_div.appendChild( document.createTextNode( 'This is a new div' ) );

Of course, this only puts those two nodes together, so we still need to put our div into the body of the document to have it show up in the browser. Using appendChild( ), we can add the div to the body of the page, but appendChild( ) simply does what it says: appends the argument to the target element. The div would become the last element in the body. What if we wanted our new div to be the first element in the body?

Scalpel Versus Sledgehammer

As we discussed earlier, innerHTML originated from the Internet Explorer DOM, but now enjoys widespread (though occasionally begrudging) acceptance. When compared to the surgical precision of the W3C's node-based content insertion and manipulation methods, innerHTML just feels, well, imprecise.

That said, there are some times where innerHTML can make your life a little easier. Take the insertion of special characters, for instance. Let's say we wanted to insert grammatically correct curly quotes into the content of a paragraph. Using a node-based approach would look like this:

     var p, text;     p = document.createElement( 'p' );     text = 'Here we have some \u201Cquotes.\u201D.';     p.appendChild( document.createTextNode( text ) );

Chances are, you know the HTML entities a lot better than you know the Unicode character codes, so having to look them up each time you want to use a special character is a bit of an annoyance. Using innerHTML, you could simply set the content of the p using a more comfortable syntax:

 p.innerHTML = 'Here we have some &#8220;quotes.&#8221;';

Similarly, adding numerous text nodes and inline elements as content within an element, such as a paragraph, can be an arduous process. Even something as simple as creating a sentence with an emphasis can be annoyingly convoluted:

     p.appendChild( document.createTextNode( 'This content is ' ) );     var em = document.createElement( 'em' );     em.appendChild( document.createTextNode( 'emphasized' ) );     p.appendChild( em );     p.appendChild( document.createTextNode( '.' ) );

The above example uses a lot of shortcuts, but it is still a ton of steps. Using innerHTML would make the process a whole lot easier:

 p.innerHTML += ' This content is <em>emphasized</em>.';

Of course, one benefit of using the DOM method is that the inserted nodes, if assigned to a variable, remain accessible through that variable even after being inserted into the document. So if we used the node-based example above, we could quickly swap out that em element (assigned to the em variable) for another text node:

     em.parentNode.replaceChild(     document.createTextNode( 'not emphasized' ), em );

The choice of approach is ultimately up to you. There certainly are benefits to each though, again, it should be stressed that documents served as XML must use the DOM node method.

27.4.3.4. insertBefore( )

Well, in that case, we can use the insertBefore( ) method.

 var body = document.getElementsByTagName( 'body' )[0]; body.insertBefore( new_div, body.firstChild );

insertBefore( ) takes two arguments: the first is the node you want to insert, and the second is the node you want to insert it in front of. In our example, we are inserting the new div in front of the firstChild of the body element.

27.4.3.5. replaceChild( )

Let's suppose that instead of inserting our new div before the firstChild of the body element, we wanted it to replace the firstChild.

To do so, we could use the replaceChild( ) method:

 body.replaceChild( new_div, body.firstChild );

Like insertBefore( ), replaceChild( ) takes two arguments: The first argument is the node you want inserted in the place of the node that is the second argument.

27.4.3.6. removeChild( )

Because we're on the topic of node manipulation, we should take a look at the removeChild( ) method as well.

Using removeChild( ), we can remove a single node or an entire node tree from the document. This method takes a single argument: the node you want to remove. Suppose we wanted to remove the text node from our div. We could accomplish that easily using removeChild( ):

 div.removeChild( div.firstChild );

We could even use removeChild( ) to delete the entire body from the page, which would not be a good thing, but demonstrates its power:

 body.parentNode.removeChild( body );

27.4.3.7. cloneNode( )

One final method available to you when working with DOM nodes is cloneNode( ). Using this powerful method, you can replicate an individual node (by supplying the method with an argument of false) or the node and all of its descendant nodes (by supplying it with an argument of TRue). Here is an example of cloneNode( ) in use:

 var ul = document.createElement( 'ul' ); var li = document.createElement( 'li' ); li.className = 'check'; for( var i=0; i < 5; i++ ){   var new_li = li.cloneNode( true );   new_li.appendChild( document.createTextNode( 'list item ' + ( i + 1 ) ) );   ul.appendChild( new_li ); }

The benefits may not seem immediately apparent by looking at this example, but there is a major benefit in performance: cloning a node is a much faster process than building a new one from scratch.