Section H. Text nodes | ppk on JavaScript. Modern, Accessible, Unobtrusive JavaScript Explained by Means of Eight Real-World Example Scripts2006

H. Text nodes

In general, text nodes are easy to work with. The W3C DOM defines a few methods for getting and changing texts, but the Core string methods and properties we discussed in 5F are more useful and versatile.

nodeValue

Many element nodes hold a text node:

<p >I am a JavaScript hacker.</p>

Often you want to read or change the text in the text node. You generally do this through the nodeValue of the text node, which is usually the firstChild of an element node:

var x = document.getElementById('test'); alert(x.firstChild.nodeValue); x.firstChild.nodeValue = 'I never hack text nodes.';

You access the correct element node, move to its first child (the text node), and then access its nodeValue. The alert shows the text 'I am a JavaScript hacker.' The third line assigns a new value to the text node, and of course this change is immediately visible in the browser.

The x.firstChild works only if the text node is actually the first child of the element. If that's not the case, accessing the text node is somewhat harder:

<p ><br />I am a JavaScript hacker.</p>

Data Methods

See the W3C DOM compatibility tables at www.quirksmode.org for the W3C DOM data methods. In general, the Core string methods are better suited to working with text nodes.

Now x.firstChild is the <br /> element node, which doesn't have a nodeValue. You have to access the text node as x.lastChild or x.childNodes[1].

Fortunately, this is a rare case; most common text containers like <p>, <li>, or <a> contain exactly one node: a text node.

Empty text nodes

Normal text nodes are easy to work with. Unfortunately, there are also empty text nodes. They are by far the most useless and annoying feature of the W3C DOM, but you'll encounter them in every HTML document you work with.

Consider this HTML snippet:

<body> <h1>Hello world!</h1> <p>I am a JavaScript hacker!</p> </body>

How many child nodes does the <body> have? Two, right? The <h1> and the <p>?

Wrong.

The <body> has five child nodes. Two of them are element nodes, the other three are empty text nodes. There is text between the tags: a hard return between the <body> and the <h1>, between the </h1> and the <p>, and between the </p> and the </body>. Since spaces, hard returns, and tabs are text content, the W3C DOM creates text nodes to hold them.

No Empty Text Nodes in Explorer

Explorer Windows does not support empty text nodes. This is an excellent idea, but unfortunately all other browsers disagree, and thus incompatibilities are born.

Empty Text Nodes are not Empty

Empty text nodes are not really empty; they contain whitespace characters. Nonetheless, they are useless in an HTML document, since HTML interprets a sequence of whitespace characters as either a space or a hard returnwhichever suits the document best.

As far as their practical usefulness goes, these text nodes might as well be empty.

Figure 8.17. DOM tree with empty text nodes.

I purposely omitted empty text nodes from the DOM overview in 8A because they would have made my explanations too complicated and dense. In fact, they make working with the DOM complicated and dense, too.

For instance, take this script:

var x = document.getElementsByTagName('p')[0]; x.parentNode.insertBefore(x,x.previousSibling);

This seems simple, right? Take the paragraph and insert it before its previous sibling: the <h1>. It works fine in Explorer.

Unfortunately, in all the other browsers, the <p>'s previous sibling is not the <h1> but the empty text node between the </h1> and the <p>. The DOM tree changes, but not the way you'd like it to change.

Figure 8.18. The DOM tree has changed, but the change is invisible.

One way to remove these incompatibilities is to turn all empty text nodes off. This could be done by removing all whitespace from your HTML:

<body><h1>Hello world!</h1><p>I am a JavaScript hacker!</p></body>

Now the <body> really has only two child nodes. Nonetheless, working in HTML files without any whitespace becomes annoying in a hurry.

Living with empty text nodes

Therefore, effectively empty text nodes are here to stay, and we have to learn to live with them. Their first victims are the previousSibling and nextSibling properties, which are rather useless since they usually refer to empty text nodes. The same happens to childNodes[]. This nodeList, too, is riddled with empty text nodes and therefore rarely used.

Let's try to get the previousSibling in the last code example to work. We want to insert the <p> before the <h1>. We already saw that this works in Explorer, but not in the other browsers:

var x = document.getElementsByTagName('p')[0]; x.parentNode.insertBefore(x,x.previousSibling);

Similarly, this works in the other browsers but not in Explorer:

var x = document.getElementsByTagName('p')[0]; x.parentNode.insertBefore(x,x.previousSibling.previousSibling);

The following has a better chance to work across the board:

var x = document.getElementsByTagName('p')[0]; var previousElement = x.previousSibling; while (previousElement.nodeType == 3)     previousElement = previousElement.previousSibling x.parentNode.insertBefore(x,x.previousElement);

We take the element's previousSibling. If this turns out to be a text node, we take the previous sibling of the previousSibling, and we continue this check until we encounter a non-text node.

There's a hidden assumption here: the parent node of the <p> (the <body> here, but it might be another node in other situations) never contains real text nodes that actually contain text. If you're absolutely certain that this is the case, the last example will work. But as soon as any Web developer or CMS accidentally adds a real text node as a sibling of the <p>, the script will misfire.

By far the safest solution is not to use previousSibling at all:

var x = document.getElementsByTagName('p')[0]; var y = document.getElementsByTagName('h1')[0]; x.parentNode.insertBefore(x,y);

This makes sure that your script always uses the correct elements, regardless of empty text nodes.

In fact, the best defense against empty text nodes is to not use previousSibling, nextSibling, and childNodes[]. In the eight example scripts, I use nextSibling only twice (and previousSibling and childNodes[] not at all). In both cases I use nextSibling when I add a new element to the document, and it doesn't really matter whether it's inserted before an empty text node or not. In cases where I have to change the order of elements that are already in the document, I always search for all relevant elements through getElementsByTagName().

Constructed DOM Trees

The only situation in which properties like nextSibling are useful is when you construct a document tree, or a portion of one, entirely in JavaScript. Then it contains no empty text nodes, and all properties work as you expect them to work.