Children versus Child Elements versus Content | Effective XML: 50 Specific Ways to Improve Your XML

An element's content is everything between the element's start-tag and its end-tag. For example, consider this DocBook para element.

 <para>   As far as we know, the Fibonacci series was first discovered    by Leonardo of Pisa around 1200 C.E. Leonardo was trying to    answer the question, <!-- Scritti di Leonardo Piasano. Rome:   Baldassarre, 1857. Volume I, pages 283 - 284.Fibonacci,    Leonardo. --> <quote lang="la"><foreignphrase>Quot paria   coniculorum in uno anno ex uno pario germinatur?</foreign    phrase></quote>, or, in English, <quote>How many pairs of   rabbits are born in one year from one pair?</quote> To solve    Leonardo&rsquo;s problem, first estimate that rabbits have   a one month gestation period, and can first mate at the age    of one month, so that each doe has its first litter at two    months. Then make the simplifying assumption that each litter    consists of exactly one male and one female. </para>

The content of this para element contains some text, including white space, a comment, some more text, a quote child element, some more plain text, another quote child element, some more plain text, the ’ entity reference, and finally some more text. All of that together, including all the content of child elements such as quote , is the para element's content.

The para element has two child elements, both named quote . However, these are not the only children of the element. This element also contains a comment, lots of character data, and an entity reference. These are considered to be children of the para element as well, although different APIs and systems vary in exactly how they represent these and how many text children there are. At one extreme, each separate character can be a separate child. At the other extreme, each text node contains the maximum contiguous run of text after all entity references are resolved so the para element has exactly four text node children.

On the flip side, the foreignphrase element and other content inside the quote elements are not children of the para element, although they are descendants of it.

The common reason for confusing children with child elements is forgetting about the very real possibility of mixed content. However, even when a document has a more record-like structure, the difference between children and child elements can be important. For example, consider the following presentation element.

 <presentation>   <title>DOM</title>   <date>Thursday, November 21, 2002</date>   <host>Software Development 2002 East</host>   <copyright>2000-2002 Elliotte Rusty Harold</copyright>   <last_modified>November 26, 2002</last_modified>   <author_name>Elliotte Rusty Harold</author_name>   <author_url>http://www.elharo.com/</author_url>   <author_email>elharo@metalab.unc.edu</author_email>   <abstract>Elliotte Rusty Harold's DOM tutorial</abstract> </presentation>

It may look like this element has only child elements. However, if you're counting child nodes you have to count the white space too. There are at least ten text node children containing only white space. Furthermore, what about the title , date , host , and similar elements? Each of them has a child node containing character data but no child elements. Bottom line: Elements are not the only kind of children.