Text versus Character Data versus Markup | Effective XML: 50 Specific Ways to Improve Your XML

XML documents are composed of text. You'll never find anything in an XML document that is not text. This text is divided into two nonintersecting sets: character data and markup. Markup consists of all the tags, comments, processing instructions, entity references, character references, CDATA section delimiters, XML declarations, text declarations, document type declarations, and white space outside the root element. Everything else is character data. For example, here's the DocBook para element with the markup identified by boldface text and the character data in a plain font.

  <para>  As far as we know, the Fibonacci series was first discovered   by Leonardo of Pisa around 1200 C.E. Leonardo was trying to    answer the question,  <!-- Scritti di Leonardo Piasano. Rome:   Baldassarre, 1857. Volume I, pages 283 - 284.Fibonacci,   Leonardo. --> <quote lang="la"><foreignphrase>  Quot paria   coniculorum in uno anno ex uno pario germinatur?  </foreign   phrase></quote>,  or, in English,  <quote>  How many pairs of   rabbits are born in one year from one pair?  </quote>  To solve   Leonardo  &rsquo;  s problem, first estimate that rabbits have    a one month gestation period, and can first mate at the age    of one month, so that each doe has its first litter at two   months. Then make the simplifying assumption that each litter    consists of exactly one male and one female.  </para>

The markup includes the <para> and </para> tags, the <quote> and </quote> tags, the <foreignphrase> and </foreignphrase> tags, the comment, and the ’ entity reference. Everything else is character data.

Sometimes the "everything else" part is called parsed character data or PCDATA after the PCDATA keyword used in DTDs to declare elements like interfacename .

 <!ELEMENT interfacename (#PCDATA)>

However, that's not perfectly accurate. Generally speaking, the parsed character data is what's left after the parser has replaced entity and character references by the characters they represent. It contains both character data and markup.