XML documents are composed of text. You'll never find anything in an XML document that is not text. This text is divided into two nonintersecting sets: character data and markup. Markup consists of all the tags, comments, processing instructions, entity references, character references, CDATA section delimiters, XML declarations, text declarations, document type declarations, and white space outside the root element. Everything else is character data. For example, here's the DocBook para element with the markup identified by boldface text and the character data in a plain font.
<para> As far as we know, the Fibonacci series was first discovered by Leonardo of Pisa around 1200 C.E. Leonardo was trying to answer the question, <!-- Scritti di Leonardo Piasano. Rome: Baldassarre, 1857. Volume I, pages 283 - 284.Fibonacci, Leonardo. --> <quote lang="la"><foreignphrase> Quot paria coniculorum in uno anno ex uno pario germinatur? </foreign phrase></quote>, or, in English, <quote> How many pairs of rabbits are born in one year from one pair? </quote> To solve Leonardo ’ s problem, first estimate that rabbits have a one month gestation period, and can first mate at the age of one month, so that each doe has its first litter at two months. Then make the simplifying assumption that each litter consists of exactly one male and one female. </para>
The markup includes the <para> and </para> tags, the <quote> and </quote> tags, the <foreignphrase> and </foreignphrase> tags, the comment, and the ’ entity reference. Everything else is character data.
Sometimes the "everything else" part is called parsed character data or PCDATA after the PCDATA keyword used in DTDs to declare elements like interfacename .
<!ELEMENT interfacename (#PCDATA)>
However, that's not perfectly accurate. Generally speaking, the parsed character data is what's left after the parser has replaced entity and character references by the characters they represent. It contains both character data and markup.