XML provides a very general method for expressing data structure and content. To maintain portability, it consists solely of a stream of text that avoids any assumptions about the systems exchanging information. It is so general and so adaptable that the first thing it declares is which alphabet it is using and how that alphabet is encoded.
XML itself can structure data, but it says nothing about the meaning of the data. XML merely provides a notation in which the meaning can be unambiguously expressed . XML cares solely about syntax; the semantics are up to us.
However, XML provides a technique by which protocols about data elements can be shared. An optional piece of information within an XML file is a reference to an external data definition. Industry associations, academic conferences, and less formal groups can shape XML vocabularies around their needs and store the shape in such an external file. Schoolteachers and astronomers and chemical salesmen and librarians are all forming agreements on ways to express the information they care about. Your own application might have special data requirements, but by using the referencing system that XML provides, you will be able to clearly expose this data to man and machine.
An XML document is a tree of elements. A single element is at the root and encapsulates the entire document. Within the element are attributes that describe it, its plain text content, and its child elements. Each of these is optional. It could contain nothing at all, just exist as a named presence. Within each of its child elements are exactly the same data options. And so on to the child elements of the child elements.
The actual look of XML is familiar to anyone who has used its cousin, HTML. Each element of data is blocked out by tags. A start tag <TAG> introduces the element and an end tag </TAG> marks its end. Between them lies the arbitrary content of the element:
<author>Sir Arthur Conan Doyle</author>
Simplicity is its strength. XML by itself can structure data, but it says nothing about the meaning of the data. For example, the XML line
<title>The Hound of the Baskervilles</title>
looks like this line of HTML:
<title>The Hound of the Baskervilles</title>
This is because the two lines are identical. (Thanks to common SGML roots.) However, in HTML the words have a specific meaning: a title element has certain properties: it is unique to a page, it synopsizes the page, it is used by browsers and by search engines, and so on. In XML, a title is only another element. XML by itself knows no more about a title than it does about a tittle.
Users create vocabularies in XML. Vocabularies can be formalized and published both for human eyes and for XML processors, or they can be implicit in the file. The explicit, public vocabularies will be encountered in Chapter 6 on DTD. Here, we assume that by some external form of agreement, the meaning of element tags is understood by the Flash application and by the source of the XML.
Obviously, without an external vocabulary, XML can provide no validation of the content of an element. When a vocabulary is formalized in a DTD, we can use that file to validate the integrity of our XML and its adherence to the published rules of the vocabulary. We should note that even with a formal DTD vocabulary, we cannot check type on element contents. All the character data between the start and end tags is only strings of text to the XML parser. It makes no attempt to interpret the text.
None of these are numbers :
<pagecount>233</pagecount> <copyright>1901</copyright> <isbn>0-7897-2242-9</isbn>
They are simply lines of text. In fact, every XML element can contain only text ”and other elements.
W ELL -F ORMED VS . V ALID XML
XML documents that obey the rules of XML syntax (details in the next chapter) are referred to as well- formed . A well-formed document that references an external DTD (or, coming soon, Schema) and conforms to the data design expressed within it can be called valid.
A parser that reads both the XML file and its DTD (or Schema) and tests the former against the latter is a validating parser. Flash does not have a validating parser. This is not uncommon for a presentation-level system. After all, if the file is not valid by the time it is seen in a Flash-enabled browser, it is too late to fix. Validation is more commonly a step in the creation and maintenance of XML systems than in their display.
Unfortunately, some conveniences ”such as the expansion of symbols and the assignment of default values ”cannot be performed without an external design, so a nonvalidating parser also omits these functions.