Quotes aren't required for attributes; for example, .
Tags (elements) can't overlap; for example, Galt . |
Tags can overlap; for example, Galt . |
An empty element must be specifically denoted; for example,
. There can be only one one root element in an XML document.
|
A single tag is considered an empty element; for example,
.
|
An XML document has three basic parts:
XML Prolog
The prolog of an XML document is optional and comprises three parts:
- XML declaration
- Comments
- Document type declaration (DTD)
XML Declaration
The XML declaration is used to specify global information about the current XML document. Although not strictly required, it's considered good form to include an XML declaration as the first line in an XML document. An XML declaration specifies that the document is an XML document much the same way that the
tag specifies that a document is an HTML document. The following is an example of an XML declaration:
As you can see in the preceding example, the declaration specifies additional information about the XML document. The first attribute, version , specifies the version of the XML document. The most current version is 1.0.
The second attribute used in the XML declaration, encoding , specifies the type of character encoding used in the document. As mentioned earlier, XML is fully Unicode compliant and the UTF-8 designation means that the document is Unicode.
The final attribute, standalone , is used to indicate whether the document is complete or references an external document such as a DTD or style sheet.
Comments
After the XML declaration has been added to the prolog, you can add comments and processing instructions to the XML document, if necessary.
Comments enable you to add descriptive information to the XML that can help you and other developers understand and document the file. XML comments use the same format as HTML comments, as shown here:
Processing instructions enable you to supply special instructions to the XML parser. The most common usage of processing instructions is to link your XML file to a style sheet so that it can control the presentation of the XML. Extensible Stylesheet Language (XSL) is covered later in this chapter in the section titled "XSL."
Processing instructions use the form
target instruction
?>
where target is the object that the instructions are directed toward (normally the XML parser) and instruction is the instruction (or instructions) to send. The following code snippet illustrates the processing instruction required to link a style sheet to your XML file:
Document Type Declaration
A document type declaration (DTD) is used to define rules that the structure of the XML document must abide by. For example, a DTD can be used to define whether a particular element is required, which child elements go inside which parent elements, and the type of data for an element.
There are two types of DTD: internal and external. An internal DTD is actually included in the XML file itself, whereas an external DTD is a separate file, usually named with the .DTD extension.
The following code snippet illustrates how you can include a reference to an external DTD in the prolog of your XML document:
Although a DTD isn't required, it's normally a good idea to use one so that the rules your XML document must follow are published and can be shared with other developers who need to interface with your XML data. It can be especially helpful to use a DTD when exchanging information with another organization so that they can test the validity of their XML against the DTD. Validity is covered later in this chapter in the section titled "Validity."
XML Content
The body of an XML document is where the actual data is stored, and you'll find that it looks much like an HTML document. The body of an XML document is composed of elements and attributes.
An element, also known as a tag, describes a unit of data. There are a few things you need to know about elements. First, each element consists of three parts: an opening tag, some data, and a closing tag. For example:
John Galt
If you're familiar with HTML, this should look familiar. is the opening tag, John Galt is the data, and is the closing tag.
You also need to note that to create a valid XML document, you must have a root element that contains all the other elements. For example:
John Galt
Hank Rearden
In the preceding example, is the root element.
NOTE
You can denote an empty element (that is, one that has no data) by using a shorthand version of the element in the format < element /> .
Much as they do in HTML, attributes enable you to specify additional information about an element. Using attributes in your XML is fairly simple. You add them as name /value pairs in the opening tag of an element in the format name =" value " . For example:
John Galt
In the preceding example, ssn is the name of the attribute, whereas "111-22-3333" is the value of the attribute. It's important to note that you must follow the format shown here. Unlike HTML, either single or double quotes must enclose the value. The following snippet illustrates incorrect attribute syntax:
John Galt
XML Epilog
The epilog of an XML document is used for any additional comments or processing instructions that must be included after the closing root element.
Well- Formedness
As mentioned earlier, XML documents must follow very strict rules governing their syntax. Documents that follow these rules are said to be well- formed , meaning that an XML parser can read them.
To be well-formed, an XML document must meet the following requirements:
Table 21.2. Common XML Entities
Character |
Entity |
> |
> |
< |
< |
" |
" |
' |
' |
& |
& |
An XML document that meets all these rules is said to be well-formed, meaning that it should be easily readable by an XML parser. Documents that aren't well-formed create fatal errors that must be corrected before an XML parser can properly read the document.
Validity
Every XML document must be well-formed to be useful, but XML documents that use a DTD can be tested for validity, which is to say that the XML complies with the rules defined in the DTD.
It's important to remember that a well-formed XML document isn't necessarily valid, but a valid XML document is necessarily well-formed.
XSL
As you've learned, XML specifies the structure of data, but provides no control of the presentation of it, which is where XSL (Extensible Stylesheet Language) comes into play.
XSL enables you to define formatting rules that control the presentation of your XML documents. By using the XSLT (Extensible Stylesheet Language for Transformations) engine, you can convert your XML to HTML, making it possible to display your XML documents in a user-friendly, aesthetically pleasing format in virtually any media.
XSL goes way beyond the scope of this chapter. For more information about this important topic, please visit http://www.w3.org/TR/xsl/.
XML Parsers
An XML parser is a tool that enables you to programmatically access and process the contents of an XML document. There are two types of parsers: non-validating and validating parsers. As you might expect from its name, a non-validating parser reads the XML, but doesn't check whether the XML is valid based on a DTD.
On the other hand, a validating parser uses the DTD associated with a document to confirm that the XML follows the rules defined in the DTD. Microsoft Internet Explorer 6. x contains a validating parser. If you attempt to view an invalid XML file that specifies a DTD, you'll receive an error message specifying the error.
As a developer, the XML parser is very important to you because it not only checks your XML document for well-formedness and validity, it also provides a way for you to programmatically manipulate XML documents using application programming interfaces (APIs). There are currently two popular APIs: Document Object Mode (DOM) and Simple API for XML (SAX). Each API has its benefits.
DOM
The Document Object Model API is designed as a platform-neutral interface that enables you manipulate the content, structure, and style of XML documents using a tree-style schema. The root element of the document becomes the root node of the tree, and each XML element in the document becomes a node in the tree based on its hierarchy in the XML document.
When using DOM, the entire XML document is loaded and the tree structure is built in memory, making it quick and easy to navigate through the hierarchy of an XML document.
DOM, as it applies specifically to Domino, is covered in somewhat more depth later in this chapter. For more specific information about DOM, please see http://www.w3.org/DOM/.
SAX
The Simple API for XML (SAX) API was developed to provide a simple, lightweight API for handling XML documents. Although DOM is currently the W3C recommendation, SAX is rapidly becoming the de facto standard for server-to-server XML processing.
The fundamental difference between DOM and SAX is that SAX uses an event-driven model rather than a tree-based model. As an XML document is processed , events are generated for each element and passed to an event handler for processing.
SAX, as it applies specifically to Domino, is covered in more depth later in this chapter. For more specific information about SAX, please see http://www.saxproject.org/.
|