The XML Document


XML is all of the following:

  • License free, platform independent, and well supported.

  • A World Wide Web Consortium Recommendation (XML 1.0,W3C Recommendation, February 1998). This makes it a global standard.

  • An easier solution for transferring legacy or other business-to-business (B2B) information. Proprietary platforms, data formats, schema, business rules, and so on can now be linked through a common universal thread and consistently processed by the receiver.

  • Used to describe the structure of information as well as the information itself. Think of it as a data description language.

  • A family or suite of technologies, as listed in Table 1.1.

These are the stages that a document goes through within the W3C:

  1. Working Draft. A Working Draft generally represents work in progress and a commitment by W3C to pursue work in a particular area. The label "Working Draft" does not imply consensus within W3C about the document.

  2. Candidate Recommendation. A Candidate Recommendation is a stable Working Draft that the Director has proposed to the community for implementation experience and feedback.

  3. Proposed Recommendation. A Proposed Recommendation is a Candidate Recommendation that has benefitted from implementation experience and that has been sent to the Advisory Committee for review.

  4. Recommendation. A Recommendation reflects consensus within W3C, as represented by the Director's approval. W3C considers that the technology or ideas specified by a Recommendation are appropriate for widespread deployment and the W3C promotes its adoption.

A graphical representation of this process is given in Figure 1.3. A definitive explanation of the steps is given by the W3C in its process document located at http://www.w3.org/Consortium/Process-20010208/tr.html.

Figure 1.3. The XML process.

graphics/01fig03.gif

Table 1.1. The W3C XML Technologies

Name /URL

Status (Nov 2000)

Description

Extensible Markup Language 1.0 http://www.w3c.org/TR/REC-xml

Recommendation

The base syntax of XML.

XML Information Set http://www.w3c.org/TR/xml-infoset

Working draft

Describes an abstract data set that contains the useful information available from an XML document.

Namespaces in XML http://www.w3c.org/TR/ REC-xml- names

Recommendation

An addendum to XML 1.0, which provides a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references.

XML Base (XBase) http://www.w3c.org/TR/xmlbase

Candidate Recommendation

Another addendum similar to that of HTML BASE, for defining base URIs for parts of XML documents.

XML Inclusions (XInclude) http://www.w3c.org/TR/xinclude

Working draft

Specifies a processing model and syntax for merging a number of XML Infosets into a single composite Infoset.

Document Object Model (DOM) Level 2 http://www.w3c.org/TR/DOM-level-2

Recommendation

Programmatic interfaces that allow manipulation and creation of XML documents.

XML Schema Part 1:Structures http://www.w3c.org/TR/xmlschema-1

Candidate Recommendation

An XML-based language for describing XML elements and attributes as hierarchical types.

XML Schema Part 2:Datatypes http://www.w3c.org/TR/xmlschema-2

Candidate Recommendation

An XML-based language and set of textual data types for describing user -generated data types.

XML Path Language 1.0 (XPath) http://www.w3c.org/TR/xpath

Recommendation

A language for addressing parts of an XML document, designed to be used by both XSLT and XPointer.

XML Pointer Language (XPointer) http://www.w3c.org/TR/xptr

Candidate Recommendation

A language utilizing XPath expressions to identify URI fragments .

XML Linking Language (XLink) http://www.w3c.org/TR/xlink

Candidate Recommendation

Allows elements to be inserted into XML documents to create and describe links between resources.

Extensible Stylesheet Language Transformations 1.0 (XSLT) http://www.w3c.org/TR/xslt

Recommendation

A language that provides ways to transform one XML document to another or to a different form.

The XML Process

Let's take a look at how the different components of the XML system listed in Table 1.1 fit together. I call this the XML process. Figure 1.4 shows all the parts of the process from Table 1.1 that we'll be concerned with and their relationship to one another.

Figure 1.4. The XML process.

graphics/01fig04.gif

Here is the flow of data in the process:

  1. Data from a data source is delivered to an XML engine. This can be accomplished via a recordset, cursor, or plain text file.

  2. A DTD document can be applied at this point to verify the correct structure of the document created.

  3. An XSLT document can be applied to modify the structure of the document to produce another, different XML document. This is usually done to produce documents for business-to-business data transfers with different data requirements. Also, HTML pages can be generated for end-user data viewing. A side benefit of this is that format and content are separated.

  4. The generated XML document can have a DTD applied at this point if one was not already applied.

  5. An XSLT document can be applied at this point to modify the structure of the document. This can take place regardless of whether or not one was applied earlier.

  6. After these last two steps, the transformed document reaches its final form.

We'll return to this diagram several times to illustrate where we are in the process.

Components of an XML Document

Figure 1.5 shows where we currently are in the XML process.

Figure 1.5. The document in the XML process.

graphics/01fig05.gif

Listing 1.1 is a sample XML document that I will use to illustrate and explain its various components.

Listing 1.1 Sample XML Document
 <?xml version="1.0" standalone="no"?>  <!-- ******* Resumes for Potential Hires ******* -->  <RESUMES xmlns='http://www.myorg.net/tags'>    <PERSON PERSONID="p1">      <NAME>        <LAST>Shelton</LAST>        <FIRST>Rick</FIRST>      </NAME>      <ADDRESS>        <STREET>911 Intranet Ave.</STREET>        <CITY>Canberra</CITY>        <COUNTRY>Australia</COUNTRY>        <PC>A34G-90</PC      </ADDRESS>      <TEL>(+612) 111-2345</TEL>      <EMAIL>shelton@somewhere.com</EMAIL>    </PERSON>    <PERSON PERSONID="p2">      <NAME>        <LAST>Tenney</LAST        <FIRST>Corey</FIRST>      </NAME>      <ADDRESS>        <STREET>211 Yardwork Circle</STREET>        <CITY>Roy, UT</CITY>        <COUNTRY>USA</COUNTRY>        <ZIP>64067</ZIP>      </ADDRESS>      <TEL/>      <EMAIL>tenney@yardwork.com</EMAIL>    </PERSON>  </RESUMES> 
XML Declaration

Although it isn't required, XML documents can begin with the statement <?xm...?> . This is the XML declaration that specifies that the following data is an XML document. It also has additional attributes of version , encoding , and standalone . version is a required attribute, and the others are optional. Why are these important? If the version of XML ever changes (changes never happen, right?), there must be a mechanism in place for developers to determine what version they are working with. XSLT 1.0 just became 1.1, for example. The encoding and standalone attributes are important from a Unicode and DTD perspective, respectively. In the sample document, the following declaration

 <?xml version="1.0" standalone="no"?> 

gives us the additional information that this XML document conforms to version 1.0 of the XML standard and that there is an external DTD associated with it. (This document cannot stand alone. ) Of course, if this document had a DTD defined within it, then the standalone attribute would be "yes" .

After all this discussion about DTDs, you might think that DTDs are required; they're not. Later in this chapter, in the "Document Type Definition" and "The DOCTYPE Declaration" sections, we'll go into much more detail on DTDs.

By the way, any statement in the form <?...?> is known as a processing instruction. Processing instructions tell an XML parser to perform tasks other than just interpreting the document data. For those of you who are familiar with C and C++, these are similar to preprocessor instructions. You can even create your own processing instructions as long as the external parser will understand them.

Markup Delimiters

An XML document is composed of one or more elements that are, in turn , composed of two (sometimes one) tags. The opening tag places the element name between a less than symbol (<) and a greater than (>) symbol. The closing tag is identical to the opening tag except that a forward slash (/) is placed before the element name. Our sample document shows that

 <LAST>Shelton</LAST> 

is one of its elements. The document has several more elements, one of which has the opening tag <STREET> and the closing tag </STREET> .

Element Names

The following are guidelines for choosing element names:

  • Element names start with a letter or an underscore character (_).

  • Name length is virtually unlimited but use common sense.

  • Names are case sensitive. For example, <ADDRESS> is a different name than <Address> .

  • Names cannot start with xml in any form.

  • A name can contain any number of the following:

    • Letters

    • Numbers

    • Underscores

    • Periods

    • Hyphens

root Element

The root element delineates the starting and ending points of the document data. It is the outermost element of the document. In our example, <RESUMES> </RESUMES> is the root element.

Empty Elements

There are cases in which an element will have no associated data with it. An example would be a null field in a database transferred to an XML document. The field exists whether or not it contains data, so we need an empty element for it. This is represented by a single tag having the forward slash ( / ) after the name. Remember that in the markup delimiters discussion, I said that sometimes an element consists of only one tag. In our example, there is no telephone number for the second person, so we place an empty element <TEL/> .

Attributes

XML element attributes are similar in appearance and function to attributes contained in HTML tags. They modify or further define the tag with which they are associated. They are contained inside the tag and consist of an attribute name and an accompanying value. The PERSON tag in our resumes document has an attribute, as shown in the following example:

 <PERSON PERSONID="p1"> 

The attribute name is PERSONID , and its value in this instance is p1 . The attribute value must be expressed in quotes.

Comments

XML comments are identical in form and function to HTML comments. They begin with <!- and end with --> . The second line of our document

 <!-- ******* Resumes for Potential Hires ******* --> 

is an XML comment.

Entity References

Five characters are not allowed between tags in an XML document. These characters are shown in Table 1.2.

Table 1.2. Entity Substitutions

Character

Entity

& (ampersand)

Use &amp;

' (apostrophe)

Use &apos;

< (less than)

Use &lt;

> (greater than)

Use &gt;

" (quote)

Use &quot;

A closer look will point out why using any of these entities would interfere with parsing. The ampersand delimits special characters like those in the Entity column of Table 1.2. The remaining entities delimit the tags themselves and attribute data.

CDATA

When you have a section of an XML document that you want to remain untouched or unprocessed by an XML parser, place it in a CDATA section. These sections are treated as plain text. They would be excellent for sections of code in which you don't want the parser to interpret any special characters such as the entity references mentioned previously. CDATA sections are delimited with <![CDATA[ and ]]> .

Watch the location and order of the brackets. This is the only declaration with a bracket between the ! and the keyword.

In the following example, the parser will treat the entities as plain text, and no interpretation will occur:

 <![CDATA[Here is a section on CDATA & its syntax with <element> tags]]> 
Namespaces

Namespaces are a recent addition to XML. They are not mandatory but are advisable to use. Their main purpose is to ensure element uniqueness.Think about the following situation. I could define a tag <NETSTORAGE> in a document about types of disk storage. It is likely that another XML author might have a different document defining the same tag. If these documents were ever combined (quite likely), then these elements would collide.

Here's an example.You have an XHTML document with standard XHTML tags. You want to add some elements of your own design to the document, but some of your tags have been defined with the same name as some of the XHTML tags. This would really mess up an HTML interpreter and would definitely give undesired results. Namespace assignments help avoid this situation by assigning a scope to tags. If this seems farfetched, remember that you can define two elements with the same name having two different meanings and have each one in a different document. The odds of you (or the company you work for) ever combining these documents can be pretty high.

Namespace definitions are usually specified as an attribute of a tag. These definitions can be assigned a name (scope) in the document by appending a colon and the desired name to the xmlns declaration.

In this example, I assigned the name res to the namespace

 'http://www.myorg.net/restags':     <RESUMES xmlns:res='http://www.myorg.net/restags'> 

In the following example, with this declaration, I prefix all tags with this name:

 <RES:PERSON PERSONID="p1">    <RES:NAME>      <RES:LAST>Shelton</LAST>      <RES:FIRST>Rick</FIRST> 

Again, look at the XML document and the root element declaration. This example illustrates the default namespace for this document:

 <RESUMES xmlns='http://www.myorg.net/tags'> 

All tags in this document that do not have a namespace prefix are assumed to belong to this document's namespace. In the following example, the prefix isn't necessary:

 <PERSON PERSONID="p1">    <:NAME>      <LAST>Shelton</LAST>      <FIRST>Rick</FIRST> 

This might seem like overkill right now, but in Chapter 3 when we talk about stylesheet transformations, you will see that this construct is used more often than not.

It is also possible to have more than one namespace declaration in the same XML document:

 <RESUMES xmlns="http://www.myorg.net">      xmlns:phonenumbers="http://www.phones.net" 

With this declaration, look at the document fragment in Listing 1.2.

Listing 1.2 Resumes XML Document Fragment
 <ADDRESS>    <STREET>911 Intranet Ave.</STREET>    <CITY>Canberra</CITY>    <COUNTRY>Australia</COUNTRY>  </ADDRESS>  <phonenumbers:TEL>(+612) 111-2345</TEL>  <EMAIL>shelton@somewhere.com</EMAIL> 

Because the first namespace declaration had no name appended to it, it is the default namespace, and no prefix is necessary for the document tags except in the case of the <TEL> tag. A second namespace was defined for the document that applied to the <TEL> tag. This is not a trick. The fact that the <TEL> tag was associated with a different namespace than the other tags in the document was known by the document writer beforehand.

Well- Formed Documents

For an XML parser to properly process an XML document, it must be at a minimum what is known as well-formed. To meet this requirement, the document must conform to the following minimum criteria:

  • The document must have an associated DTD or have an XML declaration with the standalone attribute set to no . Examine the XML declaration in our sample document, as shown in Listing 1.1.

  • Attributes must be enclosed in quotes. It doesn't matter if these are single or double quotes. Look at the first <PERSON> element in the sample document.

  • All elements must have both an opening and closing tag unless it is an empty tag.

  • Empty tags must have a closing / . An example is the empty <TEL/> for the second person in our document.

  • Tags must be properly nested. The following tag order is not allowed: <font><strong> </font></strong>

  • Markup tags cannot be used as part of element text.



XML and SQL Server 2000
XML and SQL Server 2000
ISBN: 0735711127
EAN: 2147483647
Year: 2005
Pages: 104
Authors: John Griffin

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net