Chapter 2: Brief Introduction to XML | Cross-Platform Web Services Using C# & JAVA (Charles River Media Internet & Web Design)

Building an XML Document

To read an XML document, an application uses a parser to get the data contained in the document. A parser usually consists of a large API that allows the programmer to choose which elements to look at in the document. Microsoft’s .NET architecture provides a developer with several classes for accessing the data in a document, and the Apache group develops a parser called Xerces™ that works cross platform.

With Web Services, an application passes an XML document across the Internet with different transport protocols. Therefore, either a client or sever side program must parse the XML to get to the data within the document. The following sections describe many parts of an XML document that a parser encounters.

Processing Instruction

The first part of any XML document is the Processing Instruction (PI). This tells the parser that the data is in an XML document and the version of XML used (at this point it’s always 1). The start of the document now looks like the following.

    <?xml version="1.0" ?>

The version is always set to 1 because there hasn’t been another version of XML. This statement tells the parser where to begin looking for XML.

Root Element

To have a useful document, data needs to be present. To begin describing data, a root element must be present. This is the outermost element in the document. An element is simply a tag that looks much like an HTML tag, but in the case of XML the programmer chooses the name of the tag. For this example, BOOK is the root element.

    <?xml version="1.0" ?>     <BOOK>     </BOOK>

The element is the word BOOK surrounded by <>. The element with the slash, in this case </BOOK> is the closing element. An XML document must have only one root element, and this element must be the outermost element.

Later in the book you’ll see that the root element begins the definition of a SOAP document or a WSDL file.

Empty Elements

With the small amount of data present in this document, the closing element isn’t really necessary. Using an empty element, which is an element with no closing tag, makes the data more succinct by just using a/ at the end. If we take the previous example and make BOOK an empty element, we have the following result.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services"/>

Attributes

Additional information added to an element is an attribute that, in this case, is part of the opening BOOK element and it contains the title of a book. Attributes always appear as part of the opening element and can be in any element in the document (not just in the root element as in the examples thus far).

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services">     </BOOK>

The XML standard contains a great deal of flexibility because both elements and attributes are allowed. This gives you and the developers of an XML language, such as SOAP, great flexibility in design.

As shown in examples later in the chapter, attributes often define namespaces or locations, such as the next SOAP node, for the XML document. Be sure to see the definition of namespaces later in this chapter.

Attribute Centric Data

So far this document doesn’t really give a user much information about the book. By adding more attributes to the document, a better definition of data occurs and the following is the possible result.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services"           PAGECOUNT="400"           AUTHOR="Brian Hochgurtel"           PUBLISHER="Charles River Media"/>

This document is attribute centric because the information all resides within attributes.

Element Centric Data

Now the information is more descriptive, but another possibility it to format the data which elements that are children of BOOK, such as the following.

    <?xml version="1.0" ?>     <BOOK>       <TITLE>Cross Platform Web Services</TITLE>       <PAGECOUNT>400</PAGECOUNT>       <AUTHOR>Brian Hochgurtel</AUTHOR>       <PUBLISHER>Charles River Media</PUBLISHER>     </BOOK>

Because the data in this example belongs completely in elements, the document is considered element centric.

Elements and Attributes in the Same Document

This example is element centric because all the data resides in elements and the previous example was attribute centric because the data resides in attributes. XML, however, does not require that a document be attribute or element centric because the data can be mixed, as shown in the following example.

    <?xml version="1.0" ?>     <BOOK TITLE="Cross Platform Web Services">       <PAGECOUNT>400</PAGECOUNT>       <AUTHOR>Brian Hochgurtel</AUTHOR>       <PUBLISHER>Charles River Media</PUBLISHER>     </BOOK>

Nested Elements

The elements chosen for this document only allow for one book to be in the document. If several books need to be in the document, more nesting needs to occur. By nesting BOOK under a different root element named LIBRARY, several occurrences of BOOK can occur in the document, as shown in the following example.

    <?xml version="1.0" ?>     <LIBRARY>       <BOOK TITLE="Cross Platform Web Services">         <PAGECOUNT>500</PAGECOUNT>         <AUTHOR>Brian Hochgurtel</AUTHOR>         <PUBLISHER>Charles River Media</PUBLISHER>       </BOOK>       <BOOK         TITLE="Learning Visual Basic Through Applications">         <PAGECOUNT>418</PAGECOUNT>         <AUTHOR>Clayton E. Crooks II</AUTHOR>         <PUBLISHER>Charles River Media</PUBLISHER>       </BOOK>     </LIBRARY>

LIBRARY is now the root element. BOOK is a child of LIBRARY but is still the parent of PAGECOUNT, AUTHOR, and PUBLISHER. This document now has the ability to describe multiple books, and perhaps other items, that may fit within a LIBRARY such as a magazine.

Using Namespaces

Namespaces ensure that the element names used in your XML document are unique, and have many of the same properties as the namespaces used in the C# example shown earlier in the chapter.

The namespace definition occurs in the root element (the outermost element) and utilizes a URL as a unique identifier. Realize that there is no required content at the URL. It’s just an identifier that assists in making the elements unique. Defining the namespace occurs in the root element, as the following example illustrates.

     <BOOK XMLNS:WEBSERVICES="www.advocatemedia.com/XML"></BOOK>

Then all the child elements of BOOK begin with the namespace.

    <?xml version="1.0" ?>     <BOOK XMLNS:WEBSERVICES="www.advocatemedia.com/XML">       <WEBSERVICES:TITLE>Cross Platform Web Services</          WEBSERVICES:TITLE>       <WEBSERVICES:PAGECOUNT>400</WEBSERVICES:PAGECOUNT>       <WEBSERVICES:AUTHOR>Brian Hochgurtel</WEBSERVICES:AUTHOR>       <WEBSERVICES:PUBLISHER>Charles River Media</ WEBSERVICES:PUBLISHER>     </BOOK>

Note

Namespaces are an important concept to understand because many of the XML standards underlying Web Services utilize them usually as a way to represent various elements that are vendor dependent or to support primitive types from schemas.