Introducing XML | How to Do Everything with Microsoft Office InfoPath 2003 (How to Do Everything)

Most of us look at web pages and don’t think about how the page was designed or the complexity of the Hypertext Markup Language (HTML) behind the scenes. HTML is one of the core technologies behind web pages that you may have viewed on the Internet and it provides a good starting point for this discussion of XML.

If you navigate to a web page within Internet Explorer and then select View | Source, you can see the HTML markup that was used to create that page. A web browser takes this HTML markup and uses it to display the page, using the content and formatting settings that are present in the HTML. A sample HTML page is shown in Figure 2-1.

click to expand
Figure 2-1: A sample HTML page

For example, you may have noticed that when you navigate to certain pages, the title of your web browser window changes to describe the page you are visiting. This doesn’t happen automatically; within the HTML document is a special set of “tags,” enclosed in angle brackets, that determines which text is the title of the page, as shown here:

<title>Product Listing Page</title>

When your browser reads this set of tags, it knows that the information within the tags contains the title of the page, “Product Listing Page,” and that it can display that title at the top of the browser window, as shown here.

Originally, the Web worked entirely through HTML pages that contained different tags to display text and other objects, control formatting, and so on. The majority of web pages are still written using HTML in this manner. However, while HTML provides a standard method for developing web pages, it is limited in how it can be used.

XML, on the other hand, is a much more flexible markup format and can suit a wide range of uses, from creating web pages and sites to creating data and document exchange formats and files. This flexibility comes from the fact that XML is designed to be used in a number of different ways and is “self-describing,” meaning that XML can communicate both the content and the content’s structure or format.

To see this in action, first consider the web page produced by the HTML shown in Figure 2-1. You can see in Figure 2-2 that the page has a list of products that are available for sale.

click to expand
Figure 2-2: A typical product listing page

If you were to view part of the HTML behind this page, it might look something like this:

</table>
<p><b>Xtreme Mountain
  Bike</b><br>
  Crazy Cycles<br>
  $299.99<br>
  Mountain Bike</p>
<ul>
  <li>rust-free alloy
    frame</li>
  <li>metallic paint
    finish</li>
  <li>comfort-grip handlebars</li>
  <li> cushioned saddle
    seat<br>
    </li>
</ul>
<p><b>Endorphin Racing
  Bike</b><br>
  Crazy Cycles<br>
  $699.99<br>
  Racing Bike<br>
  </p>
<ul>
  <li> rust-free alloy
    frame</li>
  <li>glossy paint finish</li>
  <li>cushioned saddle
    seat<br>
    </li>
</ul>

While this provides all the information required to display the list of products in your browser, it really doesn’t tell you anything about them. For example, if you were to send this information to someone else, how would they know what the numbers and description mean? Are the numbers the price of the product? Or the manufacturer’s cost? Likewise, is the text the actual product name or just a description of the product?

XML allows you to specify all of these attributes in a well-structured file format that can easily be understood and interpreted by a wide variety of systems. (“Well-structured” means that the file has all the required start and end tags, as well as all the other elements that make up a basic XML file.) If you were to take the product information from the web page in Figure 2-3 and put it into an XML file, it might look something like this:

click to expand
Figure 2-3: Structure of an XSN file

<?xml version="1.0" encoding="UTF-8"?>
<Products>
       <Product name="Xtreme Mountain Bike">
        <manufacturer>Crazy Cycles</manufacturer>
        <type usage="mountain"/>
           <features>
                <Item>rust-free alloy frame</Item>
                <Item>metallic paint finish</Item>
                <Item>comfort-grip handlebars</Item>
                <Item>cushioned saddle seat</Item>
           </features>
     </Product>
</Products>

Before you get too far into the XML shown here, you need to understand some terms. XML documents are primarily made up of different elements. An element has a start tag and end tag, marked with angle brackets, and within the tags is some content. In the preceding example, there is a manufacturer element, and the content within the element is Crazy Cycles (the company that manufacturers the bike).

<manufacturer>Crazy Cycles</manufacturer>

If you look at the previous example XML document a little closer, you see that it is made up of a number of these elements. And just like an XML document is made up of different data elements, each of those elements can also have attributes—an attribute is a property that is associated with an element and describes the element content. For example, you might have an attribute associated with a Name element of the product that defines the SKU (or product number) for that particular product, like the attribute shown here:

<?xml version="1.0" encoding="UTF-8"?>
<Products>
      <Product>
            <name SKU="22122">Xtreme Mountain Bike</name>
            <manufacturer>Crazy Cycles</manufacturer>
            <price>299.99</price>
            <type>mountain</type>
            <features>
                  <Item>rust-free alloy frame</Item>
                  <Item>metallic paint finish</Item>
                  <Item>comfort-grip handlebars</Item>
                  <Item>cushioned saddle seat</Item>
            </features>
      </Product>

How do you know what data is contained within an XML file? Well, you could always look at the XML file and try to work out the structure. But XML can also be used to create a schema that defines both the structure and the type of data that is contained within an XML document.

The preceding example doesn’t actually have a proper schema defined. But if it did, you could use it to describe how the product information is stored in the XML file, including the type of data that would be included in each element (string, integer, float, and so forth). The following code is actually part of a separate schema file for the example Products XML file:

<xsd:schema xmlns:xsd="http://osborne.com/HTDEInfoPath/ProductsXMLSchema">
      <xsd:element name="products">
            <xsd:complexType>
                  <xsd:sequence maxOccurs="unbounded">
                        <xsd:element ref="product"/>
                  </xsd:sequence>
            </xsd:complexType>
      </xsd:element>
      <xsd:element name="product">
            <xsd:complexType>
                  <xsd:sequence>
                        <xsd:element ref="name"/>
                        <xsd:element ref="price"/>
                        <xsd:element ref="manufacturer"/>
                  </xsd:sequence>
                  <xsd:attribute name="sku" type="xsd:string"/>
            </xsd:complexType>
      </xsd:element>
      <xsd:element name="name" type="xsd:string"/>
      <xsd:element name="price" type="xsd:float"/>
      <xsd:element name="manufacturer" type="xsd:string"/>
      <xsd:element name="type">

In addition to describing the different elements that can be used, this schema file also defines what type of data is associated with the element (for example, “float” for numbers, “string” for text, and so on), so when it comes time to use this field in a form, InfoPath will know exactly what type of data can be entered into a particular field.