There's no better way to learn about XML then to start looking at it. If you've never used XML, but you've written some HTML, then this should look somewhat familiar.
Some Basic XML
Here's a simple chunk of XML for you to enjoy.
<?xml version="1.0"?> <hello> <there> <!-- Finally, real data here. --> <world target="everyone">think XML</world> <totalCount>694.34</totalCount> <goodbye /> </there> </hello>
Hey, I didn't say it was going to be interesting. As I mentioned before, it's just data, but it is useful data, and here's why.
So there you have itsome clean, clear XML data.
Some Basicand MeaningfulXML
Let's see what that comma-delimited data from Northwind Traders I listed previously could look like in XML.
<?xml version="1.0"?> <productList> <supplier fullName="Beverages R Us"> <product available="Yes"> <productName>Chai</productName> <category>Beverages</category> <unitPrice>18.00</unitPrice> </product> </supplier> <supplier fullName="We Sell Food"> <product available="No"> <productName>Chang</productName> <category>Beverages</category> <unitPrice>19.00</unitPrice> </product> <product available="Yes" onSale="true"> <productName>Aniseed Syrup</productName> <category>Condiments</category> <unitPrice>12.00</unitPrice> </product> </supplier> </productList>
Moving the data to XML has greatly increased the size of the content. But with an increase in size comes an increase in processing value. I was immediately able to get some benefit from the hierarchical structure of XML. In the original data, supplier was just another column. But in the XML version, all the data is now grouped into supplier sections, which makes sense (at least, if that is how I was planning to use the data).
You can also see that I followed The Rule. Every opening tag has a matching closing tag. Whatever you do, don't forget The Rule.
Now, you're saying to yourself, "Tim, I could have grouped the data by supplier once I loaded the comma-delimited data into my program." And to that I say, "You're right." I told you that XML was just another data format. By itself, the XML content is not all that sexy. It's really the tools that you use with your XML data that make it zoom. Because XML uses a consistent yet generic structure to manage data, it was a snap to develop tools that could process consistent yet generic data in ways that look interesting and specific.
What About the Human-Readable Part?
One of the tools used with XML is the double acronym XSLT, which stands for XSL Transformations (XSL stands for eXtensible Stylesheet Language). XSLT is a hard-to-use scripting language that lets you transform some XML data into whatever other data or output format you want. It's just one of a handful of XSL-related languages created to manipulate XML data in complex ways. Ready for some hands-on XSL fun? Take the useful chunk of XML listed previously (the <productList> sample), and replace the first "?xml" line with the following two lines:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="hello.xsl"?>
Save all of that beautiful XML text to a file on your desktop as hello.xml. Next, put the following XSLT script into another file on your desktop named hello.xsl. (Notice that I break one line with the marker so that the content could fit in this book. Please don't really break the comma-separated list on that line in the file.)
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <xsl:text> ProductID,ProductName,SupplierID,Category, UnitPrice,Available </xsl:text> <BR/> <xsl:apply-templates/> </xsl:template> <xsl:template match="supplier"> <xsl:variable name="supID" select="@ID"/> <xsl:for-each select="product"> "<xsl:value-of select="@ID"/>", "<xsl:value-of select="productName"/>", "<xsl:value-of select="$supID"/>", "<xsl:value-of select="category"/>", "<xsl:choose> <xsl:when test="@onSale='true'">On Sale</xsl:when> <xsl:otherwise> $<xsl:value-of select="unitPrice"/> </xsl:otherwise> </xsl:choose>", "<xsl:value-of select="@available"/>" <BR/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
I told you it was hard to use, and even harder to look at. OK, now for the show. I have Internet Explorer 6 installed on my system, but this should work with most current browsers. Open the hello.xml file in your browser, and voilà, the following beautifully formatted text should appear.
ProductID,ProductName,SupplierID,Category,UnitPrice,Available "1","Chai","652","Beverages","$18.00","Yes" "2","Chang","9874","Beverages","$19.00","No" "3","Aniseed Syrup","9874","Condiments","On Sale","Yes"
Now that's more like it. XML and XSLT together have made this advance in data technology possible. (I did cheat a little in this example. You will notice the <BR/> entries in the XSLT script that don't appear in the final output. I added these just to make it look right in your browser.) But seriously, while I was able to generate a comma-separated data set with XSLT, more common tasks for XSLT include generating nicely-formatted HTML based on XML data, or generating a new XML document with a specific alternative view of the original data. How does it work? Basically, the <xsl:template> elements tell the parser to look for tags in the XML document that match some pattern (like "supplier"). When it finds a match, it applies everything inside the <xsl:template> tags to that matching XML tag and its contents. The pattern specified in the "match" attributes uses an XML technology called XPath, a system to generically search for matching tags within your XML document.
Sounds confusing? Well, it is, and don't get me started on how long it took to write that short little XSLT script. XSLT scripting is, blissfully, beyond the scope of this book. Of course, there are tools available to make the job easier. But XSLT is useful only if the XML data it manipulates is correct. You could write an XSL Transformation to report on data inconsistencies found in an XML document, but it won't work if some of the tags in your document are misspelled or arranged in an inconsistent manner. For that, you need another advancement in XML technology: XSD.
XSD (XML Structure Definitions) lets you define the schemathe "language" or "vocabulary"of your particular XML document. Remember, XML is a wide-open generic standard; you can define the tags any way you want and nobody will care, at least until you have to process the tags with your software. If they aren't correct, then your processing will likely fail. XSD lets you define the rules that your XML document must follow if it is to be considered a valid document for your purposes. (DTD, or Document Type Definition, is a similar, though older, technology. It's widely support by XML tools, but it is not as flexible as XSD. There are also other schema definition languages similar to XSD, but because XSD is built right in to .NET, we'll focus on that.)
XSD schemas are every bit as endearing as XSLT scripts. Let's create an XSD for our original sample <productList> XML listed previously. First, we need to change the top of the XML to let it know that an XSD schema file is available. Change this:
<?xml version="1.0"?> <productList>
<?xml version="1.0"?> <productList xmlns="SimpleProductList" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="hello.xsd">
These directives tell the XML parser to look in hello.xsd for the schema. They also define a namespace; more on that later. The hello.xsd file contains the following schema.
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="SimpleProductList"> <xs:element name="productList" type="ProductListType"/> <xs:complexType name="ProductListType"> <xs:sequence> <xs:element name="supplier" type="SupplierType" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:complexType name="SupplierType"> <xs:sequence> <xs:element name="product" type="ProductType" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="ID" type="xs:integer"/> <xs:attribute name="fullName" type="xs:string"/> </xs:complexType> <xs:complexType name="ProductType"> <xs:sequence> <xs:element name="productName" type="xs:string"/> <xs:element name="category" type="xs:string"/> <xs:element name="unitPrice" type="xs:decimal"/> </xs:sequence> <xs:attribute name="ID" type="xs:integer"/> <xs:attribute name="available" type="YesOrNoType"/> <xs:attribute name="onSale" type="xs:boolean"/> </xs:complexType> <xs:simpleType name="YesOrNoType"> <xs:restriction base="xs:string"> <xs:enumeration value="Yes"/> <xs:enumeration value="No"/> </xs:restriction> </xs:simpleType> </xs:schema>
It looks nasty, doesn't it? Actually, it's more straightforward than XSLT. Basically, the schema says that for each element (or "tag" or "node") in my XML document, here are the sub-elements and attributes they contain, and the data type of each of them. You can even create your own pseudo-data types (actually, limiting factors on existing data types), as I did with the "YesOrNoType" data type, which limits the related value to the strings "Yes" and "No."
You can look at the XML file with the attached XSD schema in your browser, but it wouldn't be all that interesting. It just shows you the XML. But schemas will be useful when you need to assess the quality of XML data coming into your software applications from external sources.
The product list in the XML shown earlier is nice, but someone else could come up with a product list document that is just as nice, but with different naming and formatting rules. For instance, they might create a document that looks like this:
<?xml version="1.0"?> <allProducts> <vendor vendorName="Beverages R Us"> <item available="Yes"> <itemName>Chai</itemName> <group>Beverages</group> <priceEach>18.00</priceEach> </item> </vendor> </allProducts>
The data is all the same, but the tags are different. Such a document would be incompatible with software written to work with our original document. Running the document through our XSD would quickly tell us that we have a bogus data set, but it would be nicer if something told us that from the start. Enter namespaces. Namespaces provide a convenient method to say, "This particular tag in the XML document uses this XSD-defined language." Notice the start of the XSD schema shown previously.
This line sets up a namespace named xs by using the xmlns attribute. (The ":xs" part tells XML what you want to call your namespace.) The value of the attribute is a URI (Uniform Resource Identifier), just a unique value that you are sure no one else is going to use. Typically, you use a web site address for your own company; the web site doesn't have to exist. You could even put your phone number there, just as long as it is unique.
The most common way to use a namespace is to prefix the relevant tags in your XML document with the new namespace name, as in xs:schema instead of just schema. This tells the parser, "If you are checking my syntax against an XSD schema, then use the one that I defined for the xs namespace." You can also use a "default" namespace for a given element and all its descendants by including the xmlns attribute in the outermost element. Then all elements within that outermost element will use the specified namespace. I used this method in one of the preceding examples.
For basic XML files that will be used only by your program, you may not need to bother with namespaces. They really come in handy when you are creating XML data that uses some publicly published standard. There are also instances where a single XML file might contain data related to two or more distinct uses of XML. In this case, different parts of your XML file could refer to different namespaces.
As with other parts of the XML world, XSD and namespaces are not all that easy to use, but they are flexible and powerful. As usual, there are tools, including tools in Visual Studio, which let you build all of this without having to think about the details.
As I keep saying, XML is just data, and if your program and data don't understand each other, you might as well go back to chisel and stone. XML and its related technologies provide a method to help ensure your data is ready to use in your application.