Manipulating XML | Architecting Web Services

As your usage of XML continues to evolve, more and more tools will become available to make working with it easier. Many people think developers may someday not work with XML directly, even those of us who enjoy getting our hands dirty! While that day may be a while off, vendors are already making strides at making the usage of XML more efficient. When XML was first introduced, developers had to use compilers and parsers to do anything with XML. Not only is this overkill for a simple scenario (like changing one value of one node in a document), but it also requires a high level of technical expertise.

One community to jump on the XML bandwagon was Web developers, who work with technologies like HTML, CSS (Cascading Style Sheets), and scripts. XML helped them to work with data, but it required a lot of learning on their part to even touch it programmatically. Of course, XML is very similar to HTML, but it defines data, not presentation. This community needed a way to get that data into a presentable form. Some of the initial efforts to bridge this gap took the form of the XSL specification.

XSLT

XSLT (eXtensible Stylesheet Language Transformations) has a somewhat confusing definition. It is actually one of the three specifications that make up the XSL standard. XSL was originally submitted to the W3C in 1997, and XSLT was broken out into its own specification in April 1999. XSLT is now recognized as the language of XSL for transforming XML documents into other XML documents. You can think of XSLT as the language that makes up the content of documents known as XSL stylesheets. The processors that support the XSLT language are considered to be XSL processors.

The second component of XSL is called XSL Formatting Objects and specifies formatting semantics related to presentation-specific functionality. XSL Formatting Objects fall outside the scope of this discussion. The third component of the standard is XPath, which is a non-XML syntax for locating portions of an XML document. You will use XPath in your XSLT when identifying nodes in your XML document.

XSLT became a recommendation of the W3C in November 1999. You can find the entire specification at http://www.w3.org/TR/xslt.html. Several fine books cover this standard in depth (for example, XSLT by Michael Kay), so I will provide an overview, focusing on the topics that have significance to working with Web services.

Conceptually, XSLT should really be thought of as a scripting language, similar to JavaScript or VBScript. However, it is physically more similar to Cold Fusion because it is a tag-based sequence of rules. Like these languages, XSLT is never actually compiled into a binary. Rather it is a set of rules that are interpreted in real time when applied to an XML document.

Note

Some processors are starting to do precompiling of XSL stylesheets, but this is outside the capabilities of XSLT itself.

An XSL processor is the engine that takes the XML data source(s), loads the XSL document, maps the XML to it, and produces a resulting XML document. (See Figure 4-16.) This process is usually referred to as a transformation, but it can also be referred to as mapping or overlaying. This result is then displayed, saved, or delivered without any reference or relationship to the original XML or XSL.

click to expand
Figure 4-16: XSL mapping of XML

XSLT does not replace the DOM or SAX API. In fact, implementations likely utilize these APIs behind the scenes in their XSL processors. The DOM is especially valuable because XSL borrows its model of XML as a tree. Because XML can be sourced in many ways, XSLT references DOMs through a logical model. This keeps applications from having to send the XML through a document state.

Because XSLT is a rules-based language, the language has two key components: elements and expressions. By identifying the elements in a document and recognizing their expressions, you can extract, modify, and enhance the source into another document.

Elements

Like XML itself, elements are the building blocks of XSLT. These elements are used to identify, describe, qualify, and act upon the data in your source. More than 35 elements are defined in XSLT, but I will not cover them all here. I will discuss only those elements you are most likely to need in working with Web services. These include the following:

stylesheet
template
apply-templates
element
attribute
value-of
text
comment
param
variable
for-each
if
choose, when, and otherwise
copy
copy-of
sort

For a more complete listing, you can look on the W3C site or in one of the available books dedicated to XSLT.

stylesheet

The fundamental element for all stylesheets is the stylesheet element. This is the root that declares the namespace and establishes the fact that this document is a stylesheet. For organizational purposes, XSL Stylesheets should either be kept separate from non-XSLT XML data documents or they should have a very distinct naming convention. The syntax with the current namespace is as follows:

 <xsl:stylesheet version="1.0" xmlns:xsl=" http://www.w3.org/1999/XSL/Transform" > </xsl:stylesheet>

Note

Some stylesheets declare the transform element as the root. The transform element is synonymous with stylesheet and is permissible as a substitute according to the specification. Also, all code examples in this section assume the presence of this stylesheet node as the root.

template

The template element is used to map XSLT code with the appropriate data in an XML document. It does this by acting as a container and defining the appropriate rules. Therefore, the content in the template body is executed whenever the rule applies. At first this might appear to be redundant with the stylesheet element, but a single stylesheet can have many templates, so it is not just a one-to-one relationship.

 <xsl:template match="/" name=" rootTemplate" priority="1">      ... </xsl:template>

The rule itself is defined by the match attribute, which identifies a valid path in the XML document, against which it runs. A value of "/" refers to the root node of the document and always leads in precedence to other templates. In the case of multiple rules, the priority attribute determines which one runs, not an order of precedence. The rule with the highest priority value is executed. The others are applied to a document only if this "master" template references them.

The name attribute is optional, but you will want to use it if you ever hope to reference it through the apply-templates element.

apply-templates

This element allows you to apply a defined template. This is very useful whenever you use a template multiple times. Rather than actually listing the template every time it is used, you can define it and reference it. The following templates define the formatting for the name of a customer and reuse it:

 <xsl:template match=" customerData">    ...    <xsl:apply-templates select=" customer"/>    ... </xsl:template> <xsl:template match=" customer">    <xsl:value-of select=" name/family"/>    <xsl:text>,</xsl:text>    <xsl:value-of select=" name/given"/> </xsl:template>

element

Unlike the DOM and SAX, the creation of elements is defined as a part of the XSLT standard. The element element is the mechanism for doing so. The nice thing about the implementation of the element tag is that it allows for the naming of the element to be automated and not just hard coded. The content of this element can also be treated as a template for the value, attributes, and children of the element. The following code:

 <xsl:element name=" bedSize">King</xsl:element>

produces this output:

 <bedSize>King</bedSize>

attribute

The attribute element is used to generate new attributes for elements in the resulting document. Attributes can be defined for new elements or existing elements. There actually is no distinction between the two, since even existing elements are recreated in the new document. The following code creates an attribute named "condition" and assigns it a value of "new":

 <xsl:attribute name=" condition">new</xsl:attribute>

Tip

I have noticed that XSLT processors behave more consistently when the attribute element is applied before any values in elements. Theoretically, the order should not matter.

value-of

To reference the value of an element in your source document, you can use the value-of element. This allows you to select the value of any element at the current level (a combination of the template element and this one) and reproduce it. The specific element whose value you reference is given in the select attribute. This is always a closed tag, meaning it has no end tag, because its contents have no meaning to the XSL processor. In the following code, we are selecting any nodes with a value of "bedSize":

 <xsl:value-of select=" bedSize"/>

text

The text element can be used to wrap around plain text in XSL. Usually this information can be exposed directly, but there are cases in which you will want to exempt it from the tag-based structure and syntax of XSLT. This can help especially with protecting the stylesheet from special characters that might be in the text data. The following code will produce a simple text node with the value of "Size":

 <xsl:text>Size</xsl:text>

This could be utilized to rewrite the earlier sample for the attribute tag to produce the exact same result, as shown here:

 <xsl:attribute name=" condition"><xsl:text>new</xsl:text></xsl:attribute>

comment

To add comments to your transformed document, use the comment tag. The contents of the tag are the actual comments, and there are no attributes for this element as you can see in the following example below:

 <xsl:comment>This data is provided courtesy of company XYZ.</xsl:comment>

This generates the following standard comment line in XML:

 <!--This data is provided courtesy of company XYZ.-->

param

The param element is one of two elements available for binding values to variables. This element is unique in that the assigned value acts as only a default value and can be influenced by external references (through the with-param element). Hence, the param element is similar to a public property in the programming paradigm. The param element can be either declared at the top of the document and referenced globally or declared at the top of a template and referenced locally. The name attribute is required. The following code shows the proper usage for the param element:

 <xsl:param name=" sport">Hockey</xsl:param>

A param element can be referenced, in the scope in which it is defined, through the value-of element. This is done by declaring the name of the parameter with a '$' prefix, as shown here:

 <xsl:value-of select="$sport"/>

variable

The variable is the other element available for binding values to variables. The variable is different in that it is private and can be modified only in the scope in which it is defined. The variable is also different from param in that it can be declared at any point in a template, not exclusively as the immediate child. The scope of variable works the same way as param, and its usage is the same, as shown here:

 <xsl:variable name=" sport">Hockey</xsl:variable>

This example shows the variable with a content template. An alternative implementation utilizes the select attribute to declare the value, making the variable a closed tag, as shown here:

 <xsl:variable name=" sport" select="'Hockey'"/>

Take care that you properly delineate string values, because the double quotes alone will treat the value not as a string, but as a node reference. This subtle difference (seen in the following example) produces not an error, but very likely an empty value if no Hockey node is defined:

 <xsl:variable name=" sport" select=" Hockey"/>

for-each

The for-each element allows you to loop through a series of elements that match a specific condition. For example, if you have a series of orders and want to run a certain template on every order to check for some specific condition, the for-each element facilitates that.

The for-each element has only one attribute, select. This is where the condition is defined for this template. The content between the start and end tags of the for-each element constitutes the template for the condition. The following is an example of the start and end tags for a for-each element:

 <xsl:for-each select=" hotelAvailability/hotel">    ... </xsl:for-each>

The if element is used whenever a condition needs to be tested in your XSLT. Like other programmatic if conditions, this is an all-or-nothing test in which the action is either taken (in this case, referencing the template) or ignored. The if element can be a child of itself and can be run in succession. There is no else condition available, so if you want to determine a series of exclusive conditions, you may prefer using the choose element. The following code demonstrates the syntax for the start and end tag of the if element:

 <xsl:if test=" nonsmoking='Yes'">    ... </xsl:if>

choose, when, and otherwise

The choose element allows you to choose from a number of possible alternatives. It is always followed by one or more when elements and an optional otherwise element. When you declare a choose, you build a series of conditional tests that allow you to select the appropriate action for each condition. These tests are administered through the when elements. The otherwise element serves as a catch-all if none of the tests is passed. This logic is very similar to that of a case or switch statement or a series of if-thens, only more efficient.

The choose element has no attributes and can contain only when and otherwise child elements. The following code shows the basic syntax of the choose element:

 <xsl:choose>    ... </xsl:choose>

Like the if element, the when statements have only a test attribute, which declares the expression. Each when statement is evaluated in order, and the resulting outcome is always a Boolean answer (true or false). The following example defines a condition in which the value of the bedSize element equals the string 'King':

 <xsl:when test=" bedSize='King'">    ... </xsl:when>

The otherwise element, like choose itself, has no attributes. Only one otherwise element can exist as a child of a choose element, and its content is referenced only when all when tests have failed. Because no condition needs to be defined, the otherwise syntax is very basic, as seen here:

 <xsl:otherwise>    ... </xsl:otherwise>

Let's look at a complete choose statement that takes a different action based on the evaluation. It will evaluate the price of the room and offer appropriate comments, as seen here:

 <xsl:choose>    <xsl:when test=" cost&lt; 20">      <xsl:comment>Too cheap! Move on!</xsl:comment>    </xsl:when>    <xsl:when test='cost&lt; 100'>      <xsl:comment>Could be a bargain!</xsl:comment>    </xsl:when>    <xsl:otherwise>      <xsl:comment>Too much! Move on!</xsl:comment>    </xsl:otherwise> </xsl:choose>

Note

< is the code representation of the < special character in XML documents.

It is important to note that this routine acts just as a case or switch statement. If the price of the hotel room is $19, only the first when condition is triggered. If you do not want an exclusive selection, use either the if element or multiple choose statements.

copy

The copy element allows you to make a copy of the existing node at your exact location in the XML document. This means you end up making the element a child node of the current node. This statement allows you to copy only the node itself, not its attributes or children. However, you can reference an attribute set through the copy element's properties. For a recursive copy, use the copy-of element.

One use of this function might be when you want to work through a group of nodes and push them down a level. This is accomplished by using a for-each statement, building the node you want at that level, and making the original node a child. Consider the following document for hotel availability:

 <hotelAvailability>    <date>2001-08-05</date>    <hotel>      <chain>Milton</chain>      <distance>5</distance>      <bedSize>King</bedSize>      <cost>79.99</cost>      <nonsmoking>Yes</nonsmoking>    </hotel>    <hotel>      <chain>Harriot</chain>      <distance>10</distance>      <bedSize>Queen</bedSize>      <cost>59.99</cost>      <nonsmoking>Yes</nonsmoking>    </hotel> </hotelAvailability>

You can transform the document so that it provides pricing information by chain like this:

 <hotelPricing>    <date>2001-08-05</date>    <chain>Milton      <cost>79.99</cost>    </chain>    <chain>Harriot      <cost>59.99</cost>    </chain> </hotelPricing>

This is how you can apply the copy element to get there:

 ... <xsl:template match="/" name=" rootTemplate">    <hotelAvailability>      <date>         <xsl:value-of select="/hotelAvailability/date"/>      </date>      <xsl:for-each select=" hotelAvailability/hotel/chain">         <xsl:copy>           <xsl:value-of select="."/>           <cost>              <xsl:value-of select="../cost"/>           </cost>         </xsl:copy>      </xsl:for-each>    </hotelAvailability> </xsl:template> ...

You may not recognize some expressions in this selection of relative nodes. We will look at these and other expressions in the next section.

copy-of

copy-of is just like the copy element, but with a larger scope. This means that all of the element's children are captured with it, which can make a huge difference in the output between the two elements. For example, the following XML code is extracted if we use the copy-of instead of the copy element in our hotel pricing scenario:

 <hotelPricing>    <date>2001-08-05</date>    <chain>Milton</chain>    <chain>Harriot</chain> </hotelPricing>

Notice that the cost element is not present. This is because copy-of is a closed tag, so you cannot make references in each node to do anything to the document segment copied over. Because this limits what we can do in our XSLT, let's see how that would look:

 ... <xsl:template match="/" name=" rootTemplate">    <hotelAvailability>      <date>         <xsl:value-of select="/hotelAvailability/date"/>      </date>      <xsl:for-each select=" hotelAvailability/hotel/chain">         <xsl:copy-of select="."/>      </xsl:for-each>    </hotelAvailability> </xsl:template> ...

In this scenario, the data document has to use another transformation to add the cost data from the chain nodes. You might be wondering why you would ever use the copy-of instead of the copy element. In this example, the chain did not have any attributes or child nodes. If it has, say, an attribute called standing, then the output of a copy-of reflects that, as shown here:

 <hotelPricing>    <date>2001-08-05</date>    <chain standing=" good">Milton</chain>    <chain standing=" great">Harriot</chain> </hotelPricing>

Under the same scenario, the previous copy output would not have changed with the addition of this attribute. The same results apply if chain has one or more child nodes.

sort

The sort element is a very powerful filter that can order the construction of elements based on one or more keys. It can be used in only a for-each or apply-templates element, because a sort needs a defined set, or group, of elements. In fact, the sort element is a closed tag, which might be a bit misleading. Rather than thinking of sort as a collection (as you might in other languages), you should think of it as a property of either the for-each or apply-templates element.

The attributes of the sort element are select, order, lang, case-order, and data-type. Select obviously allows you to select the node on which you want to sort. Order reflects your requirement to list the values in descending or ascending order. Lang can be used to define the language of this particular sort key. Case-order refers to whether uppercase or lowercase values have priority in the sort. The default value can vary between implementations. The data-type property refers to the type of the key being sorted. The valid values are text (the default), number, and QName (qualified name.)

The ability to sort on multiple keys is built into the element. Every successive sort child node reflects a subcategory on which to sort. Consider the following set of hotel data:

 <hotel>    <chain>Milton</chain>    <cost>79.99</cost>    ... </hotel> <hotel>    <chain>Harriot</chain>    <cost>59.99</cost>    ... </hotel> <hotel>    <chain>Pyatt</chain>    <cost>69.99</cost>    ... </hotel> <hotel>    <chain>Heabody</chain>    <cost>89.99</cost>    ... </hotel>

If you simply want to rank these entries by chain in alphabetical order, you can use the following template:

 <xsl:template match="/" name=" rootTemplate">    <xsl:for-each select=" hotelAvailability/hotel">      <xsl:sort select=" chain"/>      <chain><xsl:value-of select=" chain"/></chain>    </xsl:for-each> </xsl:template>

To take this one step further, you can sort based on price, then chain name. That template looks more like this:

 <xsl:template match="/" name=" rootTemplate">    <xsl:for-each select=" hotelAvailability/hotel">      <xsl:sort select=" price" data-type=" number" order=" ascending"/>      <xsl:sort select=" chain" data-type=" text"/>      <chain><xsl:value-of select=" chain"/></chain>      <price><xsl:value-of select=" cost"/></price>    </xsl:for-each> </xsl:template>

This takes advantage of a few of the attributes of the sort element to be more explicit in the sort definition. Numbers and text can sort differently, so you want to make sure that your data is treated appropriately. As you might expect, this template produces the following XML:

 <hotel>    <chain>Harriot</chain>    <price>59.99</price> </hotel> <hotel>    <chain>Pyatt</chain>    <price>69.99</price> </hotel> <hotel>    <chain>Milton</chain>    <price>79.99</price> </hotel> <hotel>    <chain>Heabody</chain>    <price>89.99</price> </hotel>

Expressions

Expressions are used in the XSLT language to reference data. This data can take the form of simple equations, like 2 + 2, or the form of a path reference. The key is an expression can be resolved as only one value for a given situation (document). For a mathematical expression, the laws of mathematics dictate the value. Other expressions may not be as firmly rooted in natural laws, but they are just as absolute in a given set of data.

The most common kind of expression in XSLT is to identify a specific node in an XML data set. Expressions are also used to define conditions for processing routines and to generate text for result sets. This is where mathematical routines can sometimes be useful. Any element that supports the select, match, or test attribute (that is, value-of, apply-templates, for-each, and so on) identifies a certain node, or group of common nodes, to work with. Remember that XSLT treats XML as a node-based tree, just like the DOM, so using paths to specify an element follows that structure. The expressions used in XSLT are actually paths that are defined in the XPath specification. This is part of the reason XPath is a component of the XSL standard.

We will use several expression types that cover a range of techniques and approaches. There is almost always more than one way to reference a node, so I recommend that you find a method, or set of methods, that you are comfortable with and reuse those for consistency. We will look at only some of the most frequently used or needed expressions in this section. For a more thorough discussion of the valid expressions in XSLT, I recommend getting a book dedicated to XSL/XPath or digging through the specification for XPath itself at http://www.w3.org/TR/xpath.html.

Paths

Whenever you identify a path in a data set, understanding the context of that path is critical to success. Context simply refers to the current frame of reference, in this case your current location in a data set. If you give directions based on your current location to someone starting from a different location, your directions will not get that person to the correct destination. Even worse, you may tell the person to turn left on a road where only a right turn is possible. Hopefully, the person will at least have built-in error handling that prevents a crash!

There are some methods for identifying a location globally. This is done by simply starting at the root level of the document. This is great for providing a foolproof method for locating a node, but it is not always appropriate. Perhaps the node you want is specifically related to the current node. Any scenario in which you want only one specific node out of several unordered instances requires relative pathing.

Relative pathing refers to locating a node based on your current location. Global pathing or absolute pathing refers to locating a node from the root level.

Caution

The pathing expressions are somewhat similar to the pathing syntax for HTML. However, there are discrepancies, so don't assume that, if it works for HTML, it works for XSLT.

When I talk about the current location or the current node, I am talking about a logical location, not a physical one. XSLT does not actually walk a model like the DOM, but rather references it through a series of rules. To "be in a location" actually refers to being in the scope of a certain rule. The hotel example had the following XSLT:

 <xsl:template match="/" name=" rootTemplate">    <xsl:for-each select=" hotelAvailability/hotel">      <xsl:sort select=" chain"/>      ...    </xsl:for-each>    ...

The first line establishes the template scope at the root level. Anything we reference at this point references the entire document. The for-each rule goes down to the hotel level in hotelAvailability. The context is now at the hotel level, and anything we reference is related to the hotel node(s). The third line defines a sort, but it doesn't change the location. It is just a rule telling how to treat the data at the current level. However, as soon as we close the for-each tag, we are back up to the root level of the document.

Now that you have an understanding of locations, let's look at some of the expressions that allow us to navigate the data.

/

The slash (/) is the character for starting at the root. This is how all global pathing references start because it allows you to back out of your current context. All other paths look at the child elements of only your current element. It is important to realize that this does not actually define a node. Rather, this is treated as a virtual node that is always the parent of your document root. This allows you to treat the root as a container for the entire document. Following that approach, this example takes you to the explicit path of the date node in the hotelPricing document:

 /hotelPricing/date

Without a virtual container, this path would just be /date. That could get confusing if you ever wanted to work with the root node directly. Furthermore, if you remove the slash in front of hotelPricing, it returns the appropriate node only if you are already at the root of the document.

{node}/{node}

The use of the slash between nodes simply communicates a parent-child relationship. This is how we can traverse the children of our data document. The slash can also be used to traverse up the tree through parent relationships when combined with the appropriate syntax.

{node}//{node}

The use of a double slash describes an ancestor-child relationship. That means an instance of one node preceding another, regardless of the levels between them. This would be a superset of the parent-child relationships. The expression hotelAvailability//date applied to the following code results in both instances of the date node:

 <hotelAvailability>    <date/>    <hotel>      <date/>      </hotel> </hotelAvailability>

If no node is listed previous to the double slash, this identifies any descendants of the current node.

{node}

This expression selects any matching children of the current node.

.

The period (.) simply selects the current node. There is no difference between the expression ./{node} and the expression{node}.

..

A double period (..) allows you to reference the parent node. This can be used recursively by using the slash in between consecutive pairs of periods: ../../date.

@{attribute}

This syntax allows you to reference a named attribute. This can be combined with other expressions to reference the attributes of other elements outside your current context. For example, you can use /@measurement to retrieve the measurement attribute from all the elements in a document.

*

Like in many other languages, the asterisk (*) represents a wildcard value. This is helpful whenever you want an entire group of entities with various names or if you don't know the specific name. To retrieve all the attributes of the current element, you can use @*. This wildcard can be used in paths as well as node names.

[{#}]

This notation is used to specify a certain number of elements in a document. This is useful only in a document in which data is specifically ordered or when you are writing a routine to traverse a document like you might an array. The expression /hotelAvailability/hotel[2] selects the second hotel node in your document. You have to be careful with this approach, however, because, if there is no second hotel, you get an error just like for any other invalid reference.

{node}[{node}]

Used in this syntax, the brackets utilize the node as a child qualifier of the previous node. In other words, you are looking only for those nodes that have one or more children of another node. If you want to consider only hotels that have a distance listed, you can alter the for-each element to select hotelAvailability/hotel[distance].

{node}[{node} or {attribute} equality]

This usage of the brackets allows you to further qualify nodes based on node or attribute values. This can be very helpful in filtering data before even getting to the value-of level of the actual nodes. To look at only those hotels within ten miles of your destination, you can take the previous example and enhance it: hotelAvailability/hotel[distance<10].

Axis Notation

All of the path expressions discussed so far have actually been the shorthand representation. If you have worked with HTML reference paths, you are probably familiar with the shorthand syntax, but most of these paths can also be expressed through the axis notation. This is a more explicit notation that can be helpful when dealing with a bit more complicated path, or if you want to expand the current context without changing your location. Because axis notations are so explicit, their expression is fairly straightforward. All of these commands precede a double colon (::) and the listing of a node or group of nodes like parent::{node}. These commands are listed in Table 4-3 along with a brief description.

Table 4-3: Axis Names
NAME	DESCRIPTION
ancestor::	Any ancestor node
ancestor-or-self::	Any ancestor or current node
attribute::	Any attribute
child::	Any direct child
descendants::	Any descendant children
descendant-or-self::	Any descendant children or self
following::	Any descendants of the document coming after the current node
following-sibling::	Any direct siblings coming after the current node
namespace::	Any matching namespace-declared nodes
parent::	The parent node
preceding::	Like following, but only those previous to the current node
preceding-sibling::	Like following-sibling, but only those previous to the current node
self::	References current node

Usage

Now that you can build a template for your XML data, how exactly do you execute the transformation? Well, like most things we have been discussing, there is more than one option. However, we are really interested in only one of them for our applications. The method that won't apply to us is the internal reference to a template. Just like XML documents can reference their schemas or namespaces, it can also reference a stylesheet for itself. This is done through a single line of code that must be listed directly after the XML header tag, as seen below:

 <?xml-stylesheet type=" text/xsl" href=" myhotels.xsl"?>

Tip

This is a great way to test your stylesheets through your browser.You may have trouble getting this to work if you view the XML and XSL through local files. These documents need to be served over an HTTP server so that the browser recognizes the stylesheet as XSL.Without this step, you will not get any errors, just a blank screen.You can view the source in your browser to confirm that you are actually getting the XML.

However, we are much more concerned with the ability to apply external stylesheets. The whole idea of Web services is to expose processes through XML. The provider cannot make any assumption as to what a consumer will do with that data, so providers are woefully unprepared to provide a presentation-oriented response to a request. After all, the consumer may not even be presenting the data to a user, so the provider can't make that assumption.

The Web service provider should also not rely upon the consumer to do any necessary data-level transformations. The data that users provide in the response should be suitable "as is" for the consumer to work with and manipulate as necessary.

This method is really appropriate only for the direct exposure of XML data to the browser (or other client). We will instead use stylesheets to take their data and turn it into either something usable by our applications or presentable to our users. That means referencing an external stylesheet, perhaps even dynamically. Doing this requires a programmatic transformation that provides both the data and the stylesheet to the XSL processor, resulting in an XML string or object.

The syntax for this can vary from processor to processor and obviously language to language. For the Microsoft XML processor, using Visual Basic to transform an XML document into an XML string with an XSL document looks like this:

 strHotels = sourceDOM.transformNode(myTemplateDOM)

You would use this method only if you were done with the XML at this point and simply wanted to pass along a string to a consumer of your service or application. If you had to perform more work on the data, you would want to keep it in an object form instead of loading this string into the DOM again (which is too costly). The key is to identify what you need to do with the result set. The syntax for maintaining the result in a DOM looks a little different:

 sourceDOM.transformNodeToObject myTemplateDOM, myResultDOM

As you can see from these two examples, the process of transforming documents is relatively simple. Obviously, some things have to be done prior to making either of these calls, but I will leave the details to you for now. We will be diving into the entire process in code later, as well as discussing best practices. The intent of this chapter is to give you a basic understanding of how we can manipulate and work with XML and how to take advantage of these technologies.

When you start using XSL in your applications, you need to keep a couple of things in mind. First and foremost, any transformation results in an XML document. There is the misperception that XSL can be used to transform XML into HTML, and that is technically not correct. You can generate HTML tags, but the resulting document is a valid XML document, which HTML does not always comply with.

The main differentiator is a much stricter compliance with the hierarchy of the tag system. HTML is very forgiving in that the code Hello is perfectly valid. However, this will never pass an XML processor. Any HTML produced by XSL has to take the form Hello. While this isn't a technical challenge, it might be an adjustment challenge to HTML developers. There are also special characters in HTML that are no longer valid, like the non-breaking space, for which XML-compliant alternatives will have to be found. My suggestion is to start adhering to the XHTML specification produced by the W3C: http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/. It was formalized in December 2000 as an XML-compliant version of HTML. Your target browser(s) may not support the entire specification, so make sure you take the time to test. This shouldn't be as much of a problem in the future, though, and referencing it now should at least help you to make the adjustment from HTML to XML-compliance.

The other consideration you need to make is positioning your XSL to eliminate "catastrophic" errors, those that cause the XSL processor to completely stop the current activity. This is a bad situation to be in because you are at the mercy of the XSL processor's error handler, which may not be very cooperative!

These errors usually occur by referencing nodes that do not exist. Whenever a node's existence is in doubt, make sure you take the time to check. This is typically done through the DOM (childNodes) or SAX APIs before applying an XSL to the data. XSL processors don't generally have the ability to "touch" nodes without referencing them, so testing in the template itself is problematic.