How ColdFusion Parses XML | Inside Coldfusion MX

In the preceding chapter, we talked briefly about how to parse XML with ColdFusion, but we did not give you much background on how ColdFusion handles XML documents internally. This section covers how ColdFusion actually works with XML behind the scenes to help you understand how to work with XML so that you can better utilize XML in ColdFusion and avoid various problems that can arise when working with XML.

ColdFusion MX currently makes use of the open-source XML parsers from the Apache Project, which can be found at http://xml.apache.org. ColdFusion uses the Crimson XML parser for Java. It also uses the Xalan parser for Java as the XSLT processor that it accesses using the Java XML parser API from SUN called JAXP (http://javax.xml.parsers) to tie it into MX.

The Crimson XML parser supports two different types of XML parsing. The first is called Simple API for XML (SAX), and the other is called document object model (DOM) parsing.

SAX and DOM were created to enable programmers to access XML without having to write a parser in their programming language of choice. Both SAX and DOM serve the same purpose, which is giving you access to the information stored in XML documents using any programming language (and a parser for that language). However, both of them take very different approaches to how the information is accessed. ColdFusion MX uses the DOM model to work with XML, but it's important to understand both models to understand not only how XML and XML parsing work but how and why ColdFusion uses the DOM model. SAX is also covered because there are certain times when you, as a developer, might need to use SAX instead of DOM.

SAX

SAX's approach to parsing documents is very different from DOM's. SAX chooses to give you access to the information in your XML document not as a tree of nodes (like we saw in Chapter 19) but as a sequence of events. Although SAX's event-based parsing makes it very fast, it also creates some problems for developers:

You must create a custom object model.
You must create a class that listens to SAX events and properly creates your object model.

SAX is very simple; it doesn't expect the parser to do much. All SAX requires is that the parser read in the XML document and fire a bunch of events depending on what tags it encounters in the XML document. You are responsible for interpreting these events by writing an XML document handler class, which is responsible for making sense of all the tag events and creating objects in your own object model. So you have to write the following:

Your custom object model to "hold" all the information in your XML document
A document handler that listens to SAX events (which are generated by the SAX parser as it's reading your XML document) and makes sense of these events to create objects in your custom object model

SAX can be really fast at runtime if your object model is simple. In most cases, it is faster than DOM because it bypasses the creation of a tree-based object model of your information. On the other hand, you do have to write a SAX document handler to interpret all the SAX events (which can be a lot of work).

SAX will fire an event for every open tag and every close tag. It also fires events for PCDATA and CDATA sections. Your document handler (which is a listener for these events) has to interpret these events in some meaningful way and create your custom object model based on them. Your document handler will have to interpret these events, and the sequence in which these events are fired is very important. SAX also fires events for processing instructions, DTDs, comments, and so on. The idea is still the same, however; your handler has to interpret these events (and the sequence of the events) and make sense out of them.

ColdFusion does not have internal support for a SAX parser, but you can either use CFOBJECT or create a CFX to work with a SAX parser. You might need to work with a SAX parser when you are working with XML documents that are many megabytes in size or when performance and speed are of absolute importance.

It's interesting to note that some DOM parser implementations are actually built using a SAX parser.

DOM

Of the two parsing methods, we are going to focus on DOM the most because ColdFusion MX uses it as a model for its XML document objects. The DOM gives you access to the information stored in your XML document as a hierarchical object model often called a tree. DOM creates a tree of nodes (based on the structure and information in your XML document), and you can access your information by interacting with this tree of nodes. The textual information in your XML document gets turned into a bunch of tree nodes. For example, let's say we have an XML document that contains a list of users from our ICF user database (see Listing 20.1).

Listing 20.1 Users.xml

 <?xml version="1.0" encoding="UTF-8"?>  <Users>        <sysUser Status="Active">              <UserName>robis</UserName>              <FirstName>Robi</FirstName>              <LastName>Sen</LastName>        </sysUser>        <sysUser Status="Active">              <UserName>lieage1</UserName>              <FirstName>Dan</FirstName>              <LastName>Hahn</LastName>        </sysUser>        <sysUser Status="Active">              <UserName>jgull</UserName>              <FirstName>Jenghis</FirstName>              <LastName>Kat</LastName>        </sysUser>  </Users>

The DOM would represent this data as shown in Figure 20.1.

Figure 20.1. How an XML document is represented as a DOM tree.

graphics/20fig01.gif

Regardless of the kind of information in your XML document (whether it is tabular data, a list of items, binary data, or just a document), DOM creates a tree of nodes when you create a document object given the XML document. Thus, DOM forces you to use a tree model to access the information in your XML document. Because XML is hierarchical in nature, a tree model is a natural method of modeling XML.

Figure 20.1 is perhaps overly simplistic because, in DOM, each element node actually contains a list of other nodes as its children. These children nodes might contain text values or might be other element nodes. At first glance, it might seem unnecessary to access the value of an element node (for example, in <FirstName>Robi</FirstName>, Robi is the value) by looking through a list of children nodes inside of it. If each element only had one value, this would truly be unnecessary. However, elements can contain text data and other elements; this is why you have to do extra work in DOM just to get the value of an element node.

Usually when pure data is contained in your XML document, it might be appropriate to "lump" all your data in one string and have DOM return that string as the value of a given element node. This does not work so well if the data stored in your XML document is a document. In documents, the sequence of elements is very important. For pure data (like a database table), the sequence of elements does not matter, so DOM preserves the sequence of the elements it reads from XML documents because it treats everything as it if were a document. Hence, the name document object model.

ColdFusion actually creates an XML document object using the exact same format as the DOM tree to represent its XML document object. If we use the script in Listing 20.2 and call it from our browser (IE 5.0 or above or Netscape 6.0), we can actually view the XML document object.

Listing 20.2 Code to View the XML Document Object

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">  <html>  <head>        <title>Create USER XML</title>  </head>  <body>  <cfquery datasource="icf" name="getUsers">  SELECT User.UserID, User.UserName, User.UserPassword, User.UserFirstName,  User.UserLastName, User.UserEmail, User.UserStatus, User.UserLevelID  FROM User  </cfquery>  <cfxml variable="Users"><Users><cfoutput query="getUsers">        <sysUser Status="#Trim(XMLFormat(getUsers.UserStatus))#">                <UserName>#Trim(XMLFormat(getUsers.UserName))#</UserName>              <FirstName>#Trim(XMLFormat(getUsers.UserFirstName))#</FirstName>              <LastName>#Trim(XMLFormat(getUsers.UserLastName))#</LastName>        </sysUser>        </cfoutput>  </Users>  </cfxml>   <p>This is a simple XML document that's been generated by the ColdFusion code.</p>  <cfdump var=#Users#>  </body>  </html>  <cffile action="write"    file="#ExpandPath(".")#\users.xml"    output=#tostring(Users)#>

If you look at Figure 20.2, you can see how the structure outputted by cfdump exactly models the DOM tree of the XML document.

Figure 20.2. The ColdFusion XML document object .

graphics/20fig02.gif

Working with the XML Document Object

In the first section of this chapter, we actually built several XML documents and used some of ColdFusion's built-in functions to parse and access data in the XML document object created by ColdFusion without really knowing how the XML document was modeled in ColdFusion. Now that we have an understanding of the DOM and know that ColdFusion represents the DOM tree created by Crimson as a ColdFusion structure, we can use that knowledge to do some more sophisticated manipulation of XML documents in ColdFusion.

In the preceding chapter, we looked at how to extract data from a ColdFusion XML document structure, but there are more ways to do it than we showed you. Let's look at some different methods of accessing the same data in an XML document.

In the preceding chapter, we discussed how you can access a specific node in an XML document using this sort of notation:

 myXMLDocument.XmlRoot.XmlChildren[1].XmlChildren[1].XmlText

This statement references the very first username in the Users.xml document. In ColdFusion, however, there is more than one way to get the same information. Let's look at the example in Listing 20.3.

Listing 20.3 usersparse.cfm

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">  <html>  <head>        <title>Parse XML file users.xml Chapter 20:Listing 20.3</title>  </head>  <body>  <cffile action="read"  file="#ExpandPath(".")#\users.xml"  variable="XMLFileText">  <cfset myXMLDocument=XmlParse(XMLFileText)>  <cfoutput>  #myXMLDocument.XmlRoot.XmlChildren[1].XmlChildren[1].XmlText#<br>  #myXMLDocument.Users.sysUser[1].UserName.XmlText#<br>  #myXMLDocument.Users.sysUser[1]["UserName"].XmlText#<br>  #myXMLDocument["Users"].sysUser[1]["UserName"].XmlText#<br>  #myXMLDocument.XmlRoot.sysUser[1].XmlChildren[1]["XmlText"]#<br>  </cfoutput>  </body>  </html>

In this example, we read a file in and use the XMLParse function to parse the XML and turn it into a ColdFusion XML document object. We then use five different methods of accessing the username in the Users.xml document! ColdFusion lets you reference nodes in several different ways, as follows:

You can use an array index to specify one of multiple elements with the same name, (for example, #myXMLDocument.Users.UserName[1]#, or to get the second username #myXMLDocument.Users.UserName[2]#).
You can retrieve a reference of all elements of a specific name by leaving off the element identifier (array identifier or node ID). If you did myXMLDocument.Users.UserName, you would be returned an array of three UserName elements.
You can access the XmlChildren array to specify an element without using its name, (for example, myXMLDocument.XmlRoot.XmlChildren[1]).
Use associative array (bracket) notation to specify an element name that contains a period or colon, for example, myotherdoc.XmlRoot["Type1.Case1"]).
You can also use the DOM methods in place of specific structure entry names.

You might be wondering what the best method of getting at XML document data is, but for the most part it's a matter of preference. In some specific cases in which data contains a period, for example, or a symbol, you might have to use a specific notation to get access to your data. In most cases, the syntax we introduced in Chapter 19, in the section "XML Syntax Rules" is the best in that it will cause the fewest problems, but it can also be the least obvious syntax.

In some cases, you will need to preserve the case of an XML document. For example, you might have an XML document in which there is a tag set <Users></Users> and also a tag set called <users></users>. As we learned in the preceding chapter, this is completely valid because XML is case-sensitive. ColdFusion, though, is not, so you might need to use the CASESENSITIVE="TRUE" attribute in your CFML tag or specify True as a second argument in the XMLNew or XMLParse functions that create your ColdFusion XML document object.

Now that you understand how to access data in the XML document object, let's look at how we can modify our XML document objects.

Adding, Deleting, and Modifying XML Elements

Often when you are working with an XML document, you might want to change or edit the document in some form. You could just create a whole new document, but this is not efficient. And if you are using an XML document object to store persistent data such as global application settings, creating a whole new document would not be useful.

ColdFusion MX offers a variety of methods to enable you to manipulate XML document objects. Let's first look at some ways in which you can add data to an existing XML document using techniques you have already learned to find and/or match XML elements in a XML document object. This will utilize your knowledge of working with structures and arrays from Chapter 7, "Complex Data Types." Working with XML document objects, you will mostly use standard array and structure functions to add, edit, or delete data/elements from XML document objects. Much of the syntax and code will already be familiar to you, with the exception of the addition of XML-specific functions.

Adding Elements

Let's say we wanted to add some comments to the XML document we are working with. Let's say we want to add an XML comment to the first child in the XML document. We could do this with the following statement inside a CFSCRIPT block:

 myXMLDocument.XmlRoot.XmlChildren[1].XmlComment = "This is a comment";

Try changing Listing 20.2 to look like the following.

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">  <html>  <head>        <title>Create USER XML</title>  </head>  <body>  <cfquery datasource="icf" name="getUsers">  SELECT User.UserID, User.UserName, User.UserPassword, User.UserFirstName,  User.UserLastName, User.UserEmail, User.UserStatus, User.UserLevelID  FROM User  </cfquery>  <cfxml variable=" myXMLDocument "><Users><cfoutput query="getUsers">        <sysUser Status="#Trim(XMLFormat(getUsers.UserStatus))#">              <UserName>#Trim(XMLFormat(getUsers.UserName))#</UserName>              <FirstName>#Trim(XMLFormat(getUsers.UserFirstName))#</FirstName>              <LastName>#Trim(XMLFormat(getUsers.UserLastName))#</LastName>        </sysUser>        </cfoutput>  </Users>  </cfxml>   <p>This is a simple XML document that's been generated by the ColdFusion code.</p>  <cfscript>  myXMLDocument.XmlRoot.XmlChildren[1].XmlComment = "This is a comment";  </cfscript>  <cfdump var=#myXMLDocument#>  </body>  </html>  <cffile action="write"    file="#ExpandPath(".")#\users.xml"    output=#tostring(myXMLDocument)#>

Notice how we added the CFSCRIPT block with the following code:

 myXMLDocument.XmlRoot.XmlChildren[1].XmlComment = "This is a comment";

View it in your browser and, when you see the results of the CFDUMP, click on the "version" link to see an expanded display of the XML document object. You will notice that your XML document has been changed to include an XML comment. In fact, if you write the new XML document to a file, you will see the XML comments are now inside the file. In general, whenever you want to add new data or even elements to an XML document object, you can just follow this general form or use the same methods you have already learned to access a specific node (in our example, we used myXMLDocument.XmlRoot.XmlChildren[1]) in the XML document followed by a element structure key name (or XML property name), which in this last example was XMLComment. Then you use the function XMLElemNew to add a new element.

You must be careful, though, to make sure you point to the right node. If you leave off a numeric index like in myXMLDocument.XmlRoot.XmlChildren, for example, ColdFusion would match the expression to the first node in the list of nodes returned from XMLChildren (actually, in ColdFusion's case, the first element in the XMLChildren array) and assign your expression to it.

For example, if myXMLDocument.XmlRoot.XmlChildren[1] and myXMLDocument.XmlRoot.XmlChildren[2] exist, the following expression replaces myXMLDocument.XmlRoot.XmlChildren with a new element named neel:

 myXMLDocument.XmlRoot.XmlChildren = XmlElemNew(myXMLDocument, "neel");

If you simply want to create a new element in the XML document structure, you can do something like this:

 myXMLDocument.XmlRoot.XmlChildren[1].UserEmail =XmlElemNew(myXMLDocument, "UsersEmail");

This will force ColdFusion to create a new element in the XML document object. The reason for this is that whenever ColdFusion encounters an expression that does not match an element, it will create a new element. You should be careful, though, because simply misspelling an element name can cause the creation of unwanted elements. If you have expressions in which the element and the element value do not match, you will get an error. Therefore, something like the following will cause an error:

 myXMLDocument.XmlRoot.XmlChildren[1].User_Email =XmlElemNew(myXMLDocument, "UsersEmail");

When you are adding new elements to an existing XML document object, you need to be careful not to generate errors this way.

Another nice feature when adding elements to an existing ColdFusion document is that you can have ColdFusion build parent tags for you. If you want to create an expression like the following:

 myXMLDocument.XmlRoot.XmlChildren[1].XmlChildren[1].workphone  =XmlElemNew(myXMLDocument, "workPhone");

ColdFusion automatically creates the needed parent nodes and creates the child node as well.

So far, all the methods we have looked at to add or edit elements in an XML document object have used cfscript, but you can do the same thing by using functions (see Appendix B, "Function Reference") inside CFML. You do this by using array functions to insert or append elements to the XML document object. For example, if we want to add the new element userPhone to our XML document, we can use the following:

 <cfset ArrayAppend(myXMLDocument.XmlRoot.XmlChildren [1]. userPhone,  XmlElemNew(myXMLDocument,"userPhone"))>

You will note that this is more or less the same as the cfscript version, if not as elegant. Now let's say we actually want to insert a new element into our XML object called securityLevel as a child of Users. We can use the following:

 <cfset ArrayInsertAt(myXMLDocument.XmlRoot.XmlChildren, 1,  XmlElemNew(myXMLDocument,"securityLevel"))>

This element is inserted into the first position, which pushes Username into second, FirstName into third, and so on.

Note that the syntax in this instance is parentElement.XmlChildren. You must use this syntax when you are adding a new element to the array of elements.

If you have multiple child elements with the same name and you want to insert a new element in a specific position, use the XmlChildPos function to determine the location in the XmlChildren array where you want to insert the new element. For example, the following code determines the location of mydoc.employee.name[1] and inserts a new name element as the second name element:

 <cfscript>  nameIndex = XmlChildPos(mydoc.employee, "name", 1);  ArrayInsertAt(mydoc.employee.XmlChildren, nameIndex + 1, XmlElemNew(mydoc,            "name"));  </cfscript>

You can also change elements using the same techniques. Let's say we want to make the second sysUser element inactive; to do this, we can apply the same techniques as we would to change any data in a ColdFusion structure and use the following:

 <cfset myXMLDocument.XmlRoot.sysUsers[2].XmlAttributes.Status="Inactive">  <cfset StructInsert(myXMLDocument.XmlRoot.sysUsers [2].XmlAttributes, "Status",  "Inactive")>

Or you could change the attribute's value this way:

 <cfset myXMLDocument.XmlRoot.sysUsers [2].XmlAttributes.Status="Active">

This is more like a standard direct-assignment expression.

Deleting Elements

Just as there are many ways to add elements to an XML document object, there are also many approaches to deleting XML elements from an object.

For example, if we need to delete a specific element from an XML document object, we can use the ArrayDeleteAt function just like you would with any other array.

Let's say we want to delete the LastName element from our Users.xml object. The following line deletes the second child element in the mydoc.employee element:

 <cfset ArrayDeleteAt(myXMLDocument.XmlRoot.XmlChildren, 3)>

Sometimes you will need to dynamically determine the position of XmlChildren in an array, such as in a shopping cart where people are allowed to delete specific line items. You can do this by using a function called XmlChildPos, which works like it sounds. It returns the index of a specific XML child position. For example, if we want to delete again the second sysUser element, we can do the following:

 <cfset indexval = XmlChildPos(myXMLDocument.XmlRoot, "sysUser", 2)>  <cfset ArrayDeleteAt(myXMLDocument.XmlRoot.XmlChildren, indexval)>

If you want to delete multiple children all with the same name, you can use the StructDelete or ArrayClear functions with an element name. This will then delete all of the element's child elements with that name.

If you want to get rid of all your sysUser elements as well as their children, you can use the following:

 <cfset StructDelete(myXMLDocument.Users, "sysUser")>

Or, using the array version, you can use this statement:

 <cfset ArrayClear(myXMLDocument.Users.sysUsers)>

Once again, this is very similar to how you would delete data from a regular structure or array.

You can also use this same method to delete an attribute. For example:

 <cfset StructDelete(myXMLDocument.Users.sysUsers [1].XmlAttributes, "Status")>

By using these same techniques, you can edit, delete, or add any data in an XML document object such as an XMLComment (like we did at the beginning of the section) or an attribute (which we have not shown but to which you could do same things).