XML Processing | REALbasic Cross-Platform Application Development

I have gotten a little ahead of myself and started talking about XML document encoding before saying anything about XML documents themselves. There is not enough space to devote to a full tutorial on XML, but I'd like to take a quick walk-through just in case you are entirely unfamiliar with it.

An XML document is made up of nodes and organized in the form of a tree. There are several kind of nodes, which I will outline next.

Document Node

The Document node is the node that represents the document itself. It doesn't actually appear in a document; it's more of an abstraction, but it becomes relevant when processing XML using REALbasic because you will instantiate a node that is a reference to the entire document. An XML document node generally contains two nodes: the XML declaration that appears at the top of the page and the root document node that contains all the remaining nodes.

XML DeclarationThis node is required and is the first node in the file. It labels the file as an XML file, says what version of XML it is, and specifies the encoding that is required if it is not in UTF-8 or UTF-16. This is technically a processing instruction.
```
<?xml version="1.0" encoding="UTF-8"?>
```
Element NodeThe element node is the primary node used. It is the kind of node that serves as the document root, and it is the node that organizes the document into a tree because it can contain additional nodes of different types. An element node is expressed in XML in terms of tags. If the element does not have any child elements, it can be expressed using this format:
```
<ChildlessNode/> 
```
Although childless elements are not rare, elements with children are definitely more common, and they are defined using two tags, a start tag and an end tag, like so:
```
<Node>...</Node>
```
Attribute NodeAttribute nodes are used to provide additional information about individual elements. They look like this:
```
<Node FirstAttr="1" SecondAttr="2">...</Node>
```
Processing InstructionA processing instruction is a node that is intended to be used by some application while parsing a file. It contains an instruction of some sort and looks like this:
```
<?Processing instruction?>
```
Comment NodeComments do not appear in any visual renderings of an XML document and are intended to communicate information to individuals working with the raw XML files:
```
<!--Last changed by Mark Choate, 10.3.2005 
```
```
<Node>This is some text.</Node>
```
CDATA SectionA CDATA section contains text, but it is not parsed, which means that it can contain XML markup that will be ignored by the processor:
```
<Node><!CDATA[This text will <b>not</b> be parsed]]></Node>
```
DOCTYPE declarationThis is an optional node that follows the XML declaration. It declares the type of document that it is, as defined in a Document Type Declaration (DTD), which can be either in a separate document or embedded in the XML page. The DTD defines the names of legal elements, what their attributes should be, and so on, and can be used to validate a document.
```
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
```

The heavy lifting is done by the element node and associated attribute nodes, plus text nodes. Now that you know the basic structure of an XML document, the next step is to figure out what to do with them. There are two base classes that can be used to process XML documents using REALbasic, XMLReader and XMLDocument. For those of you coming from the Java world, XMLReader works like SAX and XMLDocument like DOM. If this is all new to you, a little explanation is warranted.

The XMLDocument class parses and loads the XML document into memory. It preserves the tree structure of the document by instantiating the elements and attributes as subclasses of the XMLNode class, which is why it is called DOM, which stands for Document Object Model; it's an object model in memory of the XML document itself.

XMLReader reads through the XML document and fires an event whenever a new node is encountered. To use the XMLReader class, you will need to decide what you want to do when a particular event is encountered and implement the events just like any other event you encounter in REALbasic.

One question many people have relates to which is bettershould you use XMLReader or XMLDocument? The general rule when choosing XMLReader and XMLDocument is often given as this: For very large documents use XMLReader; for smaller ones use XMLDocument. The thinking behind this is that because XMLDocument loads an entire document into memory, it can be unwieldy with large documents. In a general sense I suppose this is true, but a better determinant is what you plan to do with the XML document. If you want to open an XML document, manipulate it, or transform it into a different format, you need to use XMLDocument. Not only does the object model make it easy to traverse nodes, it also provides access to XSLT and XPATH queries.

XMLReader, on the other hand, is not very good at transforming one XMLDocument to another. Because it is event driven, your program will need to maintain state as the file is parsed in order to understand the context of particular elements. If the destination XML document format is structured differently from the source document, it can become quite clumsy because you may have to cache a lot of data until you have the information you need to write the XML document out into XML format. XMLDocument, because of its tree structure, provides nonsequential access to the nodes of the document. In XMLReader, you get them one at a time and have to decide what to do with them at that point.

With the RSSReader application, I will be taking XML documents in different formats and using them to create an in-memory representation of the information in each document. I will use the mwNode class as the base class of my object model, and all the different flavors of RSS will get mapped into this object model. Considering the guidelines I just shared, this means that XMLReader is an ideal tool for this process, despite the fact that RSS files are small. You can do it with XMLDocument, too, and I'll re-create the process using it after showing the XMLReader version.

XMLDocument

The XMLDocument class is based on W3C's Document Object Model (DOM) API, although it is not a complete implementation. Nevertheless, that it is related to DOM will explain some of the quirkiness of the API itself, which often seems to try to do easy things the hard way. The reason is that the DOM API is intended for implementation in many programming languages. Because it had to be implemented in many programming languages, it is not exactly optimized for any particular language.

The XMLDocument class itself is the best example. It is a subclass of XMLNode, but it is the base class you will always use when you want to take a DOM-like approach. It is a factory class in the sense that you will use an instance of XMLDocument to create the elements of your XML document rather than using the more typical New operator. Here's an example: You can instantiate an XMLDocument object in three ways: either without any parameters, with a String representation of an XML document, or with a FolderItem that represents an XML file. In this example, I will create a simple XML document using the XMLDocument class:

Dim xDoc as XMLDocument Dim tmpNode as XMLElement xDoc = New XMLDocument() tmpNode = xDoc.AppendChild(xDoc.CreateElement("html"))

This shows the typical way that elements and other XML nodes are created. With the exception of XMLDocument itself, the other nodes are created by calling the appropriate methods on XMLDocument, which in this case was CreateElement. You will need to create elements this way because one of the properties that an element and other nodes must have is a reference to the OwnerDocument. Creating the instance using the xDoc.CreateElement method does that for you and associates the child element with this particular document.

Although XMLDocument is an XMLNode, it is more of an abstraction than an actual node because it doesn't actually appear anywhere when you convert the object model into a string. In the previous example, the first element I created was an html element, which will serve as the root element for the document. The XMLDocument.DocumentElement property is a reference to this root element (and it must be an element, and not some other node type).

If I were to call xDoc.toString on this example, I would find that the document actually has two nodes: the XML declaration and the html node:

<?xml version="1.0" encoding="UTF-8"?> <html/>

Even though it is not visibly present, the XMLDocument node can be thought of as the node that contains both the XML declaration and the root element itself.

In this example, I assume the presence of a FolderItem pointing to an atom XML file (see the example in the XMLReader section):

xDoc = New XMLDocument(anXmlFile as FolderItem)

You now have a fully populated document. The way to traverse and manipulate this document is by using the XMLNode class, which I have referred to but have yet to discuss. A document is a collection of nodes, and the parent class for all of these nodes is XMLNode. It provides you with methods to access child nodes, get and set attributes, and similar activities. A summary of properties and methods follows:

XMLNode Properties

When dealing with XMLNodes, you will primarily be dealing with subclasses of XMLNode. This, after all, is one of the benefits of object-oriented programming. Only XMLNode subclasses appear in an XML document, never plain XMLNodes. The role that XMLNode plays is twofold. First, it implements various methods that are shared by all nodesa typical example of code reuse. Second, it serves as an abstract class so that all the other subclasses can be referenced as XMLNodes when you do not know what kind of node to expect. Not knowing what kind of node to expect happens all the time because you usually do not know the specific organization and structure of a document before you encounter it. For example, I may have a body element in an html document. The children of the body element can be XMLElement, XMLComment, XMLText, and so on. I need to be able to get a reference to that child without knowing the specific subclass; because they all share the same parent class, I can reference them as XMLNodes.

One interesting aspect of XML in REALbasic is that there are 13 node types, which is determined by accessing the following property:

XMLNode.Type as Integer

There are also XMLNode subclasses that correspond to some of the types. What's interesting is that only a few node types are also represented by subclasses. Following is the complete list of node types, and the class constants you can use to access them:

Constant	Value
XMLNodeType.Element_Node	1
XMLNodeType.Attribute_Node	2
XMLNodeType.Text_Node	3
XMLNodeType.CData_Section_Node	4
XMLNodeType.Entity_Reference_Node	5
XMLNodeType.Entity_Node	6
XMLNodeType.Processing_Instruction_Node	7
XMLNodeType.Comment_Node	8
XMLNodeType.Document_Node	9
XMLNodeType.Document_Type_Node	10
XMLNodeType.Document_Fragment_Node	11
XMLNodeType.Notation_Node	12
XMLNodeType.Other_Node	13

XMLNode subclasses are

XMLDocument XMLElement XMLComment XMLProcessingInstruction XMLAttribute XMLTextNode XMLCDATASection

There is a correspondence between the node subclasses and the factory methods available to XMLDocument, which are as follows:

XMLDocument.CreateAttribute(name as String) as XMLAttribute XMLDocument.CreateAttribute(name as String, URI as String) as XMLAttribute XMLDocument.CreateElement(name as String) as XMLElement XMLDocument.CreateElement(name as String, URI as String) as XMLElement XMLDocument.CreateComment(data as String) as XMLComment XMLDocument.CreateCDATASection(data as String) as XMLCDATASection XMLDocument.CreateProcessingInstruction(target as String, data as String)   as XMLProcessingInstruction XMLDocument.CreateTextNode(data as String) as XMLTextNode

So, why isn't there an XMLNode subclass for each type of node? I have no idea. In fact, the XMLDocument class limits the amount of control you have over your XML documents. It's an excellent tool for parsing documents and using XPATH to search through, but it is not particularly good at creating XML files. In all fairness, however, this has nothing to do with REALbasic and everything to do with the DOM API. In every other language I'm familiar with, people complain about DOM-like implementations and come up with alternatives that seem much more natural to that particular language. JDOM is a good example from the Java world. It parses XML and creates an object model that works more like a typical Java object, without the factory methods and awkward dealings with the XMLDocument class.

The XMLNode class contains some basic properties shared by the subclasses, although not all subclasses use the properties in the same way. For example, some (but not all) nodes use the Value property:

XMLNode.Value as String

Basically, it's used to hold text for TextNodes and CDATASections. The following cluster of properties deal with namespaces.

XMLNode.Name as String XMLNode.LocalName as String XMLNode.Prefix as String XMLNode.NamespaceURI as String

A namespace is a form of shorthand for XML documents so that you can have unique element names, but also ensure that they are relatively short. To that end, a fully qualified name is a combination of NamespaceURI and LocalName. These usually take on the form of longish URLS, like this:

http://purl.org/atom/ns#type

The first part up to and including the # sign is the NamespaceURI and the last part, type, is the LocalName. Namespaces can be declared by using the xmlns property in an XMLElement. This will come up again shortly, so I will not dwell on the details now, but suffice it to say that when you declare a namespace, you define a prefix that will be used in place of the NamespaceURI. This is why you will see the following in Atom documents:

atom:type

The atom portion is the prefix and the type is the LocalName. Altogether, this makes up the Name of the element. The Name can vary, depending on whether you decide to use namespaces. The fully qualified name I shared before is a Name, too. Because of this, you need to make sure you know what you are looking at and whether it is a Name, a LocalName, or whatever.

Here are some self-explanatory methods for navigating through an XMLDocument. If you plan to do a lot of navigating, you should use XPATH.

XMLNode.Parent as XMLNode XMLNode.OwnerDocument as XMLDocument XMLNode.ChildCount as Integer XMLNode.FirstChild as XMLNode XMLNode.LastChild as XMLNode XMLNode.NextSibling as XMLNode XMLNode.PreviousSibling as XMLNode

You can get a string representation of the file (useful for saving it):

XMLNode.ToString as String

Finally, you can get rather detailed error information using the un-object-oriented LastError property.

XMLNode.LastError as Integer

The language reference provides a comprehensive list of LastError codes.

XMLNode Methods

The following methods provide a means for adding, inserting, deleting, and comparing child nodes as well as getting a reference to them.

XMLNode.Child(index as Integer) as XMLNode XMLNode.AppendChild(NewChild as XMLNode) as XMLNode XMLNode.Insert(NewChild as XMLNode, RefChild as XMLNode)    as XMLNode XMLNode.RemoveChild(OldChild as XMLNode) XMLNode.ReplaceChild(NewChild as XMLNode, OldChild as XMLNode) XMLNode.Clone(deep as Boolean) as XMLNode XMLNode.Compare(NodeToCompare as XMLNode) as Integer

Remember when I mentioned XPATH? It is time to look at it now that the XQL method has been encountered:

XMLNode.XQL(Query as String, [NamespaceMap as String]) as XMLNodeList

XPATH is a query language for XML, much like SQL is a query language for databases. It's big and powerful and complicated (at times), so I cannot take time to give you all you need to know about XPATH to use it, but I can get you started. There are a lot of good books and online resources you can turn to when you are ready for more advanced topics.

XPATH identifies nodes within an XML document using a path, much like the path you use when working with files and folders. This makes sense because both the file system and an XML document are organized like trees. A folder can contain files or other folders, and an element can contain text nodes or other elements, and so on. Not only are the paths used by XPATH conceptually similar to paths used when working with files, they look like them, too (if you look at the world from a Unix perspective, at least).

Here is an RSS file that I will use as an example. It comes from the REALsoftware website and is based on the RDF version of RSS, known as RSS1.0. Note that some of the URLs have been shortened to fit on the printed page.

Listing 6.1. REALsoftware RDF-based version of RSS

<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"         xmlns="http://purl.org/rss/1.0/">     <channel rdf:about="http://www.realsoftware.com/rss/realsoftware.xml">          <title>REAL Software News</title>          <link>http://www.realsoftware.com</link>          <description>              News and Events for REALbasic Users              </description>          <image rdf:resource="http://www.realsoftware.com/images/rssGraphic.gif" />          <language>en-us</language>          <items>              <rdf:Seq>                  <li rdf:resource="http://www.realsoftware.com/pr_rb553.html" />                  <li rdf:resource="http://www.realsoftware.com/pr_vbwp.html" />                  <li rdf:resource="http://www.realsoftware.com/pr_opp100.html" />                  <li rdf:resource="http://www.realsoftware.com/pr_showcase_launch.html" />                  <li rdf:resource="http://www.realsoftware.com/realbasic/" />                  <li rdf:resource="http://www.realsoftware.com/rb_awards1.html" />                  <li rdf:resource="http://www.realbasic.com/realworld/" />                  <li rdf:resource="http://www.realsoftware.com/macintouch2003.html" />              </rdf:Seq>          </items>     </channel>     <image rdf:about="http://www.realsoftware.com/images/rssGraphic.gif">          <title>REAL Software, Inc.</title>          <url>http://www.realsoftware.com/images/rssGraphic.gif</url>          <link>http://www.realsoftware.com/</link>     </image> <!-- Content Below Here ! -->     <item rdf:about="http://www.realsoftware.com/news/pr/2005/2005r3/lnx/">        <pubDate>Tue, 13 Sep 2005 08:05:00 CDT</pubDate>        <title>REAL Software Ships REALbasic 2005 for Linux;                    Brings Rapid Application Development to Linux                 </title>        <link>http://www.realsoftware.com/news/pr/2005/2005r3/lnx/</link>        <description>REAL Software, Inc., providers of REALbasic, cross-platform that really works, announced today the company is shipping REALbasic 2005 for Linux.  REALbasic 2005 for Linux is a rapid application development (RAD) environment that enables professional and non-professional programmers alike to quickly create software for Linux.          </description>     </item>     <item rdf:about="http://www.realsoftware.com/news/pr/2005/2005r3/mac/">        <pubDate>Tue, 13 Sep 2005 08:05:00 CDT</pubDate>        <title>REALbasic 2005 for Macintosh Release 3 Ships;                 Improves 3D GraphicsCapability and Reliability;                 Helps Bring More Software to the Macintosh </title>     <link>http://www.realsoftware.com/news/pr/2005/2005r3/mac/</link>        <description>REAL Software Inc., providers of REALbasic, cross-platform that really works, announced today that REALbasic 2005 for Macintosh Release 3 is shipping.  REALbasic 2005 for Macintosh is a rapid application development (RAD) environment that enables professional and non-professional programmers alike to quickly create software for the Macintosh, and even Windows and Linux, by clicking a checkbox.</description>     </item>     <item rdf:about="http://www.realsoftware.com/news/pr/2005/2005r3/win/">        <pubDate>Tue, 13 Sep 2005 08:05:00 CDT</pubDate>        <title>REALbasic 2005 for Windows Release 3 Ships; Improves Visual Basic Compatibility, Reliability </title>        <link>http://www.realsoftware.com/news/pr/2005/2005r3/win/</link>        <description>REAL Software, Inc., providers of REALbasic, cross-platform that really works, announced today that REALbasic 2005 for Windows Release 3 is shipping. REALbasic 2005 for Windows is a rapid application development (RAD) environment that enables professional and non-professional programmers alike to quickly create software that runs on Windows, even Macintosh and Linux. REALbasic creates self-contained executables and eliminates the common problems and installation issues associated with DLLs or external frameworks typical of other cross-platform development environments.</description>     </item> </rdf:RDF>

At the most basic level, XPATH deals in element names. That means that if you want to access the channel element, you need to construct an XPATH that identifies the containment path from the root RDF node down to the channel. Also note that you will be using the localName of the element to create the path. The path to use is this:

/RDF/channel

The way this would use this XPATH in REALbasic would be something like this:

Dim xdoc as XMLDocument Dim xnl as XMLNodeList Dim count as Integer Dim aNode as XMLNode xdoc = new XMLDocument(anRSSFolderItem) xnl = xdoc.XQL("/RDF/channel") count = xnl.length //I know...why length? aNode = xnl.Item(0) MsgBox aNode.ToString

Using this path in REALbasic causes an XMLNodeList to be returned that contains the following element:

<channel rdf:about="http://www.realsoftware.com/rss/realsoftware.xml" >   <title>REAL Software News</title>   <link>http://www.realsoftware.com</link>   <description>News and Events for REALbasic Users</description>   <image rdf:resource="http://www.realsoftware.com/images/rssGraphic.gif"/>   <language>en-us</language>   <items>     <rdf:Seq>       <li rdf:resource="http://www.realsoftware.com/pr_rb553.html"/>       <li rdf:resource="http://www.realsoftware.com/pr_vbwp.html"/>       <li rdf:resource="http://www.realsoftware.com/pr_opp100.html"/>       <li rdf:resource="http://www.realsoftware.com/pr_showcase_launch.html"/>       <li rdf:resource="http://www.realsoftware.com/realbasic/" />       <li rdf:resource="http://www.realsoftware.com/rb_awards1.html"/>       <li rdf:resource="http://www.realbasic.com/realworld/"/>       <li rdf:resource="http://www.realsoftware.com/macintouch2003.html"/>     </rdf:Seq>   </items> </channel>

If you were processing this file and you wanted to find out the title of this particular channel, you would need to create an XPATH expression that returned only the channel title. It would look like this:

/RDF/channel/title

The results would be an XMLNodeList with one element:

<title>REAL Software News</title>

Now, suppose you wanted to find the titles for all the items in this RSS feed. You would use the following XPATH expression:

/RDF/item/title

Then you would receive an XMLNodeList with several title elements listed:

[View full width]
<title>REAL Software Ships REALbasic 2005 for Linux; Brings Rapid Application Development  to Linux</title> <title>REALbasic 2005 for Macintosh Release 3 Ships; Improves 3D Graphics Capability and Reliability; Helps Bring More Software to the Macintosh </title> <title>REALbasic 2005 for Windows Release 3 Ships; Improves Visual Basic Compatibility, Reliability </title>

Finally, suppose you want to find the title of anything in the file, whether it's a channel or an item. There's a special XPATH expression for that. If you want to find an element anywhere in a document, regardless of the preceding path, precede the element name with // like so:

//title

Voila! The results include every title mentioned in the document:

[View full width]
<title>REAL Software News</title> <title>REAL Software, Inc.</title> <title>REAL Software Ships REALbasic 2005 for Linux; Brings Rapid Application Development  to Linux</title> <title>REALbasic 2005 for Macintosh Release 3 Ships; Improves 3D Graphics Capability and Reliability; Helps Bring More Software to the Macintosh </title> <title>REALbasic 2005 for Windows Release 3 Ships; Improves Visual Basic Compatibility, Reliability </title>

XML to HTML Transformation

XPATH is closely related to XSLT, which is a process of transforming one XML document into another document with a different format. The relationship comes from the fact that XPATH was developed for use with XSLT, and you will see familiar elements in the following example.

First off, here is an example of an XSLT stylesheet that is used to convert an RSS xml file into html, so that it can be viewed in a browser. This is something that will be quite useful in our RSSReader application. I will not go into excessive detail about how to write a stylesheet, but you should note that a stylesheet consists of a number of templates that get matched to text according to an XPATH expression. The output of that matching is written within the template tag. Often, the apply-templates tag is also called, which simply tells the parser to continue parsing the XML file and looking for other XPATH expressions to evaluate:

Listing 6.2. XSLT style sheet converted to HTML from RSS XML File

<!-- RSSview.xsl: retrieve RSS feed(s) and convert to HTML. --> <!-- Based on XSL distributed with Apache Forrest from the Apache Foundation --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"       xmlns:dc="http://purl.org/dc/elements/1.1/"   version="1.0">   <xsl:output method="html"/>   <xsl:template match="/" >     <html><head><title>Today's RSS Feed</title></head>    <body>       <xsl:apply-templates/>    </body></html>   </xsl:template>   <xsl:template match="channel">     <xsl:apply-templates />   </xsl:template>   <!-- Named template outputs HTML a element with href link and RSS         description as title to show up in mouseOver message. -->   <xsl:template name="anchor">     <xsl:element name="a" >        <xsl:attribute name="href">          <xsl:apply-templates select="*[local-name()= 'link;]"/>        </xsl:attribute>        <xsl:attribute name="title">          <xsl:apply-templates select="*[local-name()='description']"/>        </xsl:attribute>        <xsl:value-of select="*[local-name()='title' ]" />     </xsl:element>   </xsl:template>   <!-- Output RSS channel name as HTML a link inside of h1 element. -->   <xsl:template match="*[local-name()='channel' ]">     <xsl:element name="h1">        <xsl:call-template name="anchor"/>     </xsl:element>     <!-- Following line for RSS .091 -->     <xsl:apply-templates select="*[local-name()='item']"/>   </xsl:template>   <!-- Output RSS item as HTML a link inside of p element. -->   <xsl:template match="*[local-name()='item']">     <xsl:element name="p">       <xsl:call-template name="anchor"/>       <xsl:text> </xsl:text>       <xsl:if test="dc:date"> <!-- Show date if available -->         <xsl:text>( </xsl:text>         <xsl:value-of select="dc:date"/>         <xsl:text>) </xsl:text>       </xsl:if>     </xsl:element>   </xsl:template> </xsl:stylesheet>

Using REALbasic's XML classes, you can process this the following way:

Dim style as FolderItem Dim source as FolderItem Dim stylesheet as XMLStyleSheet Dim xdoc as XMLDocument style = getOpenFolderItem(FileTypes1.XSL) source = getOpenFolderITem(FileTypes1.XML) stylesheet = New XMLStyleSheet(style) xdoc = New XMLDocument(source) MsgBox xdoc.Transform(stylesheet)

The output display in the MsgBox would look like this:

[View full width]
<html xmlns:dc="http://purl.org/dc/elements/1.1/">   <head>     <META http-equiv="Content-Type"   content="text/html; charset=UTF-8">     <title>Today's RSS Feed</title>   </head>   <body>     <h1>       <a href="http://www.realsoftware.com"   title="News and Events for REALbasic Users">REAL Software News</a>     </h1>          REAL Software, Inc.          http://www.realsoftware.com/images/rssGraphic.gif          http://www.realsoftware.com/      <p>        <a href="http://www.realsoftware.com/news/pr/2005/2005r3/lnx/ "        title="REAL Software, Inc., providers of REALbasic, cross-platform that really works, announced today the company is shipping REALbasic 2005 for Linux.">         REAL Software Ships REALbasic 2005 for Linux;         Brings Rapid Application Development to Linux</a>     </p>     <p>        <a href="http://www.realsoftware.com/news/pr/2005/2005r3/mac/ "        title="REAL Software Inc., providers of REALbasic,        cross-platform that really works, announced today that REALbasic        2005 for Macintosh Release 3 is shipping.  REALbasic 2005 for Macintosh is a        rapid application development (RAD) environment that enables professional        and non-professional programmers alike to quickly create software for the        Macintosh, and even Windows and Linux, by clicking a checkbox.">        REALbasic 2005 for Macintosh Release 3 Ships; Improves 3D Graphics        Capability and Reliability; Helps Bring More Software to the Macintosh        </a>     </p>     <p>        <a href="http://www.realsoftware.com/news/pr/2005/2005r3/win/ "         title="REAL Software, Inc., providers of REALbasic, cross-platform that really works, announced today that REALbasic 2005 for Windows Release 3 is shipping. REALbasic 2005 for Windows is a rapid application development (RAD) environment that enables professional and non-professional programmers alike to quickly create software that runs on Windows, even Macintosh and Linux. REALbasic creates self-contained executables and eliminates the common problems and installation issues associated with DLLs or external frameworks typical of other cross-platform development environments.">REALbasic 2005 for Windows Release 3 Ships; Improves Visual  Basic Compatibility, Reliability </a>     </p>   </body> </html>

Using this technique, you can see how the RSSReader application will download an XML file, convert it to HTML, and then display it in the HTMLViewer control.

XMLReader

The XMLReader class is harder to use because of the challenges around maintaining state as you parse the document. The best way to experiment with and learn about the XMLReader class is to use it with a variety of XML documents and observe how and when the events are fired. The following class, called rbXmlReader, captures data about every event that gets triggered while parsing an XML document. When it is finished parsing the document, it displays the raw text of the XML document in one field with some of the data in bold (I'll explain why later), and in another field, it displays a summary of which events fired, and some associated information.

In the sample application, the class runs and displays the results in two separate EditFields. EditField1 shows the source XML data, and EditField2 shows the output generated by the events. One bit of functionality that I have added to the example, which serves no purpose other than to show you that it can be done, is that in EditField1 all the character data is highlighted in bold. Be sure to read the comments in the Characters event to learn about all the things that can go wrong when you try this.

Figure 6.1. The rbXMLReader application.

Class rbXmlReader

The displayString property is the buffer that collects data as each event is triggered. Because XML documents are hierarchical and elements can be nested within other elements, the indent property is used to track the current depth; in other words, how many elements have been nested at a given point in time. This gives the class information about how to indent the results.

Property displayString(-1) As String Property indent as integer

The next group of items are the events that get triggered while parsing the document. In each case, the name of the event is sent to the buffer to be displayed. If information is passed to the event that is important, that information is recorded as well. The events apply both to XML documents and to Document Type Definitions, or DTDs, which specify a document type for XML and determine whether it is valid. DTDs can be internal or external, and the XMLReader can read either one. If the DTD is external, the standalone attribute of the XML declaration must be False; otherwise it can be true.

As you scan through this class, be sure to take a look at the values that are passed for each event and how they are referenced, because that is the data you will be working with.

Listing 6.3. Sub XmlDecl(version as String, xmlEncoding as String, standalone as Boolean) Handles Event

// The first event triggered setDisplayString("XML Declaration: " + version + ", " + xmlEncoding)

Listing 6.4. Sub StartPrefixMapping(prefix as String, uri as String) Handles Event

// Whether namespace prefixes are mapped is decided by which // Constructor is used when instantiating the object. // To map prefixes, instantiate the rbXmlReader object this way: // aReader = New rbXMLReader("UTF-8", "") // The second argument specifies a character or characters to // use to separate the localName from the URI. // This is a convenience feature, not part of the namespace spec. setDisplayString("Start Prefix Mapping:" + prefix + ":" + uri)

Listing 6.5. Sub EndPrefixMapping(prefix as String)

// In many instances, namespaces are declared once, so this // event isn't triggered until all the elements have been // processed. In the atom.xml file used in the example, // namespaces are declared in event <entry> element and every // <div> element, so this event is triggered often. setDisplayString("End Prefix Mapping: " + prefix)

Listing 6.6. Sub StartDocument() Handles Event

setDisplayString("Start Document: " + str(ticks))

Listing 6.7. Sub EndDocument() Handles Event

// All elements have been parsed so display the string // in EditField2 setDisplayString("End Document") Window1.EditField2.Text = Join(displayString, EndOfLine)

Listing 6.8. Sub StartElement(name as String, attributeList as XmlAttributeList) Handles Event

// Increment the indent level each time a start element is // encountered incIndent setDisplayString("Start Element: " + name + ", " _ + str(attributeList.count) + "attributes") // If there is at least one attribute, then call // the handleAttributes method If attributeList.count > 0 Then     handleAttributes(attributeList) End If

Listing 6.9. Sub EndElement(name as String) Handles Event

setDisplayString("End Element: " + name) // Decrement the indent level each time an end element is // encountered. decIndent

Listing 6.10. Sub StartCDATA() Handles Event

// CDATA is a special kind of text data that does not // get parsed and therefore you do not need to escape any // values. It is enclosed in the followed tag: // <!CDATA[ some data ]]> SetDisplayString("Start CDATA:")

Listing 6.11. Sub EndCDATA() Handles Event

SetDisplayString("End CDATA:")

Listing 6.12. Sub Characters(s as String) Handles Event

// Characters represent text that is wrapped inside open and // close elements. The SAX standard treats all white space as // character data, too, so if your XML document is formatted // nicely and indented, then the EndOfLine characters and spaces // will trigger this event as well. The SAX specification itself // also allows for the possibility of Character events being // fired successively. When using other SAX or SAX-like parsers // be aware of this one string of text may show up in two // events. In my experience, REALbasic will not fire sequential // Character events. setDisplayString("Characters: " + s) // The following code applies a bold style to the character data // in the display of the original XML file. This can be tricky! // The "lines" referred to by the XMLReader class are not // necessarily the same lines according to an EditField if you // have the multiline property set to True. In order for them // to synch up, you need to have multiline set to False. // In Windows, one more step is required: you also need to // set the vertical scrollbar property to True. If you don't // All you will see is one line, and not all the rest. In // this context, setting multiline to False really means that // the EditField should rely on the line endings in the original // document, which happen to be the line endings that XMLReader // relies on as well. The XMLReader.currentLineNumber and // XMLReader.currentColumnNumber are properties that are // always available throughout the parsing process, so you // could use a similar technique to implement syntax // highlighting. Dim line, start, length as Integer If (Trim(s) <> "") Then line = Window1.EditField1.CharPosAtLineNum(currentLineNumber - 1) start = line + CurrentColumnNumber - 1 length = Len(s) If length > 1 Then Window1.EditField1.StyledText.Bold(start, length) = True End if End If

Listing 6.13. Sub Comment(data as String) Handles Event

// XML Comments: <!--a comment  setDisplayString("Comment: " + data)

Listing 6.14. Sub ProcessingInstruction(target as String, data as String) Handles Event

// Processing instructions look like: <?instruction?> setDisplayString("Processing Instruction: " + target + " = " + data)

Listing 6.15. Sub Default(s as String) Handles Event

// A Default method that can be used for all events // by calling XmlReader.SetDefaultHandler = True setDisplayString("Default Handler")

Listing 6.16. Function ExternalEntityRef(context as String, base as String, systemId as String, publicId as String) As Boolean Handles Event

setDisplayString("ExternalEntity")

Listing 6.17. Sub SkippedEntity(entityName as String, is_parameter_entity as Boolean) Handles Event

setDisplayString("Skipped Entity: " + entityName)

Listing 6.18. Sub StartDoctypeDecl(doctypeName as String, systemId as String, publicId as String, has_internal_subset as Boolean)

//The remaining events related to DTDs setDisplayString("Start Doctype Declaration: " + doctypeName + ", " + systemID + ", " + publicID)

Listing 6.19. Function NotStandalone() As Boolean Handles Event

// If NotStandalone is triggered, it means an external // DTD must be processed setDisplayString("Not Stand-alone")

Listing 6.20. Sub EndDoctypeDecl() Handles Event

setDisplayString("End Doctype Declaration")

Listing 6.21. Sub EntityDecl(entityName as String, is_parameter_entity as Boolean, value as String, base as String, systemId as String, publicId as String, notationName as String) Handles Event

setDisplayString("Entity Declaration: name-"_ + entityName + ", value-" + value + ", " + base + ", " _ + systemID + ", " + publicID + ", " + notationName)

Listing 6.22. Sub ElementDecl(name as String, content as XmlContentModel) Handles Event

setDisplayString("Element Declaration: " + name)

Listing 6.23. Sub AttlistDecl(elname as String, attname as String, att_type as String, dflt as String, isrequired as Boolean) Handles Event

setDisplayString("AttlistDecl: " + elname + ", " + attname)

Listing 6.24. Sub NotationDecl(notationName as String, base as String, systemId as String, publicId as String) Handles Event

setDisplayString("Notation")

Listing 6.25. Sub setDisplayString(myString as String)

// Buffer for output Dim indentString as String Dim x as Integer For x = 0 To indent     indentString = indentString + "+" Next displayString.Append(str(currentLineNumber) + "/" _ + str(currentColumnNumber) + indentString + myString)

Listing 6.26. Sub incIndent()

// Increment Indent Level (when StartElement is called) indent = indent + 1

Listing 6.27. Sub decIndent()

// Decrement Indent Level (when End Element is called) // Make sure the number is not less than zero. If indent > 0 Then     indent = indent - 1 End If

Listing 6.28. Sub handleAttributes(attList as xmlAttributeList)

// Convert attributes into a String Dim x as integer Dim attString as string For x = 0 to attList.Count - 1     attString = attString + "Key: " + attList.key(x) _     + EndOfLine + "Value: " + attList.value(x) _     + endOfLine Next     setDisplayString("Attribute List: " _     + endOfLine + attString + "---")

Input XML File

Next, I will parse an XML file with this class and examine the results. The input file used is an Atom file, and Atom is considered to be a replacement of older RSS forms such as RSS2.0 and RSS1.0. The following is a real atom.xml file generated by blogger.com. Take a look at the <feed> and <div> elementsboth have an xmlns attribute, which is the attribute that is used to define namespaces. The namespace used for <feed> is:

xmlns="http://purl.org/atom/ns#"

The namespace used for <div> is:

xmlns="http://www.w3.org/1999/xhtml"

These namespaces do not specify a prefix, which means that they are the default namespace. Because namespaces apply to the element in which they are declared and those following it, each element after the declaration uses the same namespace. When the <div> namespace is used, it becomes the default namespace so that everything within the <div> element uses the same namespace, which happens to be the namespace used for HTML elements.

Namespaces will play an important role in this application because we will be dealing with three RSS file formats that all use similar element names. For example, there is a <title> element in all three formats. There may be situations where I want to process an Atom <title> element differently from an RSS1.0 <title> element. (I also want to have one XmlReader subclass rather than one for each format.) Because of this, processing this XML document by having the XmlReader class expand namespaces means that I will readily identify the source for any given <title> element.

Listing 6.29. Use the XmlReader class to expand namespaces so you can identify the source

[View full width]

<?xml version="1.0"   encoding="UTF-8"   standalone="yes" ?> <?xml-stylesheet href="http://www.blogger.com/styles/atom.css"   type="text/css"?> <feed xmlns="http://purl.org/atom/ns#" version="0.3"    xml:lang="en-US">     <link href="https://www.blogger.com/atom/11574818" rel="service.post"         title="Choate Business "type="application/atom+xml"/>     <link href="https://www.blogger.com/atom/11574818" rel="service.feed"         title="Choate Business" type="application/atom+xml"/>     <title mode="escaped" type="text/html">Choate Business</title>     <tagline mode="escaped" type="text/html"> </tagline>     <link href="http://choate.info/Blog/Business/" rel="alternate"         title="Choate Business" type="text/html"/>     <id>tag:blogger.com,1999:blog-11574818</id>     <modified>2005-08-07T17:48:56Z</modified>     <generator url="http://www.blogger.com/" version="5.15">Blogger</generator>     <info mode="xml" type="text/html">         <div xmlns="http://www.w3.org/1999/xhtml">This is an Atom         formatted XML site feed. It is intended to be viewed in a         Newsreader or syndicated to another site. Please visit the <a         href="http://help.blogger.com/bin/answer.py?answer=697">Blogger Help</a>         for more info.</div>     </info>     <convertLineBreaks xmlns="http://www.blogger.com/atom/ns#">false</convertLineBreaks>     <entry xmlns="http://purl.org/atom/ns#">            <link href="https://www.blogger.com/atom/11574818/112343655183745944"            rel="service.edit" title="African-American's Business Ownership is            Growing Dramatically" type="application/atom+xml"/>            <author>                <name>Mark S. Choate</name>            </author>            <issued>2005-08-07T13:38:00-04:00</issued>            <modified>2005-08-07T17:48:56Z</modified>            <created>2005-08-07T17:42:31Z</created>            <link     href="http://choate.info/Blog/Business/2005/08/african-americans-business-ownership .html" rel="alternate" title="African-American's Business Ownership is Growing Dramatically"   type="text/html"/>           <id>tag:blogger.com,1999:blog-11574818.post-112343655183745944</id>           <title mode="escaped" type="text/html">African-American's Business Ownership is Growing Dramatically</title>           <content type="application/xhtml+xml" xml:base="http://choate.info/Blog/Business/" xml:space="preserve" >               <div xmlns="http://www.w3.org/1999/xhtml">According to a recent report from the Small Business Administration, there has been a 45% increase in the number of African-American owned businesses since 1997. Likewise, Hispanic- owned businesses have increased by 31% and there has also been a 20% increase in women-owned businesses during that same time period.</div>         </content>         <draft xmlns="http://purl.org/atom-blog/ns#">false</draft>     </entry>     <entry xmlns="http://purl.org/atom/ns#">            <link href="https://www.blogger.com/atom/11574818/111832600521939707" rel="service.edit" title="Economic downturns and new business growth" type="application/atom+xml"/>            <author>                <name>Mark S. Choate</name>            </author>            <issued>2005-06-09T09:56:00-04:00</issued>            <modified>2005-06-09T14:14:54Z</modified>            <created>2005-06-09T14:06:45Z</created>            <link href="http://choate.info/Blog/Business/2005/06/economic- downturns-and-new-business_09.html" rel="alternate" title="Economic downturns and new business growth" type="text/html"/>            <id>tag:blogger.com,1999:blog-11574818.post-111832600521939707</id>            <title mode="escaped" type="text/html">Economic downturns and new business  growth</title>            <content type="application/xhtml+xml" xml:base="http://choate.info/Blog/Business/" xml:space="preserve" >             <div xmlns="http://www.w3.org/1999/xhtml">It's not uncommon for there to be an upswing in the creation of new businesses in an economic downturn. One reason for this is that often experienced professionals are laid off and receive good severance packages which provides them the opportunity to start their own business. Growth in sole-proprietorship has been on a steady increase since 2000, according to the SBA and this may provide some support to the idea that businesses get started in an downturn. And since small businesses represent upwards of 75% of all new job, that's very good news. <span >The numbers themselves are impressive. Income for sole proprietors was $728.4 billion in 2000 and that figure has grown to $902.8 billion in 2004, a 19.3% increase. Bankruptcies are also at a lower rate than 2000, which was the peak of the Internet bubble. Perhaps the most interesting figure in the NFIB Business Optimism Index, which was 104.6 in 2004, and only 100.3 in 2000. Small businesses appear to be doing better than they have in the last five years, which indicates a strong, thriving economy despite the gloom often distributed by the news media. Source: SBA <i>The Small Business Advocate</i> , Vol. 24, No. 6</span>             </div>         </content>         <draft xmlns="http://purl.org/atom-blog/ns#">false</draft>    </entry> </feed>

Output

What follows is the output of the preceding atom.xml file when processed by the rbXmlReader class. Before sharing the entire document output, I have included a snippet of what the output would be like if I was not expanding namespaces. On the fifth line you will see a reference to the <feed> element:

1/0+Start Document: 1.012971e+7 1/0+XML Declaration: 1.0, UTF-8 2/0+Processing Instruction: xml-stylesheet = href="http://www.blogger.com/styles/atom.css" type="text/css" 4/0++Start Element: feed, 3 attributes 4/0++Attribute List: Key: xmlns  Value: http://purl.org/atom/ns# Key: version  Value: 0.3 Key: xml:lang  Value: en-US

Now compare that output with the output that expands namespaces. First, there is a new event labeled Start Prefix Mapping, which tells you that namespaces will be expanded. The reference to prefixes refers to the fact that you use prefixes to identify namespaces, unless the element comes from the default namespace. So, for example, it would have been possible in the atom.xml file used in this example to declare two namespaces at the beginning of the document, instead of declaring a new default namespace every time a <feed> or <div> element was encountered. The consequence of this is that you would have to declare a prefix, too, such as:

xmlns:atom="http://purl.org/atom/ns#"

Then you would refer to atom elements using a prefix, such as:

<atom:feed>

Here is sample output with expanded namespaces:

1/0+Start Document: 9.723852e+6 1/0+XML Declaration: 1.0, UTF-8 2/0+Processing Instruction: xml-stylesheet = href="http://www.blogger.com/styles/atom.css" type="text/css" 4/0+Start Prefix Mapping: :http://purl.org/atom/ns# 4/0++Start Element: http://purl.org/atom/ns#feed, 2 attributes 4/0++Attribute List: Key: version  Value: 0.3 Key: http://www.w3.org/XML/1998/namespacelang  Value: en-US ---

Note that the <feed> element reference now includes the expanded namespace identifier:

http://purl.org/atom/ns#feed

What follows is the complete output for the rbXmlReader class. The first number of each line refers to the line number of the original xml file where this element or node was encountered. The second number, following the /, refers to the column, or how many characters in on that particular line the node occurs. You will also see a lot of "Characters:" followed by nothing. Those come about as a result that all characters that appear in the document are processed, even if it is just a newline character used to make the various elements more readable.

Listing 6.30. rbXmlReader class output

[View full width]

1/0+Start Document: 1.017785e+7 1/0+XML Declaration: 1.0, UTF-8 2/0+Processing Instruction: xml-stylesheet = href="http://www.blogger.com/styles/atom.css" type="text/css" 4/0+Start Prefix Mapping: :http://purl.org/atom/ns# 4/0++Start Element: http://purl.org/atom/ns#feed, 2 attributes 4/0++Attribute List: Key: version  Value: 0.3 Key: http://www.w3.org/XML/1998/namespacelang  Value: en-US --- 4/70++Characters: 5/0++Characters: 5/1+++Start Element: http://purl.org/atom/ns#link, 4 attributes 5/1+++Attribute List: Key: href  Value: https://www.blogger.com/atom/11574818 Key: rel  Value: service.post Key: title  Value: Choate Business Key: type  Value: application/atom+xml --- 5/124+++End Element: http://purl.org/atom/ns#link 5/124++Characters: 6/0++Characters: 6/1+++Start Element: http://purl.org/atom/ns#link, 4 attributes 6/1+++Attribute List: Key: href  Value: https://www.blogger.com/atom/11574818 Key: rel  Value: service.feed Key: title  Value: Choate Business Key: type  Value: application/atom+xml --- 6/124+++End Element: http://purl.org/atom/ns#link 6/124++Characters: 7/0++Characters: 7/1+++Start Element: http://purl.org/atom/ns#title, 2 attributes 7/1+++Attribute List: Key: mode  Value: escaped Key: type  Value: text/html --- 7/40+++Characters: Choate Business 7/55+++End Element: http://purl.org/atom/ns#title 7/63++Characters: 8/0++Characters: 8/1+++Start Element: http://purl.org/atom/ns#tagline, 2 attributes 8/1+++Attribute List: Key: mode  Value: escaped Key: type  Value: text/html --- 8/42+++End Element: http://purl.org/atom/ns#tagline 8/52++Characters: 9/0++Characters: 9/1+++Start Element: http://purl.org/atom/ns#link, 4 attributes 9/1+++Attribute List: Key: href  Value: http://choate.info/Blog/Business/ Key: rel  Value: alternate Key: title  Value: Choate Business Key: type  Value: text/html --- 9/106+++End Element: http://purl.org/atom/ns#link 9/106++Characters: 10/0++Characters: 10/1+++Start Element: http://purl.org/atom/ns#id, 0 attributes 10/5+++Characters: tag:blogger.com,1999:blog-11574818 10/39+++End Element: http://purl.org/atom/ns#id 10/44++Characters: 11/0++Characters: 11/1+++Start Element: http://purl.org/atom/ns#modified, 0 attributes 11/11+++Characters: 2005-08-07T17:48:56Z 11/31+++End Element: http://purl.org/atom/ns#modified 11/42++Characters: 12/0++Characters: 12/1+++Start Element: http://purl.org/atom/ns#generator, 2 attributes 12/1+++Attribute List: Key: url  Value: http://www.blogger.com/ Key: version  Value: 5.15 --- 12/57+++Characters: Blogger 12/64+++End Element: http://purl.org/atom/ns#generator 12/76++Characters: 13/0++Characters: 13/1+++Start Element: http://purl.org/atom/ns#info, 2 attributes 13/1+++Attribute List: Key: mode  Value: xml Key: type  Value: text/html --- 13/35+++Characters: 14/0+++Characters: 14/2+++Start Prefix Mapping: :http://www.w3.org/1999/xhtml 14/2++++Start Element: http://www.w3.org/1999/xhtmldiv, 0 attributes 14/44++++Characters: This is an Atom formatted XML site feed. It is intended to be viewed in a Newsreader or syndicated to another site. Please visit the 14/177+++++Start Element: http://www.w3.org/1999/xhtmla, 1 attributes 14/177+++++Attribute List: Key: href  Value: http://help.blogger.com/bin/answer.py?answer=697 --- 14/236+++++Characters: Blogger Help 14/248+++++End Element: http://www.w3.org/1999/xhtmla 14/252++++Characters: for more info. 14/267++++End Element: http://www.w3.org/1999/xhtmldiv 14/267+++End Prefix Mapping: 14/273+++Characters: 15/0+++Characters: 15/1+++End Element: http://purl.org/atom/ns#info 15/8++Characters: 16/0++Characters: 16/1++Start Prefix Mapping: :http://www.blogger.com/atom/ns# 16/1+++Start Element: http://www.blogger.com/atom/ns#convertLineBreaks, 0 attributes 16/60+++Characters: false 16/65+++End Element: http://www.blogger.com/atom/ns#convertLineBreaks 16/65++End Prefix Mapping: 16/85++Characters: 17/0++Characters: 17/1++Start Prefix Mapping: :http://purl.org/atom/ns# 17/1+++Start Element: http://purl.org/atom/ns#entry, 0 attributes 17/41+++Characters: 18/0+++Characters: 18/2++++Start Element: http://purl.org/atom/ns#link, 4 attributes 18/2++++Attribute List: Key: href  Value: https://www.blogger.com/atom/11574818/112343655183745944 Key: rel  Value: service.edit Key: title  Value: African-American's Business Ownership is Growing Dramatically Key: type  Value: application/atom+xml --- 18/190++++End Element: http://purl.org/atom/ns#link 18/190+++Characters: 19/0+++Characters: 19/2++++Start Element: http://purl.org/atom/ns#author, 0 attributes 19/10++++Characters: 20/0++++Characters: 20/3+++++Start Element: http://purl.org/atom/ns#name, 0 attributes 20/9+++++Characters: Mark S. Choate 20/23+++++End Element: http://purl.org/atom/ns#name 20/30++++Characters: 21/0++++Characters: 21/2++++End Element: http://purl.org/atom/ns#author 21/11+++Characters: 22/0+++Characters: 22/2++++Start Element: http://purl.org/atom/ns#issued, 0 attributes 22/10++++Characters: 2005-08-07T13:38:00-04:00 22/35++++End Element: http://purl.org/atom/ns#issued 22/44+++Characters: 23/0+++Characters: 23/2++++Start Element: http://purl.org/atom/ns#modified, 0 attributes 23/12++++Characters: 2005-08-07T17:48:56Z 23/32++++End Element: http://purl.org/atom/ns#modified 23/43+++Characters: 24/0+++Characters: 24/2++++Start Element: http://purl.org/atom/ns#created, 0 attributes 24/11++++Characters: 2005-08-07T17:42:31Z 24/31++++End Element: http://purl.org/atom/ns#created 24/41+++Characters: 25/0+++Characters: 25/2++++Start Element: http://purl.org/atom/ns#link, 4 attributes 25/2++++Attribute List: Key: href  Value: http://choate.info/Blog/Business/2005/08/african-americans-business-ownership.html Key: rel  Value: alternate Key: title  Value: African-American's Business Ownership is Growing Dramatically Key: type  Value: text/html --- 25/202++++End Element: http://purl.org/atom/ns#link 25/202+++Characters: 26/0+++Characters: 26/2++++Start Element: http://purl.org/atom/ns#id, 0 attributes 26/6++++Characters: tag:blogger.com,1999:blog-11574818.post-112343655183745944 26/64++++End Element: http://purl.org/atom/ns#id 26/69+++Characters: 27/0+++Characters: 27/2++++Start Element: http://purl.org/atom/ns#title, 2 attributes 27/2++++Attribute List: Key: mode  Value: escaped Key: type  Value: text/html --- 27/41++++Characters: African-American's Business Ownership is Growing Dramatically 27/102++++End Element: http://purl.org/atom/ns#title 27/110+++Characters: 28/0+++Characters: 28/2++++Start Element: http://purl.org/atom/ns#content, 3 attributes 28/2++++Attribute List: Key: type  Value: application/xhtml+xml Key: http://www.w3.org/XML/1998/namespacebase  Value: http://choate.info/Blog/Business/ Key: http://www.w3.org/XML/1998/namespacespace  Value: preserve --- 28/106++++Characters: 29/0++++Characters: 29/3++++Start Prefix Mapping: :http://www.w3.org/1999/xhtml 29/3+++++Start Element: http://www.w3.org/1999/xhtmldiv, 0 attributes 29/45+++++Characters: According to a recent report from the Small Business Administration, there has been a 45% increase in the number of African-American owned  businesses since 1997. Likewise, Hispanic-owned businesses have increased by 31% and there has also been a 20% increase in women-owned businesses during that same time period. 29/359+++++End Element: http://www.w3.org/1999/xhtmldiv 29/359++++End Prefix Mapping: 29/365++++Characters: 30/0++++Characters: 30/2++++End Element: http://purl.org/atom/ns#content 30/12+++Characters: 31/0+++Characters: 31/2+++Start Prefix Mapping: :http://purl.org/atom-blog/ns# 31/2++++Start Element: http://purl.org/atom-blog/ns#draft, 0 attributes 31/47++++Characters: false 31/52++++End Element: http://purl.org/atom-blog/ns#draft 31/52+++End Prefix Mapping: 31/60+++Characters: 32/0+++Characters: 32/1+++End Element: http://purl.org/atom/ns#entry 32/1++End Prefix Mapping: 32/9++Characters: 33/0++Characters: 34/0++Characters: 34/1++Start Prefix Mapping: :http://purl.org/atom/ns# 34/1+++Start Element: http://purl.org/atom/ns#entry, 0 attributes 34/41+++Characters: 35/0+++Characters: 35/2++++Start Element: http://purl.org/atom/ns#link, 4 attributes 35/2++++Attribute List: Key: href  Value: https://www.blogger.com/atom/11574818/111832600521939707 Key: rel  Value: service.edit Key: title  Value: Economic downturns and new business growth Key: type  Value: application/atom+xml --- 35/171++++End Element: http://purl.org/atom/ns#link 35/171+++Characters: 36/0+++Characters: 36/2++++Start Element: http://purl.org/atom/ns#author, 0 attributes 36/10++++Characters: 37/0++++Characters: 37/3+++++Start Element: http://purl.org/atom/ns#name, 0 attributes 37/9+++++Characters: Mark S. Choate 37/23+++++End Element: http://purl.org/atom/ns#name 37/30++++Characters: 38/0++++Characters: 38/2++++End Element: http://purl.org/atom/ns#author 38/11+++Characters: 39/0+++Characters: 39/2++++Start Element: http://purl.org/atom/ns#issued, 0 attributes 39/10++++Characters: 2005-06-09T09:56:00-04:00 39/35++++End Element: http://purl.org/atom/ns#issued 39/44+++Characters: 40/0+++Characters: 40/2++++Start Element: http://purl.org/atom/ns#modified, 0 attributes 40/12++++Characters: 2005-06-09T14:14:54Z 40/32++++End Element: http://purl.org/atom/ns#modified 40/43+++Characters: 41/0+++Characters: 41/2++++Start Element: http://purl.org/atom/ns#created, 0 attributes 41/11++++Characters: 2005-06-09T14:06:45Z 41/31++++End Element: http://purl.org/atom/ns#created 41/41+++Characters: 42/0+++Characters: 42/2++++Start Element: http://purl.org/atom/ns#link, 4 attributes 42/2++++Attribute List: Key: href  Value: http://choate.info/Blog/Business/2005/06/economic-downturns-and-new-  business_09.html Key: rel  Value: alternate Key: title  Value: Economic downturns and new business growth Key: type  Value: text/html --- 42/185++++End Element: http://purl.org/atom/ns#link 42/185+++Characters: 43/0+++Characters: 43/2++++Start Element: http://purl.org/atom/ns#id, 0 attributes 43/6++++Characters: tag:blogger.com,1999:blog-11574818.post-111832600521939707 43/64++++End Element: http://purl.org/atom/ns#id 43/69+++Characters: 44/0+++Characters: 44/2++++Start Element: http://purl.org/atom/ns#title, 2 attributes 44/2++++Attribute List: Key: mode  Value: escaped Key: type  Value: text/html --- 44/41++++Characters: Economic downturns and new business growth 44/83++++End Element: http://purl.org/atom/ns#title 44/91+++Characters: 45/0+++Characters: 45/2++++Start Element: http://purl.org/atom/ns#content, 3 attributes 45/2++++Attribute List: Key: type  Value: application/xhtml+xml Key: http://www.w3.org/XML/1998/namespacebase  Value: http://choate.info/Blog/Business/ Key: http://www.w3.org/XML/1998/namespacespace  Value: preserve --- 45/106++++Characters: 46/0++++Characters: 46/3++++Start Prefix Mapping: :http://www.w3.org/1999/xhtml 46/3+++++Start Element: http://www.w3.org/1999/xhtmldiv, 0 attributes 46/45+++++Characters: It's not uncommon for there to be an upswing in the creation of new businesses in an economic downturn. One reason for this is that often experienced  professionals are laid off and receive good severance packages which provides them the  opportunity to start their own business. Growth in sole-proprietorship has been on a  steady increase since 2000, according to the SBA and this may provide some support to the  idea that businesses get started in a downturn. And since small businesses represent  upwards of 75% of all new job, that's very good news. 46/596+++++Characters: 47/0+++++Characters: 48/0++++++Start Element: http://www.w3.org/1999/xhtmlspan, 1 attributes 48/0++++++Attribute List: Key: class  Value: fullpost --- 48/23++++++Characters: The numbers themselves are impressive. Income for sole pro- prietors was $728.4 billion in 2000 and that figure has grown to $902.8 billion in 2004, a  19.3% increase. Bankruptcies are also at a lower rate than 2000, which was the peak of the  Internet bubble. Perhaps the most interesting figure in the NFIB Business Optimism Index,  which was 104.6 in 2004, and only 100.3 in 2000. Small businesses appear to be doing  better than they have in the last five years, which indicates a strong, thriving economy  despite the gloom often distributed by the news media. 48/581++++++Characters: 49/0++++++Characters: 50/0++++++Characters: Source: SBA 50/12+++++++Start Element: http://www.w3.org/1999/xhtmli, 0 attributes 50/15+++++++Characters: The Small Business Advocate 50/42+++++++End Element: http://www.w3.org/1999/xhtmli 50/46++++++Characters:, Vol. 24, No. 6 50/62++++++End Element: http://www.w3.org/1999/xhtmlspan 50/69+++++Characters: 51/0+++++Characters: 51/3+++++End Element: http://www.w3.org/1999/xhtmldiv 51/3++++End Prefix Mapping: 51/9++++Characters: 52/0++++Characters: 52/2++++End Element: http://purl.org/atom/ns#content 52/12+++Characters: 53/0+++Characters: 53/2+++Start Prefix Mapping: :http://purl.org/atom-blog/ns# 53/2++++Start Element: http://purl.org/atom-blog/ns#draft, 0 attributes 53/47++++Characters: false 53/52++++End Element: http://purl.org/atom-blog/ns#draft 53/52+++End Prefix Mapping: 53/60+++Characters: 54/0++End Element: http://purl.org/atom/ns#feed 54/0+End Prefix Mapping: 55/0+End Document

State Management with XMLReader

As you can see from the previous example, XMLReaders fire events in succession. In some RSS formats, there is a channel element, and each channel has a title element. The channel also has various items, and each item has a title element. The question then becomes, when using XmlReader, how do you distinguish between different titles? Earlier, I showed how using namespaces can help you distinguish between an Atom title versus an RSS1.0 title, but that doesn't help you to distinguish channel and item titles because the distinguishing characteristic is the context in which the title appears. If it is inside a channel element, it's a channel title. If it's inside an item element, it's an item title.

The answer to this problem is that you have to come up with a means of tracking your state. You need to know how to tell where you are in a document so that you can take the appropriate action. I am going to show you two examples of how to approach this. The first is the simplest of the two, and it tracks your current path in a document so that you can easily find out whether this title applies to a channel or an item.

The class keeps track of the elements that have been processed so that you can check the getState method to get the current path, which will be something like

channel/title

channel/item/title

This will help you know where you are in the parsing process. It works better if you are not expanding namespaces, simply because that helps keep the paths to a reasonable level. You also have the option of tracking attributes as well, and there are a few parameters that can be set that change how this works. Read the comments for more specific details of how this is accomplished.

Listing 6.31. Class mwXmlReader

// These constants determine how attributes are tracked: // The path should only contain elements names Constant kNameOnly = 0 // The path should also contain one attribute, like: // channel/item[id=4]/title // You will need to Constant kNameAndOneAttribute = 1 Constant kNameAndAllAttributes = 2 // Apply one of the class constants to determine // how you will handle tracking attributes Property incrStyle as Integer // The name of the attribute whose value // your application will place in the path. // Used with kNameAndOneAttribute style Property attrName as String // The state is maintained by an array. // "Join" is used to create the path when // it is requested Property state(-1) as String // The current depth of the hierarchy Property depth as Integer // Helper properties Property prevState as String Property inStartElement as Boolean

Listing 6.32. Sub StartElement(name as String, attributeList as XmlAttributeList) Handles Event

//Increment state and append path data whenever // a StartElement is fired. incrState name, attributeList // Fire the StartElement event so that // Subclasses are instances can use it. StartElement name, attributeList

Listing 6.33. Sub EndElement(name as String) Handles Event

// Decrement state decrState // Fire the EndElement event EndElement name

Listing 6.34. Function getState() As String

// Return the path that represents the current state. If ubound(Me.state) > -1 Then     Return Join(Me.state, "/") End if

Listing 6.35. Sub decrState()

// Lower depth count Me.depth = Me.depth - 1 Me.inStartElement = False prevState = Me.state.pop

Listing 6.36. Sub incrState(aName as String, anAttr as String)

// Method that is called when only one attribute // is being tracked. Me.State.append(aName + "["+ anAttr+"]")

Listing 6.37. Sub incrState(aName as string)

// Method called when incrStyle = kNameOnly Me.state.Append(aName)

Listing 6.38. Sub incrState(aName as string, attr_list as XmlAttributeList)

// Method called from StartElement // It calls overridden methods based upon the value // assigned to incrStyle Me.depth = Me.depth + 1 Me.inStartElement = True Select Case incrStyle     Case kNameOnly         incrState(aName)     Case kNameAndOneAttribute         incrState(aName, attr_list.Value(attrName))     Case kNameAndAllAttributes         Me.state.appEnd(aName + "[" _         + attrToString(attr_list) + "]") End select

Listing 6.39. Function attrToString(attr_list as XMLAttributeList) As string

// Convert attribute key/values into a String // to be used in the path, similar to that used // by XPATH Dim x,y as Integer Dim s as String y = attr_list.Count - 1 For x = 0 to y     s = s+ " "     s = s + attr_list.key(x) + "=" + chr(34) // "     s = s + attr_list.value(x) + chr(34) //" Next Return s

Listing 6.40. Sub setStateTrackingStyle(aStyle as integer)

incrStyle = aStyle

Listing 6.41. Sub setStateTrackingStyle(aStyle as integer, aName as string)

If aStyle = kNameAndOneAttribute Then     incrStyle = aStyle     attrName = aName Else     incrStyle = kNameAndAllAttributes End if

You must also declare new events for StartElement and EndElement so that any subclasses of this class will have access to those events. This also means you'll be able to drag this class onto a Window and implement code for the events that way.

Now this class can be used as the parent class to any XmlReader object, and you can use these methods to make it easier to maintain state. In the next section, I will subclass XmlReader and write a class that instantiates Group and Item classes from data in an Atom file. This will give you a practical example of how to use the features I just implemented in mwXmlReader, and show you how you can easily instantiate objects using mwXmlReader by adding a stack to assist with managing your state.

There is one thing to be aware of that impacts how mwXmlReader is used and subclassed. mwXmlReader is a subclass of XmlReader and there are two Constructors for XMLReader. If the default Constructor is used, XMLReader does not expand namespaces. However, if the other Constructor is used, it does expand namespaces. Either way, the point is that the Constructor being used impacts how the instance of the class works. Because of this, you need to be careful when subclassing XMLReader.

In the next section, I am going to subclass mwXmlReader and add another layer of functionality. In an earlier attempt at doing this, I implemented a Constructor that took no parameters, but set a few internal properties when mwXmlReader was launched. Then I created the mwXmlReader subclass call rssXmlReader, and it had its own Constructor that set a few properties. When I tried to use it I would get strange errorsthe program would crash without warning or error message, and even the debugger acted funny.

The problem, of course, was with the Constructors. The solution I eventually took is that I didn't override any of XMLReaders Constructors, and all the problems magically went away. Another way to avoid that kind of problem is to make sure that the Constructor calls Super.Constructor() so that the Constructor of the parent class is called. It's painfully obvious now, but what I was doing was overriding the Constructor of XMLReader so that it was never getting called, and this made it totally nonfunctional.

XML and Object Instantiation

In this section, I am going to subclass mwXmlReader and use the new subclass to parse an Atom file into an object tree, consisting of Groups and Items, both of which are subclasses of mwNode introduced in the previous chapter. This illustrates an important distinction between the SAX-like XMLReader class and the DOM-like XMLDocument class. If you have an XML file that needs to get transformed into another XML format, use XMLDocument and XSLT. If, however, the XML is going to be used to instantiate objects, XMLReader is often the better choice (not always, though).

To parse the XML file and instantiate objects, I need to create some objects. I use a set of classes that subclass mwNode. I use mwNode subclasses because mwNode is good at representing treelike data structures, and XML is a treelike data structure. Also, because I have already implemented a ListBox that will view mwNode objects, it will be an easy step to take an XML document, use it to create a tree of mwNode objects, and then display it in a ListBox.

Whether I am using an RSS XML file of some sort, or an Atom file, they all have a common structure. With RSS, there is a channel, and this channel contains a number of items, which are the links and summaries to articles. With Atom, there is a feed, and this feed contains a number of entries, which are links and summaries to articles. That means channels and feeds are branch nodes, and items and entries are leaf nodes. At the same time, there is a lot of similarity between the branches and the leaves. They all have links and they all have titles, for example. This will get translated into three subclasses. The Group subclass represents branches (channels and feeds) and the Item subgroup represents the leaves (items and entries). Both share a common Super class, Meme, that implements common functionality and subclasses mwNode.

The Meme class is pretty self-explanatory. It consists of a collection of properties that are relevant to the object model and implements a series of getter and setter methods (which could be implemented as a Computed Property, now that REALbasic has that feature).

The implementations of Groups and Items comes next. The interesting elements are the Constructors and the toXML function that creates an XML version of the objects themselves using the XMLDocument class. The comments explain what is taking place.

Listing 6.42. Class Meme Inherits mwNode

Property Guid as string Property Title as string Property ParentGuid as string Property Description as String Property Creator as String Property CreationDate as string Property ModIfiedDate as string Property MetaID as string

Listing 6.43. Sub Meme.setTitle(aTitle as String)

Me.Title = aTitle

Listing 6.44. Sub Meme.setDescription(aDescription as String)

Me.Description = aDescription

Listing 6.45. Sub Meme.setParent(aParentGuid as String)

Me.ParentGuid = aParentGuid

Listing 6.46. Sub Meme.setAncestor(aAncestorGuid as String)

Me.Ancestor=aAncestorGuid

Listing 6.47. Sub Meme.setCreationDate(tstamp as String)

// expects sql date time Me.CreationDate = tstamp

Listing 6.48. Sub Meme.setModIfiedDate(tstamp as String)

Me.ModIfiedDate = tstamp

Listing 6.49. Sub Meme.setCreator(aName as String)

Me.Creator = aName

Listing 6.50. Sub Meme.setGuid(aGuid as String)

If right(aGuid, 1) <> "/" Then aGuid = aGuid + "/" End If Me.Guid = aGuid

Listing 6.51. Sub Meme.setMetaID(aMetaID as String)

Me.MetaID = aMetaID

Listing 6.52. Function Meme.getCreationDate() As String

Return Me.CreationDate

Listing 6.53. Function Meme.getCreator() As String

Return Me.Creator

Listing 6.54. Function Meme.getParent() As String

Return Me.ParentGuid

Listing 6.55. Function Meme.getDescription() As String

Return Me.Description

Listing 6.56. Function Meme.getGuid() As String

If right(Me.Guid, 1) <> "/" Then Me.Guid = Me.Guid + "/" End If Return Me.Guid

Listing 6.57. Function Meme.getMetaID() As String

Return Me.MetaID

Listing 6.58. Function Meme.getModifiedDate() As String

Return Me.ModifiedDate

Listing 6.59. Function Meme.getTitle() As String

Return Me.Title

Listing 6.60. Sub Meme.setCreationDate(d as Date)

dim s as string If d<>nil Then Me.CreationDate = d.sqLDatetime Else d = New date Me.creationDate = d.sqLDateTime End If

Listing 6.61. Sub Meme.setModIfiedDate(d as Date)

dim s as string If d<>nil Then Me.modIfiedDate = d.sqLDatetime Else d = New date Me.modIfiedDate = d.sqLDateTime End If

Listing 6.62. Function Meme.isValid() As Boolean

dim mErr as MemeException If Me.Title = "" Then mErr = New MemeException raise mErr End If If Me.MetaID="" Then mErr = New MemeException raise mErr End If If Me.Guid = "" Then mErr = New MemeException raise mErr End If If Me.Type= kUNDEFINED Then mErr = New MemeException raise mErr End If Return true

Listing 6.63. Sub Meme.setParent(aParent as mwNode)

If aParent <> nil Then     Me.Parent = aParent     If aParent isa Group Then         Me.setParent(Group(aParent).getGuid)     End If End If

Listing 6.64. Class Group Inherits Meme

Property Sequence As mwNodeSequence

Listing 6.65. Sub Group.Constructor()

// The original mwNode class established that branch // nodes had a Type property of -1 and that a leaf node // had a Type property of 1. This value is set in the // constructor as well as the creation and modified time // The Sequence property is instantiated as well and // this defines the sequence of child elements and // provides methods for accessing them. Dim d as New Date Me.SetType(-1) Me.setCreationDate(d.SQLDateTime) Me.setModIfiedDate(d.SQLDateTime) Me.Sequence = New mwNodeSequence

Listing 6.66. Function Group.toXml(includeChildren as Boolean) as XMLDocument

// XMLDocument is used to create an XML representation of // this class. Dim xml_doc as XMLDocument Dim mxml as XmlNode Dim seq as XmlNode Dim tmp as XmlNode Dim ele_name as string Dim mex as MemeException Dim x,y as integer xml_doc = new XMLDocument ele_name = "Group" // The "Root" element will be "Group". The element is // created by using the XMLDocument factory method CreateElement // and a reference to the "Group" element is assigned to mxml mxml = xml_doc.AppendChild(xml_doc.CreateElement(ele_name)) If mxml <> Nil Then    // The Group element attributes are set using    // values from the Group object using the    // XMLElement SetAttribute method    mxml.SetAttribute("MetaID", me.getMetaID)       mxml.SetAttribute("Guid", me.getGuid)       mxml.SetAttribute("Title", me.getTitle)       mxml.SetAttribute("Creator", me.getCreator)       mxml.SetAttribute("Description", me.getDescription)       mxml.SetAttribute("CreationDate", me.getCreationDate)       mxml.SetAttribute("ModifiedDate", me.getModifiedDate)       mxml.SetAttribute("Parent", me.getParent)       mxml.SetAttribute("Ancestor", me.getAncestor)       mxml.SetAttribute("Type", str(me.getType))       If includeChildren Then         // A child element called "Sequence"         // is added to the XML document.           seq = mxml.AppendChild(xml_doc.CreateElement("Sequence")         // The children are called sequentially         // and their toXml method is called         y = Me.Sequence.GetChildCount - 1         For x = 0 to y         tmp = Me.Sequence.Child(x).toXml()         // Since the toXml() method creates and returns         // an XMLDocument object, you will need to import         // the returned documents nodes into the parent         // document. The following line imports the root         // element, referenced by the DocumentElement         // property, into the xml_doc XMLDocument.         seq.Append(xml_doc.importNode(tmp.DocumentElement, true))         Next       End If    // This document is returned. If it was called    // from a parent Group, then the xml_doc.DocumentElement    // node and subnodes will be imported into that parent    // Group's XML document and so on until one large document    // exists representing every node in the tree.     Return xml_doc Else       mex = new MemeException       mex.Message="Error creating XML"       raise mex End If

Listing 6.67. Class Item Inherits Meme

Property Sequence As mwNodeSequence

Listing 6.68. Sub Item.Constructor()

// See Group Dim d as New Date Me.SetType(1) Me.setCreationDate(d.SQLDateTime) Me.setModIfiedDate(d.SQLDateTime) Me.Sequence = New mwNodeSequence

Listing 6.69. Sub Item.toXml(includeItemSequence as Boolean) as XMLDocument

Dim xml_doc as XMLDocument Dim mxml as XmlNode Dim ele_name as string Dim mex as MemeException xml_doc = new XMLDocument ele_name = "Item" mxml = xml_doc.AppendChild(xml_doc.CreateElement(ele_name)) If mxml <> Nil Then     mxml.SetAttribute("MetaID", me.getMetaID)       mxml.SetAttribute("Guid", me.getGuid)       mxml.SetAttribute("Title", me.getTitle)       mxml.SetAttribute("Creator", me.getCreator)       mxml.SetAttribute("Description", me.getDescription)       mxml.SetAttribute("CreationDate", me.getCreationDate)       mxml.SetAttribute("ModifiedDate", me.getModifiedDate)       mxml.SetAttribute("Parent", me.getParent)       mxml.SetAttribute("Ancestor", me.getAncestor)       mxml.SetAttribute("Type", str(me.getType))     Return xml_doc Else       mex = new MemeException       mex.Message="Error creating XML"       raise mex End If

Listing 6.70. Class mwNodeSequence

// This class manages the sequence of child nodes. // It has two properties, an array of mwNodes and // a dictionary. This is probably overkill for RSS // documents since they are not very large, but // with much larger works, this can help to find // child elements quickly. The child array represents // the sequence of child nodes. The dict property lets // you look up each node by Guid without having to // scan each node sequentially. Property child(-1) As mwNode Property dict as Dictionary

Listing 6.71. Sub mwNodeSequence.Constructor()

dict = New Dictionary()

Listing 6.72. Sub mwNodeSequence.appendChild(aNode as mwNode)

If aNode Isa Meme Then     If not(aNode is nil) Then        Me.child.append(aNode)        If dict = Nil Then            dict = New Dictionary        End If        dict.value(Meme(aNode).getGuid) = aNode     End If End If

Listing 6.73. Function mwNodeSequence.getChild(pos as integer) As mwNode

If pos >= 0 and pos <= ubound(Me.child) Then     Return child(pos) Else     Return Nil End If

Listing 6.74. Function mwNodeSequence.getChildCount() As Integer

Return ubound(Me.child) + 1

Listing 6.75. Function mwNodeSequence.getChildren() As mwNode()

Return child

Listing 6.76. Function mwNodeSequence.getLastChild() As mwNode

Return Me.child(ubound(child))

Listing 6.77. Sub mwNodeSequence.insertChild(pos as integer, aNode as mwNode)

Dim bound as Integer bound = ubound(Child) If aNode Isa Meme Then     If Not(aNode  Is Nil) Then        If pos <= bound Then        Me.child.insert(pos, aNode )     Else        Me.child.appEnd(aNode)     End If     If dict = Nil Then        dict = New Dictionary     End If     dict.value(Meme(aNode).getGuid) = aNode     End If End If

Listing 6.78. Function mwNodeSequence.hasChild(aGuid as string) As Boolean

If Me.dict.HasKey(aGuid) Then     If Me.dict.value(aGuid) <> nil Then        Return true     Else        Return false     End If Else     Return false End If

Listing 6.79. Function mwNodeSequence.getChild(aGuid as string) As mwNode

If dict.HasKey(aGuid) Then     Return dict.value(aGuid) Else     Return nil End If

Listing 6.80. Sub mwNodeSequence.removeChild(aNode as mwNode)

Dim res as Integer res = Me.child.IndexOf(aNode) If res > -1 Then     Me.child.Remove(res) End If

Listing 6.81. Sub mwNodeSequence.removeChild(pos as Integer)

If pos > -1 Then     Me.child.Remove(pos) End If

Listing 6.82. Class rssXmlReader Inherits mwXmlReader

//The rssXmlReader class parses Atom files // and instantiates Group and Item objects. // It subclasses mwXMLReader, which helps it // to track it's current state, plus it implements // a stack to make managing state even easier. // A stack is a data structure that is "Last in First Out". // When a new feed or entry element is encountered, // a Group or Item will be instantiated and "pushed" onto // the stack. The object at the top of the stack is the // current object and the other methods all call the current // object. When the EndElement event for a feed or entry // element is encountered, then the current object is popped // off of the stack so that the object that was below it in // the stack is now the current object. Property Root As mwNode Property Stack(-1) As Meme Property CurrentElement As String

Listing 6.83. Sub rss.XmlReader.StartElement(name as string, attributeList as xmlAttributeList) Handles Event

Dim path as String Dim aNode as mwNode Dim aMeme as Meme Me.CurrentElement = name // Get the current path, inherited from mwXmlReader path = Me.GetState If path <> "" Then     Select Case path     Case "feed"        // The Push method instantiates a Group        // and pushes it onto the stack.        aNode = Push(name, attributeList)     Case "feed/entry"        // In this case, the Push method instantiats an Item        // and pushes it on the stack.        aNode = Push(name, attributeList)     Else        If name = "link" Then        // Both Feeds and Entries have links, but because I        // am using a stack, it doesn't matter to me whether        // the current element is a feed or an entry. All I        // I need to do is access the object at the top of        // of the stack and call the setGuid method, which        // is where the link URL will be stored.            If UBound(Stack) > -1 Then               aMeme = Stack(UBound(Stack))            aMeMe.setGuid(attributeList.Value("href"))            End If        End If     End Select End If Exception err     break

Listing 6.84. Sub rss.XmlReader.EndElement(name as string) Handles Event

// Pop the objects off the stack. Select Case name Case "feed"    Pop Case "entry"     Pop End Select Exception err     break

Listing 6.85. Sub rss.XmlReader.Characters(s as String) Handles Event

// Once again, because of the stack I do not // have to keep track in this method which // element I am in. All I need to do is reference // the current object that is at the top of the stack // and call the appropriate methods. This is why // Groups and Items are both subclasses of Meme and why // Meme implements the methods that get or set properties. If Trim(s) <> "" Then     If uBound(Stack) > -1 Then      Select Case Me.CurrentElement     Case "title"        Stack(UBound(Stack)).setTitle(s)     Case "id"        Stack(UBound(Stack)).setMetaID(s)     Case "div"        Stack(UBound(Stack)).setDescription(s)     Case "name"        Stack(UBound(Stack)).setCreator(s)     Case "modified"        Stack(UBound(Stack)).setModIfiedDate(s)     Case "created"        Stack(UBound(Stack)).setCreationDate(s)     End Select     End If End If

Listing 6.86. Function rss.XmlReader.Push(aName as String, aList as XmlAttributeList) As mwNode

Dim g as Group Dim i as Item Dim path as String path = Me.getState // Instantiate a Group or an Item based on the current state If path  = "feed" Then     g = New Group     g.Name = aName     // If this is the first object, then assign     // a reference to the Root object and then     // add it to the stack.     // Once the document is parsed, you can access the     // objects through the Root property.     If ubound(Stack) = -1 Then        Me.Root = g        Me.Stack.appEnd(g)     Else     // Get a reference to the object at the top of     // the stack and append the new object to it.     Group(Me.Stack(ubound(Stack))).Sequence.AppEndChild(g)     // Once you have appended the object to the parent object,     // place the new object on the stack.        Me.Stack.appEnd(g)     End If     Return g ElseIf path  = "feed/entry" Then     i = New Item     i.Name = aName     g = Group(Me.Stack(ubound(Stack)))     i.setParent(g.GetGuid)     If ubound(Stack) = -1 Then        Me.Root = i        Me.Stack.appEnd(i)     Else        g.Sequence.AppEndChild(i)        Me.Stack.appEnd(i)     End If     Return i End If

Listing 6.87. Sub rss.XmlReader.Pop()

// Remove the last object that was appended to the array If Ubound(Stack) > -1 Then     Stack.Remove(Ubound(Stack)) End If

Now you have seen a few examples of how to read and manipulate XML documents. In the next section I cover regular expressions, which are used to find patterns of text in any kind of document.