Available Parsers

Now that you've seen the two main parsing methods, let's talk about a few of the parsers you can use for processing, in particular Microsoft's MSXML, the Apache Group's Xerces, and a few others. Each of these comes as a library that can be added to applications, such as a server that can accept XML documents via HTTP posts, or is part of existing applications.

MSXML

The first parser, which can perform both SAX and DOM-based parsing, is Microsoft's MSXML. This parser, which is currently at version 3.0, supports several standards and can handle most of your parsing needs. Some of the supported standards include

  • XSLT
  • XML Path Language (XPath)
  • SAX2

MSXML is available as a Microsoft Windows library file (DLL) that can be embedded into applications for processing. The first version of this parser appeared in Microsoft Internet Explorer 4 and late versions of Windows 95. You should know what version of the parser you are using because different versions support different standards. For this reason, we included Table 3-2, which contains the parser version, filename, and file version.

In addition to these library files, Microsoft also embedded its parser in several of its applications. Table 3-5 provides a list of these applications, their corresponding Internet Explorer versions, if applicable, and the MSXML version.

Beginning with the beta release of MSXML 3.0, the installation of the msxml3 DLL was performed in a "side-by-side" mode—the original, older version was left intact. Additionally, Microsoft Internet Explorer and Microsoft Windows 95 and later versions would continue using the older parser library.

To upgrade your system to utilize the version 3.0 library, it's necessary to run the xmlinst.exe application that can be downloaded from the Microsoft Download Center (http://msdn.microsoft.com/downloads/default.asp?URL=/code/sample.asp?url=/msdn-files/027/001/469/msdncompositedoc.xml). The application, when passed the appropriate arguments, will remove the registry entries for the older parser and replace them with the version 3.0 entries, thereby updating the system.

Table 3-4 MSXML Versions

Parser Version File Name File Version

1.0

msxml DLL

4.71.1712.5

1.0a

msxml DLL

4.72.2106.4

1.0 Service Pack 1 (SP1)

msxml DLL

4.72.3110.0

2.0

msxml DLL

5.0.2014.0206

2.0a

msxml DLL

5.0.2314.1000

2.0b

msxml DLL

5.0.2614.3500

2.5 Beta 2

msxml DLL

5.0.2919.38

2.5a

msxml DLL

5.0.2919.6303

2.5

msxml DLL

5.0.2920.0

2.5 Service Pack 1 (SP1)

msxml DLL

8.0.5226

2.6 January 2000 Web Release

msxml2 DLL (January Web Release)

7.50.4920.0

2.6 Beta 2

msxml2 DLL

8.0.5207.3

2.6

msxml2 DLL

8.0.6518.1

3.0 March 2000 Web Release

msxml3 DLL (March Web Release)

7.50.5108.0

3.0 May 2000 Web Release

msxml3 DLL (May Web Release)

8.0.7309.3

3.0 July 2000 Web Release

msxml3 DLL (July Web Release)

8.0.7520.1

3.0 September 2000 Web Release

msxml3 DLL (September Web Release)

8.0.7728.0

3.0 Release

msxml3 DLL

8.0.7820.0

Table 3-5 XML Versions Shipped with Microsoft Products

Application Internet Explorer MSXML Version

Not applicable

Internet Explorer 4.0

1.0

Windows 95, OEM Service Release 2.5

Internet Explorer 4.0a

1.0a

Not applicable

Internet Explorer 4.01,
Service Pack 1 (SP1), or Internet Explorer 5.0

2.0

Microsoft Office 2000

Internet Explorer 5.0a

2.0a

Windows 98, Second Edition

Internet Explorer 5.0b

2.0b

Windows 95, Windows 98, or Windows NT 4.0

Internet Explorer 5.01

2.5a

Windows 2000

Internet Explorer 5.01

2.5

Windows 2000

Internet Explorer 5.01,
Service Pack 1 (SP1)

2.5 Service Pack 1 (SP1)

Windows 95, Windows 98, Windows NT 4, Windows 2000, or Windows 2000 Service Pack 1 (SP1)

Internet Explorer 5.5

2.5 Service Pack 1 (SP1)

Microsoft SQL Server 2000, Beta 2

2.6 Beta 2

SQL Server 2000

2.6

Microsoft BizTalk Server (technology preview and beta)

2.6

Embedding MSXML in Your Applications

After getting MSXML installed you'll want to start using it in your applications. This section covers how to use MSXML from Microsoft Visual Basic, starting with DOM and then SAX2. The DOM example code will build the tree in memory and will then access nodes in the memory structure. The SAX2 example responds to events while the XML document is parsed.

Using DOM

By choosing DOM you assume that you won't have any problem storing the entire XML tree in memory. You will also be able to make changes to the XML and save those as a file. In this example you will create a simple DOM object that catalogs the salaries of all the employees and gives everyone a 4 percent raise. The XML code is as follows:

 <payroll> <employee> <last_name>Smith</last_name> <first_name>John</first_name> <salary>60000</salary> <performance>Excellent</performance> <title>Accounts Payable Manager</title>  </employee> </payroll> 

First you'll need to instantiate the DOM Document object.

 Dim xmlDoc As DOMDocument Set xmldoc = New DOMDocument xmldoc.validateOnParse = True xmldoc.async = False 

The first two statements probably look straightforward. The third statement asks the parser to validate the document when it's parsed. The last statement declares that the document shouldn't be loaded asynchronously. This means that you don't have to worry about whether the XML is fully loaded before you attempt to read and manipulate the code.

Now that you've created the object, load the XML. You can do this by loading a URL, a local file, or the XML text itself. This load( ) method will take either a URL or a path as its argument. This code snippet loads the XML from the intranet.

 xmldoc.load("http://my.intra.net/salary.xml") 

To load the XML directly you use the xmldoc.loadXML( ) method.

 xmldoc.loadXML("<payroll><employee><last_name>Smith
</last_name><first_name>John</first_name><salary>60000</salary>
<performance>Excellent</performance><title>Accounts Payable Manager
</title></employee></payroll>")

For our example it doesn't make any sense to use the loadXML( ) method because we can access the data URL. However, remember that XML makes a good data structure for your internal programming needs. If another part of your application system needs to store some data in a tree structure, it might be efficient for it to generate XML and pass it to you as a string. Afterward you would load it into your DOMDocument using loadXML.

Now that you have the DOMDocument created and loaded with real data, you can start giving people raises. The first step is to set up a loop to iterate through all the nodes. In this example you need to locate the salary nodes and increase the value by 4 percent.

 Set ElemList = xmldoc.getElementsByTagName("SALARY") For i=0 To (ElemList.length -1) ElemList.item(1).text=Str$(Val(ElemList.item(1).text)*1.04) Next 

With the salary increases in place, the next step is to write the new XML out to disk by using the following code:

 xmldoc.save("NewSalary.xml") 

You may not want to give everyone the same raise. In the following example, employees who perform at an "Excellent" level are the only ones who will get raises. The getNodes( ) method takes advantage of Extensible Stylesheet Language (XSL) patterns, which are covered in more detail later in the book.

 Set ElemList = xmldoc.getNodes("PAYROLL/EMPLOYEE[PERFORMANCE=
'EXCELLENT']") For i=0 To (ElemList.length -1) ElemList.item(1).text=Str$(Val(ElemList.item(1).text)*1.04) Next

Using SAX

In the previous example you were able to read all of your data in memory and you were also able to manipulate the data and save a new XML file. If you're using SAX, you probably aren't able to store all of the data in memory and you loose the ability to save the data. This section covers how to use the SAX interface of MSXML by writing a content handler. The content handler will be fired when a particular node is encountered.

For this example you'll use the same XML file as before. Instead of giving people raises, your code will calculate the total payroll for your company. The first step is creating a class that implements the IVBSAXContentHandler interface. This can be done with Microsoft Visual Basic 6 as follows:

  1. Download and install MSXML if it isn't already installed.
  2. Create a new project (for these examples, create a standard .EXE).
  3. In the "Available Resources" list, select Microsoft XML version 3.
  4. Create a new class and select IVBSAXContentHandler from the interface's drop-down list.

This example implements two methods: startElement( ) and characters( ). When the opening salary tag is encountered, a Boolean is set so that when the next characters call is made, the salary value is grabbed and added to the total.

 Option Explicit Implements IVBSAXContentHandler Dim total As Integer Dim salaryTag As Boolean Total = 0 SalaryTag=False Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As_ String,_strLocalName As String, strQName As String, ByVal attributes As_ MSXML2.IVBSAXAttributes) If strLocalName = "SALARY"then SalaryTag= True End if End Sub Private Sub IVBSAXContentHandler_characters(text As String)   TotalPayroll=totalPayroll+Val(text) SalaryTag=False End Sub 

Xerces

Another popular parser is available from the Apache Group, an open source movement that made its name with a Web server, and it can be downloaded from http://xml.apache.org. The Xerces implementation, like MSXML, comes as a library, but it is available in three languages. These are a C++ library, a set of Java classes, and a COM and Perl binding/wrapper for the C++ implementation.

Xerces supports DOM Level 1 and 2 and SAX2. While it does not support some of the additional standards that MSXML does, if you need a common parser across various platforms and environments, Xerces might be the choice for you.

Developing applications with Xerces is similar to developing applications with any open source package: you download, follow the instructions for installation, write a sample to make sure everything is set up correctly, and then consult online documentation and news groups for information about how to build your particular application. In this example you'll learn how to get started with the Java implementation of Xerces.

You might find some minor differences between the C++ and Java implementations, so be aware of this during testing.

The first step is to download the Xerces distribution and make sure that xerces.jar is registered on your class path. While you are at the http://xml.apache.org site you might want to browse the API documentation and FAQs at http://xml.apache.org/xerces-j/api.html and http://xml.apache.org/xerces-j/faqs.html. To test your installation you can try writing a very simple application.

 import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.Element; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; import java.io.IOException; public static void main(String args[]) { DOMParser parser = new DOMParser(); try {   parser.parse(args[0]);   Document document = parser.getDocument(); NodeList salaries = document.getElementsByTagName("SALARY"); for (int i=0;i++;i<salaries.getLength()) { Node salary=salaries.item(i); String orgSalary=salary.getNodeValue(); int newSalary=1.04*Integer.parseInt(orgSalary); salary.setNodeValue(""+newSalary);   } } catch (SAXException e) {   System.err.println (e);  } catch (IOException e) {   System.err.println (e); } System.out.println("Parsing done!"); System.out.println(document.toString()); } } 

This application will parse the document and print Parsing done! and the new XML, or it will print an error. If you are able to compile and run this example, your installation succeeded. You've also taken your first steps in using the DOM functionality of Xerces. The next example uses SAX to total the salaries of all employees. As with the Visual Basic example, you need to implement a content handler to receive the incoming events:

 import org.apache.xerces.parsers.SAXParser; import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler; import org.xml.sax.SAXParseException; import org.xml.sax.SAXException; import java.io.IOException; public class PayrollCalculator implements DefaultHandler { private int totalPayroll=0; private boolean inSalaryElement=false; public PayrollCalculator(String input) { SAXParser parser=new SAXParser(); Parser.setContentHandler(this); Try { Parser.parse(input); } catch (SAXException e) { System.out.println(e); } catch (IOException e) { System.out.println(e);  } } public void startElement(String uri, String local, String qName, Attributes atts) { if (local.equals("SALARY")) { inSalaryElement=true; } public void characters (String text) { if (inSalaryElement) { totalPayroll=totalPayroll+Integer.parseInt(text); inSalaryElement=false; } } public int getTotalPayroll() { return totalPayroll;} } public void static main(String args[]) { PayrollCalculator calc=new PayrollCalculator(args[0]); System.out.println("Total payroll == "+calc.getTotalPayroll()); } } 

Other Parsers

MSXML and Xerces are not the only parsers available. Within the Java community alone, programmers have a multitude of choices. Because DOM Level 1 and SAX2 are the heavyweights, you should ensure that your parser supports these methods. Table 3-3 contains a list of parsers written in Java and tells whether they are validating or not and what standards they support.

Table 3-6 Other DOM and SAX Parsers Written in Java

Parser URL Validating Standards Supported

IBM's XML for Java

http://www.alphaworks.ibm.com/formula/xml

Yes

DOM Level 1, DOM Level 2, SAX 1, SAX 2, and Namespaces

Microstar's lfred

http://www.opentext.com/services/content_management_services/xml_sgml_solutions.html#aelfred_and_sax

Namespace validation only

DOM Level 1, DOM Level 2, SAX 1, and SAX 2

Sun's Java API for XML

http://java.sun.com/products/xml

Yes

DOM Level 1, SAX 1, and Namespaces

Oracle's XML Parser for Java

http://technet.oracle.com/

Yes

DOM Level 1, SAX 1, and Namespaces



XML Programming
XML Programming Bible
ISBN: 0764538292
EAN: 2147483647
Year: 2002
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net