JAXP | XML

The Java API for XML Processing (JAXP) defines a generic API for processing XML documents and transforming an XML source. The javax.xml.parsers package contains classes and interfaces needed to parse XML documents in SAX 2.0 and DOM 2.0 modes, while the javax.xml.transform package contains the classes and interfaces needed for transforming XML data (a source) into another format (a result). An XML document can manifest itself in several ways: a stream (File, InputStream, or Reader), SAX events, or a DOM tree representation. The aim of JAXP is to provide applications with a portable way for parsing and transforming XML documents. WebLogic Server is shipped with JAXP 1.1 classes and interfaces by default, this implementation of JAXP is configured to use WebLogic's built-in parsers and transformers.

The JAXP specification demands that you set the appropriate system properties in order to plug in a custom XML parser (or transformer). However, WebLogic deviates from this model. Instead of using system properties, WebLogic allows you to determine the actual parsers used in two ways:

You can create an XML Registry that lets you configure server-specific and document-specific parsers and transformers. XML Registries are domain resources configured using the Administration Console.
You can define application-scoped XML settings in the WebLogic-specific weblogic-application.xml deployment descriptor for an EAR. Here you can specify parser and transformer factories that apply to an enterprise application and all its constituent modules.

Both of these mechanisms allow you to configure the parsers and transformers used by your applications at deployment time. This means that you can transparently alter the configuration without changing a single line of code.

In the following sections, we'll show you how to use WebLogic's JAXP to create SAX and DOM parsers, and a transformer based on an XSLT stylesheet. We also will look at WebLogic's support for resolving external entities.

18.1.1 SAX

The Simple API for XML (SAX) is an event-based API for parsing XML documents. The SAX interface defines various events that are triggered while an XML parser is reading an XML document. A custom handler can listen for these events and process the XML document in an easy, event-oriented fashion. A SAX handler implements the relevant callback methods and responds to various events such as when it encounters start and end tags, character data, or a processing instruction. As the SAX parser reads through an XML document, it invokes the methods on the handler class to mark each particular event. In this regard, the SAX interface is unlike other XML APIs because it relies on a push model; the data is pushed to the application as it is encountered.

Example 18-1 shows how to retrieve a SAX parser using the JAXP interface. In order to parse an XML document with SAX, you need to create an instance of a SAX parser. The JAXP interface defines a SAXParserFactory, which manufactures a SAX parser for you. The newInstance( ) method in the SAXParserFactory class creates a new factory object. Call the newSAXParser( ) method on this factory instance to create a new SAX parser.

Example 18-1. Retrieving and using a SAX parser

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
//...
 //Obtain an instance of SAXParserFactory
 SAXParserFactory spf = SAXParserFactory.newInstance( );
 //Obtain a SAX parser from the factory
 SAXParser sp = spf.newSAXParser( );
 //Parse an XML document
 sp.parse(new java.io.StringBufferInputStream(""),
 new org.xml.sax.helpers.DefaultHandler( ) { 
 public void startElement (String uri, String localName,
 String qName, Attributes attr)
 { System.err.println(qName); }; }
 );

You can now use the SAX parser to parse an XML document. In this example, the SAX handler registered with the parser responds whenever a start tag is encountered. For each start tag, it simply prints out the name of the tag.

By default, WebLogic Server uses its built-in SAX parser factory: weblogic.apache.xerces.jaxp.SAXParserFactoryImpl. Later, we'll see how you can override this setting and configure WebLogic to use an alternative SAX parser.

18.1.2 DOM

An XML document is a tree of elements data elements can be nested within other elements and optionally have attributes attached. The Document Object Model (DOM) defines an object hierarchy that represents the structure of an XML document as a recursive tree of elements. This means that you need to read and parse the entire XML document in order to build this data structure. Because a DOM representation of an XML document needs to be held in memory, many DOM implementations can be memory-intensive.

As its name suggests, DOM parsing uses a hierarchical, object-based model. XML parsing in DOM mode is useful when you need frequent, random access to different parts of the document, or if you want to manipulate its structure in complex ways. However, the DOM API is not suitable for applications that need to parse XML data incrementally; in these cases, you should use SAX.

Example 18-2 shows how you can use the JAXP interface to create a DOM parser. In order to parse an XML document in DOM mode, you need to obtain an instance of DocumentBuilder. Once again, the JAXP interface provides a factory, called the DocumentBuilderFactory, which manufactures a DOM parser for you. The newInstance( ) method on the DocumentBuilderFactory class creates a new DOM parser factory. Call the newDocumentBuilder( ) method on the DocumentBuilderFactory instance to create a new DOM parser.

Example 18-2. Retrieving and using a DOM parser

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
//...
//Obtain an instance of the DocumentBuilderFactory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance( );
// Get a DOM parser
DocumentBuilder db = dbf.newDocumentBuilder( );
System.err.println(db.getDOMImplementation( ).getClass( ).getName( ));
//Parse the document
Document doc = db.parse(/*..*/);
System.err.println(doc.getDocumentElement( ).getNodeName( ));

Once you create a DocumentBuilder object, you can parse an XML document and build a DOM tree. The parse( ) method returns an org.w3c.dom.Document object, which represents the hierarchical view of the XML document. Now that you have built the DOM tree in memory, you can access the elements within the Document, or perhaps even modify the structure. In the earlier example, we simply print the name of the root element of the document.

By default, WebLogic uses the built-in DOM parser factory. Later, we will see how you can configure WebLogic to use a third-party DOM parser.

18.1.3 XSL Transformers

XSL Transformations (XSLT) are XML-formatted rules that define how one XML document can be transformed into another. You typically would assemble templates specifying a particular transformation and place these in an XSLT stylesheet. Then, an XSLT processor uses the templates defined in the stylesheet to process matching elements in the input XML source and writes the template conversion into an output tree. The source and sink (or output target) of an XSL Transformation can be a stream, a DOM tree, or SAX events. JAXP provides a standard way for acquiring an XSL transformer. The same XSL transformer can process XML data that originates from a variety of sources and can write the output of the transformation to a variety of sinks. For instance, using the same XSLT stylesheet, you could easily transform a stream of input SAX events to a DOM tree representation of the output XML document.

The JAXP interface enables you to capture XSLT stylesheets in two ways:

A Transformer object that is created via the newTransformer( ) factory method on an instance of the TransformerFactory.
A Templates instance that encapsulates runtime-compiled information about the XSLT stylesheet. You can create a Templates object via the newTemplates( ) method on an instance of the TransformerFactory.

A Templates object can be shared across multiple concurrent threads and may be used multiple times during its lifetime. However, an XSL Transformer isn't thread-safe and must not be shared across multiple concurrent threads. Typically, a single Templates object will spawn multiple Transformer instances, one for each thread that needs to use the XSLT stylesheet. Example 18-3 shows how you can create an XSL transformer for a given stylesheet.

Example 18-3. Using JAXP to access a transformer

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult; 
import org.w3c.dom.Document;
//...
TransformerFactory tf = TransformerFactory.newInstance( );
Transformer t = 
 tf.newTransformer(new StreamSource(new java.io.File("foo.xsl")));

// apply stylesheet to a DOM tree and write the 
// output of transformation to System.out
t.transform(new DOMSource(doc), new StreamResult(System.out));

Alternatively, you could create a Templates object that captures all the transformation instructions specified in the stylesheet. For example, you could create a Templates instance in the init( ) method of a servlet:

TransformerFactory tf = TransformerFactory.newInstance( );
Templates tmpl = 
 tf.newTemplates(new StreamSource(new java.io.File("foo.xsl")));

Then, when you need to apply the XSLT stylesheet to an XML document (for instance, in the doPost( ) method of a servlet), you need only to create a new Transformer instance using the same Templates object:

// Needs to be applied to an XML document
Transformer t = tmpl.newTransformer( );
t.transform(domSource, streamResult);

By default, WebLogic uses its built-in factory for processing XSLT stylesheets: org.apache.xalan.processor.TransformerFactoryImpl.

18.1.4 External Entity Resolution

External entities are portions of text external to the XML file being parsed. An external entity resembles a macro replacement facility. The replacement text can either be parsed, which means the text is incorporated into the XML document, or unparsed, which means the declaration points to external data. A typical external entity declaration uses the SYSTEM keyword and the URI of the substitution text:

Alternatively, the declaration also could specify a relative URL:

Whenever a parser encounters an entity reference &header;, it replaces the reference with the actual contents of the document located at the specified URI. A DTD reference is another example of an external entity reference. An XML document uses a DOCTYPE declaration to reference a DTD:

http://www.oreilly.com/dtds/wl.dtd">

The preceding declaration indicates that the root element for the document is oreilly, and its DTD is located at http://www.oreilly.com/dtds/wl.dtd. In addition, the DOCTYPE declaration may use a public identifier, which is a publicly declared name that identifies the associated DTD. When a validating XML parser processes an XML document that includes a DOCTYPE declaration, the parser needs to fetch the DTD file referenced by the URI.

The SAX interface allows you to associate a custom entity resolver with the parser, which can intercept parser requests for external entities (including an external DTD) and determine how the entity references are resolved. To do this, you need to register an instance of the org.xml.sax.EntityResolver interface with the XMLReader before you begin to read from an XML source. Whenever the SAX parser encounters an entity reference, it will invoke the resolveEntity( ) method on the registered EntityResolver instance.

SAX's XMLReader interface is encapsulated by the JAXP SAXParser class.

Typically, an application uses an EntityResolver to substitute a reference to a remote URI with a local copy. The resolveEntity( ) method returns an input source for the XML entity based on either a character or a byte stream. If the method returns null, the parser tries to open a connection to the URI referenced by the system identifier. A custom entity resolver is very useful if your application parses XML documents that need to be retrieved from a database or other nonstandard locations.

WebLogic's XML Registry provides several enhancements that improve the performance of external entity resolution:

It allows you to map an external entity to a local (or remote) URI that contains a copy of the substitution text for the entity.
It allows you to retrieve a copy of the remote resource associated with the external entity and cache the text, either in memory or on disk.
It allows you to specify the duration after which a cached item becomes stale and needs to be refreshed.

Each entry in the XML Registry uses a public or system identifier to identify the external entity. Each entry also may specify various caching options for entity resolution. For instance, you could instruct the server to fetch an external entity the first time it is required, and cache it for 120 seconds. Whenever an XML parser reads a document that uses an entity reference configured in the registry, WebLogic Server fetches a local or a cached copy of the substitution text.

Introduction

Web Applications

Managing the Web Server

Using JNDI and RMI

JDBC

Transactions

J2EE Connectors

JMS

JavaMail

Using EJBs

Using CMP and EJB QL

Packaging and Deployment

Managing Domains

Clustering

Performance, Monitoring, and Tuning

SSL

Security

XML

Web Services

JMX

Logging and Internationalization

SNMP