The details of how to load a catalog file when processing a document vary from parser to parser and tool to tool. Not all XML processors support catalogs, but most of the important ones do. The Gnome Project's xsltproc, Michael Kay's Saxon, and the XML Apache Project's Xerces-J and Xalan-J all support catalogs. Notably lacking from this list are the C++ versions of Xerces and Xalan as well as Microsoft's MSXML. Of course, it isn't hard to integrate catalog processing into your own applications with just a little bit of open source code.
Daniel Veillard's libxml2 XML parser for C supports catalogs, as does his libxslt processor that sits on top of libxml2. libxml2 reads the catalog location from the XML_CATALOG_FILES environment variable, which contains a white-space -separated list of file names . This can be set in all the usual ways. For example, in bash or other Bourne shell derivatives, to specify that libxml should use the catalog file found at /opt/xml/catalog.xml you would simply type the following:
% XML_CATALOG_FILES=/opt/xml/catalog.xml % export XML_CATALOG_FILES
In Windows, you'd set this environment variable in the System control panel.
This property can also be set to a white-space-separated list of file names to indicate that libxml should try several different catalogs in sequence. For example, the setting below requests that libxml first look in the file catalog.xml in the current working directory and then in the file /opt/xml/docbook/docbook.cat.
% XML_CATALOG_FILES="catalog.xml /opt/xml/docbook/docbook.cat" % export XML_CATALOG_FILES
If you expect to use the same catalog file consistently, you could set XML_CATALOG_FILES in your .bashrc or . cshrc file.
Once this environment variable is set, libxml will consult the catalog for all documents it loads, whether you're calling the library from your own C++ source code, calling it from the XSLT stylesheet with the document() function, or using the command line tools xmllint and xsltproc.
If you're having trouble with the catalog, you can put libxml in debug mode by setting the XML_DEBUG_CATALOG environment variable. (No value is required. It just needs to be set.) libxml will then tell you when it recognizes a catalog entry and what it's actually loading when. I often find this useful for discovering small, nonobvious mismatches between the IDs used in the instance documents and those used in the catalog. For instance, when I was writing this item, libxml helped me uncover a mismatch between the public ID in the catalog and the documents. The catalog was using -//OASIS//DTD DocBook XML V4.2.0//EN and the source documents were using -//OASIS//DTD DocBook XML V4.2//EN. The strings really do have to match exactly4.2 is not the same as 4.2.0 when resolving public IDs.
Saxon, Xalan, and Other Java-Based XSLT Processors
Most XML parsers and XSLT processors written in Java can use Norm Walsh's catalog library (now donated to the XML Apache Project). You can download it from http://xml.apache.org/dist/commons/. Download the file resolver-1.0.jar (the version number may have changed) and add it to your classpath. Next create a CatalogManager.properties file in a directory that is included in your classpath. The resolver will look in this file to determine the locations of the catalog files. Example 47-2 shows a properties file that loads the catalog named catalog.xml from the current working directory and the standard DocBook catalog from the absolute path /opt/xml/docbook/docbook.cat.
Example 47-2 A CatalogManager.properties File for Norm Walsh's Catalog Resolver
catalogs=catalog.xml;/opt/xml/docbook/docbook.cat relative-catalogs=true static-catalog=yes catalog-class-name=org.apache.xml.resolver.Resolver verbosity=1
If you're having trouble, turn up the verbosity to 4 to provide more detailed error messages about exactly which files the resolver is loading when.
You tell Saxon to use the Apache Commons catalog with several command line options, as shown below.
% java com.icl.saxon.StyleSheet -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver chapter1.xml docbook.xsl
Xalan is similar.
% java org.apache.xalan.xslt.Process -ENTITYRESOLVER org.apache.xml.resolver.tools.CatalogResolver -URIRESOLVER org.apache.xml.resolver.tools.CatalogResolver -in chapter1.xml -xsl docbook.xsl
jd.xslt works the same except that it uses lowercase argument names.
% java jd.xml.xslt.Stylesheet -entityresolver org.apache.xml.resolver.tools.CatalogResolver -uriresolver org.apache.xml.resolver.tools.CatalogResolver chapter1.xml docbook.xsl
In all three cases, what you're really doing is telling the processor where to find an instance of the SAX EntityResolver interface and the TrAX URIResolver interface. The org.apache.xml.resolver.tools.CatalogResolver class can also be used for these purposes in your own SAX and TrAX programs.
In TrAX both the Transformer and TransformerFactory classes have setURIResolver methods that allow you to provide a resolver that's used to look up URIs used by the document function and the xsl:import and xsl:include elements. Setting the URIResolver for a Transformer just changes that one Transformer object. Setting the URIResolver for a TransformerFactory sets the default resolver for all Transformer objects created by that factory.
To use catalogs, just pass in an instance of the org.apache.xml.resolver.tools.CatalogResolver class.
URIResolver resolver = new CatalogResolver(); TransformerFactory factory = TransformerFactory.newInstance(); factory.setURIResolver(resolver);
The location of the catalog file is determined by a CatalogManager.properties file as shown in Example 47-2.
You will of course need to add the resolver.jar file to your classpath to make this work.
SAX programs access catalogs through the EntityResolver interface, which, conveniently, org.apache.xml.resolver.tools.CatalogResolver also implements. To add catalog support to your own application just pass an instance of this class to the setEntityResolver method of the XMLReader class before beginning to parse a document.
EntityResolver resolver = new CatalogResolver(); XMLReader parser = XMLReaderFactory.createXMLReader(); parser.setEntityResolver(resolver);
That's all there is to it. From here on, you just parse the XML as usual. Whenever the XMLReader loads a DTD fragment from either a system or a public ID it will first consult the catalogs identified by the CatalogManager.properties file.