10.5 Converting XMLMathML to TeX

 < Day Day Up > 



10.5 Converting XML+MathML to TeX

MathML and TeX play a complementary role as formats for representing mathematics. MathML is ideal for displaying formulas in Web pages, while TeX is a superior solution for creating high-quality printed output. Ideally, an author should be able to switch easily between TeX and MathML, going back and forth between the two formats, as needed.

In the majority of cases, authors will be translating existing TeX and LaTeX documents for display on the Web. Some of the tools available for this type of translation were described in Section 10.4. However, there are also situations in which the reverse transformation, that is converting MathML into TeX, can be useful. This allows authors to use TeX as a formatting engine for typesetting mathematical documents that were originally created for display on the Web. In this section, we briefly look at some techniques for converting MathML equations, either individually or embedded in another XML document type (such as XHTML), into TeX.

XSLT Stylesheets

As discussed in Section 8.4, XSLT transformations provide a flexible and powerful method for converting arbitrary XML data into other formats. In particular, you can use XSLT for transforming MathML equations into LaTeX. Vasil Yaroshevich has implemented this approach in the form of an XSLT MathML Library. This is a collection of six XSLT stylesheets that together specify how to translate any arbitrary presentation MathML expression into LaTeX. Currently, the stylesheets handle conversion from presentation MathML only, but support for content MathML is likely to be added in the future.

The XSLT MathML library consists of the following six stylesheets:

  • mmltex.xsl

  • tokens.xsl

  • glayout.xsl

  • scripts.xsl

  • tables.xsl

  • entities.xsl

The templates for transforming token elements, layout schemata, scripts, tables, and entities are placed in separate stylesheets for the sake of modularity. However, for transforming a given MathML document, you only need to refer to a top-level stylesheet called mmltex.xsl. This contains commands for importing the template definitions from all the other stylesheets.

To use the XSLT MathML Library for translating MathML into LaTeX, you have two options. You can do interactive trials on the Web page set up by the author of the library: http://www.raleigh.ru/MathML/mmltex/online.php (Figure 10.8). You can enter any MathML expression in the text area provided and then click a button to view the corresponding LaTeX markup in the same page. This is useful for translating individual equations and experimenting with how the stylesheets work.

click to expand
Figure 10.8: Converting presentation MathML into LaTeX using XSLT transformations.

If you are processing a larger volume of files, you can also download all the stylesheets in zipped form from the following URL: http://www.raleigh.ru/MathML/mmltex/mmltex.zip. Once you unzip the files, you can use an XSLT processor, such as Xalan or Saxon, to do the transformations locally on your own machine.

The ORCCA Converter

Another approach for translating MathML equations into LaTeX is to use a custom program written in a low-level language such as Java. In Section 10.3, we discussed the online TeX to MathML converter created by Stephen Watt's group at ORCCA. This uses a map file containing templates that define the correspondence between specific TeX constructs and their MathML counterparts. A Java program is then used to apply the templates in the map file to any arbitrary LaTeX equation and produce MathML as output.

The ORCCA group has also created a converter that uses the same methodology for doing the reverse transformation. This too uses a map file but this time for translating presentation MathML into LaTeX. An online demo of this converter is available at the following URL: http://www.orcca.on.ca/MathML/texmml/mmltotex.html. You can type in any arbitrary presentation MathML as input in a text area and then click a button to view the LaTeX output (Figure 10.9) in the same Web page. The demo page also gives you the option of uploading a MathML file from your computer and receiving the LaTeX output in a separate file.

click to expand
Figure 10.9: Converting presentation MathML into LaTeX using the online converter at ORCCA.

xmltex

The ORCCA converter and the XSLT MathML library, discussed above, are both limited to translating individual MathML expressions into LaTeX. For large-scale document-processing, for example as part of a publisher's workflow, it is useful to have a way of processing entire XML documents that contain embedded MathML while still using TeX as a formatting engine. This is clearly a much more challenging task than converting individual formulas. However, the foundation for this type of conversion has been provided by David Carlisle, in the form of a program called xmltex.

xmltex is a parser for XML documents and is written entirely in TeX.. You can configure xmltex to trigger specific TeX commands when it encounters a particular type of element, attribute, processing instruction, or entity in the input XML document. xmltex thus serves as a valuable bridge that connects the worlds of TeX and XML. It allows TeX's powerful typesetting capabilities to be applied not just to TeX documents but to arbitrary XML documents. You can download the xmltex program along with documentation for it at the following URL: http://www.dcarlisle.demon.co.uk/xmltex/manual.html.

xmltex can process XML documents that combine elements from different namespaces; for example, XHTML documents that contain embedded MathML. The xmltex program by itself does not have any knowledge of specific XML formats. All information about a specific XML format must be specified in additional package files (with a .xmt extension). A separate xmt file is required for each XML document type, such as XHTML, DocBook, TEI, or MathML. By including a command of the following form in a catalog file, you can associate the namespace for a specific document type with a particular xmt file:

    <dis1>\NAMESPACE{URL}{xmt-file} 

When xmltex processes an XML document and encounters elements from a particular namespace, it loads the xmt file corresponding to that namespace. For example, the following command specifies that the mathml2.xmt package should be loaded whenever the input XML document contains an element that belongs to the MathML namespace:

    <dis1>\NAMESPACE{http://www.w3.org/1998/Math/MathML}{mathml2. xmt} 

The mathml2.xmt package is included with the standard xmltex distribution. It contains TeX commands for typesetting most of the common presentation MathML elements.

The catalog file, which specifies which xmt file should be associated with a particular namespace, has a .cfg file extension. You can define a specific catalog file for each document. So, for example, to typeset an XML document called test.xml, you would create a catalog file called test.cfg. If a document-specific catalog file is not found, the default configuration file xmltex.cfg is used.

Let's look at a simple example of using xmltex to typeset an XHTML document that contains MathML. We can set up an xhtml.xmt file that defines LaTeX commands that correspond to each XHTML element used in this document. Example 10.3 shows the contents of this file.

Example 10.3: An xmt package that defines LaTeX commands for specific XHTML elements.

start example
    \DeclareNamespace{xhtml}{http://www.w3.org/ 1999/xhtml}    \XMLelement{xhtml:html}    {}      {\documentclass{article}       \begin{document}      }      {\end{document}}    \XMLelement{xhtml:head}    {}      {}{}    \XMLelement{xhtml:body}    {}      {}{}    \XMLelement{xhtml:h1}    {}      {\xmlgrab}      {\title{#1}       \maketitle}    \XMLelement{xhtml:p}       {}       {\par}        {\par} 
end example

The first line in the xhtml.xmt file specifies the namespace associated with all XHTML elements in the file. Each \XMLelement{name} defines LaTeX commands to be used when an element called name is encountered. The file contains LaTeX commands that correspond to the XHTML elements html, head, body, h1, and p. Example 10.4 shows an XHTML+MathML document called test.xml, which uses only these XHTML elements. Figure 10.10 shows how this file looks when displayed by Mozilla.

Example 10.4: An XHTML+MathML document called test.xml.

start example
    <html xmlns="http://www.w3.org/1999/xhtml">    <head></head>    <body>    <h1>Using TeX to Typeset MathML</h1>    <h2>Subscript and Superscript</h2>    <p>    <math xmlns="http://www.w3.org/1998/Math/MathML">     <mrow>       <msup><mi>a</mi><mn>1</mn></msup>       <mo>+</mo>       <msub><mi>b</mi><mn>2</mn></msub>     </mrow>    </math>    </p>    <h2>Fraction</h2>    <p>    <math xmlns="http://www.w3.org/1998/Math/MathML">      <mfrac>        <mrow><mi>z</mi><mo>+</mo><mi>1</mi></mrow>        <mrow><mi>z</mi><mo>-</mo><mi>2</mi></mrow>      </mfrac>    </math>    </p>    <h2>Radical</h2>    <p>    <math xmlns="http://www.w3.org/1998/Math/MathML">      <mroot>        <mrow><mi>x</mi><mo>+</mo><mi>1</mi></mrow>        <mn>3</mn>      </mroot>    </math>    </p>    <h2>Subscript-superscript pair</h2>    <p>    <math xmlns="http://www.w3.org/1998/Math/MathML">      <msubsup>        <mi>A</mi>        <mi>i</mi>        <mn>j</mn>      </msubsup>    </math>    </p>    <h2>Table</h2>    <p>    <math xmlns="http://www.w3.org/1998/Math/MathML">      <mtable>      <mtr><mtd><mn>1</mn></mtd><mtd><mn>2</mn></mtd>      <mtd><mn>3</mn></mtd></mtr>      <mtr><mtd><mi>a</mi></mtd><mtd><mi>b</mi></mtd>      <mtd><mi>c</mi></mtd></mtr>      </mtable>    </math>    </p>    </body>    </html> 
end example

click to expand
Figure 10.10: The file test.xml viewed in Mozilla.

TeX cannot process XML files directly, only TeX files. Hence, to run xmltex on the XML document shown in Example 10.4, you first need to create a text file, called test.tex, with the following lines in it:

    \def\xmlfile{test.xml}    \input test.tex 

For the xhtml.xmt file to be automatically loaded whenever the XHTML namespace is encountered, you must create a catalog file called test.cfg that contains the following line:

    \NAMESPACE{http://www.w3.org/1999/xhtml} {xhtml.xmt} 

You can then run the following command in your LaTeX installation to parse the XML file using xmltex:

    latex test.tex 

The result is a DVI file (Figure 10.11) called test.dvi that contains the typeset output produced by TeX. Alternatively, you can run the pdflatex command to generate a PDF file of the typeset output, as shown here:

    pdflatex test.tex 

click to expand
Figure 10.11: The DVI file produced by typesetting test.xml using xmltex.

By suitably defining xmt package files for specific document types, you can typeset any XML document using TeX's formatting capabilities. A good example of this approach is the PassiveTeX project of Sebastian Rahtz. He has created a fotex.xmt package and a style file that provides a fairly complete implementation of the XSL-FO format. As discussed in Section 8.4, XSL-FO is a W3C standard for specifying the detailed layout and formatting of XML documents. You can use an XSLT stylesheet for transforming a document in any arbitrary XML format, such as XHTML or DocBook, into XSL-FO. Once an XSL-FO document is obtained, you can use Rahtz's package to typeset the document in TeX and directly create PDF files from the typeset output.

You can find more information about PassiveTeX at the following URL: http://www.tei-c.org.uk/Software/passivetex/. This site provides sample input files and XSLT stylesheets for converting XML documents in TEI format into XSL-FO and then processing them with TeX to get PDF files as output. The site also provides an example of typesetting a fairly complex XML document that contains MathML.

The PassiveTeX project is a good prototype for how TeX can be used for typesetting arbitrary XML documents. The same approach can be applied to any other XML format, including XHTML+MathML. Of course, the task of writing a macro package that will translate all elements of a given XML format into their TeX equivalents can be quite challenging. However, once the initial implementation is done, the process is flexible and robust enough for large-scale adoption as part of a publisher's production workflow. This discussion shows that TeX can continue to play an important role for generating high-quality printed output, using XHTML+MathML documents as a source.



 < Day Day Up > 



The MathML Handbook
The MathML Handbook (Charles River Media Internet & Web Design)
ISBN: 1584502495
EAN: 2147483647
Year: 2003
Pages: 127
Authors: Pavi Sandhu

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net