10.4 Converting Complete Documents

 < Day Day Up > 



10.4 Converting Complete Documents

The tools discussed so far make it easy to convert into MathML individual formulas in TeX or LaTeX syntax. You can use any of these tools to convert an entire TeX paper into an HTML document that contains MathML equations. However, the conversion involves several steps, each of which must be handled differently. First, you must convert the text in the TeX document into HTML using one of the other tools available for this purpose, such as LaTeX2HTML. Then, you must translate each equation into MathML individually and paste the equations at the appropriate place in the HTML document. Following all the steps in the process can be laborious and time-consuming, especially for large and complex documents. A much simpler and faster option is to use a tool that can translate a complete TeX or LaTeX document into HTML and MathML in one go. The two most prominent tools of this type are TeX4ht and TtM.

We specifically omit discussion of LaTeX2HTML, a popular program for converting LaTeX documents to HTML. This is because there is currently no ready-made solution for customizing LaTeX2HTML, so it converts mathematical formulas into MathML instead of images. A trial project for doing this was initiated by Russ Moore in 1998 (see http://www.geom.umn.edu/~ross/webtex/webtex for details). He developed a prototype method for generating MathML output from LaTeX2HTML, by using Perl subroutines. However, this project is no longer under active development. Hence, most users who want to convert LaTeX documents into HTML+MathML will find it much easier to use TeX4ht or TtM instead.

TeX4ht

TeX4ht, developed by Eitan Gurari of Ohio State University, is a powerful and versatile system for converting TeX documents into HTML or XML formats. In its default mode, TeX4ht converts a TeX or LaTeX document into HTML with all mathematical formulas saved as images. However, TeX4ht can be readily configured to produce other document types such as XHTML, DocBook, or Text Encoding Initiative (TEI). It can also convert formulas present in the original document into MathML instead of images.

The TeX4ht system has two main components: a set of style files and a postprocessor. The process of converting a TeX document takes place in two stages: first, TeX processes the original document using the style files provided by TeX4ht. The result is a DVI file that contains "hooks" or special instructions meant for the TeX4ht postprocessor. In the next stage, the postprocessor acts on the DVI file and interprets the hooks in the file to produce the final output.

Since TeX itself handles the conversion from the original TeX document to the DVI file, TeX4ht has access to the full power of TeX for typesetting the document. In particular, TeX4ht can use TeX's capabilities for handling fonts, macros, variables, and so on to control the form of the output. TeX4ht can also handle most user-defined macros that occur in a LaTeX document.

TeX4ht has a number of nice features. You can place separate sections of the LaTeX document on separate Web pages, with appropriate hyperlinks connecting them. You can also create HTML versions of tables of contents, bibliographies, and so on. The LaTeX Web Companion by Michel Goossens and Sebastian Rahtz (see Appendix B for details) provides detailed instructions on customizing the output of TeX4ht.

For converting TeX documents into XHTML+MathML documents, you do not have to customize TeX4ht yourself, since most of the work has already been done. Paul Gartside of the University of Pittsburgh has created a modified form of TeX4ht called TeX4moz. This contains some additional scripts and configuration files that customize the output of TeX4ht to produce XHTML+MathML files that can be displayed by Mozilla. More information on TeX4moz is given on Gartside's Mathzilla Web site: http://pear.math.pitt.edu/mathzilla.

Installing TeX4ht

Since TeX4ht uses TeX to handle the first stage of processing the source document, you must have a working installation of TeX already present on your system. If you do not already have TeX installed, you can download all the relevant files from the TeX User's Group Web site: http://www.tug.org. This site also contains a wealth of information on all aspects of TeX, ranging from tutorials for beginners to specialized information for more advanced users.

On Windows

To install TeX4moz on Windows, follow these steps:

  1. Download TeX4moz. You can get the Windows version in zipped form at the following URL: http://pear.math.pitt.edu/mathzilla/tex4mozDownload.html.

  2. Create a directory called c:\tex4ht and unzip all the files into this directory.

  3. Modify the files tex4ht.env and moz4ht.env by editing the lines starting with tc:\path\tfm! to specify the directories in which the tfm files of TeX are located on your machine. For example, if you are using MikTeX, which has tfm files in c:/texmf/fonts/tfm, change the above line in each .env file to tc:\texmf\fonts\tfm\!. The ! at the end of the line indicates that TeX4ht should search all subdirectories of the specified path for the font metric files.

  4. Rename the htlatex.tab, httex.tab, mztex.tab, and mzlatex.tab files to change the file extension from .tab to .bat.

  5. Add the c:\tex4ht directory to your path. To do this on Windows 2000/XP, open the System control panel, click the Advanced tab, click the Environment Variables button, select Path in the list of system variables, and click Edit. In the dialog that comes up, add c:\tex4ht as one of the values of the Path variable, and then click OK.

  6. Move tex4ht.sty and all the ‘.4ht' files to the c:\tex4ht directory. Alternatively, you can modify the environment variable TEXINPUTS to point to c:\tex4ht, using the same procedure outlined in Step 5.

On Unix

There are two ways of installing TeX4ht on Unix. You can install it in your local directory, in which case it is available for use only by you. Alternatively, if you have root access, you can do a root installation, in which case the program will be available to all users who have access to that machine. The installation on Unix requires the following steps:

  1. Download the archive that contains the package files.

  2. Untar and decompress the archive.

  3. Run the installer.

For a local installation, one additional step is required. You need to modify the value of the environment variables PATH and TEXINPUTS so that they point to the directory in which the TeX4moz files are installed. You can change the value of these variables by editing your configuration file.

Unlike on Windows, there is no need to change the file extensions of any files. You can get the zipped files as well as the detailed installation instructions at the following URL: http://pear.math.pitt.edu/mathzilla/tex4mozDownload.html.

Running TeX4ht

To process a document using TeX4ht, you run a command of the following form:

    mzlatex filename 

The output file is specially optimized for viewing in Mozilla. It is an XHTML file, contains a DOCTYPE declaration to the XHTML DTD, and has a .xml file extension. Hence, this file cannot be rendered in IE. However, you can easily modify the file so it is viewable in IE using either MathPlayer or IBM techexplorer. Just add a statement that references the Universal MathML stylesheet, as explained in Section 7.2.

Let us look at an example of using TeX4ht to translate a TeX document into XHTML+MathML. Example 10.2 shows a LaTeX document that contains some mathematical formulas.

Example 10.2: A LaTeX document called article.tex that contains inline and display equations.

start example
    \documentclass{article}    \begin{document}    \title{Electronic Structure of a Two-Dimensional Metal}    \maketitle    The effect of the magnetic field can be included in the    electronic structure calculation by using the Peierls    substitution $$ t_l\rightarrow t_l e^{i{2\pi \over    \phi_o}\int_{i,j}^{i',j'} {\bf A} \cdot d{\bf l}}$$    where $\phi_o=hc/e$ and $\bf A$ are the flux quantum and    the vector potential, respectively.    For simplicity, we choose the Landau gauge ${\bf A}=-    B(y,0,0)$. By following a standard procedure, we rewrite the    Hamiltonian as a function of magnetic field in {\bf k}-    space. It is straightforward to compute the thermodynamic    quantities from the field $$ \Omega = -{2 \over \beta}    \sum_{i=1}^{4 \tilde q} \sum_{\bf k} {\rm ln} [1+e^{-\beta    (E_i({\bf k})-\mu)}]$$ where $\beta$, $E_i({\bf k})$ and    $\mu$ denote the inverse temperature, the dispersion    relation of the $i$-th magnetic subband and the chemical    potential, respectively.    The field dependence of the chemical potential is    calculated by inverting the constraint equation for    occupation $$ N = 2 \sum_{i=1}^{4\tilde q} \sum_{\bf k} {1    \over e^{\beta (E_i({\bf k})-\mu)}+1} $$ where the factor 2    comes from the spin degeneracy. Because there are six    electrons per unit cell distributed among four bands at zero    magnetic field, the total occupancy factor $N/N_{max}$ is    3/4 where $N_{max} = 2\sum_{\bf k}\sum_{i=1}^{4\tilde q}1$.    Once the chemical potential and thermodynamic potential are    calculated as a function of fields, it is straightforward to    compute the magnetization $M=-dF/dB$ from the free energy    $F=\Omega+\mu N$.    \end{document} 
end example

To process this document using TeX4moz, run the following command:

    mzlatex article.tex 

Several auxiliary files are created in the same directory as the input file and a large number of messages are displayed on the screen, just like when you are processing the document using TeX. This is, of course, because TeX4ht itself calls TeX. Once the TeX processing is over, the final output document called article.xml is created. This is an XHTML+MathML document that contains the appropriate DOCTYPE declarations needed so it can be viewed in Mozilla. Figure 10.4 shows how article.xml, looks when viewed in Mozilla.

click to expand
Figure 10.4: Converting the LaTeX document article.tex into XHTML+MathML using TeX4ht.

Compare this with the output produced by processing the same input document using LaTeX. This is shown in Figure 10.5. You can see that the quality of the rendering produced by Mozilla is comparable to that of the TeX output.

click to expand
Figure 10.5: The DVI file produced by processing article.tex using TeX.

TtM

TtM is a commercial program for converting TeX or LaTeX documents into HTML+MathML documents. It was developed and is maintained by Ian Hutchinson of MIT. TtM is available for Linux and Windows. The Linux version is available for free, while the Windows version sells for $40.

TtM is a modified version of another program, called TtH, which converts LaTeX documents into HTML. The difference between the two is that TtH converts formulas in the original LaTeX document into images, while TtM converts the formulas into MathML. Both TtM and TtH use the Symbol font, available to most browsers, to represent special characters and symbols.

TtM supports many of the complex features of LaTeX, including macros, tables, and bibliographies. Some special types of TeX input that do not have a clear counterpart in HTML are not translated. TtM will generate a warning or error message if it encounters any TeX or LaTeX constructs it does not recognize. These messages are directed to stderr, which typically means they are displayed on the terminal. However, on Unix systems, these messages can also be redirected to a file.

Installing TtM

You can get the source for TtM as a zipped archive from the following URL: http://hutchinson.belmont.ma.us/tth/mml/. You must extract the files from this archive and then run the installer program to get the executable file. You can then place the executable file in any directory located on your path. Detailed installation instructions are provided with the product.

Running TtM

To convert a given TeX or LaTeX document into HTML, you run the following command:

    ttm < test.tex 

By default, the output is written to a file with the same name as the input file but with the extension .html. Hence, the above command would produce an output file called test.html. You can explicitly specify a different name for the output file using the redirection operator >, as shown here:

    ttm < test.tex > output.html 

TtM provides various "switches" or command-line options to customize the output. Some of the important options are:

  • -w*: determines the style of HTML that is produced. If the option is -w0, a title element is not added to the output. If the option is specified as -w1, head and body tags are inserted into the output.

  • -c: adds the prefix head "Content type: text/HTML" for specifying the MIME type of the file.

  • -e*: determines how PostScript figures in the original LaTeX document are handled. If the option is specified as -e1, all figures in the input document are converted to GIF images using the ps2gif utility, which must be present on the user's machine. If the option is specified as -e2, images are converted to inline GIF images. If the option is specified as -e0, the conversion to GIF does not take place and the figures are omitted from the output document.

  • -v: causes warnings and error messages to be produced in a more verbose format, which is useful for debugging.

  • -L filename: identifies the input document as LaTeX and specifies the name of auxiliary files for generating tables of contents and bibliographies.

Unlike TeX4ht and other tools for converting TeX to HTML, such as LaTeX2HTML, TtM does not require a TeX or LaTeX installation to be present on the user's machine. This makes TtM much more self-contained and portable. It is, however, still advisable to run TeX or LaTeX on the input file before the TtM translation so that you can ensure the file is free of syntax errors. If you do not have access to a TeX installation, you can debug the input document using the error messages produced by TtM alone. However, this can be difficult for users who do not have much experience with TeX.

A LaTeX installation is necessary if you are translating a LaTeX document that includes tables of contents or bibliographies. If you want to generate content of this type in LaTeX, the source file has to be processed multiple times. In the first pass, the TeX program generates forward references that are stored in auxiliary files. The information in the auxiliary files is then read by TeX in subsequent passes to generate the final table of contents or bibliography.

TtM can use the auxiliary files generated by LaTeX to generate hypertext versions of a table of contents and bibliography present in the input document. TtM looks for the auxiliary files in the same directory as the input file. By default, TtM expects each auxiliary file to have the same name as the input file but with a different extension. However, you can specify a different name for the auxiliary files by using the command-line option -L filename. You can also use the -L option without a filename to instruct TtM that the input file is in LaTeX format. This enables TtM to interpret LaTeX constructs in the input document even if the document lacks a \documentclass line to identify itself as a LaTeX document.

Figure 10.6 shows the HTML+MathML document obtained by processing the document of Example 10.2 using TtM when viewed in Amaya. Compare this with the output produced by TeX4ht and LaTeX from the same document, shown in Figures 10.4 and Figure 10.5.

click to expand
Figure 10.6: The HTML+MathML file produced by translating article.tex using TtM. The resulting file is viewed here in Amaya.

TtM is extremely fast and efficient. Conversion of even large TeX files takes less than a second. As mentioned earlier, TtM is self-contained and does not call TeX or LaTeX when processing a document. Because it is fast and self-contained, you can run TtM on a server via a CGI script to do real-time conversion of LaTeX documents over the Web. You can see examples of such conversion at the following URL: http://hutchinson.belmont.ma.us/tth/mml/ttmform.html (Figure 10.7). This page contains a text area for entering arbitrary LaTeX input. You can then click a button to submit the input to the server, where it is processed using TtM. The output produced by TtM is then displayed on the same Web page. You can try out this Web page for experimenting with different types of TeX and LaTeX input and seeing what types of output TtM generates.

click to expand
Figure 10.7: A conversion Web page for translating LaTeX documents using TtM.



 < Day Day Up > 



The MathML Handbook
The MathML Handbook (Charles River Media Internet & Web Design)
ISBN: 1584502495
EAN: 2147483647
Year: 2003
Pages: 127
Authors: Pavi Sandhu

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net