Hack 48 Process XML Documents with XSL-FO and FOP

   

figs/moderate.gif figs/hack48.gif

Use Apache's FOP engine together with XSL-FO to generate PDF output.

Apache's FOP or Formatting Objects Processor (http://xml.apache.org/fop/) is an open source Java application that reads an XSL-FO (http://www.w3.org/TR/xsl/) tree and renders the result primarily as PDF. However, other formats are possible, including Printer Control Language (PCL), PostScript (PS), Scalable Vector Graphics (SVG), an area tree representation of XML, Java Abstract Windows Toolkit (AWT), FrameMaker's Maker Interchange Format (MIF), and text.

XSL-FO defines formatting objects that help describe blocks, paragraphs, pages, tables, and so on. These formatting objects are aided by a large set of formatting properties that control fonts, text alignment, spacing, etc., many of which match the properties used in CSS (http://www.w3.org/Style/CSS/). XSL-FO's formatting objects and properties provide a framework for creating attractive, printable pages.

XSL-FO is a huge, richly detailed XML vocabulary for formatting documents for presentation. XSL-FO is the common name for the XSL specification produced by the W3C. The spec is nearly 400 pages long. At one time, XSL-FO and XSLT (whose finished spec is less than 100 pages) were part of the same specification, but split into two specs in April 1999. XSLT became a recommendation in November 1999, but XSL-FO did not achieve recommendation status until October 2001.

To get you started, we'll go over a few simple examples. The first example, time.fo (Example 3-33), is an XSL-FO document that formats the contents of the elements in time.xml.

Example 3-33. time.fo
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">  <fo:layout-master-set>   <fo:simple-page-master master-reference="Time"        page-height="11in" page-width="8.5in" margin-top="1in"        margin-bottom="1in" margin-left="1in"        margin-right="1in">    <fo:region-body margin-top=".5in"/>    <fo:region-before extent="1.5in"/>    <fo:region-after extent="1.5in"/>   </fo:simple-page-master>  </fo:layout-master-set>  <fo:page-sequence master-name="Time">   <fo:flow flow-name="xsl-region-body">      <!-- Heading -->  <fo:block font-size="24px" font-family="sans-serif"     line-height="26px" space-after.optimum="20px"     text-align="center" font-weight="bold"    color="#0050B2">Time</fo:block>      <!-- Blocks for hour/minute/second/atomic status -->  <fo:block font-size="12px" font-family="sans-serif"     line-height="16px"    space-after.optimum="10px" text-align="start">Hour: 11 </fo:block>  <fo:block font-size="12px" font-family="sans-serif"     line-height="16px"    space-after.optimum="10px" text-align="start">Minute: 59</fo:block>  <fo:block font-size="12px" font-family="sans-serif"     line-height="16px"    space-after.optimum="10px" text-align="start">Second: 59</fo:block>  <fo:block font-size="12px" font-family="sans-serif"     line-height="16px"    space-after.optimum="10px" text-align="start">Meridiem: p. m.</fo:block>  <fo:block font-size="12px" font-family="sans-serif"     line-height="16px"    space-after.optimum="10px" text-align="start">Atomic? true</fo:block>       </fo:flow>  </fo:page-sequence> </fo:root>

3.19.1 XSL-FO Basics

The root element of an XSL-FO document is (surprise) root. The namespace name is http://www.w3.org/1999/XSL/Format, and the conventional prefix is fo. Following root is the layout-master-set element where basic page layout is defined. The simple-page-master element holds a few formatting properties such as page-width and page-height, and some margin settings (you could use page-sequence-master for more complex page layout, in place of simple-page-master). The region-related elements such as region-body are used to lay out underlying regions of a simple page master. The master-reference attribute links with the master-name attribute on the page-sequence element.

The page-sequence element contains a flow element that essentially contains the flow of text that will appear on the page. Following that is a series of block elements, each of which has properties for the text it contains (blocks are used for formatting things like headings, paragraphs, and figure captions). Properties specify formatting such as the font size, font family, text alignment, and so forth.

3.19.2 Generating a PDF

FOP is pretty easy to use. To generate a PDF from this XSL-FO file, download and install FOP from http://xml.apache.org/fop/download.html. At the time of this writing, FOP is at Version 20.5. In the main directory, you'll find a fop.bat file for Windows or a fop.sh file for Unix. You can run FOP using these scripts.

To create a PDF from time.fo, enter this command:

fop time.fo time-fo.pdf

time.fo is the input file and time-fo.pdf is the output file. FOP will let you know of its progress with a report like this:

[INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [INFO] FOP 0.20.5 [INFO] Using org.apache.xerces.parsers.SAXParser as SAX2 Parser [INFO] building formatting object tree [INFO] setting up fonts [INFO] [1] [INFO] Parsing of document complete, stopping renderer

The result of formatting time.fo with FOP can be seen in Adobe Reader in Figure 3-26.

Figure 3-26. time-fo.pdf in Adobe Reader 6
figs/xmlh_0326.gif


You can also incorporate XSL-FO markup into an XSLT stylesheet, then transform and format a document with just one FOP command. Example 3-34 shows a stylesheet (time-fo.xsl) that incorporates XSL-FO.

Example 3-34. time-fo.xsl
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="xml" encoding="utf-8" indent="yes"/>     <xsl:template match="/"> <fo:root>  <fo:layout-master-set>   <fo:simple-page-master master-reference="Time"        page-height="11in" page-width="8.5in" margin-top="1in"       margin-bottom="1in" margin-left="1in"        margin-right="1in">    <fo:region-body margin-top=".5in"/>    <fo:region-before extent="1.5in"/>    <fo:region-after extent="1.5in"/>   </fo:simple-page-master>  </fo:layout-master-set>  <fo:page-sequence master-name="Time">   <fo:flow flow-name="xsl-region-body">    <xsl:apply-templates select="time"/>   </fo:flow>  </fo:page-sequence> </fo:root> </xsl:template>     <xsl:template match="time">  <!-- Heading -->  <fo:block font-size="24px" font-family="sans-serif"       line-height="26px" space-after.optimum="20px"       text-align="center" font-weight="bold" color="#0050B2">      Time  </fo:block>      <!-- Blocks for hour/minute/second/atomic status -->  <fo:block font-size="12px" font-family="sans-serif"       line-height="16px"      space-after.optimum="10px" text-align="start">      Hour: <xsl:value-of select="hour"/>  </fo:block>  <fo:block font-size="12px" font-family="sans-serif"       line-height="16px"      space-after.optimum="10px" text-align="start">      Minute: <xsl:value-of select="minute"/>  </fo:block>  <fo:block font-size="12px" font-family="sans-serif"       line-height="16px"      space-after.optimum="10px" text-align="start">      Second: <xsl:value-of select="second"/>  </fo:block>  <fo:block font-size="12px" font-family="sans-serif"       line-height="16px"      space-after.optimum="10px" text-align="start">      Meridiem: <xsl:value-of select="meridiem"/>  </fo:block>  <fo:block font-size="12px" font-family="sans-serif"       line-height="16px"      space-after.optimum="10px" text-align="start">      Atomic? <xsl:value-of select="atomic/@signal"/>  </fo:block> </xsl:template>     </xsl:stylesheet>

The same XSL-FO markup that you saw in time.fo is interspersed with templates and instructions that transform time.xml. Now, with this command, you can generate a PDF like the one that you generated with time.fo:

fop -xsl time-fo.xsl -xml time.xml -pdf time-fo.pdf

3.19.3 See Also

  • XSL-FO, by Dave Pawson (O'Reilly)



XML Hacks
XML Hacks: 100 Industrial-Strength Tips and Tools
ISBN: 0596007116
EAN: 2147483647
Year: 2006
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net