Chapter 1. Introducing XSL-FO | Definitive XSL-FO

Information that is presented in a browser window on the screen must be presented very differently in printed form. Fixed-size folios or pages necessarily require different navigation methods compared to the dynamic tools we use to navigate through a long selection of information in a browser. In the past, this has required our organizations to maintain two versions of the information: one for use in a variable-width , effectively infinite-length browser window, and the other for the printed page.

Perhaps we authored the information directly in the HyperText Markup Language (HTML) for use on the screen and then took the very same information into either a word processing application or publishing tool to produce the hardcopy version. Both these approaches are very presentation-oriented in that we need to capture our information twice, using the different constructs designed for these tools to format the appearance for the two reading audiences. Our maintenance effort for keeping the information up-to-date is doubled .

Consider the need to present training information to two audiences, one online and the other in printed form. The HTML presentation of a snippet of some training material in a web browser could look as in Figure 1-1 where an entire module is rendered as a single web page and the content shown is found somewhere in the middle of the page. Note the use of hyperlinks on the page in three of the bullets, the middle hyperlink having the focus as reflected in the status bar at the bottom of the window. When viewing the screen, the reader does not need to know where a hyperlink is terminated, or even whether it is terminated somewhere on the same page, because the act of interacting with the hyperlink dynamically moves the reader to the target address.

Figure 1-1. Training information in an HTML window

graphics/01fig01.jpg

The markup used for the hyperlinks in Example 1-1 captures the presentation of this information using the anchor vocabulary of the web browser (HTML) to express how the information is rendered on the screen.

Note on lines 14 and 15, in the markup for the highlighted hyperlink, how the title of the section referenced by the anchor is part of the anchor itself. The same duplication of information would be necessary in a word processing document. When authoring complete corpus of training material, the maintenance of all such references can be a nightmare, as any change in a title must be reflected everywhere the title is used.

Think about how this web page would be printed. If you had a color printer, you would recognize the presence of the hyperlink and how hat underscored text is different from the argument to the function documented at the top of the window. This fragment is only a small art of a rendering of the module of training material, so the entire module would probably span dozens of pages. It would be very frustrating to know there is a link on your printed page to somewhere in your document, and not be able to quickly traverse it.

Example 1-1 An example of HTML markup

 Line 01     <li>e.g. merge-property-values(<u><i>property-name</i></u>)      02       <ul>      03         <li>this looks for the property value directly from the       04 particular sibling object specification corresponding to a given       05 state of the user interface </li>      06       </ul>      07     </li>      08   </ul>      09   <li>summarized in <a href="module-B.htm#funcname">Functions       10 summarized by name</a></li>      11 </ul>      12 <p>Numerous property data types</p>      13 <ul>      14   <li>summarized in  <a href="module-D.htm#datatypes">Property  15  data types</a>  </li>      16   <li>simple-valued and compound-valued values</li>      17 </ul>      18 <p>Numerous functions available for expressions</p>      19 <ul>      20   <li>summarized in <a href="module-B.htm#funcname">Functions       21 summarized by name</a></li>      22   <li>numeric functions can be applied to length values by       23 reducing the "unit power" and adding it back again after</li>

An Extensible Markup Language (XML) document is fundamentally different from an HTML document or a word processing document in hat we can author our information in any vocabulary of element types and attributes that describes the data but not the presentation of that data. For this example, we could choose to create a hyperlink in our training material using a simple empty reference element without any "clickable text" in the actual authored markup, as shown in Example 1-2.

Note that the same hyperlink, on line 15 in XML markup, is an empty ref element. At the time the hyperlink is authored, the need to reference a particular location is all that is captured in the idref attribute, without any indication of how the hyperlink is presented to the user. Through indirection, the displayed text for presenting the hyperlink can be derived at production time from the title child element of the hyperlink's target element on line 32. We get the same presentation, but if any changes are made to any of the titles, all references to these titles will be properly presented. The W3C has developed the Extensible Stylesheet Language Transformations (XSLT)Recommendation to do such rearranging of instances of XML information into instances of other vocabularies.

Example 1-2 An example of XML markup

 Line 01 <course>      02   <title>Practical Formatting Using XSL-FO</title>      03       04   <module id="basic">      05     <title>Basic concepts of XSL-FO</title>      06     <lesson id="vocab">      07       <title>Formatting object XML vocabulary</title>      08       <frame id="propexp">      09         <title>Property value expressions</title>      10         ...      11           <point>summarized in <ref idref="funcname"/></point>      12         </points>      13         <para>Numerous property data types</para>      14         <points>      15           <point>summarized in  <ref idref="datatypes"/>  </point>      16           <point>simple-valued and compound-valued values</point>      17         </points>      18         <para>Numerous functions available for expressions</para>      19         <points>      20           <point>summarized in <ref idref="funcname"/></point>      21         ...      22       </frame>      23     </lesson>      24   </module>      25   ...      26       <frame id="funcname">      27         <title>Functions summarized by name</title>      28         ...      29       </frame>      30       ...      31       <frame id="datatypes">      32         <title>Property data types</title>      33         ...      34       </frame>      35   ...      36 </course>

While the basic presentation of the material in both screen and paper formats is similar, the navigation tools need to be different when the presentation is designed for the paper medium. The browser environment can be recreated by transforming the XML vocabulary into the HTML vocabulary, creating the a anchor elements for the hyperlinks. The paper medium needs to support semantics not found in HTML, such as page numbers and page number citations, in order to allow the reader to properly traverse the hyperlinks in a collection of pages.

The Extensible Stylesheet Language Formatting Objects (XSL-FO) that is described in this book defines an XML vocabulary representing such pagination semantics. This is a powerful vocabulary for producing high quality printable output as a collection of fixed- sized pages. The page layout shown in Figure 1-2 is produced using the XSL-FO vocabulary for the presentation. The frame is presented on its own page and it contains references to frame elements elsewhere in the publication.

Figure 1-2. Training information in a printable page

graphics/01fig02.jpg

Just as XSLT is used to produce HTML from the XML of the training material, XSLT can be used in this example to transform the XML instance into an instance of the XSL-FO vocabulary. An XSL-FO formatting tool interprets the instance of XSL-FO to render page images. Note, below the middle of the page, how the hyperlink is presented as both the title of the referenced frame and the page number on which that frame is found. Note also that the current page number and the total page count are shown on the right of the page footer. The reader is now equipped to traverse the hyperlink in a way not possible when simply printing the HTML.

For producers of XML-based web services, XSL-FO is a way to meet the needs of the users who are unsatisfied with, or unwilling to accept, screen renderings and difficult to use printed browser pages. XSL-FO makes it possible to produce on demand lengthy information in a paginated form.

For web designers, XSL-FO allows printable versions of web pages to be made available to site visitors as downloadable print files, generated from the same source of information from which the HTML pages are generated. Some web sites even mimic print-like multiple pages to create more ad views per document, but such elaboration often confuses the printed output from the browser, thus necessitating making a paginated version available.

Just as we learned the HTML vocabulary to be able to control the presentation of our information in a web browser, we will learn the XSL-FO vocabulary to be able to control the layout and presentation of our information in a printable form. We will learn the new semantics, such as page number citations, and the ways to represent them in XSL-FO for the formatter to give us the results we need.