Unlike CSS which simply applies styles directly to elements in a single pass through the document, XSL gives you the opportunity to do major reorganization of your document. This capability comes at the cost of simplicity. To ease the burden on developers, the designers of XSL have split the process into two parts : transformation and formatting.
The transformation alters the structure of the input document and adds presentational information in a hybrid format composed of formatting objects. A formatting object is a container for content that associates styles and rendering instructions with the content. It is compact and easy for a person to understand. The formatting objects are arranged in a tree that retains structure used in building the final presentation.
The result of this transformation is a temporary file that you feed to an XSL-FO formatter. Through a complex series of steps, the formatter calculates the final geometry and appearance of the presentation and churns out a file suitable for printing or viewing on a screen. When it's finished, the formatting objects are flushed from memory and you can discard the temporary FO file. It is important to understand that you are not meant to write your own XSL-FO markup. Make all of your stylistic corrections in the XSLT stylesheet and let the tools do the rest.
Inside the formatter a complex operation in multiple phases takes place, illustrated in Figure 8-1. We start with a result tree (recall from Chapter 7 that a result tree is the product of an XSLT transformation) in the XSL-FO namespace. In phase one of formatting, the formatter translates this document into an object representation in memory in a process called objectification . This structure, a formatting object tree , is structurally similar to the result tree, but with some details changed. For example, all the character data will be replaced with fo:character node objects. By making the tree more verbose like this, later processing will run more efficiently .
Figure 8-1. The XSL-FO formatting process
After objectification comes phase two, refinement . The formatter begins to calculate the actual geometry of areas in the presentation, replacing relative values and constraints with concrete numbers . For example, a table cell whose width is specified as a percentage now will show that width as a number of pixels. The percentage value for a relative width in a table will be replaced with a calculated number of pixels. The product of this phase is a refined formatting object tree .
Next comes the formatting phase, in which formatting object properties are translated into a virtual picture of the document called an area tree . This in-memory representation of the output is composed of overlapping and nested canvases called areas . Each area is a region of the document with a set of traits that describe how its the content should be rendered. An area can be as large as a sidebar or as small as a single glyph.
Finally, the formatter enters its rendering phase. The area tree translates directly into some output format such as PDF or troff that exactly conforms to the original specifications. At last, you see something emerge from the black box, a tangible product that hopefully looks the way you wanted it to.
The normal user will not need to know how the phases from objectification to rendering actually work. She has put down her constraints in the XSLT stylesheet and the rest is automatic. The actual work that takes place inside the formatter is more of interest to developers who write and maintain such software.
8.1.2 Formatting Objects
Formatting objects come in a variety of types representing what you can do typographically in a document. There are FOs for blocks and inlines, section containers and individual character containers, page and style settings. The complete set of formatting objects is the vocabulary for formatting abstractions possible in XSL.
Each FO represents a semantic specification for a particular part of a document. It has default properties that you can override as necessary. These defaults depend on the context, such as whether the current writing mode is horizontal or vertical. An FO can contain other FOs, creating a structure that propagates inheritance of properties.
Some properties have a direct result on the formatting of an FO instance. For example, setting the color will have an immediate and obvious effect on text. Other properties set constraints that may or may not come into play. Widow and orphan settings for paragraphs are only applicable then the paragraph straddles a pagebreak. In some cases, properties may override other properties when a rule of aesthetics would be violated.
An FO creates or helps to create an abstract positional framework called an area. For every area, an FO specifies its position on the page, how to arrange and display its children, and whether any other decoration, such as a border, is required. For example, a single character will be formatted into an area large enough to hold one glyph.
8.1.3 Print, Screen, and Beyond
Being device- agnostic means XSL has to support both scrollable online presentation and the printed page. Fortunately, the online and print worlds have much in common. The basic way of fitting content in flows is the same, and both support the notion of floating objects (sidebars and footnotes), marginalia, and headers and footers. But there is much that is different.
Online representations typically involve one long, uninterrupted column of content, whereas pagination produces discrete regions with breaks in content and vertical margins. Pagination has other complexities, such as page layout types and line spacing for vertical justification.
The area model treats visual document structure in a generic way that is compatible with online and print needs. HTML frames can be derived easily from abstract structures, as can page layouts. It can be as complex as you need it to be.
At the top level of the FO tree is a page layout section which declares page layout master templates. These masters can be combined in sequences in a wide variety of ways. Content is stored in page sequence objects. These are the clay from which the document is sculpted.
Many layout settings are available to control page- related formatting. They include hyphenation, widow and orphan control, columnar layout, and so on.
A special concept is the viewport. A viewport is the physical region in which you can put content. In a web browser, the dimensions of the window determine the viewport area. Since the window can be resized, the viewport is variable. In print, a piece of paper is the viewport. This geometry does not usually change, except in cases where you rotate the view from portrait to landscape mode.
The XSL recommendation includes a set of aural properties for FOs. This is a radically different medium than print and screen, but an important one nonetheless. Speech synthesis is a maturing technology already in use by visually challenged people. We can expect its use only to increase, so the XSL designers have planned its support from the beginning.
The aural medium has many analogies to typographical styles. Where a visual medium would use space to separate objects, an aural processor would use an interval of silence. Emphasized text could be rendered in higher pitch, different volume, or altered speed. In place of typefaces , you might think of the kind of voice used, such as a child's, older man's, or a robot's.
Though I would love to explore these properties in this book, constraints of scope and the printed medium prevent me from doing an adequate job. So I will leave that for another writer and concentrate on visual formatting instead.
There are a variety of implementations of XSL-FO formatters available, from free and open source to commercial products. At the time of this writing, here is what I found available:
Implementations vary in how well they conform to the XSL specification. The recommendation defines three levels of conformance. It says that the basic level "includes the set of formatting objects and properties needed to support a minimum level of pagination or aural rendering." Extended adds to that everything else in the standard except shorthand properties , which collect a group of properties in one assignment, like border in CSS. Complete conformance has all of that plus shorthands.
This division does not frequently work in the real world. As an example, page-oriented formatters usually do not have any support for aural (speech synthesis) formatting, so even the official basic conformance is rare. Most users are probably interested in using XSL for one medium anyway, so it is more useful in my opinion to think of conformance within each medium. For example, does a formatter correctly produce a page layout from the formatting objects in the print medium?
Often developers leave out features that are difficult to implement and that they think will not be in high demand. Bidirectional support needed for scripts such as Hebrew or Chinese is not present in most formatters. Arbortext's formatter does not support mixed language hyphenation. FOP has numerous issues with tables, such as requiring fixed widths of columns . In time, many of these features will be incorporated, but it will depend on demand and developer time.
The XSL-FO recommendation has left out some important features that may be added in future versions. For example, index generation is still rather primitive. In the meantime, vendors are adding their own extensions using XML namespaces. Some examples of this are the fox:outline extensions in FOP to generate PDF bookmarks and rx:page-index in XEP to help create page numbers in indexes.
The formatter I like to use for XSL-FO is the free and open source FOP. It has pretty decent print-oriented support and outputs to PDF, MIF, PostScript, and PCL. The formatter is written in Java and its install package includes Apache's XML parser Xerces and XSLT engine Xalan, which you can download from http://xml.apache.org.  The FOP download includes the parser and XSLT classes in one package. It also contains a shell script, fop , that runs Java and supplies the arguments to the formatter.
Below is an example of FOP as invoked from the command line. The argument -xsl db.xsl gives the name of the XSLT stylesheet. The next argument, -xml chap1.xml supplies the source file name. The third, -pdf chap1.pdf , tells which format to output (PDF) and what to call the output file.
> fop -xsl db.xsl -xml chap1.xml -pdf chap1.pdf [INFO] FOP 0.20.4 [INFO] building formatting object tree [INFO]  [INFO]  [INFO]  ... [INFO]  [INFO]  [INFO]  [INFO] Parsing of document complete, stopping renderer
The numbers in brackets represent the number of the page being rendered once the FO tree has been constructed .
If this process doesn't work, there could be several reasons. First, the source document may not be well- formed , and you will see output like this:
[INFO] FOP 0.20.4 [ERROR] The markup in the document preceding the root element must be well-formed.
Check the files with a parser to rule out this possibility. Second, the XSLT stylesheet may have an error. It may not be well-formed XML, or it may be invalid XSLT structurally. Here, I misspelled an element name in the XSLT and got a cryptic error message:
[INFO] FOP 0.20.4 [ERROR] null
Check all the elements in the xsl namespace to make sure it is valid. You might want to test it with an XSLT engine that has verbose output to see where in the file the error occurs. xsltproc from gnome.org has very good parse error messages.
Third, the formatting object tree may contain invalid markup:
[INFO] FOP 0.20.4 [INFO] building formatting object tree [ERROR] Unknown formatting object http://www.w3.org/1999/XSL/Format^robot [ERROR] java.lang.NullPointerException
This occurred because I changed the name of a formatting object from fo:root to fo:robot . Although FOP's error messages aren't the easiest to read, at least it did manage to tell me that it was a formatting object naming problem.