Chapter 8. Styling at the Character Level

CONTENTS

8.1 General Character Properties
8.2 Fonts

XSL-FO provides a number of features aimed at formatting text and dealing with characters, which provide fine-grained control over presentation. You can manage content on a character-by-character basis, or you can apply properties to larger chunks of text.

In this chapter, I discuss the options available for formatting at the character level and when you should use this level of formatting. I also introduce font usage.

Be aware that, as formatters are introduced, the available fonts are not likely to match those available for desktop publishing packages or word processors. Most packages allow you to add fonts, either purchased or downloaded. See the vendor literature for instructions on adding new fonts and for the list of included fonts.

8.1 General Character Properties

In many cases, what can be done at the character level could also be done at the inline level. This gives the stylesheet designer the choice of using either one. In some cases, the choice will be very clear. If you need to style only a single character, it makes sense to use the fo:character element. If it's necessary to style a block of text that is all inline, use the fo:inline element. Many of the characteristics available at the inline level are equally applicable at the character level, so the number of them to remember doesn't increase dramatically! The only properties unique to fo:character are treat-as-word-space, character, glyph-orientation-horizontal, glyph-orientation-vertical, and suppress-at-line-break. I'll describe each of these in this chapter.

The fo:character element is always empty. The character to be formatted is specified as an attribute of the element. While it is possible to specify more than one character, using the character="ABC" form, the working draft does say that this attribute specifies the Unicode character to be presented! This implies a single character but does not require it.

The fo:character element is the fundamental unit of formatting for the formatting engine, creating an area on the page whose size is determined by the font metrics for the glyph representing the Unicode code point specified. Presently, the Antenna House implementation permits this to be seen by indicating the borders around each character-level element when specified as a unit. This indicates, to some extent, the work being done by the formatter when laying out page after page!

8.1.1 Usage

The need to format a specific character in some special way is a rare occurrence, though it is handy to have that capability. When a single character has to stand out in some way, marked off from other inline text, it first needs to be identified in the source XML document, then provided with sufficient characteristics to provide the stylesheet writer with the information necessary to enable the appropriate formatting. This might be, for instance, a single character in another language or an example character that, though inline, is required to be shown in some detail. For example, you might specify that this character outlined in a large font, perhaps with a different background color. This can be done with the properties discussed in Chapter 7, using border or background-color and color attributes. Example 8-1 shows a single character in 20-point text, a pale blue background, and the character in red:

Example 8-1. A color-contrasted character

<fo:character       character="&x067;"       font-size="20pt"       background-color="skyblue"       color="red"/>

This creates output as shown in Figure 8-1.

Figure 8-1. A color-contrasted character

figs/xslf_0801.gif

Now, let's move on to those characteristics specific to fo:character. The basic form of fo:character is shown Example 8-2.

Example 8-2. fo:character example

    <fo:character [1]  character="g" />

Shows the primary, required attribute: the actual character to be displayed. This may be entered as either an actual character or, as shown, in its Unicode entity format. The x indicates that it is a hexadecimal number. If the special character is to be referred to, you might want to add an id attribute to identify it.

When discussing a single character, it may be required to present it in a way other than the normal inline-progression-direction. For a single character, this is specified using the glyph-orientation-horizontal attribute.

A more unusual property, treat-as-word-space, has the specific function of using characters, not whitespace, to separate words. Unicode has a group of code points, U+2000 to U+200A, which may be used as word seperators in certain circumstances. These are known, respectively, as:

U+2000, EN QUAD
U+2001, EM QUAD
U+2002, EN SPACE
U+2003, EM SPACE
U+2004, THREE-PER-EM SPACE (a.k.a. thick space)
U+2005, FOUR-PER-EM SPACE (a.k.a. mid space)
U+2006, SIX-PER-EM SPACE (sometimes known as thin space)
U+2007, FIGURE SPACE (equivalent to the digit width of a font)
U+2008, PUNCTUATION SPACE (equivalent to the narrow punctuation of a font)
U+2009, THIN SPACE (one fifth of an em)
U+200A, HAIR SPACE (often the thinest space available)

So, why are we interested in them? The information is provided to tell the formatter that these are not printable characters that provide glyphs, but that they are used to separate two words. This is done using the treat-as-word-space attribute on fo:character, as shown in Example 8-3. A colleague informs me that in Thai texts, the interword gaps are not always whitespace; perhaps this is a case where treat-as-word-space is needed.

Example 8-3. treat-as-word-space example

<fo:inline>Words with<fo:character       character="&#x2001;"       treat-as-word-space="true"       />spaces. </fo:inline>

(Note that I'm cheating here! The specification says that this group of characters is treated as if true has been specified anyway.) This gives the output shown in Figure 8-2.

Figure 8-2. treat-as-word-space example

The suppress-at-line-break attribute takes one value from a fixed number of values. This property determines what happens to this character when it is the last character on either end of a line. The character may be retained or suppressed depending on the setting of the attribute value. The values are:

auto: Action is dependent on the character.
suppress: The character is suppressed.
retain: The character is retained.

When the action is character dependent, the value of the character is inspected to determine whether it is a normal space character, U+0020, in which case the character is suppressed. suppress implies that if it is at either end of a line, the glyph is not presented. retain implies that the character will always be retained and presented.

8.1.2 Writing Mode

If you write from the top left to the bottom right as in English, writing mode is not an issue. Writing mode is only necessary when you need to specify writing direction to a formatter. The W3C is serious about internationalization, so this has been accounted for in XSL-FO. The terminology is explained in Chapter 3, but I want to bring this down to its impact on character orientation. Remember that when we are dealing with characters, some properties are already available. Starting at the top of the hierarchy, we have the block-progression-direction. At right angles to this is the inline-progression-direction. These are both set by the writing-mode attribute. If we take the Latin example of left to right, top to bottom, then when we come down to the character level, we finish with a default glyph orientation. This has the top of the glyph oriented towards the top of the page.

Writing modes specify the manner in which text flows down the page. Unicode includes the default writing mode for each character. Thus, Arabic is automatically presented right to left. If you want it presented left to right, you need to use the bidi-override attribute of fo:inline. This also allows you to embed left-to-right text into right-to-left data (e.g., English words embedded in an Arabic stream).

glyph-orientation (horizontal or vertical) sets the orientation of a glyph with respect to this default glyph orientation. The variance is that vertical, as used here, relates to vertical and horizontal writing modes. Latin scripts where the glyphs are laid out on the page from top to bottom, left to right use a horizontal writing mode.

For vertical writing mode, where text is laid out top to bottom, use the glyph-orientation-vertical attribute. The rotation obtained should be one from the following list: 0, 90, 180, or -90. The formatter is supposed to round to the nearest one of these values. Used for single characters, the result is that the character is rotated relative to the zero at the top of the page. Using the clock face analogy, 90 degrees has the top of the glyph oriented to 3 on the clock face. From that, it's easy to work out that the top of the glyph is at 6 on the clock face for 180 degrees and at 9 on the clock face for -90 (or +270) degrees. Figure 8-3 should explain it clearly.

Figure 8-3. Writing mode

figs/xslf_0803.gif

Note that this applies to horizontal writing mode. Although the vertical writing mode is similar at the character level, the effect is different when applied at block or inline level using the closely related properties set by the writing-mode property. For fo:character level operation, it is fairly straightforward.

When vertical writing mode is used, the actual rotations are identical for single characters to that shown in Figure 8-3. Character sequencing and layout are modified due to common layout properties, as specified in the font tables.

8.1.3 Superscript and Subscript

Although not specific to fo:character , the ability to use superscript and subscript is handy at the character level. Typical uses might include references to footnotes, glossary items, or where a single character reference is needed. This is achieved readily, as shown in Example 8-4. Because the size of the superscripted character is often set smaller than the main content, the font-size property would normally be applied here, too.

Example 8-4. Superscript

<fo:character      character="1"      baseline-shift="super"      font-size="8pt"      font-family="'MS Serif'"/>

The value 1 specifies that the baseline-shift attribute be set to the value super. This provides the shift upwards relative to the reference baseline suitable for a superscript. Example 8-5 combines this with an inline.

Example 8-5. Character-level superscript

<fo:inline>See note  <fo:character       character="1"       baseline-shift="super"       font-family="'MS Serif'"/> </fo:inline>

This provides the output shown in Figure 8-4.

Figure 8-4. Character-level superscript

Subscript is similar but has the alternate attribute value baseline-shift="sub". If the extent of the shift is not right for you, the alternative is to specify the amount by which the character is required to be shifted, using the baseline-shift="120%" form. The specification also permits an absolute value stated as a length, but I don't recommend it unless the document is fixed in terms of font size, etc. The percentage solution works relative to the size of the font and, hence, would scale if the font were changed. Note that a negative value, such as baseline-shift="-120%", provides the subscript version.

8.2 Fonts

Font selection considerations are primarily dependent on the formatter used. Each formatter may have certain built-in fonts readily available to the stylesheet writer, other fonts may be added either by the user or by the provider. Fonts may be found on the Web, purchased from font foundries, or developed specifically for a task. The two prerequisites for use within a stylesheet are that the font metrics are available to the formatter in the style selected and that the font chosen is identified correctly by the stylesheet.

While discussion of the general characteristics of fonts is beyond the scope of this book, a good reference is the approach used by L^AT_EX. One I found particularly useful is Chapter 7 of The Latex Companion by Goosens, Mittelback, and Samarin (Addison Wesley).

Some of the preparatory tasks when writing the stylesheet for a document are to determine if the source file contains characters not likely to be found in the primary font chosen for the document, find a font that contains an entry for that character, and ensure that all characters can be mapped to a presentation form. The result of not doing these tasks is likely to be a missing character glyph, often difficult to spot in anything other than a small document. It's often easier to search the source XML document for entities than to peruse the formatted document for the missing character glyph.

The reference section of XSL refers to a font model called OpenType, developed by Microsoft and Adobe. The reference is to the OpenType specification v1.2 (see http://www.microsoft.com/truetype/tt/tt.htm). This technology model has been adopted by XSL-FO.

Some of the terminology takes getting used to. As in all XML, the starting point is a given Unicode code point, such as U+0041, its name, LATIN CAPITAL LETTER A, the actual glyph representing that code point, and the font chosen to represent it, Arial Black, for instance. The XSL document defines a glyph as " . . . a recognizable abstract graphic symbol which is independent of any specific design." Therefore, we can tell that a particular symbol represents the letter A, even when it's in a strange font that we may have never seen before. The recognition is critical. A font is simply a collection of glyphs. The font designer determines which set of glyphs to include in his font.

The font tables for any particular font include the information necessary to map characters to glyphs, to determine the size of glyph areas, and to position the glyph area precisely on the page to align with its neighbors. Alignment of an individual character with its neighbors uses reference points, called alignment points, such that any two glyphs that are direct neighbors appear correct when viewed together. Vertical alignment information is also provided. For instance, Western glyphs are aligned on the bottoms of the capital letters while other scripts have differing alignment points. The table also holds information about the writing modes supported by the font. If you intend to present a vertically oriented piece of text, ensure that this writing mode is available.

Each font table consists of the font characteristics, such as the font-weight and font-style. The formatter uses this information to place the individual glyph precisely on the page, aligned both to the sides of its neighbors and to any lines of text above or below it.

The space a character takes up is defined as the design space. It is the box within which the character fits, and within which given reference points are measured in the design space coordinate system. Each line and curve of the glyph is drawn within this box. This allows a single 20-point character in a line of 12-point characters while maintaining legible presentation.

In XSL-FO, font selection is based on the font properties: font-family , font-style, font-variant, font-weight, font-stretch , and font-size. The font-family specifies a font set from which the stylesheet designer wishes to select characters. This addresses the issue of font coverage. A font set is a list of fonts that will be tried, in turn, to find a glyph for the particular character. The first available font that contains the character is used. The fonts listed should be the same style and size. The two types of values for this attribute are a specific family name, such as Baskerville, and a generic family, either serif, sans-serif, cursive, fantasy, or monospace. When specifying one of these generic families, do not use quotes. These act as a fallback mechanism, coming into action when none of the font families contain the character you are seeking.

The Serif family are typically proportionately spaced and have finishing strokes. Examples include Times New Roman, Bodoni, Garamond from the Latin family, and Bitstream Cyberbit from the Hebrew and Arabic families.

Sans Serif fonts have stroke endings that are plain. Examples include MS Verdana, Univers, and Futura from the Latin families, and Helvetica Cyrillic, ER Univers, and Lucida Sans Unicode from the Cyrillic fonts.

The font-family attribute is specified by a list as in Example 8-6.

Example 8-6. Font example

    <fo:inline [1]     font-family="Arial, Garamond, serif"/>     </fo:inline>

Shows the font-family selection in use.

This selects the Arial font as first choice, Garamond as second choice, and the fallback of using any available Serif fonts. Where the font family name contains spaces, enclose in the alternate quote marks, for example, 'Times Extravaganza Fabuleux'.

font-style selects one style from a small number of variants. These are normal , the upright form of the character; italic, often used for emphasis; oblique, slightly different visually from italic and sometimes known as slanted; and backslant, the inverse of oblique.

Use is shown in Example 8-7 in a simple inline.

Example 8-7. Font style

<fo:inline       font-style="normal">This is normal.</fo:inline> <fo:inline       font-style="italic">Italic.</fo:inline> <fo:inline       font-style="oblique">Oblique.</fo:inline> <fo:inline       font-style="backslant">Backslant.</fo:inline>

The font-variant attribute selects the small capital variant of the font, which could be generated by the formatter or provided by the font. Example 8-8 shows its use.

Example 8-8. Font variant

<fo:inline       font-variant="small-caps">A small-caps example</fo:inline>

Although content is mixed case, the formatter converts these to small capitals as shown in Figure 8-5.

Figure 8-5. Small capitals

The font-weight attribute follows this pattern, the options being one from a reasonably complete list, including: normal, bold, bolder, and lighter, coupled with a numeric alternative, ranging from 100 to 900, in steps of 100. bolder and lighter provide the option of making relative changes. These are all relative to the inherited font-weight, which is useful in a text that may be modified. Each change increases or decreases the weight by 100 on this 100 to 900 scale, until either limit is reached. The equivalents provided set font-weight="normal" at either 400 or 500, with font-weight="bold" at 700. Trying these will often not produce a noticable difference between two adjacent values. Some fonts may only have the two basic types, normal and bold. So be aware of this when designing a stylesheet.

The font-stretch attribute expands characters horizontally. Currently, this is not well supported in implementations. Its utility is viable, because it is one aspect of layout that enables content to precisely fill a line. Its use is identical to the previous attributes, setting font-stretch to a range from ultra-condensed to ultra-expanded, again providing the relative options of wider and narrower, when compared to the inherited property. The ranged list is: normal, wider, and narrow. These provide the relative aspects and the initial value or default. The specified stretch range includes: ultra-condensed, extra-condensed, condensed, semi-condensed, normal (the mid-range default), semi-expanded, expanded, extra-expanded, and ultra-expanded. These should satisfy most requirements.

The font-size utility attribute provides basic sizing, selecting the potentially complex mix of availability and other modifications. Watch out for formatter font size availability and the parent element's font value inherited by the element under scrutiny.

As with the previous related attributes, this one can be provided in a number of ways, each having its uses. The full list of options include absolute-size, relative-size, a length specification, and a percentage. The meaning of these relates back to the CSS origins and may not be immediately obvious.

Let's start with a simple length specification, shown in Example 8-9.

Example 8-9. Simple font size example

<fo:inline       font-size="12pt">Length specification.</fo:inline>

I've chosen the point length specification here. Other absolute length alternatives are the pixel (px), the pica (pc, 1 pica is equal to 12 points), inches (in), centimeters (cm), millimeters (mm). These forms of specification enable content to be presented to the user with an absolute size. When using XSL-FO in combination with XSLT to style an XML document, there are potential problems when using this form as anything other than an initial value to specify basic body text sizes. These relate to nested structures within the source document, where the visual aspects of the presented text size are important. Using the analogy of XHTML, headings will probably want to scale down from the largest (XHTML has its H1 element) through to any minor headings (XHTML has its H6 element). If absolute lengths are specified directly for any given level of nesting or class of text content, a number of factors come into play. If the content is modified to introduce, say, another level of usage, it is often easier to specify the font size relative to its direct neighbors. So, use absolute values (length specifications) with caution.

The length specification of a font-size shouldn't be confused with absolute-size, which includes the options: xx-small, x-small, small, medium, large, x-large, and xx-large.

These are not relative. Because they are computed values, they are specific, and the specification states that each relates to the other, with a scaling factor of 1.2 between each one and may, in fact, vary from font to font. These options are known as absolute while the 12-point specification is known as a length. So, we might see <fo:inline font-size="xx-small">Pretty small text</fo:inline>.

The relative alternative is shown in Example 8-10.

Example 8-10. Relative font changes

<fo:inline       font-size="smaller">Relative Specification of font size.</fo:inline>

This form uses the previous range of values, xx-small to xx-large, such that each increment causes a change of value in one of the two directions, obviously limited to either end. This is useful when styling one piece of content with respect to its parent. Experience has proven the ratio to be successful.

The use of relative specification of size is a looser version of the larger or smaller specification in that it can range over a larger change in value. You might use this to specify the font sizes in a document as part of a range of variables that are then used throughout the document as needed. This permits a single change to alter the whole document.

The basis for this is shown in Example 8-11.

Example 8-11. Attribute sets for font variants

<xsl:variable name="base-font">12</xsl:variable> <xsl:attribute-set name="head1">   <xsl:attribute name="font-size"><xsl:value-of     select="concat(round($base-font *1.2),'pt')"/></xsl:attribute>   <xsl:attribute name="font-weight">bold</xsl:attribute>   <xsl:attribute name="font-family">Helvetica</xsl:attribute>   ....   <!-- other attributes as required. -->   </xsl:attribute-set>

With various attribute sets named like this, a clear specification is set up that may then be used throughout the document quickly and easily, as shown in Example 8-12.

Example 8-12. Attribute set usage

<fo:block     xsl:use-attribute-sets="head1"     keep-with-next.within-page="always">      <xsl:apply-templates/>    </fo:block>

This makes use of the attribute set, adding, as needed, any other properties, without explicitly stating the font within the body of the stylesheet. This provides the global version equivalent to using the percentage option locally, setting any block or inline to a fixed percentage of its parent element's font size.

The specification gives the formatter the right to determine the tolerance when trying to determine what 13.56894-point text should look like. Be aware of this and don't expect too much, even for scalable fonts.

The font-size-adjust property, which is used to tweak the aspect ratio of a character, is not used often. This is defined in the specification as the ratio of the font size to its x height. The x height is the height of a letter x within a font. By adjusting this ratio, fonts can appear clearer. The calculations are provided in the specification, and the CSS specification provides clear indications of usage, enabling you to substitute fonts that have differing aspect values. The specification refers to the examples in the CSS2 specification:

For example, if 14px Verdana (with an aspect value of 0.58) was unavailable and an available font had an aspect value of 0.46, the font-size of the substitute would be 14 * (0.58/0.46) = 17.65px.

If this level of adjustment is necessary on any scale, it may be worthwhile to purchase the required font in the needed sizes.

The decoration (as it's called in the specification) of text provides for a more readily usable set of modifications to the presented content. This uses the text-decoration property, where the list of options includes: none, underline, no-underline, overline, no-overline, line-through , no-line-through, blink, and no-blink.

You may wonder about the last two options. These can be directly attributed to CSS2 and are of little use in paper-based output, as they're targeted for screen use. A quick word on the negations in this list of properties: when a sequence of content is presented to a user, you may need to use an on/off sequence. For instance, when identifying content for deletion, you could mark the first four words with line-through, remove the strike through with the no-line-through property, then reapply it to words 7 through 10. The negation options provide for this. Example 8-13 illustrates nested inlines using this feature.

Example 8-13. Negating decorations

<fo:inline   text-decoration="line-through">Continuing,           <fo:inline             text-decoration="no-line-through">with           </fo:inline> font-weight bold </fo:inline>

The nested inlines use the no-line-through property to turn off the strike-through for the word with. The other properties work the same way, decorating the content as described. The output is shown in Figure 8-6.

Figure 8-6. Negating decorations

The text-shadow property provides the type of shadow often seen on containing boxes or window buttons on a graphical user interface. It is specified by an optional color and three length values, which specify the horizontal and vertical offsets from the character (to the right and down being positive) and an implementation-dependent blur radius. The syntax might be as shown in Example 8-14.

Example 8-14. Text shadow

<fo:inline   text-shadow="red 0.3px -0.3px 5px">Eclipse </fo:inline>

This provides a red shadow 0.3px to the right, 0.3px up, and a blur radius of 5px.

The final property associated with characters, though also applicable to other elements, is text-transform . This enables the capitalization and case of selected content to be modified. Capitalization changes case on the first letter of bicameral fonts. The Latin alphabet is an example of a bicameral font; it has an uppercase and lowercase. Unicameral alphabets (such as Arabic and Hebrew) have only one case. The uppercase and lowercase values ensure content is fixed to that particular case. As usual, the transformation may be turned off with the none value, if this is required.

8.2.1 Shorthand Attribute Specification

The font attribute is a slightly lazy way of specifying a nearly complete set of font-related traits. The properties that can be set are: font-family, font-size, font-weight, font-style, font-variant, line-height, font-size-adjust, and font-stretch.

Use of the font attribute is demonstrated in Example 8-15.

Example 8-15. Font short form usage

font="italic 11pt/1.5 Times"

The syntax of this attribute is a space-separated list of values in the following order, extracting the relevent part of the specification:

[ <font-style> || <font-variant> || <font-weight> ]?   <font-size> [ / <line-height>]? <font-family> ]

The || indicates that you can have any of the options in any order, so you don't necessarily need the font-style before the font-weight, although these do have to occur before the font-size. Note the sequencing and the exception, shown in Example 8-15, of the inclusion of the line-height value. This has the forward slash predecessor and must follow the font-size. line-height is a multiplier and may be an explicit length. Be aware that the shorthand sets all these values to their default values prior to applying those specified. So, if you don't set a value, it has a default applied. If you want to maintain a value (perhaps set by a parent element), ensure that it is included. Be careful: silently overriding line-height or font-size-adjust by this property can easily give rise to confusion. The side effect here is that two other properties are reset to default values. These are font-stretch and font-size-adjust.

CONTENTS