Section 5.5. Dealing with non-XML objects


Prev	don't be afraid of buying books	Next

5.5 Dealing with non-XML objects

So far, we were concerned with our stylesheet's HTML output. But a typical web page is more than just HTML. Web sites invariably contain images; some of them embed Flash animations or Java applets; many sites also present some of their content in PDF. This section will show you how to use XSLT to generate such objects automatically from the site's XML source.

5.5.1 Accessing images

No matter how powerful XSLT is, not all external objects need to be generated by the stylesheet. Those that are statici.e., those that do not change when the content of the site changes ( 1.3.4 )are produced once and for all by the site's graphic designer. However, the stylesheet can still benefit from a method to access such static objects. Why? There are two reasons.

Checking existence. First, the stylesheet must be able to check if these objects actually exist. This kind of validation cannot be done by a schema simply because schemas check source XML, and source XML has no knowledge of the static images or other objects used for web page layout or decoration. It is the stylesheet actually building the site that stores references to all such objectsand can verify that they really are there and the site will therefore not give any nasty formatting surprises .

Retrieving objects' properties. Second, to properly embed external objects (both static and dynamic), the stylesheet must be able to extract some information from them. For images, you'll want to know their dimensions so you can put them in the height and width attributes of the corresponding img element (this speeds up loading the page, especially a complex one). For static images, you can simply hardcode these values into the stylesheet, but this is tedious manual work, inefficient and prone to errors. Let's see if there is a way to do this automagically.

Getting dimensions. XSLT itself is not quite up to the challenge. Once again, we need to write a couple of extension functionsthey would take an image pathname or URI and return its dimensions. Example 5.11 shows a Java class implementing this. (Not exactly the most elegant piece of code, but it does its job.)

Example 5.11. The `graph` class provides methods that return width and height of an image (works with the PNG, GIF, and JPEG formats).

  package   com.projectname.xslt;   import   java.awt.*  ;  import   java.awt.image.ImageObserver  ;  public class   graph  {  static  ImageObserver  observer  ;  static  Image  img  ;  private static  void  init  (String  name  ) {     Toolkit  tk  = Toolkit.getDefaultToolkit ();     img = tk.getImage (name);     observer =  new  ImageObserver () {  public  boolean  imageUpdate  (Image  img  , int  flags  ,                                   int  x  , int  y  , int  w  , int  h  ) {  return  (flags & (ALLBITS  ABORT)) == 0;       }     };  try  {       MediaTracker  imageTracker  =  new  MediaTracker (  new  Frame ());       imageTracker.addImage (img, 0);       imageTracker.waitForID (0);     }  catch  (Exception  e  ) {       System.err.println (e.getMessage ());     }   }  public static  int  geth  (String  name  ) {     init (name);  return  img.getHeight (observer);   }  public static  int  getw  (String  name  ) {     init (name);  return  img.getWidth (observer);   } }

Image insertion template. Now in XSLT, all we need to do is declare a namespace prefix (e.g., graph ) for com.project-name.xslt.graph so we can use its methods. As for file existence checks, we already have the files:exists() method (Example 5.6, page 221).

Example 5.12 shows a callable template for inserting static images. You call this template with an image filename (e.g., img/logo.png ) and its description (e.g., Company logo ) as parameters; the template will check if the image file exists, retrieve its dimensions, and create the corresponding img element.

Example 5.12. The `image` template checks if an image exists and inserts it into HTML.

 <xsl:template name=  "image"  >   <xsl:param name=  "filename"  />   <xsl:param name=  "alt"  />   <xsl:variable name=  "path"  select=  "concat($im-path, $filename)"  />   <xsl:if test=  "not(files:exists($path))"  >     <xsl:message terminate=  "yes"  >  Error: Image  <xsl:value-of select=  "$path"  />  not found  </xsl:message>   </xsl:if>   <img       src="  {$path}  "       width="  {graph:getw($path)}  "       height="  {graph:geth($path)}  "       alt="  {$alt}  " border="0"/> </xsl:template>

Building an image gallery. Example 5.13 combines the methods of the graph class with the dir() method from the files class ( 5.3.2.1 ) to present all images in the $dir directory in an automated gallery.

The graph:geth() and graph:getw() functions will be especially useful for generated images (discussed in the next section) whose dimensions cannot be known in advance.

Example 5.13. This loop uses extension functions to access and link all images from `$dir` .

 <xsl:for-each select=  "tokenize(string(files:dir($dir)), '\n')"  >   <xsl:if test="  . and (   ends-with(., '.png') or   ends-with(., '.gif') or   ends-with(., '.jpg')   )"  >     <img        src="  {.}  " alt="  {.}  "        width="  {graph:getw(.)}  "        height="  {graph:geth(.)}  "/>   </xsl:if> </xsl:for-each>

5.5.2 Creating images

XSLT was created as an XML-to-XML transformation tool, but in practice, only the input to an XSLT stylesheet has to be well- formed XML. The output of a transformation can be in any textual format. So, for example, it is easy to write a stylesheet to transform an XML spreadsheet into a comma-separated text file. But what if we need binary files, such as images or Flash?

Two-stage conversion. An obvious approach is to program the stylesheet to output an intermediate textual (or, better yet, XML) format that has at least a one-way mapping to the required binary format. Then, we can call an external conversion utility to create a binary file from that intermediate format. This is the general idea that we'll explore in more detail below in reference to bitmap images. Other binary formats used on the Web are briefly discussed in 5.5.3.

What is the best intermediate format for images? Remember that we are not interested in all images that can be displayed on a site; our focus is on those that (may) need to be updated when the site's source is updated. This means that in a great majority of cases, we'll be creating images consisting primarily of text, perhaps combined with some static backgrounds or overlays.

Why put text into images? Isn't it bad web design? Often it is; a piece of text petrified in a static bitmap suffers from bad accessibility in text-only user agents , not to mention that it is annoyingly unscalable in most graphic browsers. On the other hand, accessibility can be improved by proper metadata markup (use an alt attribute with the same text as in the image), whereas from the designer's viewpoint, it is sometimes critical to ensure the pixel-precise rendition of a textual element such as a menu label or a heading.

Corporate sites care a lot about design consistency which improves branding and recognizability, and font consistency is an important aspect of it. With the current HTML state of the art, the simplest and most reliable way to typeset some textual element using a specific font face is to cast it into a bitmap. Besides, a lot of web surfers still do not have fully antialiased text display, so if you want a bit of text to stand out quality-wise, a properly rasterized bitmap is the way to go.

Finally, sometimes you may need to work from an existing imagefor example, overlay a text string on top of an ad banner. This means we must be prepared to include textual images in our XML/XSLT ecosystem should such a need arise.

5.5.2.1 Choosing format

There's really no shortage of text-based formats that can be converted into images. TEX, XSL-FO, PostScriptall of them can be used for producing a bitmap image with text in it. But if we narrow our search by excluding non-XML formats as well as those requiring complex renderer setup, the best open format for our task appears to be SVG. ^[14]

^[14] Latest version as of this writing: www.w3.org/TR/SVG11/.

SVG (Scalable Vector Graphics) is an XML-based vector graphics format designed by the W3C to be used on the Web and to fit well with other W3C standards. SVG is not really suitable for complex typography; for example, you cannot flow a long text string into a paragraph with automatic line breaks. However, for things like buttons or headings on a web site, it fits the bill perfectly .

Strictly speaking, SVG is intended to be supported directly by the browsers, but you cannot count on that support just yet. ^[15] This does not mean, however, that we cannot successfully use SVG for automatically generating traditional bitmap imagesfor example in PNG formatto be served from the web site.

^[15] Browser plugins for viewing SVG exist.

I have already hinted at the possibility by including a call to the create-image template in the menu template (Example 5.5). Now let's see what this create-image looks like.

5.5.2.2 Choosing a rasterizer

There are several SVG rasterizers available, including both commercial and open source products. In this book, we use the Batik suite developed by the Apache XML Project. ^[16] Batik claims complete support of the static features of SVG (i.e., excluding animation); it is written in Java and includes a rasterizer (program converting SVG to a bitmap) as well as a font conversion utility that makes it possible to use TrueType fonts in SVG. Install the latest version of Batik to run the examples in this section.

^[16] xml.apache.org/batik/. Batik will run on any Java-enabled system (JDK 1.4 or better required).

Another free SVG renderer is a part of the Imagemagick suite of graphic tools. ^[17] You are supposed to be able to run

^[17] www.imagemagick.org

 convert image.svg image.png

(where convert is the image conversion utility in Imagemagick) to rasterize an SVG document. However, Imagemagick's SVG support is less robust and uses nonstandard font handling, so we'll stick with Batik for our rasterizing needs. You may still need Imagemagick to postprocess your bitmap files ( 5.5.2.6 ), and it is generally a good piece of software to have around, so install it too. ^[18]

^[18] If you don't have it installed alreadyImagemagick is included in most Linux distributions. On the Imagemagick web site, you'll find binaries for Windows, Linux, Mac OS X, and other platforms.

5.5.2.3 Preparing fonts

To create an image containing text, we start by choosing and preparing the font(s) to be used by that text. SVG uses its own scalable font format; fortunately, Batik includes a font conversion utility for TrueType fonts. If you have a font file called pushkin.ttf , run this command:

 java org.apache.batik.svggen.font.SVGFont pushkin.ttf -id font_id \                                                        > pushkin.svg

(Make sure all Batik's .jar s are in the Java CLASSPATH before issuing this command.) This will create the file pushkin.svg , which is our font converted to SVG format. The -id command-line option sets the internal identifier of the new font that we'll use later to refer to it from our SVG files.

A good font editor capable of working directly with SVG fonts (as well as TrueType, OpenType, and PostScript Type 1 fonts) is PfaEdit. ^[19] It can be used for font conversions as well as for editing character outlines, adjusting kern pairs, reencoding fonts, and so on.

^[19] pfaedit.sf.net

5.5.2.4 Creating SVG

Now that we have the tools and the font, let's write a sample SVG file to test our setup. We will use the free cursive Pushkin font ^[20] to render the string "Scalable Vector Graphics." After we test rasterizing of a manually created SVG file, we'll look into how to enable our XSLT stylesheet to do the same automatically.

^[20] Created by Paratype based on the handwriting of Alexander Pushkin, see www.fonts.ru/news/pushkin.html.

Another language to learn? SVG is a complex format; what is shown in Example 5.14 is only a "Hello World." Fortunately, you don't need to learn all the details of the SVG specification. If what you have in mind is a complex graphic composition, you can use any SVG-capable vector editor, ^[21] save the result into SVG, and use it as a template for your stylesheet-generated images. The only thing you need to know for this is how to deal with fonts and basic text layout, and this is what Example 5.14 illustrates.

^[21] Adobe Illustrator can import and export SVG. A small but promising native-SVG editor is Inkscape, www.inkscape.org.

Example 5.14. A sample SVG file.

 <  ?xml version="1.0" encoding="iso-8859-1"  ?> <svg     width="500px" height="100px"     xmlns="http://www.w3.org/2000/svg"     xmlns:xlink="http://www.w3.org/1999/xlink">   <defs>     <font-face font-family="Pushkin">       <font-face-src>         <font-face-uri xlink:href="pushkin.svg#font_id"/>       </font-face-src>     </font-face>   </defs>   <text style="       fill: #000000;       font-family: Pushkin;       font-size: 25pt;"       x="10px" y="50px">  Scalable Vector Graphics  </text> </svg>

Defining the canvas. The root element, svg , sets the size of the canvas that we'll be painting on. Since you cannot know in advance how much space will be taken by your text string, make sure you have ample room even for the longest label or heading you will need to create. This element also declares the SVG and XLink namespaces, the latter necessary for linking up the font file.

Linking the font. The defs element, generally used for all sorts of document setup and definitions, here contains only a reference to the font stored in a separate SVG file. Make sure the xlink:href attribute contains the correct relative URI of the pushkin.svg file we created previously. The selector part after # in that URI must match the -id value that we specified when converting the font into SVG.

Creating the text. As you can see from the text element, SVG uses CSS for specifying text properties. This is good news, since you'll be able to leverage your CSS experience. The two non-CSS attributes in the text element, x and y , specify the position of the text string in pixels relative to the coordinate origin (in SVG, it is in the top left corner).

5.5.2.5 Running conversion

Save Example 5.14 into a file, say test.svg , and type this command:

 java org.apache.batik.apps.rasterizer.Main test.svg

For this to work, the batik-rasterizer.jar file from the Batik distribution must be in your Java CLASSPATH . Note that the Batik documentation suggests running the rasterizer by launching its .jar file:

 java -jar /full/path/to/batik-rasterizer.jar test.svg

However, our variant specifying the Java path to the rasterizer's Main class works just as well and has the big advantage of not having to worry about the .jar file pathname that might be different across environments.

When the command is finished, you have a brand new test.png file in the same directory (Figure 5.1).

Figure 5.1. The Batik-rasterized version of Example 5.14. The image is magnified to demonstrate anti-aliasing; the actual size is approximately 400 by 60 pixels.

graphics/05fig01.gif

Wow! Antialiasing is flawless, kerning pairs are correctly kerned, and overall, the text looks just right. Even if you reduce the font size to a barely readable minimum, letters still look even and smooth. Since we didn't specify any background color , the background is transparent, and the antialiased contour pixels are actually half-black, half-transparent (instead of half-black, half-white). This is only possible in PNG with its alpha channel transparency, and the end result is that the image will look smoothly antialiased over any background, be it solid color or pattern.

Designers beware. Yes, it does looks smoothly antialiased, but only in a standards-compliant browser, such as Mozilla. Microsoft's Internet Explorer, unfortunately , does not support alpha transparency in PNG. To create a page that would be correctly displayed in all modern browsers, you need to explicitly add to your SVG a nontransparent background rectangle whose color is the same as that of the web page background. And if you want a complex patterned background under your image, you're out of luckwithout alpha transparency, you can't have both antialiasing and a fancy background at the same time.

5.5.2.6 Postprocessing

The image may not require any postprocessing; it is quite usable as is (bar the PNG transparency problem in MSIE). Sometimes, however, you may want to fiddle with it some more. For example, it may be necessary to trim the margins of the image down to the bounding rectangle of the text string. With Imagemagick, this is done by

 mogrify -trim test.png

After that operation, the dimensions of the image are unpredictable, so you'll need to use the graph:getw() and graph:geth() extension functions ( 5.5.1 ) if you want the corresponding img element in your HTML to specify exact width and height .

You can also scale the image, reduce the number of colors, convert it to other formats, and so on. The complete list of capabilities available via Imagemagick's command-line tools is quite impressive. For example, the following commands add a nice drop shadow to our image (Figure 5.2):

 convert -blur 7x7 test.png test-shadow.png composite -geometry +2+4 test-shadow.png test.png test.png

Figure 5.2. Drop shadow added to Figure 5.1 by Imagemagick.

graphics/05fig02.gif

5.5.2.7 Once more, with XSLT

Java as a launchpad . To automate the process we just ran manually, the first thing we need is a way to run external applications from within the stylesheet. Once again, Java comes to our rescue. Add the run() method shown in Example 5.15 to the files class in Example 5.6 (page 221). This method takes a command line as an argument, executes it, and returns the same argument string.

Why so much code for such a simple task? It turns out that Java's Runtime. exec () method shuts off the executed program's console output, so we must explicitly grab its output (both stdout and stderr ) and print it if we want to read what the program has to say. Unfortunately, stdout and stderr are grabbed and reprinted separately, which may sometimes lead to weird ordering of output lines. No output will be lost, though.

Image generation template. All the components of the create-image implementation should be obvious by now. Still, an outline (with most of the static SVG code dropped for readability) is given in Example 5.16. The template first creates the SVG file, then runs Batik to rasterize it and Imagemagick's mogrify to trim edges.

The files:run() function is called from within an xsl:value-of which, in turn , is inside xsl:message . Since files:run() returns the command line it received as argument, you will see the actual command line being executed in your terminal as a useful debugging hint.

You can write a similar template (or extend this one by adding more parameters to control its SVG output) for generating other images on your site. Examples might include headings, sequence navigation buttons ( 3.9.4 ), or even a graphic copyright notice with your logo (e.g., to be added as a semitransparent watermark to the photos you publish on your site).

Example 5.15. A Java method to run a command (add to the `files` class).

  public static  String  run  (String  s  ) {  try  {     String  str;  Process  p  = Runtime.getRuntime ().exec (s);     BufferedReader  is  =  new  BufferedReader (  new  InputStreamReader                                 (p.getInputStream ()));     BufferedReader  es  =  new  BufferedReader (  new  InputStreamReader                                 (p.getErrorStream ()));  try  {  while  ((str = is.readLine ()) !=  null  ) {         System.out.println (str);       }  while  ((str = es.readLine ()) !=  null  ) {         System.out.println (str);       }     }  catch  (IOException  e  ) {       System.exit (0);     }   }  catch  (IOException  e1  ) {     System.err.println (e1);     System.exit (1);   }  return  (s+'\n'); }

Not exactly fly. The only problem with Batik's SVG rasterization is that it is not very fast and therefore may not be suitable for on-the-fly image generation on the server (especially if you need to create more than one image at once). This is why in our menu template (Example 5.5), create-image is only called if the $images stylesheet parameter is set to yes , since regenerating all menu buttons on each page update may be quite time-consuming .

5.5.3 Creating other binary formats

Now that you've got an idea of how a bitmap image can be generated from an XML source, I don't need to go into much detail on creating other objects. The only problem you can run into is choosing an appropriate intermediate format and a fast and reliable renderer for it.

Example 5.16. A callable template creating an image via SVG.

 <xsl:template name=  "create-image"  >   <xsl:param name=  "label"  />   <xsl:param name=  "filename"  />   <xsl:variable name=  "svg"  select=  "concat($filename, '.svg')"  />   <xsl:variable name=  "png"  select=  "concat($filename, '.png')"  />   <xsl:result-document href=  "{$svg}"  format=  "xml"  >     <svg ...>       <!--  SVG preamble...  -->       <text ...><xsl:value-of select=  "$label"  /></text>     </svg>   </xsl:result-document>   <xsl:message>     <xsl:value-of select=  "   files:run(concat(   'java org.apache.batik.apps.rasterizer.Main ',   $svg))"  />   </xsl:message>   <xsl:message>     <xsl:value-of select=  "   files:run(concat('mogrify -trim ', $png))"  />   </xsl:message> </xsl:template>

5.5.3.1 Flash

Macromedia Flash, sometimes called ShockWave Flash (SWF), is an open format ^[22] for animated vector graphics, widely used on the Web. In most cases, Flash objects include textual elements that you might want to update from time to time, ideally by linking to an external data source. Such objects would benefit from becoming part of an XML-based web design workflow.

^[22] www.openswf.org

Macromedia's own Flash creation software ^[23] includes (in "professional" versions) functionality to update animation objects, such as text or links, using data from XML documents or dynamic XML sources. However, you might be using some other Flash authoring software (there is plenty of it on different platforms). Moreover, you might want to generate simple animations in a completely automatic fashion from the stylesheet, as we did for images. A search for a comprehensive text-based "Flash source" format reveals two candidates that may make this possible.

^[23] www.macromedia.com/software/flash

One is SWFML (SWF Markup Language), developed by Saxess. ^[24] This language is XML-based and the renderer offered by Saxess, called X-Wave, is written in Java. This is a commercial product (a preview edition is available). SWFML's coverage of SWF features, both static and animated, is probably sufficient for the majority of applications. ^[25]

^[24] www.saxess.com

^[25] The animated Flash menu on www.kirsanov.com was implemented via SWFML. You can download the complete source code of that XML-based site at www.kirsanov.com/dk-site.zip.

Another textual equivalent of Flash is an open source Ming library ^[26] whose functions can be used from several languages including PHP, Python, and Perl. None of these languages is XML, of course, but this should not stop you: If you can write a script to generate the Flash animation you need, so can your XSLT stylesheet. On the upside, Ming is faster than X-Wave and, via PHP, can be easily incorporated into a web site setup to generate Flash bits on the fly. SWF features covered by Ming include all of the essentials: shapes , text, links, bitmaps, and audio.

^[26] ming.sf.net

A disadvantage of both SWFML and Ming is that they cannot reuse existing Flash content. That is, you cannot draw a nice animation in a GUI Flash editor, export it into a textual format, use it as a template to fill in your content by the stylesheet, and then reassemble it back into binary SWF. You can, however, embed a stylesheet-generated movie into another movie, thereby combining a manually drawn "template" and automatically generated "content" into a seamless animated object.

5.5.3.2 PDF

Adobe's PDF (Portable Document Format) is a stripped-down and compressed version of PostScript, which is a textual page description language. You don't want to write PostScript code manually, however, as it is a really low-level machine-oriented language.

What then are your options if you want to generate nice-looking PDF documents from XML sources? The two major high-level page description languages worth considering are XSL-FO and TEX. Yes, I do mean TEX; although a marriage between TEX and XML may seem strange , it is feasible and may have its advantages.

In fact, the choice between TEX and XSL-FO can be tricky. TEX ^[27] is much faster (compared to the existing Java-based XSL-FO formatters such as FOP ^[28] or XEP ^[29] ), but XSL-FO is XML. TEX offers better control over typography and page layout, but XSL-FO has much more straightforward i18n support (again, because it is based on XML).

^[27] www.tug.org

^[28] xml.apache.org/fop

^[29] www. renderx .com. This book was produced using XEP.

Choose TEX if any of the following is true:

you've already had (positive) experience with it;
you are satisfied by the standard LATEX document styles and don't want to tweak anything; or
(on the contrary) your typographic requirements are very high and you're not afraid to spend some time instructing TEX to do exactly what you want.

Otherwise, XSL-FO ^[30] may be a better choice for you. It doesn't offer such a huge library of free styles, packages, and add-ons as does TEX, but creating a simple new style from scratch is actually doable in XSL-FO (in reasonable time) even without any previous experience. Another advantage of XSL-FO is its better integration with XSLT; for example, Saxon 6 can pass its transformation result directly to FOP without serialization.

^[30] www.w3.org/Style/XSL


	Amazon