Section 7.2.  Apache Cocoon

Prev don't be afraid of buying books Next

7.2 Apache Cocoon

No book on XML in web design can ignore Cocoon. [2] On the other hand, no book can do Cocoon justice unless it is entirely devoted to this complex piece of software. [3]


[3] A list of books devoted to Cocoon is at

Cocoon is not only large; it is also quite unlike other web development platforms. Therefore, be prepared for an avalanche of new terms and concepts. I will try to make this section easier to digest by focusing on those aspects of Cocoon that are the most relevant in the context of this book. In any case, this section is only a starter.

Cocoon step by step. The foundation of Cocoon is the same basic idea you're already familiar with: Mark up the content in XML and transform it into HTML using XSLT. Cocoon adds a lot of new terms to this equation, but you can explore them incrementally when (and if) you need them.

In particular, we can quickly deploy Cocoon to run a static XML/XSLT web site, such as our sample site from previous chapters, with minimum changes ( 7.2.7 ). You can therefore enjoy the immediate benefits of server-side transformation without delving too deeply into Cocoon specifics. Later, led by curiosity or by the demands of more dynamic web site functionality, you can always extend your setup by adding advanced Cocoon components .

A bit of history. Cocoon started in 1999 as a simple servlet for server-side XML-to-HTML transformation, similar to the Saxon servlet ( 7.1.1 ). It quickly outgrew its modest beginnings and has by now turned into a complex framework for building XML-based dynamic sites, where XSLT is only one of the inventory of tools for processing data.

Cocoon version 2, [4] first released in 2001, is now quite mature and runs many web sites around the world. Hosting providers have started providing Cocoon-enabled accounts. Overall, Cocoon is a rich platform with a sound conceptual foundationand without the code-centric entanglements of ASP or PHP. Cocoon's development community is active and willing to help newbies.

[4] Examples in this book were tested in version 2.0.4.

But its feature set and documentation are not quite coordinated. Cocoon usually offers more than one way to perform any single task (reminding of Perl, another famous web development platform), and it may not always be easy to figure out how these different approaches compare.

7.2.1 Cocoon applicability

While Cocoon is in many aspects similar to the conventional XML/XSLT setup that we worked on throughout this book, in many aspects it is also different.

Enter logic. The holy grail of Cocoon is a clean separation of content, logic , and style . If you compare that to the content vs. style distinction we discussed in Chapter 1, you'll notice that logic (i.e., the programming code that implements the logic of a dynamic site's functionality) is the new item that Cocoon puts on the agenda.

In theory, content, logic, and style are completely orthogonal to each other. In practice, if you have a working two-layer system separating content and style (something we have focused on in this book), adding some programming logic to it is relatively easy: Most probably it will only affect content. On the other hand, traditional web development technologies largely deal with style and logic only ( 1.5.1 ); there, it may be much harder to cleanly separate content from the other two aspects if it wasn't designed orthogonally from the start.

Even though Cocoon's attempts to separate logic from style are not always successful (as we'll see in , this separation is not always possible), the content-logic-style triad is a natural fit for dynamic web sites. With that in mind, we can identify two situations where using Cocoon has clear advantages: dynamic sites that want to use XML for content markup and XML sites that need to run on the server with acceptable speed and scalability. Dynamic sites with XML

Cocoon lets you combine full-scale dynamic processing (such as interfacing with a database) with XML content markupwithout programming the entire system from scratch. Moreover, XML-based technologies such as RSS and SOAP enable new kinds of dynamic web-based applications that are straightforward to implement in Cocoon.

Building blocks. The wealth of prepackaged components available to a Cocoon developer is not so wide compared to the vast libraries of older languages like Perl. Still, the basics of a dynamic web site are well covered; for example, Cocoon provides facilities for session tracking, sending email messages, and site search. Plus, you have the full power of XSLT and Java at your disposal.

Java, again. Being Java-based, Cocoon naturally leans toward programming the logic of a web site in Java. Therefore, if you have a working dynamic web site made with Perl, PHP, or ASP, migrating it to Cocoon may not be trivial. If, however, you are using a Java-based technology such as JSP, or if you are starting a new dynamic site from scratch and are not predisposed against Java, Cocoon is a much more attractive option. Server-side XML

No doubt, the fastest of all XML web site setups is offline transformation ( 1.4.1 ): Nothing can beat a native-binary, lean and mean web server spewing out plain HTML and graphic files prepared offline. When you burden the server with on-the-fly XML processing, performance cannot but suffer.

The XSLT bottleneck. Still, Cocoon delivers acceptable performance and scalability for medium-duty web sites. Most of the processing time for each request is consumed by XSLT, but Cocoon reduces this time significantly by caching results of transformations. You can further speed things up by using a faster processor (for example, Saxon is notably faster than Xalan, which comes with Cocoon; still faster Java processors exist, 6.4.1 ) and by removing all static content, such as images, from Cocoon and serving it with a fast conventional web server.

Size matters. Another important performance factor is the size of documents. Memory requirements for processing grow very fast as your documents get bigger, and the practical document size limit is easily reachable even on capable systems. Cocoon steps around this problem by representing XML data not as static trees but as linear streams of events that go down pipelines. This significantly lowers memory requirements and speeds up processing.

For XSLT processing, however, building a tree representation of XML is inevitableXPath connotes arbitrary access to the entire document tree. In some cases you can use Cocoon's pipeline tools to break large documents into parts before transformation and reassemble transformed pieces later. Cocoon's downsides

There are a number of caveats that you should consider before deciding to build your web site with Cocoon:

  • It is likely that the main obstacle to Cocoon deployment will be the server (re)configuration required . If your web server does not already run Cocoon, you will have to install quite a bunch of software on it: Java runtime, Tomcat servlet engine, and Cocoon itself. Hosting administrators are often reluctant to install new software on their servers because of security and performance considerations. If you control your web server, you are of course free to experiment, but remember that such a drastic change in configuration is risky and should be thoroughly tested before deployment.

  • As mentioned before, Cocoon is big and complex , with a steep learning curve (even if you are already familiar with XML and XSLT). In my opinion, this complexity is well worth the effort, as the power you get with Cocoon is immenseeven though it may take time to implement your first dynamic Cocoon web site. On the other hand, installing a simple static XML/XSLT web site under Cocoon isn't at all difficult (see 7.2.6 for a how-to).

  • Besides being big and complex, Cocoon is still an unstable, evolving piece of software. Certain architectural aspects may seem awkward to a person coming from either a traditional web development background or an XML/XSLT background. I will detail the most noticeable discrepancies between Cocoon's approach to XSLT and the way we used the language in previous examples. None of these discrepancies is too serious, but they require some getting used to.

The following sections cover the most important architectural principles of Cocoon and present some of its often used components. If you prefer to "learn as you type," you can skip to 7.2.6 for a step-by-step guide to setting up a minimal Cocoon site, or even to 7.2.7 where we'll see what it takes to adapt our sample Foobar site to run under Cocoon.

7.2.2 Pipelines

Cocoon's architecture is based on the concept of a pipeline . A pipeline is a device that is triggered by an HTTP request (such as a user browsing to a URL on a Cocoon server) and responds to the request by preparing the corresponding resource. Generally, this preparation consists of generation (e.g., reading an XML file), transformation (e.g., XSLT processing), and serialization (e.g., rendering the XSLT transformation output as an HTML document that is sent to the requesting client).

DOM vs. SAX. With the possible exception of the very first stage (generation), the entire pipeline is dynamic: Data being poured down the pipeline is never frozen in a static file but is always in a state of flux. The linear stream of XML data between pipeline stages consists of the so called SAX [5] events. The first version of Cocoon used DOM [6] trees to represent XML data, but performance considerations forced developers to switch to SAX.



The three main categories of components available to a Cocoon developer are generators, transformers , and serializers.

  • Generators produce an XML stream and send it down a pipeline. The simplest generator, which is the default in Cocoon, is the file generator that parses an XML document stored in a file. Others include the search generator (searching files in a given directory with a given query and returning a list of results), the directory generator (similar to our files:dir() extension function, ), and the status generator (producing an XML representation of the status of the Cocoon engine, useful for debugging). With the exception of the file generator (which can read arbitrary XML), most generators produce a specific XML vocabulary as output.

  • Transformers draw in an XML stream, convert it to some other form of XML, and spit it out further down the pipeline. The most widely used transformer is the XSLT transformer , which applies a given stylesheet to its input and sends the transformation result to the output. Other transformers can be used for encoding URLs, converting relative paths in href attributes to absolute, aggregating content from several documents, filtering, logging, and so on.

    All transformers expect some specific XML vocabulary as input. For the XSLT transformer, the input vocabulary is defined by the stylesheet it uses; for others, it is usually hardcoded into the transformer itself but may sometimes be affected by the parameters of a transformer call.

    For example, the i18n transformer translates (parts of) a document from one language to another (using a fixed vocabulary that you must prepare for your data in advance). It expects to see specific elements from the namespace. It reacts by translating the content of these elements as specified in their attributes and passing along the rest of the data unchanged.

  • Serializers convert a stream of XML data into a stream of bytes. [7] The most obvious way to serialize an XML document is to represent it in the same way as it would appear in a file; this is what the default XML serializer does. Similarly, the HTML, XHTML , and text serializers are analogous to the corresponding output methods in XSLT.

    [7] The term serializer implies a conversion from a tree-like XML structure into a stream of bytes. Given that inside Cocoon, XML already exists as a serial stream of SAX events rather than a tree, this term may be slightly misguiding even if backed by tradition.

    Cocoon also includes serializers for converting the XSL-FO vocabulary ( ) into PostScript or PDF, as well as a number of serializers for rasterizing SVG ( ) into various bitmap image formats (these components use Batik, ).

  • For serving static non-XML content such as images, Cocoon offers readers. A reader is an entire pipeline compressed into a one-line instruction; it reads the specified resource and immediately sends it to the client without attempting to parse, transform, or serialize it.

  • In addition to these basic components, a pipeline may also include aggregators ( 7.2.4 ), actions ( 7.2.5 ), and other components used less frequently.

7.2.3 Sitemap

A sitemap is the control center of a Cocoon-based web site. It is an XML document describing the configuration of the pipelines and specifying when and how these pipelines are to be activated.

A sitemap consists of two main parts. First, it declares all the components it is going to use; second, it builds pipelines from these components and associates them with various types of requests coming from web clients .

The component declaration part is, although important, not too exciting. You just look up the correct syntax for each component in the Cocoon documentation and copy it to your sitemap with appropriate modifications (see 7.2.6 for a complete sitemap example). It is the defining of pipelines and associating them with requests that is much more interesting. Matching requests to pipelines

A sitemap component called a matcher wraps up a pipeline definition and associates it with specific requests. The most frequently used matcher is the wildcard URI matcher ; it responds to URI requests that match a wildcard pattern. When triggered, a matcher launches the pipeline that is defined inside the map:match element, [8] for example:

[8] In the examples in this chapter, the prefix map corresponds to the namespace URI

 <map:match pattern=  "*.html"  >   <map:generate src=  "source/{1}.xml"  />   <map:transform src=  "style.xsl"  />   <map:serialize/> </map:match> 

This matcher catches all URI requests that have the form *.html . Within the matcher, the expression {1} is used to refer to whatever replaces the first asterisk in the request patternin this case, the filename without extension.

The map:generate element starts the pipeline by reading in the corresponding XML source file from the source/ subdirectory. Next map:transform applies the stylesheet, style.xsl , to this source to transform it into HTML. Finally, the default serializer, map:serialize , outputs the result as XHTML that is sent to the requesting client.

Different pipelines may use one source document but different transformation stylesheets and serializers. For example, a virtual URI of the form *.pdf may activate another pipeline ending with the PDF serializer that will produce a PDF version of the same web page that is also available as *.html .

Pipelining from afar. The generator need not necessarily read a local file. This example is taken from the Cocoon documentation:

 <map:match pattern=  "news/slashdot.html"  >   <map:generate src=  ""  />   <map:transform src=  "stylesheets/news/slashdot.xsl"  />   <map:serialize/> </map:match> 

This pipeline is triggered when a web surfer requests the specific URI, /news/slashdot.html , from the Cocoon- powered web site. Instead of relying on its own resources, however, Cocoon leeches the XML version of the front page from , transforms it with a local stylesheet, and sends the result to the client.

Other types of generators can connect to databases (including remote ones) and retrieve data from them, or simply generate some XML data on the spot (such as search results or a Cocoon status report). A virtual URI may include, in arbitrary syntax, any parameters that the generator will use, such as a query string for document search or a zip code for a database query.

Readers for static content. For static resources, a matcher can use a reader instead of a complete pipeline:

 <map:match pattern=  "img/*.png"  >   <map:read mime-type=  "image/png"  src=  "img/{1}.png"  /> </map:match> 

Other wildcard options. A plain * matches anything excluding the directory separator character, /. Conversely, a double asterisk (**) in a wildcard pattern matches a string that can include / . Thus,

 <map:match pattern=  "**/*.html"  >   <map:generate src=  "source/{1}/{2}.xml"  />   <map:transform src=  "style.xsl"  />   <map:serialize/> </map:match> 

will match news/archive/foobar.html and take the corresponding source from source/news/archive/foobar.xml . Here, {1} refers to the ** in the pattern that matches the entire directory path ( news/archive ), and {2} is the single * which represents the filename ( foobar ).

Other types of matchers can respond to a specific client hostname or hostname mask, HTTP request parameters, etc. More complex URI patterns can be recognized by the regexp matcher ; see the examples in the Cocoon distribution for ideas on how to use it.

Virtual URIs vs. abbreviated addresses. Note that any pipeline processing remains invisible to the user. If someone surfs, e.g., to on the Cocoon server, the browser will display the received HTML document with that same in the URL line. The user will have no way of knowing that no static HTML file corresponding to this URI exists on the server and that the XML source of the page may be stored in some very different place or generated on the fly. Such a URI is therefore called a virtual URI in Cocoon.

This correspondence between virtual URIs visible from outside the server and the actual locations of source XML documents on the server might remind you of our address abbreviation technique ( 3.5.3 ). Indeed, Cocoon's virtual URIs serve a similar purpose, but there is also an important difference: While our abbreviated addresses were only used in source XML markup and remained unseen to the site's visitors , a virtual URI in Cocoon is used by both XML content authors and web surfers.

For one thing, this means that if a virtual URI changes, all documents linking to it will have to be updated. Abbreviations, on the other hand, were designed in part to avoid exactly this problem. Besides, a virtual URI is still a URIeven if stripped of the filename extension, it has to carry certain technical baggage such as protocol specification.

These observations suggest that virtual URIs are a complement rather than competition to address abbreviations used in XML markup, and in fact, with Cocoon we can use both ( 7.2.7 ). For this to work, however, your transformation stylesheet must have a way to know both the virtual URI (for creating links in HTML) and the source location (for accessing the source) for each abbreviated address it encounters.

7.2.4 Aggregation

Aggregation is the Cocoon term for combining data from different sources. It usually refers only to static sources represented by XML documents; inserting dynamic values into templates is another matter (we'll get to it in 7.2.5 ). As with most other concepts, Cocoon offers several different approaches to aggregation.

  • The XInclude transformer is an implementation of the XInclude standard. [9] It filters its input looking for specific inclusion instructions and replacing each one with the content of the external resource referred to in the instruction. Such an instruction must have the form of an element in a designated namespace, as in


     <x:include     xmlns:x=  ""  href=  "header.xml#xpointer(//section/head[2])"  /> 

    As you can see from this example, XInclude URIs may use the XPointer language [10] for extracting specific fragments from the referenced resource. XPointer is a superset of XPath, so if you are familiar with XPath you shouldn't have problems with XPointer.


    Here's how you might include an XInclude transformer in a pipeline in your sitemap:

     <map:match pattern=  "*.html"  >   <map:generate src=  "source/{1}.xml"  />   <map:transform type=  "xinclude"  />   <map:transform type=  "xslt"  src=  "style.xsl"  />   <map:serialize/> </map:match> 

    Now, the XSLT transformer will receive as input an XML document in which the include elements from the XInclude namespace are replaced by whatever they refer to.

  • The CInclude transformer [11] is similar to XInclude. Its sole advantage is that it does not support XPointer, which means it does not have to build a complete tree of the included document and can just pour it in as a stream. You can only embed entire documents with CInclude, but it is significantly faster than XInclude. Also, CInclude offers a caching implementation which further speeds it up when the same document is included more than once.


    Here's how a CInclude instruction might look in a document:

     <c:include     xmlns:c=  ""  src=  "header.xml"  element=  "head"  /> 

    Here, the element attribute specifies the element which will envelop the included resource.

  • A sitemap aggregator works at the pipeline level and therefore, unlike XInclude and CInclude, does not require that any instructions be placed in the source files. Here's an example sitemap fragment:

     <map:match pattern=  "leaflet.pdf"  >   <map:aggregate element=  "leaflet"  >     <map:part element=  "cover"  src=  "source/cover-page.xml"  />     <map:part element=  "page"  src=  "cocoon:/page2.xml"  />   </map:aggregate>   <map:transform src=  "leaflet-style.xsl"  />   <map:serialize type=  "pdf"  /> </map:match> 

    Here, the map:aggregate element works as a generator; it combines two resources into one and passes the result down the pipeline. As with CInclude, the element attributes produce wrapper elements around the included content, so the high-level structure of the created document will be:

     <leaflet>   <cover>     <!--  contents of source/cover-page.xml  -- >   </cover>   <page>     <!--  contents of cocoon:/page2.xml  -- >   </page> </leaflet> 

    A nice thing about Cocoon is that it allows you to reuse any pipelines as data sources for other pipelines. The map:part with the cocoon:/ URI in the above aggregation example does just that: It searches the current sitemap for a pipeline matching the request page2.xml , runs it, and uses its output as the second part of the aggregated document. For this to make sense, the referenced pipeline must produce not HTML, but an XML vocabulary that fits the aggregated document's schema and would be correctly processed by leaflet-style.xsl . document() ing in Cocoon

As you may remember from Chapter 5, the document() function of XPath is one of the cornerstones of our transformation stylesheet, much used for "aggregation"that is, for looking up arbitrary values in the master document or in other page documents during transformation of a page. In Cocoon, however, document() is a bit of a controversy. What's the problem?

Stay in line. The pipeline metaphor implies that each of the components in the middle of a pipeline has exactly one inflow tube and exactly one outflow tube. This applies to the XSLT transformer as well, which means your stylesheet must only take one document as input and produce one document as output.

Any compilation of data from different sources is supposed to be done either by the dynamic code ( 7.2.5 ) or by specialized aggregators ( 7.2.4). Both of them usually come before the XSLT transformation stage in a pipeline, which means Cocoon leans towards the "Compile, then transform" scenario as described in 1.5.2 (even though Cocoon is flexible enough to implement any dynamic workflow).

Aggregating entire documents with map:aggregate or the CInclude aggregator is too coarse for the tasks where we would normally call document() with an XPath nodeset selector. XInclude can use XPointer to extract arbitrarily granular fragments for inclusion, but it requires adding instructions in the source XML which is simply impossible in many cases (where would you XInclude a menu label if there is no menu in a page document?). It looks like nothing in Cocoon can give us quite the functionality of a stylesheet's document() call.

Logic or style? Moreover, the authors of the Cocoon FAQ [12] claim that the use of the document() function in XSLT breaks the separation between style and logic: The stylesheet must only be concerned with styling (i.e., formatting), while aggregating sources belongs in the realm of programming logic. [13]


[13] The FAQ goes on to say, "Understand that the document() function was designed before XInclude with XPointer facilities existed. Had such capabilities been available, perhaps the document() function, which essentially mimics XInclude and XPointer, would have never been added to XSLT." Wrong. XInclude and document() solve entirely different problems. An XInclude instruction in the source may affect the content, while document() provides access to arbitrary resources from within the processing layer (be that "style" or "logic" processing).

This is at least arguable. For example, if building a menu for a web page is part of styling that page, why isn't fetching data needed for the menu from another document? The fact is, in any sufficiently complex web site, the stylesheet needs to see the context of the entire site in order to format a pageand the master document accessed via document() calls is the best way to provide that context.

Another example is the presentation of links. The source of a page can only provide the address for a linkbut when creating an HTML rendition , we might want to use some information from the linked document, such as its title, that could be shown in the link's floating tooltip. This is another task best performed in XSLT using document() .

On the other hand, inclusion of orthogonal content ( ) could indeed be removed from the stylesheet into a different processing stage. For instance, in a page document each block element with an idref attribute (Example 3.1, page 141) might be replaced by an XInclude instruction. Whether XInclude (with its XPointer support) is better than idref (with its abbreviated block references) is an open question, though.

Forking trouble. Fortunately, the case against document() is purely ideological, not technical. A stylesheet with a document() call will work just fine under Cocoon, as we'll see in 7.2.7. It is the other violation of the "one inflow, one outflow" principlethe xsl:result-document instructionthat may cause some trouble under Cocoon.

Forking the output of a stylesheet into multiple result documents effectively breaks the pipeline system by sprouting new branches that are unseen and uncared for from the sitemap perspective. What's worse , xsl:result-document may simply fail unless the pathname of the output document is absolute.

This is because an XSLT processor uses the pathname of the main result document to resolve the relative pathnames of all non-main ones. Inside Cocoon, however, the processor is run in "stream mode" without any real input or output files. Therefore, for xsl:result-document to work, you must provide an absolute pathname for the output file in its href attribute.

This is not a particularly elegant solution, as it forces you to hard-code absolute paths into the stylesheet making it potentially unportable. The bottom line is that it is better to avoid xsl:result-document altogether if you are planning to run your stylesheet under Cocoon.

7.2.5 Dynamic processing

Although this book is not about dynamic web sites, it is worth spending a few pages on Cocoon's approaches to dynamic processing, from the viewpoint of integrating XSLT transformations with a dynamic web site engine.

  • Legacy code. For refugees from other web development platforms, Cocoon offers ways to accommodate the old code. You can use special generators ( 7.2.2 ) if you want to reuse HTML templates with embedded JSP [14] or PHP [15] code. These generators will launch regular JSP or PHP interpreters to process the embedded scripts and then feed the resulting HTML files down the pipeline.

    [14] Java Server Pages, see


    This approach is obviously very crudefrom an XML-enlightened perspective, anyway. It still produces HTML which is nearly useless for our purposes even if it is coming down a Cocoon pipeline. So, it is provided only as a temporary workaround for those pages that haven't yet been converted to more advanced technologies supported by Cocoon.

  • XSP. The primary mechanism for programming web site applications in Cocoon is called XSP (eXtensible Server Pages, in line with all the other "Server Pages" out thereASP, JSP ...). This is, basically, a syntax for embedding programming code into XML. The programming language used by XSP is usually Java, although Cocoon also supports JavaScript.

    Embedding Java code into an XSP page document is not too different from embedding, say, PHP code into an HTML file. One important difference is that you don't work with messy HTML, but with your clean semantic XML that will be transformed into HTML only after all XSP processing is finished ("Compile, then transform," ).

    Another difference is that instead of PHP's <? ?> or ASP's <% %> , bits of embedded code in an XSP document are contained in elements from the XSP namespace. To use an XSP page in Cocoon, you call a special kind of generator that executes the embedded code and sends the resulting pure XML down the pipeline.

  • Logicsheets. An interesting addition to XSP is the concept of logicsheets ("stylesheets for logic"). This is an attempt to apply XML's fundamental concept of separating content from presentation to programming code.

    A logicsheet is an XSLT stylesheet that returns its input unchanged except for specific elements (usually from a namespace unique to this logicsheet) that are replaced by bits of XSP code. Thus, another level of abstraction is introduced: In your source XML, you provide a general outline of what the code must do, and the logicsheet fills in a specific implementation.

    The set of element types that a logicsheet responds to is called the taglib ("library of tags") of that logicsheet. [16] Cocoon provides several built-in logicsheets for tasks such as database access, sending emails from a web page, and form input validation.

    [16] A taglib is somewhat similar to an API (Application Programming Interface) in traditional programming.

  • Actions. Finally, you can implement some of your dynamic functionality at the sitemap level using actions . One of the reasons for adding actions to the already rich dynamic landscape in Cocoon was that, quoting Cocoon documentation, other approaches "still mix content and logic to a certain degree." Which is perhaps a polite way of putting iteven with logicsheets, you still often have to edit your content (i.e., the XML source) if you want to change some aspects of the site's logic .

    Actions, on the other hand, require no changes to the page documents whatsoever. An action is a Java class that must be declared in a sitemap just like any other sitemap component (generators, transformers, etc.). After that, you can call your action using the map:act element from within any pipeline. An action can set parameters of a pipeline, switch pipeline parts on or off, and perform any actions that are sufficiently external to web pages (such as validating the output of a form).

7.2.6 Cocoon primer

Suppose you have a very simple site consisting of a stylesheet ( style.xsl ) and one or more page documents (say, page.xml ). No master documents, no graphic generation, no extension Java classes all of that is left for the next section. You have authored the pages and tested them with the stylesheet offline; all you want to do now is install them under Cocoon so they are transformed on the server.

We will now build a setup without a dedicated web server, with the Tomcat servlet engine working as a server. This is simpler to get going, yet perfectly adequate for experimentation.

  1. Install Java. [17] (You've probably done that long ago. It's here just for completeness.)

    [17] Sun's Java implementation is at; IBM's is at; there are others.

  2. Download and install Tomcat. [18] It's easy: unzip the distribution archive into a directory, set the JAVA_HOME environment variable to the location of your Java installation, and run startup.bat (Windows) or (Unix) from the bin subdirectory of Tomcat.


  3. Check that Tomcat is running. Open up your web browser and go to localhost:8080 . The front page of your local Tomcat installation should appear.

  4. Stop Tomcat by running bin/shutdown.{batsh} and launch it again by bin/startup.{batsh} . You will have to restart Tomcat with these scripts whenever you change the configuration of Tomcat or Cocoon (updating sitemaps , sources, or stylesheets does not require a restart).

  5. Download the latest version of Cocoon. [19] Unzip the distribution archive and place the cocoon.war file into the webapps subdirectory of Tomcat. Restart Tomcat.


  6. Check that Cocoon is running by browsing to localhost:8080/cocoon . The first page view after a restart may be slow while Cocoon loads its classes and configures itself.

  7. Now, go to webapps/cocoon under Tomcat (called "Cocoon directory" from now on) and create a subdirectory for your site, e.g., eg .

  8. Put your stylesheet, style.xsl , into eg . Create a subdirectory eg/source and put page.xml and the rest of the page documents there.

  9. Create the file named sitemap.xmap , shown in Example 7.1, in eg . This is the sitemap of your sample site.

  10. We also need to mount our new site in the main sitemap of the Cocoon installation. Ascend to the Cocoon directory and edit the sitemap.xmap file there by adding, after the start tag of the first map:pipeline element:

     <map:match pattern=  "eg/**"  >   <map:mount       check-reload=  "yes"  src=  "eg/sitemap.xmap"  uri-prefix=  "eg"  /> </map:match> 

    Now, the sitemap for eg will receive all URL requests for files from eg/ , but with everything up to and including eg/ removed from the URL. That is, if you surf to /eg/page.html , the root sitemap will cut out the last part of the URL, page.html , and pass it down to the eg sitemap for matching.

    Example 7.1. eg/sitemap.xmap : A basic sitemap for a Cocoon site.
      <?xml version="1.0" encoding="utf-8"?>  <map:sitemap xmlns:map=  ""  >   <map:components>     <map:matchers default=  "wildcard"  />     <map:generators default=  "file"  />     <map:transformers default=  "xslt"  />     <map:serializers default=  "html"  />   </map:components>   <map:pipelines>     <map:pipeline>       <map:match pattern=  "*.html"  >         <map:generate src=  "source/{1}.xml"  />         <map:transform src=  "style.xsl"  />         <map:serialize/>       </map:match>     </map:pipeline>   </map:pipelines> </map:sitemap> 

  11. That is all! Direct your browser to


    Cocoon will transform the page and (after some pause) your browser will show you the resulting HTML. Subsequent loads of the same page will be much faster, thanks to caching.

7.2.7 Foobar under Cocoon

Our sample Foobar Corporation site will not quite run out of the box under Cocoon. However, the changes required to make it work are not too drastic.

  • To start, our stylesheet uses many XSLT 2.0 and XPath 2.0 facilities. You can install Saxon 7 as the default processor under Cocoon if the Xalan processor that is shipped with Cocoon only supports XSLT 1.0. Instructions for doing this can be found on the Cocoon Wiki web site. [20]


  • Extension Java classes that we developed in Chapter 5 are also easy to install: just copy the entire com hierarchy ( ) into the WEB-INF/classes directory under Cocoon.

    By the way, at least some of our extension functions can be replaced by Cocoon components. Thus, instead of the files:dir() function ( ) you can use the much more powerful directory generator [21] that produces an XML representation of a directory listing. This will require, however, a fair amount of redesign of the stylesheet to remove part of its functionality to the sitemap.


  • Directories and URLs. You can install the site in a subdirectory under Cocoon, as we did in the primer ( 7.2.6 ), but you could also remove all the Cocoon samples and documentation and install your site right in the root directory of Cocoon, replacing the default root sitemap with the sitemap of your site. This will get rid of the site's subdirectory name in the URL (i.e., the URL will end with /cocoon/ rather than /cocoon/foobar/ ). You can also configure Tomcat to make Cocoon its root servlet, thus removing the /cocoon/ from the URL as well.

  • The master document ( , page 49) does not need to be changed in any significant way. You'll only need to define an environment whose src-path is the absolute path of the site's root directory (for example, /var/tomcat/webapps/cocoon/foobar/ ), and whose target-path is the relative URL of the site as seen from outside of Cocoon (for example, /cocoon/foobar/ , or simply / if you have configured Tomcat and Cocoon as described in the previous item). Make this environment the default (by changing the value of the $env parameter in the shared library, _lib.xsl ) so that you don't have to specify the environment identifier when the stylesheet is run by Cocoon.

  • The shared XSLT library ( 5.1.1 , page 187) needs to be modified. The saxon:systemId() extension function that we used for finding out the pathname of the source document will not work because Cocoon runs its XSLT processor on a stream of SAX events, not on a file. We must devise another way to let the stylesheet know which page of those listed in the master document it is working on.

    Since the stylesheet is run from the Cocoon sitemap, we can pass this information from the sitemap to the stylesheet via a parameter. The pipeline might look like this:

     <map:match pattern=  "**.html"  >   <map:generate src=  "{1}.xml"  />   <map:transform src=  "style.xsl"  >     <map:parameter name=  "request"  value=  "{1}"  />   </map:transform>   <map:serialize/> </map:match> 

    Here, {1} refers to the replacement of ** in the matcher patternthat is, to the URL fragment without the server part (stripped by the root sitemap) and without .html . Thus, for a URL like


    the $request parameter will be en/team/contact which is exactly what we want to have in the stylesheet. All you need to do is rewrite the definitions of the $lang and $current variables in _lib.xsl so that they rely on the $request parameter instead of calling saxon:systemId() .

  • Schematron validation only makes sense for offline transformationserver-side validation of each page on each request is too expensive (even with caching). Besides, a developer is supposed to fix any validation problems before a document gets to the server. As mentioned in 1.4.2 , you will likely need a working offline setup in addition to the server setup anyway. So, we won't even try to port our Schematron setup to Cocoon; you will have to validate your documents offline before uploading them to Cocoon.

  • Image generation ( 5.5.2 ) is another thing you won't want to run under Cocoon. It will work (if you run the stylesheet with the $images parameter set to yes ), but it is way too slow for on-the-fly generation on the server. Generate images offline and upload them to Cocoon along with the rest of the files.

As you can see, the changes required for migrating our (initially offline) setup to Cocoon are not too serious, even though we used a number of non- orthodox approaches in our stylesheet. We did drop a few componentsnot because we could not get them to run, but simply because it made little sense to use them on the server.

There are many ways in which a site can be refactored to make it fit the Cocoon architecture better and to optimize its server-side performance. (One possibility I already mentioned is reimplementing the orthogonal content mechanism using Cocoon facilities.) However, I don't think this refactoring is truly a necessity for a static site, since preserving the ability to run the complete transformation process offline has its benefits. For a dynamic XML site, the situation is different; it is preferable to develop the site from the start on the Cocoon platform, but perhaps to separate some of the auxiliary subsystems (such as validation and image generation) to be run offline.


XSLT 2.0 Web Development
ASP.Net 2.0 Cookbook (Cookbooks (OReilly))
ISBN: 0596100647
EAN: 2147483647
Year: 2006
Pages: 90

Similar book on Amazon © 2008-2017.
If you may any questions please contact us: