The History of XSL | NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)

Like most of the XML family of standards, XSLT was developed by the World Wide Web Consortium (W3C), a coalition of companies orchestrated by Tim Berners-Lee, the inventor of the Web. There is an interesting page on the history of XSL, and styling proposals generally , at http://www.w3.org/Style/History/ .

Writing history is a tricky business. Sharon Adler, the chair of the XSL Working Group , tells me that her recollections of what happened are very different from the way I describe them. This just goes to show that the documentary record is a very crude snapshot of what people were actually thinking and talking about. Unfortunately, though, it's all that we've got.

Prehistory

HTML was originally conceived by Berners-Lee as a set of tags to mark the logical structure of a document; headings, paragraphs, links, quotes, code sections, and the like. Soon, people wanted more control over how the document looked ; they wanted to achieve the same control over the appearance of the delivered publication as they had with printing and paper. So, HTML acquired more and more tags and attributes to control presentation; fonts, margins, tables, colors, and all the rest that followed. As it evolved, the documents being published became more and more browser-dependent, and it was seen that the original goals of simplicity and universality were starting to slip away.

The remedy was widely seen as separation of content from presentation. This was not a new concept; it had been well developed through the 1980s in the development of Standard Generalized Markup Language (SGML).

Just as XML was derived as a greatly simplified subset of SGML, so XSLT has its origins in an SGML-based standard called DSSSL (Document Style Semantics and Specification Language). DSSSL (pronounced Dissel ) was developed primarily to fill the need for a standard device-independent language to define the output rendition of SGML documents, particularly for high-quality typographical presentation. SGML was around for a long time before DSSSL appeared in the early 1990s, but until then the output side had been handled using proprietary and often extremely expensive tools, geared toward driving equally expensive phototypesetters, so that the technology was really taken up only by the big publishing houses .

Michael Sperberg-McQueen and Robert F. Goldstein presented an influential paper at the WWW '94 conference in Chicago under the title A Manifesto for Adding SGML Intelligence to the World-Wide Web . You can find it at http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Autools/sperberg-mcqueen/sperberg.html .

The authors presented a set of requirements for a stylesheet language, which is as good a statement as any of the aims that the XSL designers were trying to meet. As with other proposals from around that time, the concept of a separate transformation language had not yet appeared, and a great deal of the paper is devoted to the rendition capabilities of the language. There are many formative ideas, however, including the concept of fallback processing to cope with situations where particular features are not available in the current environment.

It is worth quoting some extracts from the paper here:

Ideally, the stylesheet language should be declarative, not procedural, and should allow stylesheets to exploit the structure of SGML documents to the fullest. Styles must be able to vary with the structural location of the element: paragraphs within notes may be formatted differently from paragraphs in the main text. Styles must be able to vary with the attribute values of the element in question: a quotation of type "display" may need to be formatted differently from a quotation of type "inline"

At the same time, the language has to be reasonably easy to interpret in a procedural way: implementing the stylesheet language should not become the major challenge in implementing a Web client.

The semantics should be additive: It should be possible for users to create new stylesheets by adding new specifications to some existing (possibly standard) stylesheet. This should not require copying the entire base stylesheet; instead, the user should be able to store locally just the user 's own changes to the standard stylesheet, and they should be added in at browse time. This is particularly important to support local modifications of standard DTDs.

Syntactically, the stylesheet language must be very simple, preferably trivial to parse. One obvious possibility: formulate the stylesheet language as an SGML DTD, so that each stylesheet will be an SGML document. Since the browser already knows how to parse SGML, no extra effort will be needed.

We recommend strongly that a subset of DSSSL be used to formulate stylesheets for use on the World Wide Web; with the completion of the standards work on DSSSL, there is no reason for any community to invent their own style-sheet language from scratch. The full DSSSL standard may well be too demanding to implement in its entirety, but even if that proves true, it provides only an argument for defining a subset of DSSSL that must be supported, not an argument for rolling our own. Unlike home-brew specifications, a subset of a standard comes with an automatically predefined growth path . We expect to work on the formulation of a usable, implementable subset of DSSSL for use in WWW stylesheets, and invite all interested parties to join in the effort

In late 1995, a W3C-sponsored workshop on stylesheet languages was held in Paris. In view of the subsequent role of James Clark as editor of the XSLT Recommendation, it is interesting to read the notes of his contribution on the goals of DSSSL, which can be found at http://www.w3.org/Style/951106 _ Workshop/reportl.html#clark .

Here are a few selected paragraphs from these notes.

DSSSL contains both a transformation language and a formatting language. Originally the transformation was needed to make certain kinds of styles possible (such as tables of contents). The query language now takes care of that, but the transformation language survives because it is useful in its own right.

The language is strictly declarative, which is achieved by adopting a functional subset of Scheme. Interactive stylesheet editors must be possible.

A DSSSL stylesheet very precisely describes a function from SGML to a flow object tree. It allows partial stylesheets to be combined ("cascaded" as in CSS): some rule may override some other rule, based on implicit and explicit priorities, but there is no blending between conflicting styles.

James Clark closed his talk with the remark:

Creating a good, extensible style language is hard!

One suspects that the effort of editing the XSLT 1.0 Recommendation didn't cause him to change his mind.

The First XSL Proposal

Following these early discussions, the W3C set up a formal activity to create a stylesheet language proposal. The remit for this group specified that it should be based on DSSSL.

As an output of this activity came the first formal proposal for XSL, dated 27 August 1997. Entitled A Proposal for XSL , it lists 11 authors: James Clark (who works for himself), five from Microsoft, three from Imso Corporation, one from ArborText, and one (Henry Thompson) from the University of Edinburgh. The document can be found at http://www.w3.org/TR/NOTE-XSL.html .

The section describing the purpose of the language is worth reading.

XSL is a stylesheet language designed for the Web community. It provides functionality beyond CSS (e.g. element reordering ). We expect that CSS will be used to display simply structured XML documents and XSL will be used where more powerful formatting capabilities are required or for formatting highly structured information such as XML structured data or XML documents that contain structured data.

Web authors create content at three different levels of sophistication given as follows :

markup: relies solely on a declarative syntax

script: additionally uses code "snippets" for more complex behaviors

program: uses a full programming language

XSL is intended to be accessible to the "markup" level user by providing a declarative solution to most data description and rendering requirements. Less common tasks are accommodated through a graceful escape to a familiar scripting environment. This approach is familiar to the Web publishing community as it is modeled after the HTML/JavaScript environment.

The powerful capabilities provided by XSL allow:

formatting of source elements based on ancestry/descendency, position, and uniqueness

the creation of formatting constructs including generated text and graphics

the definition of reusable formatting macros

writing-direction independent stylesheets

extensible set of formatting objects

The authors then explained carefully why they had felt it necessary to diverge from DSSSL and described why a separate language from CSS (Cascading Style Sheets) was thought necessary.

They then stated some design principles:

XSL should be straightforwardly usable over the Internet.
XSL should be expressed in XML syntax.
XSL should provide a declarative language to do all common formatting tasks.
XSL should provide an "escape" into a scripting language to accommodate more sophisticated formatting tasks and to allow for extensibility and completeness.
XSL will be a subset of DSSSL with the proposed amendment. (As XSL was no longer a subset of DSSSL, they cannily proposed amending DSSSL so it would become a superset of XSL.)
A mechanical mapping of a CSS stylesheet into an XSL stylesheet should be possible.
XSL should be informed by user experience with the FOSI stylesheet language.
The number of optional features in XSL should be kept to a minimum.
XSL stylesheets should be human-legible and reasonably clear.
The XSL design should be prepared quickly.
XSL stylesheets shall be easy to create.
Terseness in XSL markup is of minimal importance.

As a requirements statement, this doesn't rank among the best. It doesn't read like the kind of list you get when you talk to users and find out what they need. It's much more the kind of list that designers write when they know what they want to produce, including a few political concessions to the people who might raise objections. But if you want to understand why XSLT became the language it did, this list is certainly evidence of the thinking.

The language described in this first proposal contains many of the key concepts of XSLT as it finally emerged, but the syntax is virtually unrecognizable. It was already clear that the language should be based on templates that handled nodes in the source document matching a defined pattern, and that the language should be free of side effects, to allow "progressive rendering and handling of large documents." I'll explore the significance of this requirement in more detail on page 36, and discuss its implications on the way stylesheets are designed in Chapter 9. The basic idea is that if a stylesheet is expressed as a collection of completely independent operations, each of which has no external effect other than generating part of the output from its input (for example, it cannot update global variables ), then it becomes possible to generate any part of the output independently if that particular part of the input changes. Whether the XSLT language actually achieves this objective is still an open question.

Microsoft shipped its first technology preview 5 months after this proposal appeared, in January 1998.

To enable W3C to make an assessment of the proposal, Norman Walsh produced a requirements summary, which was published in May 1998. It is available at http://www.w3.org/TR/WD-XSLReq . It largely confirms the thinking already outlined.

The bulk of his paper is given over to a long list of the typographical features that the language should support, following the tradition that the formatting side of the language originally got a lot more column inches than did the transformation side.

Following this activity, the first Working Draft of XSL (not to be confused with the Proposal) was published on 18 August 1998, and the language started to take shape, gradually converging on the final form it took in the 16 November 1999 Recommendation through a series of Working Drafts, each of which made radical changes, but kept the original design principles intact.

Important

A Recommendation is the most definitive of documents produced by the W3C. It's not technically a standard, because standards can only be published by government-approved standards organizations. But I will often refer to it loosely as "the standard" in this book.

The Microsoft WD-XSL Dialect

Before the Recommendation came out, however, Microsoft took a fateful decision to ship an early implementation of their XSLT processor as an add-on to Internet Explorer 4, and later as a built-in feature of IE5. Unfortunately, Microsoft was too early, and the XSLT standard changed and grew. When the XSLT Recommendation version 1.0 was finally published on 16 November 1999, it had diverged significantly from the initial Microsoft product.

Many of the differences, such as changes of keywords, are very superficial but some run much deeper; for example, changes in the way the equals operator is defined.

Fortunately, the Microsoft IE5 dialect of XSL (which I refer to as WD-xsl) is now almost completely obsolete. Microsoft no longer actively promotes it, and their more recent products are very closely aligned with the W3C specifications. It's still possible, however, that you will come across stylesheets written in this language, or developers who aren't aware of the differences. You can recognize stylesheets written in this dialect by the namespace URI on the <xsl:stylesheet> element, which is « http://www.w3.org/TR/WD-xsl » .

Saxon

At this point it might be a good idea to clarify how I got involved in the story. In 1998 I was working for the British computer manufacturer ICL, a part of Fujitsu. Fujitsu, in Japan, had developed an object database system, later marketed by Computer Associates as Jasmine, and I was trying to find applications for this technology in content management applications for large publishers. We developed a few successful large systems with this technology, but found that it didn't scale downwards to the kind of project that wanted something working in 6 weeks rather than 6 months. So I was asked to look at what we could do with XML, which was just appearing on the horizon.

I came to the conclusion that XML looked like a good thing, but that there wasn't any software. So I developed the very first early versions of Saxon to provide a proof-of-concept demonstration. At that stage Saxon was just a Java library, not an XSLT processor, but as the XSL standards developed I found that my own ideas were converging more and more with what the W3C working group was doing, and I started implementing the language as it was being specified. ICL had decided that its marketing resources were spread thinly over too many products, and so the management took the imaginative decision to make the technology available as open source. Seventeen days after the XSLT 1.0 specification was published in November 1999, I announced the first conformant implementation. And on the day it was published, I started work on the first edition of this book.

When the book was published, the XSL Working Group invited me to join and participate in the development of XSLT 1.1. Initially, being based in the United Kingdom and with limited time available for the work, my involvement was fairly sporadic. But early in 2001 I changed jobs and joined Software AG, which wanted me to take a full role in the W3C work. The following year James Clark pulled out of the Working Group, and I stepped into his shoes as editor.

The reason I'm explaining this sequence of events is that I hope it will help you to understand the viewpoint from which this book is written. When I wrote the first edition I was an outsider, and I felt completely free to criticize the specification when I felt it necessary. I have tried to retain an objective approach in the present edition, but as editor of the language spec it is much more difficult to be impartial. I've tried to keep a balance: it wouldn't be fair to use the book as a platform to push my views over those of my colleagues of the working group, but at the same time, I've made no effort to be defensive about decisions that I would have made differently if they had been left to me.

Software AG continued to support my involvement in the W3C work (on the XQuery group as well as the XSL group), as well as the development of Saxon and the writing of this book, through till February 2004, at which point I left to set up my own company, Saxonica.

Beyond XSLT 1.0

After XSLT 1.0 was published, the XSL Working Group responsible for the language decided to split the requirements for enhancements into two categories: XSLT 1.1 would standardize a small number of urgent features that vendors had already found necessary to add to their products as extensions, while XSLT 2.0 would handle the more strategic requirements that needed further research.

A working draft of XSLT 1.1 was published on December 12, 2000. It described three main enhancements to the XSLT 1.0 specification:

Multiple output documents: an <xsl:document> instruction, modeled on extensions provided initially in Saxon and subsequently in other products including xt, Xalan and Oracle, allowing a source document to be split into multiple output documents. This instruction has become <xsl:result-document> in XSLT 2.0.
Temporary trees: the ability to treat a tree created by one phase of processing as input to a subsequent phase of processing. This enhancement was modeled on the node-set() extension function introduced first in xt and subsequently copied in other products. It is retained largely unchanged in XSLT 2.0.
Standard bindings to extension functions: written in Java and ECMAScript. XSLT 1.0 allowed a stylesheet to call external functions, but did not say how such functions should be written, with the result that extension functions written for Xalan would not work with xt or Saxon, or vice versa. XSLT 1.1 defined a general framework for binding extension functions written in any language, with specific mappings for Java and ECMAScript (the official name for JavaScript). This feature of the XSLT 1.1 draft has been dropped completely from XSLT 2.0. It proved highly controversial , particularly as it coincided with Microsoft's U- turn in its Java strategy.

For a number of reasons XSLT 1.1 never got past the working draft stage. This was partly because of the controversy surrounding the Java language bindings, but more particularly because it was becoming clearer that XSLT 2.0 would be a fairly radical revision of the language, and the working group didn't want to do anything in 1.1 that would get in the way of achieving the 2.0 goals. There were feelings, for example, that the facility for temporary trees might prejudice the ability to support sequences in 2.0, a fear which as it happens proved largely unfounded.

XQuery

By the time work on XSLT 2.0 was starting, the separate XQuery working group in W3C had created a draft of its own language.

While the XSL working group had identified the need for a transformation language to support a self-contained part of the formatting process, XQuery originated from the need to search large quantities of XML documents stored in a database.

Different people had different motivations for wanting an XML Query Language, and many of these motivations were aired at a workshop held in December 1998. You can find all 66 position papers presented at this workshop at http://www.w3.org/TandS/QL/QL98/pp.html . Quite how a consensus emerged from this enormous variety of views is difficult to determine in retrospect. But it's interesting to see how the participants saw the relationship with XSL, as it was then known. The Microsoft position paper states the belief that a query language could be developed as an extension of XSLT, but in this it is almost alone. Many of the participants came from a database background, with ideas firmly rooted in the tradition of SQL and object database languages such as OQL, and to these people, XSL didn't look remotely like a query language. But in the light of subsequent events, it's interesting to read the position paper from the XSL Working Group, which states in its summary:

The query language should use XSL patterns as the basis for information retrieval.
The query language should use XSL templates as the basis for materializing query results.
The query language should be at least as expressive as XSL is, currently.
Development of the pattern and transformation languages should remain in the XSL working group.
A coordination group should ensure either that a single query language satisfies all working group requirements or that all W3C query languages share an underlying query model.

(Remember that XPath had not yet been identified as a separate language, and that the expressions that later became XPath were then known as patterns.)

This offer to coordinate, and the strong desire to ensure consistency among the different W3C specifications, can be seen as directly leading to the subsequent collaboration between the two working groups to define XPath 2.0.

The XQuery group started meeting in September 1999. The first published requirements document was published the following January ( http://www.w3.org/TR/2000/WD-xmlquery-req-20000131 ). It included a commitment to compatibility with XML Schema, and a rather cautiously worded promise to "take into consideration the expressibility and search facilities of XPath when formulating its algebra and query syntax." July 2000 saw a revised requirements document that included a selection of queries that the language must be able to express. The first externally visible draft of the XQuery language was published in February 2001 (see http://www.w3.org/TR/2001/WD-xquery-20010215/) and it was at this stage that the collaboration between the two working groups began in earnest.

The close cooperation between the teams developing the two languages contrasts strangely with the somewhat adversarial position adopted by parts of the user community. XSLT users were quick to point out that XSLT 1.0 satisfied every single requirement in the first XQuery requirements document, and could solve all the use cases published in the second version in August 2000. At the same time, users on the XQuery side of the fence have often been dismissive about XSLT, complaining about its verbose syntax and sometimes arcane semantics. Even today, when the similarities of the two languages at a deep level are clearly apparent, there is very little overlap between their user communities: I find that most users of the XQuery engine in Saxon have no XSLT experience. The difference between XSLT and XQuery is in many ways a difference of style rather than substance, but users often feel strongly about style.

XSLT 2.0 and XPath 2.0

The requirements for XSLT 2.0 and XPath 2.0 were published on 14 February 2001. In the case of the XPath 2.0 requirements, the document was written jointly by the two working groups. You can find the documents at the following URLs:

http://www.w3.org/TR/2001/WD-xslt20req-20010214
http://www.w3.org/TR/2001/WD-xpath20req-20010214

Broadly, the requirements fall into three categories:

Features that are obviously missing from the current standards and that would make users' lives much easier, for example, facilities for grouping related nodes, extra string-handling and numeric functions, and the ability to read text files as well as XML documents.
Changes desired by the XML Query working group. The difficulty at this stage was that the Query group did not just want additions to the XPath language; they wanted fundamental changes to its semantics. Many members of the XQuery group felt they could not live with some of the arbitrariness of the way XPath handled data types generally, and node-sets in particular, for example the fact that «a = 1 » tests whether there is some «a » that equals one, whereas «a - 1 = 0 » tests whether the first «a » equals one.
Features designed to exploit and integrate with XML Schema. The W3C XML Schema specification had reached an advanced stage (it became a Candidate Recommendation on 20 October 2000), and implementations were starting to appear in products. The thinking was that if the schema specified that a particular element contains a number or a date (for example), then it ought to be possible to use this knowledge when comparing or sorting dates within a stylesheet.

It has sometimes been suggested that the adoption of XML Schema was forced on XSLT by the XQuery group. I don't think there is any truth in this. The big three database companies (IBM, Oracle, and Microsoft) were very active in both working groups, along with middle- tier players such as BEA and Software AG. Most of these companies saw XML Schema as a strategic way forward, and they carried both groups along with them. What is almost certainly true, however, is that when the XSL Working Group made the decision to work with XML Schema, this was based on a rather vague idea of the potential benefits; none of the members at that stage had a clear idea as to the detailed implications of this decision on the design of the language.

The development of XSLT 2.0 has been a long drawn out process. The timescale was dictated largely by the pace at which agreement could be reached with the XQuery group on the details of XPath 2.0. This took a long time firstly, because of the number of people involved; secondly, because of the very different places where people were coming from (the database community and the document community have historically been completely isolated from each other, and it took a lot of talking before people started to understand each others' positions ); and finally, because of the sheer technical difficulty of finding a workable design that offered the right balance between backwards compatibility and rigorous , consistent semantics. A great deal of the credit for finding a way through these obstacles goes to Mary Fernandez, who chaired the joint XPath task force with remarkable patience and persistence.

So much for the history. Let's look now at the essential characteristics of XSLT 2.0 as a language.