Setting Out the Production Rules | NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)

Now we get to a more interesting area. The XML Recommendation contains syntax production rules, and these are marked up in some detail. A sequence of production rules is contained within a <scrap> element, and each rule is a <prod> element. Here is an example of a <scrap> that contains a single production rule:

  <scrap lang='ebnf' id='document'>   <head>Document</head>   <prod id='NT-document'>   <lhs>document</lhs>   <rhs>   <nt def='NT-prolog'>prolog</nt>   <nt def='NT-element'>element</nt>   <nt def='NT-Misc'>Misc</nt>*   </rhs>   </prod>   </scrap>

This is of course the production rule for an XML document, which appears in the specification as shown in Figure 10-2.

Figure 10-2

In some cases the production rules within a <scrap> are grouped into <prodgroup> elements, but this grouping is ignored in the output.

Here are the top-level template rules:

  <!-- scrap: series of formal grammar productions -->   <!-- set up a <table> and handle children -->   <xsl:template match="scrap">   <xsl:apply-templates select="head"/>   <table class="scrap" summary="Scrap">   <xsl:apply-templates select="bnf  prod  prodgroup"/>   </table>   </xsl:template>   <!-- create one <tbody> for each group -->   <xsl:template match="prodgroup">   <tbody>   <xsl:apply-templates/>   </tbody>   </xsl:template>   <!-- prod: a formal grammar production -->   <!-- if not in a prodgroup, needs a <tbody> -->   <!-- has a weird content model; makes a table but there are no   explicit rules; many different things can start a new row -->   <!-- process the first child in each row, and it will process the   others -->   <xsl:template match="prod">   <tbody>   <xsl:apply-templates   select="lhs   rhs[preceding-sibling::*[1][name()!='lhs']]   com[preceding-sibling::*[1][name()!='rhs']]   constraint[preceding-sibling::*[1][name ()! = 'rhs']]   vc[preceding-sibling::*[1][name()!='rhs']]   wfc[preceding-sibling::*[1][name()!='rhs']]"/>   </tbody>   </xsl:template>   <xsl:template match="prodgroup/prod">   <xsl:apply-templates   select="lhs   rhs[preceding=sibling::*[1][name()!='lhs']]   com[preceding-sibling::*[1][name() !='rhs']]   constraint[preceding-sibling::*[1][nam@{)!='rhs']]   vc[preceding-sibling::*[1][name()!='rhs']]   wfc[preceding-sibling::*[1][name()!='rhs']]"/>   </xsl:template>

To understand this, let's first ignore the horrendous select expression that appears in the last two rules.

The rule for the <scrap> element processes the <head> element to produce a heading, and then outputs an HTML table, whose contents are generated by processing all the <prodgroup> and <prod> elements contained in the <scrap> .
The rule also allows for a <scrap> to contain <bnf> elements. However, the document we're working with doesn't contain any, so we can ignore this.
The rules are being rather pedantic by ensuring that the rows of the table are always contained in a <tbody> element. In practice, Web browsers don't insist on a <tbody> being present, and many HTML authors don't bother writing one, but technically the HTML specification requires it, and the W3C takes great pains to make sure that the documents it publishes are valid HTML. This means that when there is a <prodgroup> present, the <tbody> is generated at the level of the <prodgroup> ; when there is a <prod> that is not contained in a <prodgroup> (that is, it is contained directly in the <scrap> ), then the <tbody> is generated when processing the <prod> element; but when a <prod> is contained in a <prodgroup> , no additional <tbody> is produced.

Now let's look at the monster select expression. A production rule ( <prod> ) has one left-hand side ( <lhs> ), one or more right-hand sides ( <rhs> ), and one or more annotations ( <vc> , <wfc> , or <com> ). A <vc> element is used to refer to a validation constraint, a <wfc> element to refer to a well- formedness constraint, and a <com> element to refer to a comment. The XML specification does not use <constraint> elements, so we can ignore those.

A rule with one <lhs> element, two <rhs> elements, and three <wfc> annotations would be laid out in an HTML table like this:

[17]	lhs1	::=	rhs1
			rhs2	wfc1
				wfc2
				wfc3

As the comment says, the select expression is processing the children of the <prod> element that start a new row: here, lhs1 , rhs2 , wfc2 , and wfc3 . More precisely, the selected elements include every <lhs> element, any <rhs> element that is not immediately preceded by an <lhs> element, and any <vc> , <wfc> , or <com> element that is not immediately preceded by an <rhs> element. So, this template selects the elements that will start a new row, and calls <xsl:apply-templates> to process them.

We'll now look at the template rules that will match these elements. First the <lhs> :

  <!-- lhs: left-hand side of formal productions -->   <xsl:template match="lhs">   <tr valign="baseline">   <td>   <xsl:if test="ancestor-or-self::*/@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="ancestor-or-self::*/@diff"/>   </xsl:attribute>   </xsl:if>   <xsl:if test="../@id">   <a name="{../@id}" id="{../@id}"/>   </xsl:if>   <xsl:number count="prod" level="any" from="spec" format="[1]"/>   <xsl:text>&#xa0;&#xa0;&#xa0;</xsl:text>   </td>   <td>   <xsl:if test="ancestor-or-self::*/@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="ancestor-or-self::*/@diff"/>   </xsl:attribute>   </xsl:if>   <code><xsl:apply-templates/></code>   </td>   <td>   <xsl:if test="ancestor-or-self::*/@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="ancestor-or-self::*/@diff"/>   </xsl:attribute>   </xsl:if>   <xsl:text>&#xa0;&#xa0;&#xa0:::=&#xa0;&#xa0;&#xa0;</xsl:text>   </td>   <xsl:apply-templates select="following-sibling::*[1][name()='rhs']"/>   </tr>   </xsl:template>

There's a great deal of clutter in this rule. The code outputs a table row (a <tr> element) and the first three cells in that table ( <td> elements).

For each <td> element, there is a six-line <xsl:if> instruction that is concerned solely with coloring change-marked sections in the code: changes from one version to the next are marked by the presence of a «diff » attribute on this or some ancestor element, and the coloring happens only if the stylesheet parameter $show.diff.markup is enabled. This clutter could be reduced dramatically by replacing the six lines with a call such as the following, to return the relevant attribute node when required, or an empty sequence otherwise :

  <xsl:call-template name="handle-diff"/>

The first cell contains an optional hyperlink anchor, and a sequence number. The call on <xsl:number> using «level="any" » is a good example of how to generate a sequence of numbers that runs through the document. It creates a sequential number for each <lhs> element, that is, for each production rule. (Unfortunately, it is actually commented out in the current version of the stylesheet, supposedly because of a bug in one particular XSLT processor, and a less convenient technique is used instead. I decided on this occasion to publish the code as the author would have wanted it to be.)

In the second cell, the template calls <xsl:apply-templates/> to process the contents of the <lhs> element, which will generally just be the name of the syntactic term being defined. In the third cell it outputs the «::= » that separates the term from its definition. In various places it inserts nonbreaking space characters ( «  » ) to achieve visual separation between the parts of the rule.

After producing these three cells, the template calls:

  <xsl:appy-templates select="following-sibling::*[1] [name()='rhs']"/>

This selects the immediately following sibling element, provided it is an <rhs> element, and applies the appropriate template rule. Actually, I think the <lhs> element is always followed immediately by an <rhs> element, so this could have been written rather more straightforwardly as:

  <xsl:apply-templates select="following-sibling::rhs[1]"/>

As I mentioned before, I would normally write the predicate as «[self::rhs] » rather than «[name()='rhs'] » to avoid namespace problems, and more particularly, to allow the optimizer to use indexes if it can.

As we will see, this <xsl:apply-templates> causes the other two cells to be added to the table row.

So let's look at the template rule for the <rhs> element. There are two cases to consider here: if the <rhs> immediately follows an <lhs> element, then it will appear in the same table row as the <lhs> element, but in all other cases, it will appear in a new row of its own, preceded by three empty table cells. I would probably have chosen to handle these two cases in two separate template rules, distinguishing the first case using a match pattern such as «match="rhs[preceding-sibling::*[1] [self::lhs]]" » , but the writer of this stylesheet chose to handle both cases in a single rule, like this:

  <!-- rhs: right-hand side of a formal production -->   <!-- make a table cell; if it's not the first after an LHS, make a   new row, too -->   <xsl:template match="rhs">   <xsl:choose>   <xsl:when test="preceding-sibling::*[1][name()='lhs']">   <td>   <xsl:if test="ancestor-or-self::*/@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="ancestor-or-self::*/@diff"/>   </xsl:attribute>   </xsl:if>   <code><xsl:apply-templates/></code>   </td>   <xsl:apply-templates   select="following-sibling::*[1][name()='com' or   name()='constraint' or   name()='vc' or   name()='wfc']"/>   </xsl:when>   <xsl:otherwise>   <tr valign="baseline">   <td/><td/><td/>   <td>   <xsl:if test="ancestor-or-self::*/@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="ancestor-or-self::*/@diff"/>   </xsl:attribute>   </xsl:if>   <code><xsl:apply-templates/></code>   </td>   <xsl:apply-templates   select="following-sibling::*[1][name()='com' or   name()='constraint' or   name()='vc' or   name()='wfc']"/>   </tr>   </xsl:otherwise>   </xsl:choose>   </xsl:template>

Once again, the code is cluttered by the <xsl:if> instructions that generate change highlighting when required. It also contains a lot of repetition between the two branches of the <xsl:choose> .

What the code does is this:

If the <rhs> is to appear on the same row as the <lhs> , it outputs a table cell ( <td> element), colored to reflect any change markings necessary, whose contents are produced by calling <xsl:apply-templates> to process the children of the <rhs> element. It then calls <xsl:apply-templates> to process the following sibling <vc> , <wfc> , <constraint> , or <com> element if there is one.
If the <rhs> is to appear on a new row, it creates a new table row ( <tr> element), and within this row it first outputs three blank table cells ( <td> elements). It then outputs a table cell representing the <rhs> element itself, and calls <xsl:apply-templates> to process the following sibling element, as in the previous case.

Some people prefer to avoid empty table cells by writing «<td> </td> » , but that's really necessary only if the table has borders or a background color .

Finally, the last column contains the representation of a <vc> , <wfc> , <constraint> , or <com> element if there is one. The rules for these elements are all very similar, and I will show only one of them. The structure is very similar to that for the <rhs> element:

  <!-- vc: validity check reference in a formal production -->   <xsl:template match="vc">   <xsl:choose>   <xsl:when test="preceding-sibling::*[1][name()='rhs']">   <td>   <xsl:if test="@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="@diff"/>   </xsl:attribute>   </xsl:if>   <a>   <xsl:attribute name="href">   <xsl:call-template name="href.target">   <xsl:with-param name="target" select="key('ids', @def)"/>   </xsl:call-template>   </xsl:attribute>   <xsl:text>[VC: </xsl:text>   <xsl:apply-templates select="key('ids', @def)/head" mode="text"/>   <xsl:text>]</xsl:text>   </a>   </td>   </xsl:when>   <xsl:otherwise>   <tr valign="baseline">   <td/><td/><td/><td/>   <td>   <xsl:if test="@diff and $show.diff.markup != 0">   <xsl:attribute name="class">   <xsl:text>diff-</xsl:text>   <xsl:value-of select="@diff"/>   </xsl:attribute>   </xsl:if>   <a>   <xsl:attribute name="href">   <xsl:call-template name="href.target">   <xsl:with-param name="target" select="key('ids', @def)"/>   </xsl:call-template>   </xsl:attribute>   <xsl:text>[VC: </xsl:text>   <xsl:apply-templates select="key('ids', @def)/head" mode="text"/>   <xsl:text>]</xsl:text>   </a>   </td>   </tr>   </xsl:otherwise>   </xsl:choose>   </xsl:template>

After studying the previous rule, the basic structure should be familiar. But there is some extra code included in this rule, because the <vc> element is represented as a hyperlink to the description of a validity constraint held outside the table itself. The link is represented in the XML by a def attribute, and this is used directly to construct the HTML internal hyperlink. The displayed text of the link is formed by retrieving the element whose ID is equal to this def attribute, and displaying its text.

So much for formatting the production rules! This is by far the most complicated part of this stylesheet; the rest should be plain sailing. But before we move on, we should ask whether all this logic could have been written in a more straightforward way in XSLT 2.0.

I see this problem as an example of a positional grouping problem. Grouping problems are all concerned with turning a one-dimensional sequence of elements into a hierarchy, and the problem of arranging data in a table can often be understood as a grouping problem in which the hierarchic levels are the table, the rows, and the individual cells.

All grouping problems can be solved by answering two questions:

How do you identify an element that can be used to represent the group as a whole (usually the first element of the group)?
How do you then identify the remaining members of the same group?

We already have answers to these questions in the existing stylesheet: the group is a row of the table, and we have an XPath expression that selects elements that will be the first in a new row. The other elements in the row are then the following siblings, up to the next element that's a "new row" element.

So here is my XSLT 2.0 solution to this problem. First, in the two template rules for «match="prod" » and «match="prodgroup/prod" » , we'll replace the complicated <xsl:apply-templates> instruction with a simple call on the named template «show.prod » , with no parameters. This template looks like this:

  <xsl:template name="show.prod">   <xsl:for-each-group select="*" group-starting-with="   lhs   rhs[preceding-sibling::*[1][not(self::lhs)]]   com[preceding-sibling::*[1][not(self::rhs)]]   constraint[preceding-sibling::*[1][not(self::rhs)]]   vc[preceding-sibling::*[1][not(self::rhs)]   wfc[preceding-sibling::*[1][not(self::rhs)]]">   <tr valign="baseline">   <xsl:apply-templates select="." mode="padding"/>   <xsl:apply-templates select="current-group()"/>   </tr>   </xsl:for-each-group>   </xsl:template>

Now, we define a set of simple template rules to produce the empty cells in each row, depending on the type of the first element in the row:

  <xsl:template match="lhs" mode="padding"/>   <xsl:template match="rhs" mode="padding">   <td/><td/><td/>   </xsl:template>   <xsl:template match="comconstraintvcwfc" mode="padding">   <td/><td/><td/><td/>   </xsl:template>

And finally we provide one template rule for each kind of element, which simply outputs the content of the appropriate cells in the table. There is no longer any need for it to worry about what comes afterwards: that's taken care of by the iteration in the master «show.prod » template.

  <xsl:template match="lhs">   <td>   <xsl:call-template name="show.diff"/>   <xsl:if test="../@id">   <a name="{../@id}" id="{../@id}"/>   </xsl:if>   <xsl:number count="prod" level="any" from="spec" format="[1]"/>   <xsl:text>&#xa0;&#xa0;&#xa0;</xsl:text>   </td>   <td>   <xsl:call-template name="show.diff"/>   <code><xsl:apply-templates/></code>   </td>   <td>   <xsl:call-template name="show.diff"/>   <xsl:text>&#xa0;&#xa0;&#xa0;::=&#xa0;&#xa0;&#xa0;</xsl:text>   </td>   </xsl:template>   <xsl:template match="rhs">   <td>   <xsl:call-template name="show.diff"/>   <code><xsl:apply-templates/></code>   </td>   </xsl:template>   <xsl:template match="vc">   <td>   <xsl:call-template name="show.diff"/>   <a>   <xsl:attribute name="href">   <xsl:call-template name="href.target">   <xsl:with-param name="target" select="key('ids', @def)"/>   </xsl:call-template>   </xsl:attribute>   <xsl:text>[VC: </xsl:text>   <xsl:apply-templates select="key('ids', @def)/head" mode="text"/>   <xsl:text>]</xsl:text>   </a>   </td>   </xsl:template>

As before, I left out the logic for <wfc> , <com> , and <constraint> elements, to avoid repetition. But I think you'll agree that the <xsl:for-each-group> instruction, while still requiring some thought, makes this tricky problem a lot easier to tackle than it was in XSLT 1.0.

For completeness, here is the «show.diff » template:

  <xsl:template name="show.diff" as="attribute()?">   <xsl:if test="ancestor-or-self::*/@diff and $show.diff.markup != 0">   <xsl:attribute name="class"   select="concat('diff-', ancestor-or-self::*/@diff)"/>   </xsl:if>   </xsl:template>