Displaying the Family Tree Data | NetBeansв„ў IDE Field Guide: Developing Desktop, Web, Enterprise, and Mobile Applications (2nd Edition)

What we want to do now is to write a stylesheet that displays the data in a GEDCOM file in HTML format. We want the display to look something like the following screenshot (see Figure 11-3).

Figure 11-3

This shows all the details of one individual, with links to related individuals so that you can browse around the family tree. Of course one could attempt many more ambitious ways of displaying this data, and I would encourage you to do so: you can start with the small Kennedy data set included in the download for this book, and then continue with any other GEDCOM data set, perhaps one of your own family tree.

Since we will have one HTML page for each individual in the file, we have to think about how to create multiple HTML pages from a single XML input document. There are at least three ways of doing this:

A bulk publishing process, in which you convert the XML input document into a set of HTML pages, and then publish these as static pages on the web server. This has the benefit that you only incur the cost of transformation once. It minimizes your dependence on the facilities available from your Internet Service Provider, and it will work with any browser. However, it can take a lot of space on the server, and can take a long time to upload if you have a slow connection.
Generating HTML pages on demand in the server, using Java servlets or ASP pages. Again this will work with any browser, but this time you need to find an Internet Service Provider who allows you to run servlets or ASP pages.
Downloading the entire XML file to the client, and generating the display there. This has the advantage that the data is only downloaded once, and the user can then browse it at leisure, with no further interaction with the server.

Unfortunately , at the time of writing the two major browsers (Netscape and Internet Explorer) both support XSLT 1.0 transformations, but neither yet supports XSLT 2.0. To get around this problem, I use a fallback stylesheet for this case that uses XSLT 1.0 only.

Another disadvantage is security; you have no way of filtering the data, for example to remove details of living persons, and you have no way to stop your entire XML file being copied by the user (for example, the user can View Source, or can poke around in the browser cache).

The only real difference between the three cases, as far as the stylesheet is concerned , is that the hyperlinks will be generated differently.

We'll handle the differences by writing a generic stylesheet module containing all the common code for the three cases, and then importing this into stylesheets that handle the variations. But we'll start by writing a stylesheet that displays one individual on one HTML page, and then we'll worry about the hyperlinks later.

The Stylesheet

We're ready to write a stylesheet, person.xsl that generates an HTML page showing the information relevant to a particular individual. This stylesheet will need to accept the Id of the required individual as a stylesheet parameter. If no value is supplied, we'll choose the first <INDI> record in the file. Here's how it starts:

  <xsl:transform   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns:xs="http://www.w3.org/2001/XMLSchema"   xmlns:ged="http://www.wrox.com/569090/gedcom"   xmlns="http://www.w3.org/1999/xhtml"   exclude-result-prefixes="xs ged"   version="2.0" >   <!-- import the schema for the GEDCOM 6.0 vocabulary -->   <xsl:import-schema namespace="" schema-location="gedSchema.xsd"/>   <!-- import the schema for the target XHTML vocabulary -->   <xsl:import-schema namespace="http://www.w3.org/1999/xhtml"   schema-locations"http://www.w3.org/2002/08/xhtml/xhtml1-transitional.xsd"/>   <xsl:output method="xhtml" indent="yes" encoding="iso-8859-1"/>   <!-- parameter to supply the Id of the person to be displayed.   Default value is the Id of the first person in the data set -->   <xsl:param name="id" select="/*/IndividualRec[1]/@Id" as="xs:string"/>

The stylesheet defines four namespaces: the XSLT namespace, the schema namespace, a local namespace which is used only for the functions defined in this stylesheet module, and the XHTML namespace for the result tree. The schema and ged namespaces aren't needed in the output file, so the exclude-result-prefixes attribute is set to prevent them appearing.

I've chosen to generate the output in XHTML, so I've specified «method="xhtml" » in the <xsl:output> declaration, and I've imported the XHTML schema. This means that any attempt to generate incorrect XHTML can be reported immediately, as the stylesheet is running, and the offending instruction in the stylesheet can be pinpointed. I decided to use the transitional XHTML schema rather than the strict version of the schema, frankly out of laziness : the strict version is very strict indeed, and extra work would be needed on this stylesheet to make its output conform.

There's now a fair bit of preamble before we do any useful work. This is all designed to make the subsequent processing easier and faster. First we define some keys:

  <!-- keys to allow records to be found by their Id -->   <xsl:key name="indi" match="IndividualRec" use="@Id"/>   <xsl:key name="fam" match="FamilyRec" use="@Id"/>   <!-- a key that locates the family record for a given child -->   <xsl:key name="family-of-child" match="FamilyRec" use="Child/Link/@Ref"/>   <!-- a key that locates the family records for a given parent -->   <xsl:key name="families-of-parent" match="FamilyRec"   use="element(*,ParentType)/Link/@Ref"/>   <!-- a key to allow events to be found for a given individual -->   <xsl:key name="events-for-person" match="EventRec" use="Participant/Link/   @Ref"/>

The main purpose of the keys is to make navigation around the structure faster. For a data model like GEDCOM, with many cross-references from one record to another, this can make a big difference. The first two keys allow records to be found given their unique identifiers (they are indexed on their Id attributes). The other three keys are there essentially to follow inverse relationships: a family contains links to the children in the family, and the first key enables us quickly to find the family with a link to a given child (in our data there will never be more than one, though GEDCOM allows it: for example a child may be linked both to her birth parents and to her adoptive parents).

Having defined these keys, we now define some functions to make it easier to navigate around the data.

  <!-- a function to get all the events for a given individual -->   <xsl:function name="ged:events-for-person" as="element(EventRec)*''>   <xsl:param name="person" as="element(IndividualRec)"/>   <xsl:sequence select="$person/key('events-for-person', $person/@Id)''/>   </xsl:function>   <!-- a function to get the families in which a given individual is a spouse   -->   <xsl:function name="ged:families-of-spouse" as="element(FamilyRec)*">   <xsl:param name="person" as="element(IndividualRec)"/>   <xsl:sequence select="$person/key('families-of-parent', $person/@Id)"/>   </xsl:function>   <!-- a function to get all the events for a couple -->   <xsl:function name="ged:events-for-couple" as="element(EventRec)*">   <xsl:param name="couple" as="element(FamilyRec)"/>   <xsl:sequence   select="if ($couple/HusbFath and $couple/WifeMoth)   then (ged:events-for-person(   $couple/key('indi', $couple/HusbFath/Link/@Ref]))   intersect   ged:events-for-person(   $couple/key('indi', $couple/WifeMoth/Link/&Ref)))   else ()"/>   </xsl:function>

This checks that the family record does indeed identify a couple (both parents are present), and then finds all the events in which both parties participate-note the use of the intersect operator to find the nodes that are present in two given node-sets .

  <!-- function to get the marriage event for a couple -->   <xsl:function name="ged:marriage-event" as="element(EventRec)?">   <xsl:param name="couple" as="element(FamilyRec)"/>   <xsl:variable name="marriage-vitals"   as="element(EventRec)*"   select="ged:events-for-couple($couple)[@VitalType='marriage']"   />   <xsl:variable name="marriage"   as="element(EventRec)*"   select="$marriage-vitals[@Type='marriage']"/>   <xsl:sequence   select="if ($marriage)   then $marriage[1]   else if ($marriage-vitals)   then $marriage-vitals[1]   else ()"/>   </xsl:function>

This function is trying to accommodate some of the variety possible in the model. It first finds all the events for the couple (using the previously-defined function) that have the VitalType attribute set to «marriage » : this will include events such as engagement or (in older times) the granting of a marriage license. Then it selects the subset of these that are actually «marriage » events. There may still be more than one marriage event for the same couple (these might be different records of the same event, or there may actually have been more than one event, for example a civil marriage and a religious ceremony). So we choose the first marriage event if there is one, or the first event whose VitalType is «marriage » if not.

  <!-- function to get the birth date of an individual -->   <xsl:function name="ged:birth-date" as="element(*,DateType)">   <xsl:param name="person" as="element(IndividualRec)"/>   <xsl:variable name="birth-events"   select="ged:events-for-person($person)[@Type='birth']"/>   <xsl:sequence   select="if (exists($birth-events/Date))   then ($birth-events/Date)[1]   else (ged:events-for-person($person)[@VitalType='birth']/Date)[1]   "/>   </xsl:function>

This function uses similar logic, finding the actual birth event if it exists, or the first event with a VitalType of «birth » otherwise .

  <!-- function to get the estimated marriage date for a couple -->   <xsl:function name="ged:estimated-marriage-date" as="element(*,DateType)">   <xsl:param name="couple" as="element(FamilyRec)"/>   <xsl:for-each select="$couple">   <xsl:variable name="marriage-vitals"   select="if (HusbFath and WifeMoth)   then (ged:events-for-person(key('indi', HusbFath/Link/@Ref))   intersect   ged:events-for-person(key('indi', WifeMoth/Link/@Ref)))   [@VitalType='marriage']   else ()"/>   <xsl:variable name=       "marriage"   as=         "element(EventRec)*"   select=     "$marriage-vitals[@Type='marriage']"/>   <xsl:variable name=       "marriage-date"   as=         "element(*, DateType)?"   select=     "if ($marriage/Date)   then ($marriage/Date)[1]   else if ($marriage-vitals/Date)   then ($marriage-vitals/Date)[1]   else ()"/>   <xsl:choose>   <xsl:when test="$marriage-date">   <xsl:sequence select="$marriage-date"/>   </xsl:when>   <xsl:otherwise>   <xsl:variable name=   "childbirth-dates"   as=     "element(*, DateType)*"   select= "Child/Link/@Ref/key('indi',.)/ged:birth-date(.)"/>   <xsl:for-each select="$childbirth-dates">   <xsl:sort select="ged:date-sort-key(.)"/>   <xsl:if test="position() eq 1"><xsl:sequence select="."/></xsl:if>   </xsl:for-each>   </xsl:otherwise>   </xsl:choose>   </xsl:for-each>   </xsl:function>

This function attempts to determine when a couple (identified by a FamilyRec ) were married. This is done solely so that an individual's partners can be listed in the right order, so the date does not have to be precise. The logic looks complicated, but all it is does is that it finds a dated marriage event if it can, and if it can't, it returns the date of the birth of the oldest child. The call on ged:date-sort-key() is a forwards reference to a function that we'll see later.

The body of the function is wrapped in an <xsl:for-each> instruction in order to set the context node to the value of the supplied $couple argument. It's a matter of personal style, but I find it much more convenient to write path expressions that assume the existence of a context node than to precede every path expression with «$couple/ » .

The next three functions are concerned with date formatting. First a function to convert a date from GEDCOM format into ISO format:

  <!-- function to convert a standard GEDCOM date (DD MMM YY) to an xs:date -->   <xsl:function name="ged:date-to-ISO" as="xs:date">   <xsl:param name="date" as="StandardDate"/>   <xsl:variable name="iso-date">   <xsl:analyze-string select="$date"   regex="\s*([0-9]+)\s+([A-Z]+)\s+([0-9]+)\s*$">   <xsl:matching-substring>   <xsl:number value="regex-group(3)" format="0001"/>   <xsl:text>-</xsl:text>   <xsl:number value="index-of(('JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN',   'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'),   regex-group(2))" format="01"/>   <xsl:text>-</xsl:text>   <xsl:number value="regex-group(1)" format="01"/>   </xsl:matching-substring>   </xsl:analyze-string>   </xsl:variable>   <xsl:sequence select="xs:date($iso-date)"/>   </xsl:function>

This function only works on a date in standard GEDCOM format «DD MMM YYYY » . If you pass it a date in an extended form, such as «BEF 1870 » , the stylesheet will fail with a type error.

I've not attempted here to handle the problems of non-Gregorian calendars (which arise all the time with genealogical data). If the GEDCOM date represents a date in the Julian (or Old Style ) calendar, then in theory it ought to be shifted by ten or eleven days when converting it to an ISO date, because ISO dates are supposed always to be Gregorian.

  <!-- function to format a standard GEDCOM date for display -->   <xsl:function name="ged:format-date" as="xs:string">   <xsl:param name="date" as="StandardDate"/>   <xsl:sequence select="format-date(ged:date-to-ISO($date), '[D] [MNn] [Y]')"/>   </xsl:function>

To format a date into the form «2 January 1931 » , we first convert the date to standard ISO representation (the xs:date type), and then call XSLT's format-date() function.

  <!-- function to get a sort key for GEDCOM dates -->   <xsl:function name="ged:date-sort-key" as="xs:string">   <xsl:param name="date" as="element(*, DateType)"/>   <xsl:sequence select="   if (data($date) instance of StandardDate)   then string(ged:date-to-ISO($date))   else substring($date, string-length($date)-4)   "/>   </xsl:function>

When we sort on dates, we ideally want to be able to sort standard dates such as «2 JAN 1931 » chronologically, but we also want to be able to fit non-standard dates such as «BEF 1870 » into the sequence as best we can. To achieve this, I've chosen a sort key that uses the ISO conversion of the date in the case of standard dates (for example, «1931-01-02 » , and that uses the last four characters otherwise.

Sometimes we just want to display the year:

  <!-- function to get the year from a GEDCOM date -->   <xsl:function name="ged:get-year" as="xs:string">   <xsl:param name="date" as="element(*, DateType)"/>   <xsl:sequence select="substring($date, string-length($date)-4)"/>   </xsl:function>

Finally, there's one more function we will be using, which converts a string so that the initial letter is a capital:

  <!-- a function to capitalize the initial letter of a string -->   <xsl:function name="ged:initial-cap" as="xs:string">   <xsl:param name="input" as="xs:string"/>   <xsl:sequence select="concat(upper-case(substring($input, 1, 1)),   substring($input, 2))"/>   </xsl:function>

And that's the end of the preliminaries . Now we can get on with some actual template rules.

  <xsl:template match="/">   <xsl:if test="not(/* instance of schema-element(GEDCOM))">   <xsl:message terminate="yes">Input document is not a validated GEDCOM 6.0   file</xsl:message>   </xsl:if>   <xsl:result-document validation="strict">   <xsl:variable name="person" select="key('indi', $id)"/>   <xsl:apply-templates select="$person"/>   </xsl:result-document>   </xsl:template>

The root template rule starts by testing to see if the outermost element of the source document is a GEDCOM element. It doesn't just test the name of the element: the sequence type descriptor «schema-element (GEDCOM) » also checks that the type annotation is appropriate. If the user supplies a source document that hasn't been validated, then this test will fail, even if the document is actually valid; and the stylesheet will proceed no further. If this check weren't present here, some strange and difficult-to-diagnose failures could occur later on, because we are relying on the type annotations being present in the input data.

The entire transformation is then wrapped inside an <xsl:result-document> instruction. This instruction is usually used only when producing multiple result trees, but in this case we're using it for the primary result tree, in order to request validation. It's not actually specifying what the type of the result document must be, only that it must be what it says it is: «validation="strict" » will cause a failure if the outermost element in the result tree isn't defined in some imported schema, or if the result tree isn't valid against that definition. In this case the intent is to check that the result is valid XHTML.

The outline of the HTML page is produced when we process the selected <IndividualRec> element, as one might expect:

  <xsl:template match="IndividualRec">   <html>   <head>   <xsl:call-template name="css-style"/>   <xsl:variable name="name">   <xsl:apply-templates select="IndivName[1]"/>   </xsl:variable>   <title><xsl:value-of select="$name"/></title>   </head>   <body bgcolor="{if (Gender='M') then 'cyan' else 'pink'}>   <!-- Show name and parentage -->   <h1><xsl:apply-templates select="IndivName[1]"/></h1>   <xsl:if test="IndivName[2]">   <p>   <span class="label">Also known as: </span>   <xsl:for-each select="IndivName[position() ge 2]">   <xsl:apply-templates select="."/>   <xsl:if test="position() ne last()">, </xsl:if>   </xsl:for-each>   </p>   </xsl:if>   <xsl:call-template name="show-parents"/>   <hr/>   <table>   <tr>   <!-- Show events and attributes -->   <td width="50%" valign="top">   <xsl:call-template name="show-events"/>   </td>   <td width="20%"/>   <!-- Show children -->   <td width="30%" valign="top">   <xsl:call-template name="show-partners"/>   </td>   </tr>   </table>   <hr/>   <!-- Show notes -->   <xsl:for-each select="Note">   <p class="text"><xsl:apply-templates mode="note"/></p>   <xsl:if test="position() eq last()"><hr/></xsl:if>   </xsl:for-each>   </body>   </html>   </xsl:template>

This template rule works through the process of generating the output page. Some observations:

The title in the HTML header is generated by first creating a variable, and then copying the value of the variable to the <title> element. This is deliberate , it takes advantage of the standard template rules for generating a personal name, but the <xsl:value-of> instruction then removes the tags such as <b> that appear in the generated name, because these clutter the displayed title in some browsers.
The background color of the page depends on the value of the person's Gender attribute. You might consider this to be an aesthetic abomination, in which case you are welcome to change it, but I left it in because it illustrates another XSLT technique. A more technical criticism is that strict XHTML doesn't allow the <body> element to have a bgcolor attribute: this will be reported as an error if you try to import the strict XHTML schema instead of the transitional one.
The main task of generating the content of the page is split up and delegated to separate named templates, simply for reasons of modularity.
There is no attempt to display all the data that GEDCOM allows to be included in, or referenced from, an <INDI> record, for example citations of sources, multimedia objects such as photographs, etc. If such data is present it will simply be skipped .

I've chosen to use an internal CSS stylesheet to define font sizes and the like, and the task of generating this is delegated to the template named css-style . This generates fixed output, as follows :

  <xsl:template name="css-style">   <style type="text/css">   H1 {   font-family: Verdana, Helvetica, sans-serif;   font-size: 18pt;   font-weight: bold;   color: "#FF0080"   }   H2 {   font-family: Verdana, Helvetica, sans-serif;   font-size: 14pt;   font-weight: bold;   color: black;   }   H3 {   font-family: Lucida Sans, Helvetica, sans-serif;   font-size: 11pt;   font-weight: bold;   color: black;   }   SPAN.label {   font-family: Lucida Sans, Helvetica, sans-serif;   font-size: 10pt;   font-weight: normal;   font-style; italic;   color: black;   }   P,LI,TD {   font-family: Lucida Sans, Helvetica, sans-serif;   font-size: 10pt;   font-weight: normal;   color: black;   }   P.text {   font-family: Comic Sans MS, Helvetica, sans-serif;   font-size: 10pt;   font-weight: normal;   color: black;   }   </style>   </xsl:template>

It would have been quite possible, of course, to attach these attributes to the various HTML elements individually, or to incorporate them using XSLT attribute sets, but this way seems cleaner, and shows how XSLT and CSS can complement each other. In fact, it might have been even better to use an external CSS stylesheet, since a user displaying many of these HTML pages would then get more benefit from caching.

The next template displays the parents of the current individual, as hyperlinks:

  <xsl:template name="show-parents">   <xsl:variable name=   "parental-family"   as=     "element(FamilyRec)?"   select= "key('family-of-child', @Id)[1]"/>   <xsl:variable name=   "father"   as=     "element(IndividualRec)?"   select= "key('indi', $parental-family/HusbFath/Link/@Ref)"/>   <xsl:variable name=   "mother"   as=     "element(IndividualRec)?"   select= "key('indi', $parental-family/WifeMoth/Link/@Ref)"    />    <p>   <xsl:if test="$father">   <span class="label">Father: </span>   <xsl:apply-templates select="$father/IndivName" mode="link"/>&#xa0;   </xsl:if>   <xsl:if test="$mother">   <span class="label">Mother: </span>   <xsl:apply-templates select="$mother/IndivName" mode="link"/>&#xa0;   </xsl:if>   </p>   </xsl:template>

The template starts by locating the <FamilyRec> element in which this person appears as a child. It does this using the «family-of-child » key defined earlier. Then it selects the <IndividualRec> records for the father and mother, these being the records pointed to by the <HusbFath> and <WifeMoth> fields of the <FamilyRec> record: this time the «indi » key is used.

If the data is not all present, for example if there is no <FamilyRec> element, or if the <FamilyRec> is missing a <HusbFath> and <WifeMoth> (no pedigree goes back to infinity), then the «$father » and or «$mother » variables will simply identify an empty sequence. The subsequent <xsl:if> instructions ensure that when this happens, the relevant label is omitted from the output.

The actual hyperlinks are generated by using <xsl:apply-templates> with «mode="link" » : this gets reused for all the other links on the page, and we'll see later how it works. The «  » character reference outputs a non-breaking space. It's actually simpler to do this than to output an ordinary space, which would require an <xsl:text> element. If you don't like numeric character references you can define an entity called «nbsp » in the <!DOCTYPE> declaration and then use «  » in place of «  » .

The next named template is used to display the list of events for an individual, such as birth, marriage and death.

  <!-- Show the events for an individual -->   <xsl:template name="show-events">   <xsl:variable name=   "subject"   as=     "element(IndividualRec)"   select= "."/>   <xsl:for-each select="ged:events-for-person(.)">   <xsl:sort select="ged:date-sort-key(Date)"    />    <h3><xsl:value-of select="ged:initial-cap(@Type)"/></h3>   <p>   <xsl:for-each select="Participant[Link/@Ref ne $subject/@Id]">   <span class="label"><xsl:value-of select="ged:initial-   cap(Role)"/>:   </span>   <xsl:apply-templates select="Link/@Ref/key('indi',.)/IndivName[1]"   mode="link"/>   <br/>   </xsl:for-each>   <xsl:if test="Date">   <span class="label">Date: </span><xsl:apply-templates   select="Date"/><br/>   </xsl:if>   <xsl:if test="Place">   <span class="label">Place: </span><xsl:apply-templates   select="Place"/><br/>   </xsl:if>   </p>   <xsl:for-each select="Note">   <p class="text"><xsl:apply-templates mode="note"/></p>   </xsl:for-each>   </xsl:for-each>   </xsl:template>

The events are located using the ged:events-for-person() function, and they are presented in an attempt at date order, achieved by calling the ged:date-sort-key() function that we saw earlier.

For each event the template displays the name of the event (in title case, for example «Birth » ), the list of participants other than the subject of this page, the date and place of the event, and any notes recorded about the event. In each case this is done by applying the appropriate template rules.

The only part of the HTML display that remains is the right-hand panel, where we show information about a person's partner(s) and children. If multiple partners are recorded for an individual, we use headings such as "Partner 1", "Partner 2"; if there is only one, we omit the number.

The template looks like this:

  <xsl:template name="show-partners">   <xsl:variable name=   "subject"   as=     "element(IndividualRec)"   select= "."/>   <xsl:variable name=   "partnerships"   as=     "element(FamilyRec)*"   select= "ged:families-of-spouse(.)"/>   <xsl:for-each select="$partnerships">   <xsl:sort select="ged:date-sort-key(ged:estimated-marriage-date(.))"/>   <xsl:variable name=   "partner"   as=     "element(IndividualRec)?"   select= "key('indi', element(*, ParentType)/Link/@Ref)   except $subject"/>   <xsl:variable name=   "partner-seq"   as=     "xs:integer?"   select= "if(count($partnerships) eq 1)   then ()   else position()"/>   <xsl:if test="$partner">   <h2>Partner <xsl:value-of select="$partner-seq"/></h2>   <p><xsl:apply-templates select="$partner/IndivName[1]" mode="link"/></p>   </xsl:if>   <xsl:if test="Child">   <h3>Children:</h3>   <p>   <xsl:for-each select="Child">   <xsl:sort select="ChildNbr"/>   <xsl:sort select="ged:date-sort-key(Link/@Ref/key('indi',.)/ged:   birth-date(.))"/>   <xsl:variable name=   "child"   as=     "element(IndividualRec)"   select= "Link/@Ref/key('indi',.)"/>   <xsl:value-of select="ged:get-year(ged:birth-date($child))"/>   <xsl:text> </xsl:text>   <xsl:apply-templates select="$child/IndivName[1]" mode="link"/><br/>   </xsl:for-each>   </p>   </xsl:if>   </xsl:for-each>   </xsl:template>

As before, we try to list the partners in chronological order, based on the year of marriage. If this isn't known, there's not much we can do about it (I could have tried to use the <FamilyNbr> field, but it's not present in the data we are using). For each partnership, we list the partner's name, as a hyperlink, and then the children's names , again as hyperlinks. The children are found from the <Child> fields of the <FamilyRec> record, and are listed in order of year of birth where this is known.

The next group of template rules is used to create the HTML hyperlinks:

  <xsl:template match="IndivName" mode="link">   <a>   <xsl:attribute name="href">   <xsl:call-template name="make-href"/>   </xsl:attribute>   <xsl:apply-templates/>   </a>   </xsl:template>   <xsl:template match="NamePart[@Type='surname']">   <xsl:text> </xsl:text>   <span class="surname"><xsl:apply-templates/></span>   <xsl:text> </xsl:text>   </xsl:template>   <xsl:template name="make-href">   <xsl:value-of select="concat(../@Id, '.html')"/>   </xsl:template>

The «make-href » template is the only place where the form of a link is defined: in this case it consists of a relative URL reference to another HTML file, with a filename based on the individual's Id attribute, for example I27.html . This has been very deliberately isolated into a template all of its own, for reasons that will become clear later.

The stylesheet ends with the template rules for formatting dates, places, and notes:

  <xsl:template match="PlaceName[PlacePart]">   <xsl:variable name="sorted-parts" as="element()*">   <xsl:perform-sort select="PlacePart">   <xsl:sort select="@Level" order="descending"/>   </xsl:perform-sort>   </xsl:variable>   <xsl:value-of select="$sorted-parts" separator=", "/>   </xsl:template>

The above rule sorts the parts of a date by the value of their Level attribute, and then outputs them in a comma-separated list. Note that we no longer need to specify that this is a numeric sort, the system can work this out from the schema.

  <xsl:template match="Date[data(.) instance of StandardDate]">   <xsl:value-of select="ged:format-date(data(.))"/>   </xsl:template>   <xsl:template match="Date">   <xsl:value-of select="."/>   </xsl:template>

The above two rules handle standard dates and non-standard dates respectively. We rely on the type annotation to distinguish the two cases. Note the call on «data(.) » : we want to test the type of the simple content of the <Date> element, not the type of the element itself. So we need to call the data() function to get the content.

The final rule, below, is for text nodes within a <Note> element. This uses the <xsl:analyze-string> instruction to replace newline characters by <br/> elements, so that the line endings are preserved in the browser's display.

  <xsl:template match="text()" mode="note">   <xsl:analyze-string select="." regex="\n">   <xsl:matching-substring>   <br/>   </xsl:matching-substring>   <xsl:non-matching-substring>   <xsl:value-of select="."/>   </xsl:non-matching-substring>   </xsl:analyze-string>   </xsl:template>   </xsl:transform>

Putting it Together

We've now got a stylesheet that can generate an HTML page for a single chosen individual. We don't yet have a working WEB site!

As I suggested earlier, there are three ways you can work. You can do a batch conversion of the entire data file into a collection of linked static HTML pages held on the web server, you can generate each page on demand from the server, or you can generate pages dynamically at the client. I'll show how to do all three; and in the second case, I'll describe two different implementations of the architecture, one using Java servlets and one using Microsoft ASP pages.

Publishing Static HTML

To generate HTML files for all the individuals in the data file, we need some kind of script that processes each individual in turn and produces a separate output file for each one. Here we can take advantage of the XSLT 2.0 capability to produce multiple output files from one input file. Many XSLT 1.0 products had a similar capability, but unfortunately each product used different syntax.

We'll need a new template for processing the root element, and because this must override the template defined in person.xsl , we'll need to use <xsl:import> to give the new template higher precedence.

Here is the complete stylesheet, publish.xsl , to do the bulk conversion. As well as generating an HTML page for each individual, it also creates an index page listing all the individuals grouped first by surname, then by the rest of the name.

  <xsl:transform   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   xmlns="http://www.w3.org/1999/xhtml"   version="2.0"   >   <xsl:import href="person.xsl"/>   <xsl:param name="dir" select="'.'"/>   <xsl:template match="/">   <xsl:for-each select="*/IndividualRec">   <xsl:result-document href="{$dir}/{@Id}.html" validation="strict">   <xsl:apply-templates select="."/>   </xsl:result-document>   </xsl:for-each>   <xsl:result-document href="{$dir}/index.html" validation="strict">   <xsl:call-template name="make-index"/>   </xsl:result-document>   </xsl:template>   <xsl:template name="make-index">   <html>   <head>   <title>Index of names</title>   </head>   <body>   <h1>Index of names</h1>   <xsl:for-each-group select="/*/IndividualRec/IndivName/NamePart[@Level=1]"   group-by=".">   <xsl:sort select="current-grouping-key()"/>   <h2><xsl:value-of select="current-grouping-key()"/></h2>   <xsl:for-each select="current-group()">   <p>   <xsl:apply-templates select="ancestor::IndividualRec/IndivName[1]"   mode="link"/>   </p>   </xsl:for-each>   </xsl:for-each-group>   </body>   </html>   </xsl:template>   </xsl:transform>

You can run this stylesheet using any XSLT 2.0 schema-aware processor. At the time of writing, the only processor that will run this stylesheet as written is the schema-aware version of Saxon, which you can obtain at http://www.saxonica.com/ .

You will also need to download the example files from the Wrox web site. Create a new directory, copy the stylesheets and the XML data file into it, make this the current directory, and then run the command:

  java com.saxonica.Transform -val -t -o index.html kennedy6.xml publish.xsl

This assumes that the source files are in the current directory. The -val option is necessary to ensure that the source file is validated against its schema; the -t option is useful because it shows you exactly where the generated output files have been written.

If you want to generate the HTML files in a different directory, you can specify this on the command line, for example:

  java ... dir=d:\jfk

The new directory should fill with HTML files. Double-click on the index.html file, and you should see an index of names. Click on any of the names to see the screen shown on page 714, in glorious color. Then browse the data by following the relationships.

Generating HTML Pages From a Servlet

An alternative to bulk-converting the XML data into static HTML pages is to generate each HTML page on request. This requires execution of a stylesheet on the server, which in principle can be controlled using ASP pages, Java servlets, or even raw CGI programs. However, as many of the available XSLT processors are written in Java, it turns out to be convenient to use servlets.

If you aren't familiar with servlet programming, it's probably best to skip this section, because there isn't space here to start from first principles. There are plenty of good books on the subject.

All the Java XSLT 1.0 processors (there are at least five) implement the JAXP API, which is described in Appendix D. This means you can write a servlet that works with any processor. Although the JAXP API currently only supports XSLT 1.0, there's very little difference at the API level between a 1.0 processor and a 2.0 processor, so you can use this API with minor tweaks to run an XSLT 2.0 processor such as Saxon version 8.

In fact, most of the XSLT processors come with some kind of packaged servlet interface, though it's often best to customize it to suit the particular requirements of the application. As there are a lot of variations depending on the environment you are working in, I won't try to give a complete working solution for this situation, but will just sketch out the design.

A particular feature of this application is that there are lots of requests to get data from the same source document, using the same stylesheet, but with different parameters. So ideally we want to hold both the source document and the stylesheet in memory on the server: we don't want to incur the overhead of parsing and validating the full XML document to display each individual.

We would like to accept incoming requests from the browser in the form:

  http://www.myserver.com/examples/servlet/GedServlet?tree=kennedy6&id=I1

The parameters included in the URL are firstly, the name of the data set to use (we'd like the server to be able to handle several concurrently), and secondly, the identifier of the individual to display.

Important

When the above URL is included in an XML document, the «& » must be represented as «& » , Most HTML browsers will accept either «& » or «& » . But strictly , «& » is correct according to the HTML specification, and that is what our stylesheet will actually generate.

So the first thing that we need to do is to generate hyperlinks in this format. We can do this by writing a new stylesheet module that imports person.xsl and overrides the template that generated the hyperlinks. We'll call this ged-servlet.xsl .

The ged-servlet.xsl stylesheet module looks like this. It has an extra parameter, which is the name of the tree we are interested in, because the same servlet ought to be able to handle requests for data from different family trees. And it overrides the «make-href » template with one that generates hyperlinks in the required format:

  <xsl:transform   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   version="2.0" >   <xsl:import href="person.xsl"/>   <xsl:param name="tree"/>   <xsl:template name="make-href">   <xsl:value-of select="concat('/examples/servlet/GedServlet?tree=',   $tree, '&amp;id=', ../@Id)"/>   </xsl:template>   </xsl:transform>

If you configure the servlet in a different location from this, you will need to modify this stylesheet to use a different URL.

The stylesheet and the servlet interface could also be extended to generate an index of names, as in the previous example, but as that's a simple task I'll leave you to work that out for yourself.

More tricky is writing the servlet. The code below uses the JAXP interface with a minor extension to request the XSLT processor to perform validation of source documents.

  import java.io.*;   import javax.servlet.*;   import javax.servlet.http.*;   import java.util.Hashtable;   import org.w3c.dom.Document;   import javax.xml.parsers.*;   import javax.xml.transform.*;   import javax.xml.transform.stream.*;   import net.sf.saxon.FeatureKeys;   public class GedServlet extends HttpServlet {

The init() method of a servlet is called when the servlet is first initialized . In this method we set a couple of system properties. The first property ensures that the XSLT processor we use is the schema-aware version of Saxon. The second property selects Crimson as the XML parser (the tomcat servlet engine comes with its own built-in parser, which causes problems with this example). In a production environment, it would be appropriate to read the values of these system properties from the web.xml configuration file.

  public void init(javax.servlet.ServletConfig conf)   throws javax.servlet.ServletException {   super.init (conf);   System.setProperty("javax.xml.transform.TransformerFactory",   "com.saxonica.SchemaAwareTransformerFactory");   System.setProperty("javax.xml.parsers.SAXParserFactory",   "org.apache.crimson.jaxp.SAXParserFactoryImpl");   }

The service() method of the servlet responds to an individual request from a user browser. It sets a StreamSource to the file identified by the tree parameter in the URL. It looks in its local data to see if the compiled stylesheet is already there; if not, it creates it. It then creates a Transformer , sets a couple of stylesheet parameters, and calls the JAXP transform() method to run the transformation, sending the result to the servlet output destination (which of course causes the result to appear at the browser).

  /**   * Respond to an HTTP request   */   public void service(HttpServletRequest req, HttpServletResponse res)   throws ServletException, IOException   {   res.setContentType("text/html");   try {   String clear = req.getParameter("clear");   if (clear!=null && clear.equals("yes")) {   resetData();   }   String family = req.getParameter("tree");   Source source = new StreamSource(   new File(getServletContext().getRealPath(   "/"' + family + ".xml")));   Result result = new StreamResult(res.getOutputStream());   Templates style = getStyleSheet();   Transformer transformer = style.newTransformer();   transformer.setParameter("id", req.getParameter("id"));   transformer.setParameter("tree", family);   transformer.transform(source, result);   } catch (Exception err) {   PrintStream ps = new PrintStream(res.getOutputStream());   ps.println("Error applying stylesheet: " + err.getMessage());   }   }

When the stylesheet is first invoked, it is prepared and stored in memory as a Templates object. This method causes Saxon to validate the source document by setting the SCHEMA_VALIDATION property in the TransformerFactory: this attribute is specific to Saxon.

  /**   * Get the prepared stylesheet from memory; prepare it if necessary   */   private synchronized Templates getStyleSheet()   throws TransformerConfigurationException {   if (stylesheet == null) {   File sheet = new File(getServletContext().getRealPath(   "/ged-servlet.xsl"));   TransformerFactory factory = TransformerFactory.newInstance();   factory.setAttribute(FeatureKeys.SCHEMA_VALIDATION, Boolean.TRUE);   stylesheet = factory.newTemplates(new StreamSource(sheet));   }   return stylesheet;   }   /**   * Reset data held in memory   */   private synchronized void resetData() {   stylesheet = null;   }   private Templates stylesheet = null;   }

The XML file holding the family tree data must be in a file tree .xml where tree identifies the specific family tree, in our case kennedy6.xml . This must be in the home directory for the web application containing the servlet, as defined by the configuration parameters for your web server. The two stylesheet modules person.xsl and ged-servlet.xsl , and the schema gedSchema.xsd , must also be in this directory.

The servlet keeps in memory a copy of the compiled stylesheet (the JAXP Templates object): it makes this copy the first time it is needed.

It would also make sense to keep in memory a DOM Document object representing each family tree, but I haven't attempted to do that in this demonstration.

Installing and Configuring the Servlet

To run servlets you need to install a servlet container such as tomcat, available from www.apache.org . For production use, tomcat normally runs as an add-on to the Apache web server, but for testing purposes, it also has an HTTP server of its own built in. There's no space here to go into all the details of installing a servlet container like tomcat, but for quick reference, this section shows where I put the application files to get this example working.

Figure 11-4 shows the directory structure after installing Tomcat 4 (the details, of course, may vary).

Figure 11-4

Notice the four files in the examples directory: the two XSLT modules, the XML data file, and the XML Schema. Open up the examples directory, and we find the GedServlet.class file, representing the compiled servlet code (see Figure 11-5).

Figure 11-5

Finally (I won't show you this one), the WEB-INF/lib directory contains the JAR files for the XSLT processor.

To start tomcat up, double-click on the startup.bat file in the bin directory. This brings up an old-fashioned console that displays progress messages. Then, assuming you have defaulted everything in your configuration, open up your browser and enter the URL:

http://localhost:8080/examples/servlet/GedServlet?tree=kennedy6&id=I1

If things fail (and they probably will-servlets can be delicate animals) then you will probably see a summary message on the browser window, but the detailed diagnostics will be on the Tomcat console, or in its log files. Good luck!

Generating HTML using ASP Pages

If you work in a Microsoft environment, an alternative to writing Java servlets to perform server-side transformation is to control the process using an ASP page. In this case you have a choice: you can use Microsoft's MSXML3 parser and XSLT processor, or you can use the newer .NET processor. While MSXML3 is best known for its ability to do client-side transformations, it is equally effective as a server-side engine, and many people have reported that its performance is better than the .NET engine. Of course, this may change over time. MSXML3 uses COM interfaces so it can be called from an ASP page in the same way as any other COM object, while the .NET processor (in package System.Xml.Xsl ) fits more cleanly into the ASP.NET environment. The different APIs to control transformation using the two Microsoft processors are described in Appendix C.

At the time of writing, Microsoft doesn't have an XSLT 2.0 implementation. There have been reports of an XSLT 1.0 processor running on top of their XQuery engine, which could potentially evolve into an XSLT 2.0 processor in the fullness of time. But as this processor doesn't yet exist, I will leave the ASP version of this application as an exercise for the reader.

Generating HTML in the Browser

Finally, let's look at another way to display the family tree: namely, to download the whole XML file to the browser as a single chunk , and then use client-side scripts to invoke stylesheet processing whenever the user clicks on a hyperlink.

The problem with this approach is that at the time of writing, there is no XSLT 2.0 processor available in either Internet Explorer or Netscape: both support XSLT 1.0 client-side transformation, but not yet 2.0. Hopefully this situation will soon change, though there is always a drawback in running client-side applications because not all your users will be using the latest browser versions.

However, this book would not be complete if it didn't show you how to run transformations client-side, and for that purpose I have written an XSLT 1.0 version of the stylesheet.

My first attempt to do this was to produce the 1.0 version of the stylesheet as an overlay on the 2.0 version: that is, I wrote an XSLT 1.0 module in which every top-level declaration in the 2.0 stylesheet that contained constructs that would only run under XSLT 2.0 was replaced by a functionally equivalent 1.0 construct. My thinking was that the forwards compatibility rules in XSLT 1.0 would ensure that no errors were raised because of constructs in the unused part of the stylesheet. Unfortunately, it didn't prove possible to do this. To see why, look at the rule:

  <xsl:template match="Date[data(.) instance of StandardDate]">   <xsl:value-of select="ged:format-date(data(.))"/>   </xsl:template>

This uses XSLT 2.0 constructs (the data() function and the "instance of" operator) within the match pattern, and there is no way of overriding this with an XSLT 1.0 template rule in a way that an XSLT 1.0 processor will understand. So one would have to adopt a different strategy: move the shared components to a common module, and import this from modules containing the code that's specific to 1.0 and 2.0 respectively. I didn't want to distort my XSLT 2.0 code to this extent, so I simply copied the common code into the 1.0 module by cut-and-paste to create a freestanding XSLT 1.0 stylesheet, which is named person10.xsl . This stylesheet simply leaves out many of the more interesting aspects of the 2.0 version, for example dates are output as they appear in the GEDCOM data, and no attempt is made to sort children or spouses in chronological order.

The next thing we need to do is to adapt the stylesheet to run in the browser. To do this, we need to write an HTML page containing JavaScript to invoke the transformation.

This particular example runs in Internet Explorer 6.

If the XML file is large (family trees produced by serious genealogists often run to several megabytes) then this approach means the user is going to have to wait rather longer to see the first page of data. But the advantage is that once it's downloaded, browsing around the file can be done offline: there is no need to go back to the server to follow each link from one individual to another. This gives the user a lightning-fast response to navigation requests, and reduces the processing load and the number of hits on the server. Another benefit, given that many genealogists only have access to the limited web space provided by a commercial ISP, is that no special code needs to be installed on the server.

This time, the transformation is controlled from JavaScript code on an HTML page famtree.html . The page itself reads as follows. The <script> elements contain client-side JavaScript code.

  <html>   <head>   <title>Family Tree</title>   <style type="text/css">    ...  as before    ...   </style>   <script>   var source = null;   var style = null;   var transformer = null;   function init(){   source =   new ActiveXObject("MSXML2.DOMDocument");   source.async = false;   source.load('kennedy.xml');   style =   new ActiveXObject("MSXML2.FreeThreadedDOMDocument");   style.async = false;   style.load('ms-person.xsl');   transformer = new ActiveXObject("MSXML2.XSLTemplate");   transformer.stylesheet = style.documentElement;   refresh("I1");   }   function refresh(indi) {   var xslproc = transformer.createProcessor();   xslproc.Input = source;   xslproc.addParameter("id", indi, "");   xslproc.transform();   displayarea.innerHTML = xslproc.output;   }   </script>   <script for="window" event="onload">   init();   </script>   </head>   <body>   <div id="displayarea"></div>   </body>   </html>

The CSS style definitions have moved from the XSLT stylesheet to the HTML page, but they are otherwise unchanged.

The init() function on this page is called when the page is loaded. It creates two DOM objects, one for the source XML and one for the stylesheet, and loads these using the relative URLs kennedy.xml and ms-person.xsl . It then compiles the stylesheet into an object which is rather confusingly called an XSLTemplate ; this corresponds directly with the TrAX Templates object. Finally it calls the refresh() function to display the individual with identifier I1 .

I've taken a bit of a short cut here. There's no guarantee that a GEDCOM file will contain an individual with this identifier. A more carefully constructed application would display the first individual in the file, or an index of people.

The refresh() function creates an executable instance of the stylesheet by calling the createProcessor() method on the XSLTemplate object. It then sets the value of the global id parameter in the stylesheet, and applies the stylesheet to the source document by calling the transform() method. The HTML constructed by processing the stylesheet is then written to the contents of the <div id="displayarea"> element in the body of the HTML page.

We can use the same stylesheet as before, again with modifications to the form of the hyperlinks. This time we want a hyperlink to another individual, I2 say, to take the form:

  <a href="Javascript:refresh('I2')">Jaqueline Lee Bouvier</a>

When the user clicks on this hyperlink, the refresh() function is executed, which causes a new execution of the compiled stylesheet, against the same source document, but with a different value for the id parameter. The effect is that the contents of the page switches to display a different individual.

The ms-person.xsl stylesheet is written by importing the person10.xsl stylesheet presented earlier, and then overriding the aspects we want to change. This time there are two changes: we want to change - the form of the hyperlink, and we want to leave out the generation of the CSS style, because the necessary definitions are already present on the HTML page. Here is the stylesheet:

  <xsl:transform   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"   version="1.0"   >   <xsl:import href="person10.xsl"/>   <!-- Change the way hyperlinks are generated -->   <xsl:template name="make-href">   <xsl:variable name="apos">'</xsl:variable>   <xsl:value-of   select="concat('Javascript:refresh(', $apos, ../@Id, $apos, ')')"/>   </xsl:template>   <!-- Suppress the generation of a CSS stylesheet -->   <xsl:template name="css-style"/>   </xsl:transform>

One slight infelicity in the resulting stylesheet is that it generates a full HTML page, complete with <html> , <head> , and <body> elements, and then inserts this as the content of a <div> element within an existing HTML page. Fortunately Internet Explorer tolerates this abuse of the HTML specification rather well.

Unfortunately the script shown here works only with Internet Explorer and not with Netscape. If you want to write the application in a way that is portable between the two browsers, there is a library you can use to do this: Sarissa, from http://sarissa. sourceforge .net/ .