Appendix A. Case Studies

CONTENTS
  •  A.1 Lists
  •  A.2 MARC Records: The ATLAS Project from ATLA-CERTR at Emory University
  •  A.3 The Harvard-Kyoto Classics Project with Vedic Literature

This appendix provides examples that use many of the XPath and XSLT functions as well as the XSLT elements presented in this book. They are drawn directly, or otherwise derived, from real-world applications of XSLT. They have been chosen for their high likelihood of applicable relevance to most readers' needs. Each example is included on the CD.

There are many more uses of XSLT than are represented here. Be sure to review the examples of the <xsl:key> top-level element and key() function in Jeni Tennison's contributed Appendix B, "Grouping Using the Muenchian Method," as well as the unique application of XSLT to the classic N-Queens problem in artificial intelligence, by Oren Ben-Kiki, in Appendix C.

In this appendix, we've included a sample of work with library records stored in the Machine Readable Record (MARC) format as part of a significant XML project currently underway at Emory University by the American Theological Library Association's Center for Electronic Resources in Theology and Religion. MARC records are an early standard for electronic card catalogues used for over 20 years by libraries around the world. In this example, MARC records converted to XML with an excellent shareware tool by Bob Pritchett at Logos Software (http://www.logos.com/marc/marc.asp) are mined for linking to individual images of each page of the journal article referenced by any given MARC record. Another example represents a common situation encountered with complex documents representing several types of markup DTDs and two different texts. It shows how to reformat the tags and separate the versions for selective publication.

The first set of examples represents a comprehensive range of ways to work with lists in XSLT and XPath to create HTML and plain text lists of varying formats and complexity.

A.1 Lists

Each of these list examples becomes successively more complex, beginning with a simple HTML output with LREs and building toward layered sub-lists using <xsl:number>. All examples except the last one work with the simple structure of the list shown in Example A-1.

A.1.1 Simple HTML Lists from XML with Literal Result Elements

The stylesheet in Example A-1 converts the input list to basic HTML output with ordered and unordered lists.

In this example, we directly match <list> elements, based on their type attribute (@) value of ordered or unordered, with a simple insertion of HTML ordered (<ol>) and unordered (<ul>) LREs. The <item> elements are then transformed into <li> elements. The use of <xsl:apply-templates> assures that the children of the matched elements are processed. The first <xsl:template> model establishes the basic <html> and <body> wrappers used to make browser-readable HTML, as shown in the sample output. Notice also that it is possible to create HTML output without using <xsl:output> and setting its method to html, as long as the output is well-formed, per XML rules.

A.1.2 ASCII/Text-only Lists from XML Input

Using the <xsl:output> element with the method attribute set to text, it is possible to take the same XML document instance as in Example A-1 and get simple text from it, also as a numbered list. We'll add position() and <xsl:value-of> to create the sequential numbering in Example A-2, which was otherwise made possible in HTML by a browser rendering the list tags properly. Notice especially how <xsl:text> is used to format the text output similarly to how <pre> in HTML pre-formats HTML output.

Example A-1 Making HTML lists.
INPUT: <?xml version="1.0"?> <list type="ordered" prefix="number"> <item>item 1 in list level 1</item> <item>item 2 in list level 1</item> <item>item 3 in list level 1</item> <item>item 4 in list level 1, with a sub-list: <list type="unordered" prefix="bullet"> <item>item 1 in list level 2</item> <item>item 2 in list level 2</item> <item>item 3 in list level 2</item> </list> </item> <item>item 5 in list level 1</item> <item>item 6 in list level 1</item> </list> STYLESHEET: <?xml version="1.0"?> <xsl:stylesheet        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"        version="1.0"> <!-- This stylesheet takes an XML list      and converts it to HTML list format --> <xsl:template match="/">       <html>       <body>       <xsl:apply-templates/>       </body>       </html> </xsl:template> <xsl:template match="list[@type='ordered']">       <ol>              <xsl:apply-templates/>       </ol> </xsl:template> <xsl:template match="list[@type='unordered']">       <ul>       <xsl:apply-templates/>       </ul> </xsl:template> <xsl:template match="item">       <li>             <xsl:apply-templates/>       </li> </xsl:template> </xsl:stylesheet> OUTPUT: <html> <body> <ol> <li>item 1 in list level 1</li> <li>item 2 in list level 1</li> <li>item 3 in list level 1</li> <li>item 4 in list level 1, with a sub-list: <ul> <li>item 1 in list level 2</li> <li>item 2 in list level 2</li> <li>item 3 in list level 2</li> </ul> </li> <li>item 5 in list level 1</li> <li>item 6 in list level 1</li> </ol> </body> </html> 

Since we are not using LREs, and since we have text-only output, the <list> elements are not necessary for the output, other than to have the value of their type attributes determine whether <item>s selected for processing in either template will be ordered or unordered. We've used a predicate ([]) test for which attribute value is to be selected so that the corresponding <item> children of either <list> element can be processed. The technique of splitting into two different lines for the starting and ending tags of <xsl:text> shows how to get a hard return between each item.

Example A-2 XSLT for an XML list converted to an ASCII list.
STYLESHEET: <?xml version="1.0"?> <xsl:stylesheet       xmlns:xsl="http://www.w3.org/1999/XSL/Transform"       version="1.0"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <!-- This stylesheet takes an XML list      and converts it to a text list format --> <xsl:template match="list[@type='ordered']/item">       <xsl:value-of select="position()"/><xsl:text>. </xsl:text><xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="list[@type='unordered']/item"> <xsl:text> </xsl:text>       <xsl:text>    -  </xsl:text><xsl:apply-templates/> </xsl:template> </xsl:stylesheet> RESULT: 1.  item 1 in list level 1 2.  item 2 in list level 1 3.  item 3 in list level 1 4.  item 4 in list level 1, with a sub-list:     -  item 1 in list level 2     -  item 2 in list level 2     -  item 3 in list level 2 5.  item 5 in list level 1 6.  item 6 in list level 1:apply-templates/> 

Several things are different in this example. First, working with <xsl:output> set to method="text", we need no XML tags in the output. That being the case, it is necessary to find a way to generate numbers for the numbered, or ordered, portions of the list. We need to do some formatting, so for each of the items in the <list> with the ordered attribute type value, we use <xsl:value-of> and position() to calculate what number should be assigned to each item. Using <xsl:text>, we mandate that a period follows each <xsl:value-of> with position() to set up nicely formatted numbers, followed by two spaces. Then, the actual list content is simply processed as a child of each <item> (remember, text() nodes are possible children of elements, so when <xsl:apply-templates> processes the children of each <item>, it is the text contained in the tags that is processed). Notice that we have suppressed any blank space children using the <xsl:strip-space> element.

For the unordered list, we're relying heavily on <xsl:text> in a similar manner to enable a certain formatting with dashes and spaces. Otherwise, <xsl:apply-templates> assures the proper output of the actual text content the text() child nodes of each <item> child to the <list> element with the type attribute set to a value of unordered.

A.1.3 Additional Text-only Formatting from XML with <xsl:number>

Working with <xsl:number>, it becomes possible to perform additional formatting on the output list, which is only possible with the basic functionality introduced in the preceding examples. We can get the same output as in Example A-2, but do not need as many <xsl:text> elements to do so. The <xsl:output> method is still text, and the match on each <list> item is still a path expression with a predicate ([]) test on the attribute values to determine ordered or unordered. However, inside the templates, we are working specific formatting options for the numbers, as shown in Example A-3.

Example A-3 XSLT for XML list converted to ASCII list using <xsl:number>.
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"             version="1.0"> <xsl:output method="text"/> <!-- This stylesheet takes an XML list      and converts it to a text list format --> <xsl:template match="list[@type='ordered']/item"> <xsl:number value="position()" format="1. "/> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="list[@type='unordered']/item"> <xsl:text> </xsl:text>       <xsl:text>    -  </xsl:text><xsl:apply-templates/> </xsl:template> </xsl:stylesheet> 

Notice that we have left the unordered template the same, as there is no need to use <xsl:number>, of course, if it is unordered. The format value of <xsl:number> is just a standard 1 followed by a period and a space, a task that required the careful use of <xsl:text> in previous examples is simplified here. You could also specify something like "A. " for this and get capital alphabetical ordering (for more, see Chapter 9, Section 9.7). Splitting the <xsl:text> element into two lines, as shown above, provides the hard return between each item. A more deeply-nested list is shown in Example A-4.

Example A-4 Using <xsl:number> with formatting.
INPUT: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"             version="1.0"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <!-- This stylesheet takes an XML list      and converts it to a text list format --> <xsl:template match="list[@type='ordered']/item"> <xsl:number value="position()" format="A. "/> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <xsl:template match="list[@type='unordered']/item"> <xsl:text> </xsl:text>       <xsl:text>    -  </xsl:text><xsl:apply-templates/> </xsl:template> </xsl:stylesheet> OUTPUT: A. item 1 in list level 1 B. item 2 in list level 1 C. item 3 in list level 1 D. item 4 in list level 1, with a sub-list:     -  item 1 in list level 2     -  item 2 in list level 2     -  item 3 in list level 2 E. item 5 in list level 1 F. item 6 in list level 1 

A.1.4 Multi-level Text Outline

The XML list in Example A-5 has a more deeply layered structure, with sub, sub-sub, and further embedded sub-lists. Making a representative output of this structure in text form can be done very well with the same tools, but applied in more detail. We begin with a slightly more complex input XML source. The stylesheet will generate a list formatted to look like an outline.

In this example, we're doing a more complex formatting of the output to look like a traditional outline in a text document. Using the format attribute of <xsl:number> makes this possible. In addition, we are also using <xsl:text> to create or "force" indentation of the sub-levels of the outline.

If you look closely at the XSLT stylesheet, you will notice that apart from a successively longer path in the <xsl:template> match attribute, each template repeats a fairly regular structure. The <xsl:text> element is used to increase the indent level, then <xsl:number> draws its value from position() and a format value that changes its style each time, as required by the successive step down the outline form.

A.2 MARC Records: The ATLAS Project from ATLA-CERTR at Emory University

MARC records for electronic card catalogues in libraries have been in use for some two to three decades. They have become most prevalent in the last 15 years, as the increased power and affordability of digital equipment has become more available to libraries one of the most invaluable and almost utterly under funded knowledge stores of humankind.

The MARC style is very simple, but remarkably powerful for data representation and categorization. This is accomplished with a carefully counted syntax in which the entire beginning of each record is a series of eight-digit sequences identifying each part of the record (title, author, etc.), how many characters each contains, and where in the sequence of characters each ends.

MARC is not XML. There are no <> tags in MARC. It is therefore necessary to convert MARC to XML. Again, because the string-tagging system in MARC is so very orderly, it is not difficult to convert MARC records to XML. Each part of a library lookup record is identified in MARC by its starting position in the numerical sequence of characters forming the record (a start tag in XML correlates to this), and the length in number of characters of that piece of information (an end tag in XML indicates this point of closure for the information). There is also a three-digit string that identifies what kind of information is represented (title, etc.), like an XML element name.

Example A-5 Outline formatting for a deeply nested list.
INPUT: <?xml version="1.0"?> <list> <item>item 1 in list level 1</item> <item>item 2 in list level 1</item> <item>item 3 in list level 1</item> <item>item 4 in list level 1, with a sub-list: <list> <item>item 1 in list level 2</item> <item>item 2 in list level 2</item> <item>item 3 in list level 2, with a sub-list: <list> <item>item 1 in list level 3</item> <item>item 2 in list level 3</item> <item>item 3 in list level 3, with a sub-list: <list> <item>item 1 in list level 4</item> <item>item 2 in list level 4</item> <item>item 3 in list level 4, with a sub-list: <list> <item>item 1 in list level 5</item> <item>item 2 in list level 5</item> <item>item 3 in list level 5, with a sub-list: <list> <item>item 1 in list level 6</item> <item>item 2 in list level 6</item> <item>item 3 in list level 6</item> </list> </item> <item>item 4 in list level 5</item> <item>item 5 in list level 5</item> </list> </item> <item>item 4 in list level 4</item> <item>item 5 in list level 4</item> </list> </item> <item>item 4 in list level 3</item> <item>item 5 in list level 3</item> </list> </item> <item>item 4 in list level 2</item> <item>item 5 in list level 2</item> </list> </item> <item>item 5 in list level 1</item> <item>item 6 in list level 1</item> </list> STYLESHEET: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"             version="1.0"> <xsl:output method="text"/> <xsl:strip-space elements="*"/> <!-- This stylesheet takes an XML list      and converts it to a text list outline format      regardless of list type, using list levels --> <xsl:template match="list//list"> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <!-- 1st list level --> <xsl:template match="list/item"> <xsl:number value="position()" format="I. "/> <xsl:apply-templates/> <xsl:text> </xsl:text> </xsl:template> <!-- 2nd list level --> <xsl:template match="list/item/list/item"> <xsl:text>       </xsl:text><xsl:number value="position()" format="A. "/> <xsl:apply-templates/> </xsl:template> <!-- 3rd list level --> <xsl:template match="list/item/list/item/list/item"> <xsl:text>             </xsl:text> <xsl:number value="position()" format="1. "/> <xsl:apply-templates/> </xsl:template> <!-- 4th list level --> <xsl:template match="list/item/list/item/list/item/list/item"> <xsl:text>                    </xsl:text> <xsl:number value="position()" format="a. "/> <xsl:apply-templates/> </xsl:template> <!-- 5th list level --> <xsl:template match="list/item/list/item/list/item/list/item/ list/item"> <xsl:text>                               </xsl:text> <xsl:number value="position()" format="i. "/> <xsl:apply-templates/> </xsl:template> </xsl:stylesheet> OUTPUT: I. item 1 in list level 1 II. item 2 in list level 1 III. item 3 in list level 1 IV. item 4 in list level 1, with a sub-list:       A. item 1 in list level 2       B. item 2 in list level 2       C. item 3 in list level 2, with a sub-list:            1. item 1 in list level 3            2. item 2 in list level 3            3. item 3 in list level 3, with a sub-list:                 a. item 1 in list level 4                 b. item 2 in list level 4                 c. item 3 in list level 4, with a sub-list:                        i. item 1 in list level 5                        ii. item 2 in list level 5                        iii. item 3 in list level 5                        iv. item 4 in list level 5                        v. item 5 in list level 5                  d. item 4 in list level 4                  e. item 5 in list level 4             4. item 4 in list level 3             5. item 5 in list level 3       D. item 4 in list level 2       E. item 5 in list level 2 V. item 5 in list level 1 VI. item 6 in list level 1 

Fortunately, while it is possible to wrap a MARC record in XML tags and use a complex set of XPath functions and expressions (e.g., count(), substring-after(), and so on), there is already software that does this. We highly recommend a freeware program called marcxml.exe (alas, Windows only) from Bob Pritchett at Logos, Inc. (http://www.logos.com/marc/marc.asp).

The following archive and access procedure was developed at Emory University by the American Theological Library Association's Center for Electronic Resources in Theology and Religion (ATLA-CERTR). ATLA-CERTR has undertaken a project in which over 50 years of issues from 50 journals in philosophy, ethics, religion, and so forth are being scanned as images for archival integrity and are also being keyed in with the Text Encoding Initiative (TEI) DTD in XML, a service provided by Pacific Data Conversion Corporation. The initial point of access is through MARC records from the comprehensive catalogue of resources maintained by ATLA (a 50 x 50 journal/year subset of the one-of-a-kind resource for theology, ethics, religion, philosophy and biblical studies carefully maintained by the Chicago-based nonprofit organization). To accomplish this task and to maintain a completely standards-based XML solution, the MARC records required translation into XML, as well as further processing.

The Logos software simply takes a command line in an MS-DOS window with the name of the program, the input MARC, and the output XML filename you choose. It runs very quickly, even on extremely large MARC files, and produces well-formed XML. An output MARC record in XML is shown as the input in Example A-6.

The three-digit numbers in the tag attribute values are MARC identifiers for the kind of information, like 773 for the kind of journal, and the <subfield>s identify citations and so forth. One of the first projects at ATLA-CERTR was to test an Oracle database back-end with the MARC records and links to the images of the journal articles. In the case of the record in Example A-6, the pages of this article were 7 through 19 (see <data-field tag="773" ind1="0"> at the end of <record> and <subfield> with code equal to g). This set of numbers was not individually marked, and sometimes the format varied, so it was not easy to recognize the page numbers. Plus, of course, in a large database, there could be many page 7s, 8s, 9s, and so forth. Therefore, a unique identifier was needed to make each page from this record distinct from duplicate numbers in different records.

In addition, to make the pages browsable in a traditional Internet browser, we needed to have forward and backward links and to have each page formatted with information from the author and title fields as well. It so happens that the specification for MARC records sets aside in the control field a unique identifier field for each record (one of the very first strings of numbers in each record), such as the number "001" in the following line taken from Example A-6:

<control-field tag="001">ario19990010001002</control-field> 

First, however, we needed to select the subset of records that represented our initial set of test-scanned images of article pages a few journals of various types.

Using the stylesheet shown in Example A-6, we fed over 500MB of data into an XSLT processor to extract the less than 100K of total records for our initial test. The first template selected the root element, <marc>, which was the standard output from the Logos marcxml.exe conversion program. With <xsl:copy>, we copied that basic element and processed only those <record> children whose ISSN[1] numbers contained in the <subfield> child of a <data-field> element, as tested in the predicate ([]) matched our chosen #'s. Then, to be sure we got all the attributes as well-remember, <xsl:copy> does not copy attributes we used <xsl:copy-of> and a match on all attributes with @*. We used the copy mode. With this simple stylesheet, we got the subset we needed for this particular test. However, the original challenge still remained: to get individual pages for each page in the article that corresponded to the citation in the MARC record. In addition, those HTML pages had to be made unique among references to duplicate page numbers in other journals.

Example A-6 Processing an XML version of a MARC record.
INPUT: <?xml version="1.0"?> <record type="naa">       <control-field tag="001">ario19990010001002</control-field>       <control-field tag="003">ATLA</control-field>       <control-field tag="005">19990802145817.0</control-field>       <control-field tag="008">990802s1998    xx       000  0eng d</control-field>       <data-field tag="040">             <subfield code="a">ATLA</subfield>             <subfield code="c">ATLA</subfield>       </data-field>       <data-field tag="100" ind1="1">             <subfield code="a">Malone, Patricia.</subfield>       </data-field>       <data-field tag="245" ind1="1" ind2="0">             <subfield code="a">Religious Education and Prejudice among Students Taking the Course Studies of Religion :</subfield>             <subfield code="b">[bibliog, tables]</subfield>       </data-field>       <data-field tag="650" ind2="4">             <subfield code="a">Religions</subfield>             <subfield code="x">Study.</subfield>       </data-field>       <data-field tag="650" ind2="4">             <subfield code="a">Prejudices.</subfield>       </data-field>       <data-field tag="650" ind2="4">             <subfield code="a">Toleration, Religious.</subfield>       </data-field>       <data-field tag="650" ind2="4">             <subfield code="a">Students</subfield>             <subfield code="x">Religious life.</subfield>       </data-field>       <data-field tag="651" ind2="4">             <subfield code="a">Australia</subfield>             <subfield code="x">Education.</subfield>       </data-field>       <data-field tag="773" ind1="0">             <subfield code="a">British Journal of Religious Education</subfield>             <subfield code="g">21 (Aut 1998), p. 7-19</subfield>             <subfield code="x">0141-6200</subfield>       </data-field> </record> STYLESHEET: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                  version="1.0"> <xsl:output type="xml" indent="yes"/> <!-- Semeia, vols 73-76, issues 1 --> <!-- JBL, Vol 114, 115 issues 1-4 0021-9231--> <!-- Biblical Archaeologist, Vol 60, issues 1-4 0006-0895--> <!-- JAAR, Vol 65, issue 1 0095-571X--> <xsl:template match="marc"> <xsl:copy> <xsl:apply-templates mode="copy"  select="record[data-field/subfield='0002-7189']"/> <xsl:apply-templates mode="copy"  select="record[data-field/subfield='0021-9231']"/> <xsl:apply-templates mode="copy"  select="record[data-field/subfield='0006-0895']"/> <xsl:apply-templates mode="copy"  select="record[data-field/subfield='0095-571X']"/> </xsl:copy> </xsl:template> <xsl:template mode="copy" match="*"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates mode="copy"/> </xsl:copy> </xsl:template> </xsl:stylesheet> 

For example, a "start" page was required, as well as a header for each page, so that the image of the page would be called in a browser page, which would also display basic reference information in human-readable form, including the title, journal, citation, and author.

The XSLT stylesheet in Example A-7, developed primarily by G. Ken Holman of CraneSoftwrights, Ltd., is presented in parts, along with commentary. Assume a starting MARC XML file similar to the sample record in Example A-6. It uses the XT xt:document extension element (you could also use saxon:output) for producing multiple output files from a single MARC XML input file.

There are several major tasks accomplished by this stylesheet. The first template has to set up the basic breakout of page ranges and the "homepage" for all the articles in the chosen subset (for instance, in our first demo, this was per-journal issue within a per-journal directory structure). This first step is performed with the xt:document element to create the homepage (index.htm) and individual article pages for each journal.[2]There are a number of qualifications to be considered, as MARC records do not require any specific order or sorting and the structure of the text() nodes can be different. After a basic match for the template on the root or document element (with /), and once the name of the output document is declared to be the index.htm file, the contents of that index.htm file must be created by the child elements.

Example A-7 Processing MARC XML files into HTML pages.
<?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:xt="http://www.jclark.com/xt">   <xsl:output method="html" /> <xsl:template match="/"> <xt:document href="index.htm"> <!-- make index page   --> <!-- each entry creates a set of files   --> <xsl:for-each select="//record/*[@tag='773']">   <xsl:variable name="cite_elem" select="*[@code='g']" />   <xsl:variable name="citation" select="normalize-space($cite_elem)" />   <xsl:variable name="numseqs" select="substring-after($citation,'p. ')" />   <xsl:variable name="start" select="number(substring-before($numseqs,'-'))" />   <xsl:variable name="end" select="number(substring-after($numseqs,'- '))" />   <xsl:param name="basename" select="concat(../control-field[@tag=001], '- ')" /> <!-- check number validity before linking   --> <xsl:if test="string($start) != 'NaN' and string($end) != 'NaN'"> <p> <a href="{$basename}{$start}.htm">   <xsl:value-of select="*[@tag='245']" />   </a>   </p>   </xsl:if>   </xsl:for-each>   </xt:document> <!-- make each set of pages   --> <xsl:for-each select="//record/*[@tag='773']"> <xsl:variable name="cite_elem" select="*[@code='g']" />   <xsl:variable name="citation" select="normalize-space($cite_elem)" />   <xsl:variable name="numseqs" select="substring-after($citation,'p. ')" />   <xsl:variable name="start" elect="number(substring-before($numseqs,'-'))" />   <xsl:variable name="end" select="number(substring-after($numseqs,'- '))" /> <xsl:choose> <xsl:when test="string($start) = 'NaN' or $start &lt; 0">   <xsl:message>start value unacceptable</xsl:message>   </xsl:when> <xsl:when test="string($end) = 'NaN' or $end &lt; 0">   <xsl:message>end value unacceptable</xsl:message>   </xsl:when> <xsl:when test="$start &gt; $end">   <xsl:message>start value unacceptable</xsl:message>   </xsl:when> <xsl:otherwise> <xsl:call-template name="makepages">   <xsl:with-param name="basename" select="concat(../control-field[@tag=001], '-')" /> <!--  with-param name="prev"     use default   -->   <xsl:with-param name="start" select="$start" />   <xsl:with-param name="end" select="$end" />   </xsl:call-template>   </xsl:otherwise>   </xsl:choose>   </xsl:for-each>   </xsl:template> <xsl:template name="makepages"> <!-- make a set of pages from start to end   --> <xsl:param name="basename" />   <xsl:param name="prev" /> <!-- number of previous page   -->   <xsl:param name="start" select="-1" /> <!-- number of this page   -->   <xsl:param name="end" select="-1" /> <!-- number of last page   --> <!-- produce this document from set   --> <xt:document href="{$basename}{$start}.htm"> <p>   <xsl:value-of select="*[@tag='773']" />   </p> <!-- item information   --> <xsl:choose> <!-- previous link   --> <xsl:when test="not($prev)"> <p>   <a href="index.htm">Index page</a>   </p>   </xsl:when> <xsl:otherwise> <p> <a href="{$basename}{$prev}.htm">   Page   <xsl:value-of select="$prev" />   </a>   </p>   </xsl:otherwise>   </xsl:choose> <xsl:choose> <!-- next link   --> <xsl:when test="$start &lt; $end"> <p> <a href="{$basename}{$start + 1}.htm">   Page   <xsl:value-of select="$start + 1" />   </a>   </p>   </xsl:when> <xsl:otherwise> <p>   <a href="index.htm">Index page</a>   </p>   </xsl:otherwise>   </xsl:choose> <p>   Page   <xsl:value-of select="$start" />   </p> <!-- item information   -->   <img src="{$basename}{$start}.gif" alt="Page {$start} Image" /> <!-- image   -->   </xt:document> <!-- produce next document from set   --> <xsl:if test="$start &lt; $end"> <!-- only if available   --> <xsl:call-template name="makepages">  <xsl:with-param name="basename" select="$basename" />   <xsl:with-param name="prev" select="$start" />   <xsl:with-param name="start" select="$start + 1" />   <xsl:with-param name="end" select="$end" />   </xsl:call-template>   </xsl:if>  </xsl:template> </xsl:stylesheet> 

We begin with what we know. First, we know that the citation data is contained in the <data-field> with the 773 attribute tag. We select each one with <xsl:for-each> and a selection on this journal's <data-field>. We can safely choose any element (*) with an attribute tag value of 773 as follows:

<xsl:for-each select="//record/*[@tag='773']"> 

Next, we create a series of variables that build on one another for narrowing down and removing ambiguity from the particular string of page numbers that interests us. Since string functions have specific input node types for their arguments, we'll work with variables to avoid errors. First, a variable called cite_elem is made with <xsl:variable> to identify the particular element whose text() node we're going to use. In this case, it is the <subfield> child of the <data-field> element with tag 773, which has a code value of g. In MARC parlance, the g indicates the citation details for an article. So, we select the element child (*) with a predicate test ([]) for the attribute code to be g for any element (*) that is the child of the currently selected node using

<xsl:variable name="cite_elem" select="*[@code='g']" /> 

Next, we need to make sure there are no extra spaces in the section with the page numbers, so we use normalize-space() to normalize the space in the text() node referenced by the cite_elem variable; we call this the citation variable: normalize-space($cite_elem). Note that when you call a variable for an argument in a function, you do not have to put it in quotes, but you must precede it with the $ token.

<xsl:variable name="citation" select="normalize-space($cite_elem)" /> 

Now that the citation variable has been identified and "cleaned up," we can begin to extract the page numbering sequences we'll use. Creating a variable called numseqs, we can identify the page numbers by choosing the substring-after() the p (now that we've normalized spaces). This gives us a numseqs variable value, which is the string of representative page numberings (such as 7-19).

<xsl:variable name="numseqs" select="substring-after($citation,'p. ')" /> 

Next, we'll pare this down even further, using the dash (-) to get the starting number for a variable called start and the ending number for a variable called end. We'll use substring-before() for the number preceding the dash and substring-after() for the number that follows it.

<xsl:variable name="start" select="number(substring-before($numseqs,'- '))" /> <xsl:variable name="end" select="number(substring-after($numseqs,'-'))" /> 

We can now work with the start and end numbers and be confident that the normalized space and substring functions are giving us the specific digits we want. Using <xsl:message>, we make sure that, in case this is a record with typos or we've gotten an improper page number representation, there is some alert to this effect. We'll use the NaN (not a number) token to test whether the $start and $end variables are "not NaN" in other words, the double-negative will affirm a positive. If they are not a non-number, then they are a number. We'll use the <xsl:if> test to determine this, and if they pass the test, LREs will create the basic HTML tags we need for making the reference links to the start page of the articles. We will then create a parameter to get the unique field name from the control field for the base name of the file, which will be used in the link from the index page (a dash is added with the concat() function).

  <xsl:param name="basename" select="concat(../control-field[@tag=001], '-')" /> 

Assuming the NaN test is passed, then the LRE for the hypertext link is to the file whose name is created with the basename and the start number appended with .htm.

<a href="{$basename}{$start}.htm"> 

The two AVTs with {} get the basename and the starting page number. The actual text that is the link, for user-friendliness, is the title of the article, retrieved by <xsl:value-of>, selecting the contents of the element (*) whose attribute tag is 245 the MARC indicator for a title.

  <xsl:value-of select="*[@tag='245']" /> 

The content of the xt:document element creates an index page, whose contents are links, which use the titles of the articles from each MARC record to link to the first page image of each respective article.

We will now move on to the creation of the individual per-page HTML files to hold each scanned article page's image. We will work with another <xsl:for-each> to establish an iterative approach. Everything begins with the same set of variables as before (remember, they must be redeclared) to establish exactly what range of page numbers is to be used for making the pages.

<xsl:for-each select="//record/*[@tag='773']"> <xsl:variable name="cite_elem" select="*[@tag='773']/*[@code='g']" />   <xsl:variable name="citation" select="normalize-space($cite_elem)" />   <xsl:variable name="numseqs" select="substring-after($citation,'p. ')" />   <xsl:variable name="start" select="number(substring-before($numseqs,'-'))" />   <xsl:variable name="end" select="number(substring-after($numseqs,'- '))" /> 

We then use another <xsl:choose> and a set of <xsl:when> elements to make sure the page number variables are valid. If they are, the <xsl:otherwise> element is selected and we create the individual HTML files for each page number with a call to a named template.

<xsl:call-template name="makepages"> 

We add an <xsl:with-param> element to pass the values of the page numbers and basename to the named template.

<xsl:with-param name="basename"     select="concat(../control-field[@tag=001], '-')" />   <xsl:with-param name="start" select="$start" />   <xsl:with-param name="end" select="$end" /> 

These pages have to have "next" and "previous" HTML links for "turning the virtual pages" of the journal article. Now that the model is set with $basename for naming and making the individual output files in HTML with xt:document, it is time to use the template that is actually called for doing so.

<xsl:template name="makepages"> 

Remember that when name is used with <xsl:template>, a match attribute is not required.

The initial pair of $prev and $basename parameters are declared, but no values are given because values for each MARC record will be invoked on-the-fly with <xsl:with-param>, which can recalculate a parameter when used, for instance, in <xsl:call-template>.

<xsl:param name="basename" /> <xsl:param name="prev" /> 

To begin building the pages, we need to increment the $start and $end values sequentially to one less than their starting value, so that by checking for a value greater than the former or latter, we'll know or the processor will know we've made "enough" pages.

  <xsl:param name="start" select="-1" /> <!-- number of this page   -->   <xsl:param name="end" select="-1" /> 

Next, we make the actual pages, again with xt:document. Each successive current node in <xsl:for-each> which called the makepages template is processed.

<xt:document href="{$basename}{$start}.htm"> <p>   <xsl:value-of select="*[@tag='773']" />   </p> 

The xt:document function is concatenating the $basename defined in the template with <xsl:with-param>, into which this template is called as makepages. This is added to the $start variable and an .htm extension to make the first page of the article. Then we use some <p> LREs to make the citation information (from the 773 <data-field>) appear as a human-readable reference on each HTML page created.

Next, with <xsl:when> inside an <xsl:choose> element, we test to see if there is a $prev value:

<xsl:when test="not($prev)"> <p>   <a href="index.htm">Index page</a>   </p>   </xsl:when> 

If there is not, we just make a link to the homepage (index.htm) since we are on the first page of the article and the only place to go back to is the index page of articles. If there is a $prev value, we skip to the <xsl:otherwise> element to make a link to the previous page.

<xsl:otherwise> <p> <a href="{$basename}{$prev}.htm">   Page   <xsl:value-of select="$prev" />   </a>   </p>   </xsl:otherwise> 

The actual text, which is the blue-colored browser link, is the page number (the word Page is added with actual text), and we use <xsl:value-of> to select that actual page number value.

We then make the next page link for readers to continue reading additional pages (e.g., from page 8 to page 9), using a new <xsl:choose> element.

<xsl:choose> <!-- next link   --> <xsl:when test="$start &lt; $end"> <p> <a href="{$basename}{$start + 1}.htm">   Page   <xsl:value-of select="$start + 1" />   </a>   </p>   </xsl:when> 

We use <xsl:when> to be sure that our $start variable is less than $end (in other words, that we haven't finished remember, makepages is called into an <xsl:for-each> loop in the previous template of the stylesheet, so we have to have termination conditions). If it is less, then we make a link with LREs by concatenating the $basename MARC record identifier to distinguish this string of pages from another from a different issue or volume, along with the $start variable incremented by 1 to an .htm suffix for the HTML link reference. The actual human-readable blue link text is, again, the word "Page" followed by the page number of that next page being linked to, retrieved with <xsl:value-of>, selecting the @start +1 page number.

We've not finished making pages yet, just parts of their content, as we've not yet even put in the image files! is to account for what to do when $start is not less than $end. In other words, what do we do when we are actually on the last ($end) page in the <xsl:for-each> iteration for each article represented by a MARC record? We already set up a link like that for $prev if we're on the first page: When we're on the first page, we can only go "back" to the main index of articles for that journal.

<xsl:otherwise> <p>   <a href="index.htm">Index page</a>   </p> </xsl:otherwise> 

In the final parts of this xt:document instruction, the actual image of the scanned article page itself has to be inserted. We just take the $start variable and place it before the .gif suffix, along with the $basename identifier. For the additional pages beyond $start, as you will see, we simply increment +1 remember, again, this is used in an <xsl:for-each>, where it is called as a template for each MARC record and each page represented by the article cited in that record. We prefix the scanned image with the word "Page" and the current page number using a <p> LRE.

<p>   Page   <xsl:value-of select="$start" />   </p> <!-- item information   -->   <img src="{$basename}{$start}.gif" alt="Page {$start} Image" /> <!-- image   -->   </xt:document> 

The actual scanned article page image is displayed, using the HTML <img> element, for reader convenience in reading the citation (e.g., journal name, data, page numbers). The images are stored in directories named and structured by journal ISSN and issue, in practice.

We use the <xsl:call-template> element recursively inside the <xsl:if> element to make the rest of these pages. We test with <xsl:if> to see if there are remaining pages if $start, as iterated by +1, is less than $end and if so, makepages continues to be called. This portion of the template is recursive, as we are calling the template in which this template occurs. However, the <xsl:if> test gives us our stop criteria.

<xsl:if test="$start &lt; $end">   <xsl:call-template name="makepages">      <xsl:with-param name="basename" select="$basename" />      <xsl:with-param name="prev" select="$start" />      <xsl:with-param name="start" select="$start + 1" />      <xsl:with-param name="end" select="$end" />   </xsl:call-template> </xsl:if> 

A.3 The Harvard-Kyoto Classics Project with Vedic Literature

The academic work done with XML is varied in many ways and is still in its exploratory stages. An international collaboration between Dr. Michael Witzel of Harvard University and colleagues at Kyoto University, Japan, led by H. Nakani and M. Tokunaga, is working to reconstruct an entire catalogue of the classics of Asian, East Asian, and South Asian sacred literature under the title "Towards a Reconstitution of Classical Studies." It will be done in XML with unprecedented complexity, richness of links, and other aspects of XML technology.

A multitude of ancient texts have been entered into computer formats over the past decades, beginning with the valiant and noble efforts of Lehman and Ananthanarayanan in 1971, with the Rig Veda and Shatapatha Brahmana, two ancient texts of India dating as far back as pre-1800 b.c.e. Formats were a problem, and even when markup was used, differing DTDs and tag names were employed, some in SGML/TEI (Text Encoding Initiative) tags, some in plain text, and some in HTML. In addition, there are multiple versions of the texts particularly of important texts like the Rig Veda (RV). In this applied example, we will see how the use of XML and XSLT is still readily possible, even when the actual content is not known, but the logical structure of the tags is.

In Example A8a, we see one such example where the version by Lubotsky, designed to remove changes in spelling due to sound combinations of words, is combined with a version maintained in the TITUS project at Frankfurt University by Jost Gippert (http://titus.uni-frankfurt.de/texte/texte.htm). These versions have been woven with TEI <div> tags, nonconforming IDs (remember, IDs need to begin with a nonnumeric, therefore alphabetic, character), and HTML tags.

In this project, we wanted to separate the "L" (Lubotsky) version from the "T" (TITUS) version, for some high-precision searching in pure XML, with no HTML tags, but with proper IDs (which could be validated if needed) and tag names that reflected the actual common naming among scholars (like paada for each little part of a verse, also called a mantra). Notice that if you open the resulting file in a browser, it looks like

Example A-8a XML input from the Rig Veda.
<?xml version="1.0"?> <html> <body bgcolor="#ffffff"> <div class="Rgveda"> <hr size="8" /> <br /> <font size="5"><b>Mandala I</b></font> <br /> <hr width="200" /> <br /> <div1 class="maNDala" id="1"> <dl> <div2 class="hymn" id="1.1"> <div3 class="verse" id="1.1.1"> <a name="1.1.1"></a> <dt> 1.1.1 </dt> <dd> <ol class="mantra" type="a">       <li class="T"> agni;m ILe puro;hitaM yajJa;sya deva;m Rtvi;jam / <ul> <li class="L"> agni;m ILe puro;hitam </li> <li class="L"> yajJa;sya deva;m Rtvi;jam / </li> </ul> </li> </ol> <ol type="a" start="3" class="mantra"> <li class="T"> ho;tAraM ratnadhA;tamam // <ul> <li class="L"> ho;tAram ratnadhA;tamam // </li> </ul> </li> </ol> </dd> </div3> <div3 class="verse" id="1.1.2"> <a name="1.1.2"></a> <dt> 1.1.2 </dt> <dd> <ol class="mantra" type="a">       <li class="T"> agni;H pU;rvebhir R;Sibhir I;Dyo nU;tanair uta; / <ul> <li class="L"> agni;H pU;rvebhiH R;SibhiH </li> <li class="L"> I;DyaH nU;tanaiH uta; / </li> </ul> </li> </ol> <ol type="a" start="3" class="mantra"> <li class="T"> sa; devA;M; e;ha; vakSati // <ul> <li class="L"> sa; devA;n A; iha; vakSati // </li> </ul> </li> </ol> </dd> </div3> </div2> </dl> </div1> </div> </body> </html> 

Figure A-1, with definition lists (<dl>) used to format the verse numbers and so forth.

Figure A-1. Browser view of the hybrid XML-HTML Rig Veda.

graphics/afig01.gif

This HTML formatting is not necessary for the raw processing in XML where we wanted tags for specific rhythms of meter, deities addressed in the hymn, author, and so forth were to be added and individual word strings searched. This did not require the HTML format, the extra <dl>, <dt>, and <dd> tags, or the "T" version. So, a simple series of XPath matches would give us the "L" versions in the output by selecting them and using xsl:apply-templates. In addition, we can easily remove the "T," or "TITUS" versions, because they are located only in ordered lists <ol>. The "L" versions are in unordered lists (<ul>). Thus when we do an <xsl:template> match on an <ol> (which is the T version we want to get rid of for our research use), and not use xsl:apply-templates in other words, give an empty body to the template they are simply removed from the output. We are still preserving the TEI <div> tag structure, however. The first template matches on the root and processes all children with <xsl:apply-templates>, as shown in Example A8b.

Example A-8b Stylesheet to create HTML.
<?xml version="1.0"?> <xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" xml-declaration="yes"/> <xsl:strip-space elements="*"/> <!-- 1 -->   <xsl:template match="/">     <xsl:apply-templates />   </xsl:template> <!-- 2 -->   <xsl:template match="li">     <xsl:copy>       <xsl:apply-templates select="@*"/>       <xsl:apply-templates select="text()"/>     </xsl:copy>   </xsl:template> <!-- 3 -->   <xsl:template match="div3">     <xsl:copy>       <xsl:apply-templates select="@*"/>       <xsl:apply-templates select="dt"/>       <xsl:apply-templates select="dd"/>     </xsl:copy>   </xsl:template>   <!-- 4 -->   <xsl:template match="dd">     <xsl:copy>       <xsl:apply-templates select=".//ul"/>     </xsl:copy>   </xsl:template>   <!-- 5 -->   <xsl:template match="dl"> <xsl:copy>       <xsl:apply-templates select=".//div2"/>     </xsl:copy>   </xsl:template> 

The second template matches on <li>, or list items, and copies them with <xsl:copy>. The use of <xsl:apply-templates> preserves the attributes and actual text nodes.

<!-- 6 -->   <xsl:template match="div">     <xsl:copy>       <xsl:apply-templates select="@*"/>       <xsl:apply-templates select=".//div1"/>     </xsl:copy>   </xsl:template>   <!-- 7 --> <xsl:template match="*|@*|text()">     <xsl:copy>       <xsl:apply-templates select="*|@*|text()"/>     </xsl:copy>   </xsl:template> </xsl:transform> 

The <div3> element is matched in the third template and copied, and its definition list children, <dt> and <dd>, are processed along with the attributes.

Those <dd> elements are matched and copied in the fourth template, along with any <ul> node branches thus preserving the "L" versions, which are contained in the <ul> tags, and processing them to the output result tree.

With the fifth template matching on <dl>, the main definition list is matched and preserved with <xsl:copy>, and all its <div2> node branch children are preserved with <xsl:apply-templates>. This maintains the basic TEI tag format (thoughit is not a valid TEI document in this use).

The <div> and <div1> tags are then processed to the output tree in the sixth template, along with their attributes (we'll use these for IDs later).

Finally, text and attributes are output using the seventh template.

The result is a Lubotsky-only version of the sample file that looks like Example Example A8c. This is well-formed XML, which will also display in a browser.

Next, we want to remove all the HTML tags the <html>, <body>, <dl>, and so forth. Further, we want to change the abstract TEI <div> tags to the terms scholars use for the levels of division in the Rig Veda book/div1, hymn/div2, and verse/div3 and label the individual unordered list segments (<ul>) as the more common term paada, meaning foot. In Sanskrit and Vedic, a foot of divine meter is considered a footstep of the gods in a sense, so the term applies. If the word seems familiar, the Western podiatrist, who treats feet, derives from the same root word.

Example A-8c Resulting HTML file after "cleaning" input tags.
<?xml version="1.0" encoding="utf-8"?> <html> <body bgcolor="#ffffff"> <div class="Rgveda"> <div1 class="maNDala" id="1"> <dl> <div2 class="hymn" id="1.1"> <div3 class="verse" id="1.1.1"> <dt> 1.1.1 </dt> <dd> <ul> <li class="L"> agni;m ILe puro;hitam </li> <li class="L"> yajJa;sya deva;m Rtvi;jam / </li> </ul> <ul> <li class="L"> ho;tAram ratnadhA;tamam // </li> </ul> </dd> </div3> <div3 class="verse" id="1.1.2"> <dt> 1.1.2 </dt> <dd> <ul> <li class="L"> agni;H pU;rvebhiH R;SibhiH </li> <li class="L"> I;DyaH nU;tanaiH uta; / </li> </ul> <ul> <li class="L"> sa; devA;n A; iha; vakSati // </li> </ul> </dd> </div3> </div2> </dl> </div1> </div> </body> </html> 

We also need to recreate the id attributes with an alphabetic prefix of rv at each level, using <xsl:attribute> and <xsl:text> to add rv to the <xsl:value-of> of the existing id attributes. In each case, LREs for book/div1, hymn/div2, and verse/div3 are inserted to remove the more abstract TEI element-type names for scholars unfamiliar with the otherwise versatile academic DTD. The <xsl:apply-templates> element processes the children of each matched element to the result tree. See Example A9a.

Now, we want to remove the dt elements that remain. We do this with an empty <xsl:template> that matches on them and puts nothing in their place. Following that, the <paada> LRE replaces the remaining HTML <ul> tags for the individual verse portions.

The result output is shown in Example A9b, now nearly ready for detailed research, with only the basic tags, so more complex XSLT stylesheets can be added (you can see other such stylesheets in a prior publication by one of the authors at http://www1.shore.net/~india/ejvs, http://www.asiatica.org/publications/ijts/default.asp, and http://nautilus.shore.net/~india/ejvs/ejvs0601/ejvs0601.html). This new stripped down version is a much smaller and correspondingly speedier file to use. In case you're wondering at this point, agni is the Vedic word for fire, and this hymn is a famous praise of fire in the rituals. The first line says, "Agni I call upon, the priest." The fire was considered a priest because, as the smoke rose to the sky, it "carried" the message of the ritual to the deities (see http://vedavid.org/diss/ for more).

Example A-9a XHTML-to-XML conversion and calculation of id attributes with XSLT.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                  version="1.0"                  > <xsl:output type="xml" indent="yes"/> <xsl:template match="div1"> <book>       <xsl:attribute name="id">             <xsl:text>rv</xsl:text><xsl:value-of select="@id" />       </xsl:attribute> <xsl:apply-templates /> </book> </xsl:template> <xsl:template match="div2"> <hymn>       <xsl:attribute name="id">             <xsl:text>rv</xsl:text><xsl:value-of select="@id" />       </xsl:attribute> <xsl:apply-templates /> </hymn> </xsl:template> <xsl:template match="div3"> <verse>       <xsl:attribute name="id">             <xsl:text>rv</xsl:text><xsl:value-of select="@id" />       </xsl:attribute> <xsl:apply-templates select="*" /> </verse> </xsl:template> <xsl:template match="dt" /> <xsl:template match="ul"> <paada>             <xsl:apply-templates /> </paada> </xsl:template> </xsl:stylesheet> 
Example A-9b Resulting XML file.
<?xml version="1.0"?> <book id="rv1"> <hymn id="rv1.1"> <verse id="rv1.1.1"> <paada> agni;m ILe puro;hitam yajJa;sya deva;m Rtvi;jam / </paada> <paada> ho;tAram ratnadhA;tamam // </paada> </verse> <verse id="rv1.1.2"> <paada> agni;H pU;rvebhiH R;SibhiH I;DyaH nU;tanaiH uta; / </paada> <paada> sa; devA;n A; iha; vakSati // </paada> </verse> </hymn> </book> 

Now, the only other thing that makes this a more usable text is to mark the individual <paada> elements with more detail. In Vedic parlance, each few syllables forming a mantra is sub-sequenced with a, b, c, d, and so on. These verses have a through d sections (some go up to g and h), and every other one is marked for example, a and c. To aid in identification for our new workhorse text of the Rig Veda, we're going to add id attributes to the <paada>s and format them with <xsl:number>. Example 10a presents the stylesheet, with comments. As usual, the first template match on the root assures processing of the entire input XML document instance.

Example A-10a Using XSLT to enhance data identification in XML: basic source copying.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                 version="1.0"                 > <xsl:output type="xml" indent="yes" /> <xsl:template match="/">              <xsl:apply-templates /> </xsl:template> <xsl:template match="paada"> <xsl:copy> <xsl:attribute name="id">       <xsl:value-of select="../@id" />       <xsl:copy>       <xsl:number format="a" value="position() -2"              letter-value="alphabetic" />       </xsl:copy> </xsl:attribute>              <xsl:apply-templates /> </xsl:copy> </xsl:template> <xsl:template match="*|@*|text()">     <xsl:copy>       <xsl:apply-templates select="*|@*|text()"/>     </xsl:copy>   </xsl:template> </xsl:stylesheet> 

We begin by matching on <paada>. It is copied with <xsl:copy>. Then, the <xsl:attribute> instruction adds an attribute named id. To get the proper verse id as the base of the id for each <paada>, we select its value (we could also use AVTs ({}) here, see Chapter 6, Section 6.6.1). The simple path that gets the <xsl:value-of> of the parent (..) attribute id furnishes this base. Next, we want to calculate sub-identifiers a, c, e, and so on for each <paada>, based on its position. The <xsl:number> instruction element allows us to format it as a letter. Further, the value is set by the current position, minus two spaces (there is text and then an attribute node, and we only want to count the node that is the paada itself: the first is a, third is c, and so on). The children are processed to the output XML document instance with <xsl:apply-templates>.

The last template assures output of any unmatched elements, attributes, and text nodes.

The resulting output XML document instance, ready for detailed book, hymn, verse, and now paada identification, is shown in Example A10b.

Just to take this one step further, let's use XSLT to search this new text we've created. We can create a simple template to do this. With all the standard <xsl:output> and by removing the whitespace with <xsl:strip-space>, we can match on the path to a <paada> (you could imagine replacing "verse" with an author, for instance, to get all <paada>s by that author) to start the template. Then, a simple <xsl:if> test with the contains() function searches for a <paada> containing Ile. When that is found, <xsl:copy-of> copies its ancestor, for instance, so we get the entire verse context for our match, as shown in Example A-11a.

Example A-11b is the output result of the search. It is important to remember, however, that XSLT is not a proper query language, nor was it intended to be one. It works well for many query-like functions, but as has been said when all you have is a hammer, everything looks like a nail. At a certain point, querying with XSLT and XPath is going to run into intractable limits, including processor power. The reader might notice that, in effect, we're using XPath with XSLT here to "query" in a database sense. Future evolving standards from the W3C will weave a query langauge in XML, XQL, together with these standards. Until then, these kinds of content-based selections from a large resource are still quite efficient depending on how much detail is there in the tagging of your source.

Example A-10b Resulting XML document instance.
<book id="rv1"> <hymn id="rv1.1"> <verse id="rv1.1.1"> <paada id="rv1.1.1a"> agni;m ILe puro;hitam yajJa;sya deva;m Rtvi;jam / </paada> <paada id="rv1.1.1c"> ho;tAram ratnadhA;tamam // </paada> </verse> <verse id="rv1.1.2"> <paada id="rv1.1.2a"> agni;H pU;rvebhiH R;SibhiH I;DyaH nU;tanaiH uta; / </paada> <paada id="rv1.1.2c"> sa; devA;n A; iha; vakSati // </paada> </verse> </hymn> </book> 
Example A-11a A simple content-based search query with XSLT.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                  version="1.0"> <xsl:output type="xml" indent="yes"/> <xsl:strip-space elements="*" /> <xsl:template match="verse/paada">    <xsl:if test="contains(., 'ILe')">           <xsl:copy-of select='ancestor::verse'/>      </xsl:if> </xsl:template> </xsl:stylesheet> 
Example A-11b Resulting XML document instance from XSLT search query.
<?xml version="1.0" encoding="utf-8"?> <verse id="rv1.1.1"> <paada id="rv1.1.1a"> agni;m ILe puro;hitam yajJa;sya deva;m Rtvi;jam / </paada> <paada id="rv1.1.1c"> ho;tAram ratnadhA;tamam // </paada> </verse> 

Remember that, using XSLT, we can add more detailed categories, like who wrote a hymn, its meter, and other information. This makes it possible to further contextualize the search with XPath, such as requesting all <paada>s composed by Agastya, in the jagati meter, dedicated to Agni, containing the word tanuu. This and other plans are in the works from Harvard and Kyoto, including a use of Topic Maps and XLink.

[1] Journals and books have unique numerical identifiers called ISSN and ISBN numbers, respectively.

[2] The sample XML file on the CD only contains several MARC records, as the ATLA database is quite large and only a few records are required to make this example illustrative.

CONTENTS


XSLT and XPATH(c) A Guide to XML Transformations
XSLT and XPATH: A Guide to XML Transformations
ISBN: 0130404462
EAN: 2147483647
Year: 2005
Pages: 18

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net