With XSLT, translate data in a source file by looking up the translation in a lookup table, using FIPS codes as an example. While writing XSLT transformations, sometimes you need to convert phrases or data elements from the source file. For example, you might be transforming data from one schema to another, and the target schema might use different enumerated values. The source data might contain event-time, while the target schema requires eventTime. XSLT techniques to make these conversions are well known, and even though they may not exactly be hacks, they are well worth including here. The approach is to create a lookup table that pairs the input and output phrases. There are two variations: The lookup table is an external XML file. The lookup table is embedded into the XSLT stylesheet. With either, the lookup can be done with or without the help of keys, which will often speed up access. These variations are illustrated in this hack. 3.27.1 The FIPS Code Example For a concrete example, this hack translates FIPS (Federal Information Processing Standards) numerical codes into city and state names. FIPS codes are published by the United States government. For example, the state of Indiana has the FIPS code 18, and the city of Bethel Village, which is in Indiana, has a code of 5050. The hack changes these codes into their natural language names. Here is part of the source document (fips_lu_data.xml in Example 3-49). Example 3-49. fips_lu_data.xml <places> <place> <state>17</state> <city>14000</city> </place> <place> <state>17</state> <city>57381</city> </place> <!-- ... --> </places> We use just a few cities and states in order to have a short example. Here is the lookup table (fips.xml in Example 3-50). Example 3-50. fips.xml <fips> <state fips="17" name="ILLINOIS"> <city fips="57381" name="PALOS HEIGHTS"/> <city fips="35307" name="HINSDALE"/> <city fips="20149" name="DIXMOOR"/> <city fips="84090" name="YOUNGSDALE"/> <city fips="14000" name="CHICAGO"/> <city fips="70629" name="SOUTH CHICAGO HEIGHTS"/> </state> <state fips="18" name="INDIANA"> <city fips="1810" name="ANTIOCH"/> <city fips="36000" name="INDIANAPOLIS"/> <city fips="5050" name="BETHEL VILLAGE"/> <city fips="17740" name="DENHAM"/> </state> <state fips="26" name="MICHIGAN"> <city fips="74010" name="SIMMONS"/> <city fips="22000" name="DETROIT"/> <city fips="43180" name="KINCHELOE"/> <city fips="73260" name="SHERMAN TWP"/> </state> </fips> The easiest approach is to make the lookup table an external file, and not to use keys. The following stylesheet illustrates this variation (fips_no_keys.xsl in Example 3-51). Example 3-51. fips_no_keys.xsl <?xml version="1.0"?> <!--= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = fips_no_keys.xsl Purpose: Demonstrate using a lookup table located in an external document. Author: Thomas B Passin Creation date: 7 March 2004 Demonstrates using a lookup table and the use of document( ) to refer to nodes in the table. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =--> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- indent='yes' is used just to try to get a more readable output. It has no effect on the functionality of the output. --> <xsl:output encoding='utf-8' indent='yes'/> <!-- It is better to declare these global variables here rather than to just use the expressions inline. The lookup table is contained in the file "fips.xml". --> <xsl:variable name='cities' select='document("fips.xml")/fips/state/city'/> <xsl:variable name='states' select='document("fips.xml")/fips/state'/> <xsl:template match='places'> <places> <xsl:apply-templates select='place'/> </places> </xsl:template> <!-- This template demonstrates two methods to specify which part of the lookup table to use. Note the use of current( ), which lets us get the context-derived value into the lookup table predicate. Otherwise the use of "city" or "state" would be taken to be elements in the lookup table, not in the source document. The variable is another way to achieve the same thing. --> <xsl:template match='place'> <xsl:variable name='city-fips' select='city'/> <place> <state><xsl:value-of select='$states[@fips=current( )/state]/@name'/></state> <city><xsl:value-of select='$cities[@fips=$city-fips]/@name'/></city> </place> </xsl:template> </xsl:stylesheet> 3.27.2 Putting the Lookup Table in the Stylesheet If the lookup table is relatively short, you can put it into the stylesheet itself. You need to add a namespace to the top-level element of the table, and you need to add that namespace to the stylesheet element. You refer to nodes within the stylesheet itself using document("") (note the empty string). So change the stylesheet element to this (fips_internal_codes.xsl in Example 3-52). Example 3-52. fips_internal_codes.xsl <?xml version="1.0" encoding='utf-8'?> <!--= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = fips_internal_codes.xsl Purpose: Demonstrate using a lookup table located within the stylesheet itself. Author: Thomas B Passin Creation date: 7 March 2004 Demonstrates inserting the lookup table using a namespace, and the use of document("") to refer to nodes in the stylesheet itself. = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =--> <!-- Note the use of exclude-result-prefixes to prevent the "lu" namespace from appearing in the output document (where it would be harmless but mildly annoying). --> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:lu='http://example.com/lookup' exclude-result-prefixes='lu'> <!-- indent='yes' is used just to try to get a more readable output. It has no effect on the functionality of the output. --> <xsl:output encoding='utf-8' indent='yes'/> <!-- It is better to declare these global variables here rather than to just se the expressions inline. --> <xsl:variable name='cities' select='document("")/xsl:stylesheet/lu:fips/state/city'/> <xsl:variable name='states' select='document("")/xsl:stylesheet/lu:fips/state'/> <xsl:template match='places'> <places> <xsl:apply-templates select='place'/> </places> </xsl:template> <!-- This template demonstrates two methods to specify which part of the lookup table to use. Note the use of current( ), which lets us get the context-derived value into the lookup table predicate. Otherwise the use of "city" or "state" would be taken to be elements in the lookup table, not in the source document. The variable is another way to achieve the same thing. --> <xsl:template match='place'> <xsl:variable name='city-fips' select='city'/> <place> <state><xsl:value-of select='$states[@fips=current( )/state]/@name'/></state> <city><xsl:value-of select='$cities[@fips=$city-fips]/@name'/></city> </place> </xsl:template> <!-- The internal lookup table. The exact namespace used does not matter as long as there is one. --> <lu:fips> <state fips="17" name="ILLINOIS"> <city fips="57381" name="PALOS HEIGHTS"/> <city fips="35307" name="HINSDALE"/> <city fips="20149" name="DIXMOOR"/> <city fips="84090" name="YOUNGSDALE"/> <city fips="14000" name="CHICAGO"/> <city fips="70629" name="SOUTH CHICAGO HEIGHTS"/> </state> <state fips="18" name="INDIANA"> <city fips="1810" name="ANTIOCH"/> <city fips="36000" name="INDIANAPOLIS"/> <city fips="5050" name="BETHEL VILLAGE"/> <city fips="17740" name="DENHAM"/> </state> <state fips="26" name="MICHIGAN"> <city fips="74010" name="SIMMONS"/> <city fips="22000" name="DETROIT"/> <city fips="43180" name="KINCHELOE"/> <city fips="73260" name="SHERMAN TWP"/> </state> </lu:fips> </xsl:stylesheet> 3.27.3 Running the Hack In the following, we use the Instant Saxon XSLT processor [Hack #32] . Assuming that Instant Saxon is on your path, and that both data and stylesheet are in the current working directory, type the following command: saxon -o fips_out.xml fips_lu_data.xml fips_no_keys.xsl Here the input data is in fips_lu_data.xml, and the external lookup table fips.xml is in the same directory as the stylesheet. If fips.xml gets moved elsewhere, you have to adjust the paths on the command line and in the document() call in the stylesheet. If you have a large lookup table, you can use fips_keys.xsl instead of fips_no_keys.xsl to improve performance. To use the internal lookup table, type this command: saxon -o fips_out.xml fips_lu_data.xml fips_internal_codes.xsl All the variations give the same results (Example 3-53). Example 3-53. fips_out.xml <?xml version="1.0" encoding="utf-8"?> <places> <place> <state>ILLINOIS</state> <city>CHICAGO</city> </place> <place> <state>ILLINOIS</state> <city>PALOS HEIGHTS</city> </place> <place> <state>MICHIGAN</state> <city>DETROIT</city> </place> <place> <state>INDIANA</state> <city>BETHEL VILLAGE</city> </place> </places> Tom Passin |