Hack 56 Use Lookup Tables with XSLT to Translate FIPS Codes

   

figs/moderate.gif figs/hack56.gif

With XSLT, translate data in a source file by looking up the translation in a lookup table, using FIPS codes as an example.

While writing XSLT transformations, sometimes you need to convert phrases or data elements from the source file. For example, you might be transforming data from one schema to another, and the target schema might use different enumerated values. The source data might contain event-time, while the target schema requires eventTime.

XSLT techniques to make these conversions are well known, and even though they may not exactly be hacks, they are well worth including here. The approach is to create a lookup table that pairs the input and output phrases. There are two variations:

  1. The lookup table is an external XML file.

  2. The lookup table is embedded into the XSLT stylesheet.

With either, the lookup can be done with or without the help of keys, which will often speed up access. These variations are illustrated in this hack.

3.27.1 The FIPS Code Example

For a concrete example, this hack translates FIPS (Federal Information Processing Standards) numerical codes into city and state names. FIPS codes are published by the United States government. For example, the state of Indiana has the FIPS code 18, and the city of Bethel Village, which is in Indiana, has a code of 5050. The hack changes these codes into their natural language names.

Here is part of the source document (fips_lu_data.xml in Example 3-49).

Example 3-49. fips_lu_data.xml
<places>     <place>         <state>17</state>         <city>14000</city>     </place>         <place>         <state>17</state>         <city>57381</city>     </place>         <!-- ... --> </places>

We use just a few cities and states in order to have a short example. Here is the lookup table (fips.xml in Example 3-50).

Example 3-50. fips.xml
<fips>      <state fips="17" name="ILLINOIS">          <city fips="57381" name="PALOS HEIGHTS"/>          <city fips="35307" name="HINSDALE"/>          <city fips="20149" name="DIXMOOR"/>          <city fips="84090" name="YOUNGSDALE"/>          <city fips="14000" name="CHICAGO"/>          <city fips="70629" name="SOUTH CHICAGO HEIGHTS"/>      </state>      <state fips="18" name="INDIANA">          <city fips="1810" name="ANTIOCH"/>          <city fips="36000" name="INDIANAPOLIS"/>          <city fips="5050" name="BETHEL VILLAGE"/>          <city fips="17740" name="DENHAM"/>      </state>      <state fips="26" name="MICHIGAN">          <city fips="74010" name="SIMMONS"/>          <city fips="22000" name="DETROIT"/>          <city fips="43180" name="KINCHELOE"/>          <city fips="73260" name="SHERMAN TWP"/>      </state>  </fips>

The easiest approach is to make the lookup table an external file, and not to use keys. The following stylesheet illustrates this variation (fips_no_keys.xsl in Example 3-51).

Example 3-51. fips_no_keys.xsl
<?xml version="1.0"?> <!--=  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =   =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =         fips_no_keys.xsl         Purpose:                 Demonstrate using a lookup table located                 in an external document.         Author: Thomas B Passin         Creation date: 7 March 2004             Demonstrates using a lookup table and the use of         document( ) to refer to nodes in the table. =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =   =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =--> <xsl:stylesheet version="1.0"         xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <!--         indent='yes' is used just to try to get a more          readable output.  It has no effect on the          functionality of the output. --> <xsl:output encoding='utf-8' indent='yes'/>     <!--         It is better to declare these global variables here          rather than to just use the expressions inline.         The lookup table is contained in the file "fips.xml". --> <xsl:variable name='cities'       select='document("fips.xml")/fips/state/city'/> <xsl:variable name='states'       select='document("fips.xml")/fips/state'/>     <xsl:template match='places'> <places>         <xsl:apply-templates select='place'/> </places> </xsl:template>     <!--         This template demonstrates two methods to specify          which part of the lookup table to use.  Note the use          of current( ), which lets us get the context-derived          value into the lookup table predicate.  Otherwise the          use of "city" or "state" would be taken to be elements          in the lookup table, not in the source document.             The variable is another way to achieve the same         thing. --> <xsl:template match='place'>  <xsl:variable name='city-fips' select='city'/>  <place>   <state><xsl:value-of       select='$states[@fips=current( )/state]/@name'/></state>   <city><xsl:value-of       select='$cities[@fips=$city-fips]/@name'/></city>  </place> </xsl:template>     </xsl:stylesheet>

3.27.2 Putting the Lookup Table in the Stylesheet

If the lookup table is relatively short, you can put it into the stylesheet itself. You need to add a namespace to the top-level element of the table, and you need to add that namespace to the stylesheet element. You refer to nodes within the stylesheet itself using document("") (note the empty string).

So change the stylesheet element to this (fips_internal_codes.xsl in Example 3-52).

Example 3-52. fips_internal_codes.xsl
<?xml version="1.0" encoding='utf-8'?> <!--=  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =   =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =     fips_internal_codes.xsl     Purpose:         Demonstrate using a lookup table located          within the stylesheet itself.     Author: Thomas B Passin         Creation date: 7 March 2004          Demonstrates inserting the lookup table using a     namespace, and the use of document("") to refer     to nodes in the stylesheet itself. =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =   =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =  =--> <!--     Note the use of exclude-result-prefixes to prevent     the "lu" namespace from appearing in the output     document (where it would be harmless but mildly     annoying). --> <xsl:stylesheet version="1.0"      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"     xmlns:lu='http://example.com/lookup'     exclude-result-prefixes='lu'>      <!--     indent='yes' is used just to try to get a more readable     output.  It has no effect on the functionality of the     output. --> <xsl:output encoding='utf-8' indent='yes'/>     <!--     It is better to declare these global variables here rather      than to just se the expressions inline. --> <xsl:variable name='cities' select='document("")/xsl:stylesheet/lu:fips/state/city'/> <xsl:variable name='states' select='document("")/xsl:stylesheet/lu:fips/state'/>     <xsl:template match='places'> <places>     <xsl:apply-templates select='place'/> </places>     </xsl:template>     <!--     This template demonstrates two methods to specify which      part of the lookup table to use.  Note the use of      current( ), which lets us get the context-derived value      into the lookup table predicate.  Otherwise the use of      "city" or "state" would be taken to be elements in the      lookup table, not in the source document.            The variable is another way to achieve the same     thing. --> <xsl:template match='place'>     <xsl:variable name='city-fips' select='city'/>     <place>         <state><xsl:value-of select='$states[@fips=current( )/state]/@name'/></state>         <city><xsl:value-of select='$cities[@fips=$city-fips]/@name'/></city>     </place> </xsl:template>     <!--     The internal lookup table.  The exact namespace used does      not matter as long as there is one. --> <lu:fips>      <state fips="17" name="ILLINOIS">          <city fips="57381" name="PALOS HEIGHTS"/>          <city fips="35307" name="HINSDALE"/>          <city fips="20149" name="DIXMOOR"/>          <city fips="84090" name="YOUNGSDALE"/>          <city fips="14000" name="CHICAGO"/>          <city fips="70629" name="SOUTH CHICAGO HEIGHTS"/>      </state>      <state fips="18" name="INDIANA">          <city fips="1810" name="ANTIOCH"/>          <city fips="36000" name="INDIANAPOLIS"/>          <city fips="5050" name="BETHEL VILLAGE"/>          <city fips="17740" name="DENHAM"/>      </state>      <state fips="26" name="MICHIGAN">          <city fips="74010" name="SIMMONS"/>          <city fips="22000" name="DETROIT"/>          <city fips="43180" name="KINCHELOE"/>          <city fips="73260" name="SHERMAN TWP"/>      </state>  </lu:fips>   </xsl:stylesheet>

3.27.3 Running the Hack

In the following, we use the Instant Saxon XSLT processor [Hack #32] . Assuming that Instant Saxon is on your path, and that both data and stylesheet are in the current working directory, type the following command:

saxon -o fips_out.xml fips_lu_data.xml fips_no_keys.xsl

Here the input data is in fips_lu_data.xml, and the external lookup table fips.xml is in the same directory as the stylesheet. If fips.xml gets moved elsewhere, you have to adjust the paths on the command line and in the document() call in the stylesheet. If you have a large lookup table, you can use fips_keys.xsl instead of fips_no_keys.xsl to improve performance. To use the internal lookup table, type this command:

saxon -o fips_out.xml fips_lu_data.xml fips_internal_codes.xsl

All the variations give the same results (Example 3-53).

Example 3-53. fips_out.xml
<?xml version="1.0" encoding="utf-8"?> <places>    <place>       <state>ILLINOIS</state>       <city>CHICAGO</city>    </place>    <place>       <state>ILLINOIS</state>       <city>PALOS HEIGHTS</city>    </place>    <place>       <state>MICHIGAN</state>       <city>DETROIT</city>    </place>    <place>       <state>INDIANA</state>       <city>BETHEL VILLAGE</city>    </place> </places>

Tom Passin



XML Hacks
XML Hacks: 100 Industrial-Strength Tips and Tools
ISBN: 0596007116
EAN: 2147483647
Year: 2006
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net