Section 8.6. Converting XML


8.6. Converting XML

You might want to convert an XML document into something else, such as an HTML document, a text file, or an XML file in a different format. The standard method for converting an XML document to another format is by using XSLT (eXtensible Stylesheet Language Transformations). XSLT is complex, so we are not going over all the details of the XML vocabulary. If you to learn more about XSLT, you can find the full specification at http://www.w3.org/TR/xslt.

If XSLT doesn't do what you want, you might need to resort to other solutions. The XML_Transformer PEAR class is one possible solution. With XML_Transformer, you can do XML transformations with PHP without the need for XSLT or external libraries.

8.6.1. XSLT

To use the XSLT functions in PHP, you need to install the latest version of the libxslt library, which implements the necessary functions for transformations. If you use Windows, you can copy the libxslt.dll file from the dlls directory of the PHP distribution to a location on your path (for example, c:\winnt\system32). Enabling the extension on UNIX is done by adding --with-xsl to your configure line and recompiling. Windows users can uncomment the extension=php_xsl.dll line in the php.ini file.

As explained earlier, you can use XSLT to transform your XML documents into another format. We're going to transform a file similar to our RSS file into an X(HT)ML file by applying stylesheets to the XML document. Stylesheets are used for all transformations done with XSLT to map the elements in the source XML file with a template for each element. The first part of the XSL stylesheet contains options for input and output. We want to output the result as an HTML document with mime-type 'text/html/' in the ISO-8859-1 encoding. The namespace for the XSL declaration is defined as xsl, meaning that every element related to XSL has the prefix xsl: in front of the tag name (for example, xsl:output):

 <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL Transform"> <xsl:output encoding='ISO-8859-1'/> <xsl:output method='html' indent='yes' media-type='text/xhtml'/> 

The templates follow the leader section shown earlier. The match attribute of the xsl:template element is used to select elements in the document. In the first template, all "rdf" elements in the document will be matched. Because this is the root element of our document, the template is only applied once. When an element is matched by a template, the contents of the xsl:template are copied to the output document, with the exception of elements belonging to the XSL namespace that have a special meaning:

 <xsl:template match="rdf"> <html> <head>   <title><xsl:value-of select="channel/title"/></title> </head> <body>   <xsl:apply-templates/> </body> </html> </xsl:template> 

The <xsl:value-of /> tag "returns" the value of an element or attribute specified in the select attribute. In the template shown here, the contents of the title child of the channel element is inserted into the <title /> tag in the output document. References are usually relative to the element that has been matched.

If you want to include the contents of an attribute, rather than an element, you need to add the @ as prefix; for example, to select the "href" attribute in <a href="http://www.example.org"></a>, you can use <xsl:value-of select="@href"/> (providing the element that is matched by the template is the "a" element).

Another special tag in the previous snippetthe <xsl:apply-templates /> tagtells the XSL processor to continue processing child elements.

 <xsl:template match="channel">   <h1><xsl:value-of select="title"/></h1>   <p><xsl:value-of select="description"/></p>   <xsl:apply-templates select="items"/> </xsl:template> 

If you don't want to process all elements of the current matched element, you can select an element to process with the select attribute of the <xsl:apply-templates /> tag, similar to the match attribute of the <xsl:template /> tag. In the previous template, we continue processing child elements of the type "items" only, skipping "title", "link," and "description".

 <xsl:template match="Seq">   <ul>     <xsl:apply-templates />   </ul> </xsl:template> <xsl:key name="l" match="item" use="@about"/> <xsl:template match="li">   <li>     <a href="#{generate-id(key('l',@resource))}">       <xsl:value-of select="key('l',@resource)/title"/>     </a>   </li> </xsl:template> <xsl:template match="item">   <hr />   <a name="{generate-id()}">   <h2><xsl:value-of select="title"/></h2>   <p>     <xsl:value-of select="description"/>   </p>   <p>     <xsl:element name="a">       <xsl:attribute name="href"><xsl:value-of select="link"/></xsl:attribute>       <xsl:text>[more]</xsl:text>     </xsl:element>   </p> </a> </xsl:template> </xsl:stylesheet> 

The rest of the stylesheet makes a crosslink between the li childs of the "items" tag with the <item/>s. The XSLT magic used is beyond the scope of this chapter. Other interesting XSL elements in the template for "item" are <xsl:element/> and <xsl:attribute/>, which enable you to use the content of a value as an attribute for an output element. <a href="<xsl:value-of select="link"/> would not be valid. XML and XSL files are just forms of XML documents. Instead, you need to create an element in the output document with <xsl:element name="a"/> and add the attributes with <xsl:attribute name="href"/>, as shown in the previous template.

The modified RSS file is included here with all the namespace modifiers removed, which would have made the example unnecessarily complex:

 <?xml version="1.0" encoding="UTF-8"?> <rdf> <channel about="http://www.php.net/">   <title>PHP: Hypertext Preprocessor</title>   <link>http://www.php.net/</link>   <description>The PHP scripting language web site</description>   <items>     <Seq>       <li resource="http://qa.php.net/" />       <li resource="http://www.php.net/news.rss" />     </Seq>   </items> </channel> <item about="http://qa.php.net/">   <title>PHP 4.3.0RC4 Released</title>   <link>http://qa.php.net/</link>   <description>     Despite our best efforts, it was necessary to make one more      release candidate, hence PHP 4.3.0RC4.   </description> </item> <item about="http://www.php.net/news.rss">   <title>PHP news feed available</title>   <link>http://www.php.net/news.rss</link>   <description>     The news of PHP.net is available now in RSS 1.0 format via our      new news.rss file.   </description> </item> </rdf> 

Now that we have both the stylesheet and the XML source file, we can use PHP to apply the stylesheet to the XML document. We use the XSLT functions with the files php.net.xsl and php.net-stripped.rss, and echo the output to screen:

 <?php $dom = new domDocument(); $dom->load("php.net.xsl"); $proc = new xsltprocessor; $xsl = $proc->importStylesheet($dom); $xml = new domDocument(); $xml->load('php.net-stripped.rss'); $string = $proc->transformToXml($xml); echo $string; ?> 

Tip

You can use the same loaded XSLT stylesheet from $dom->load() for the transformation of multiple XML documents (such as $proc->transformToXml($xml) ). This saves the overhead of parsing the XSLT stylesheet.


When you call this script through your browser, the result is something like what is displayed in Figure 8.2.

Figure 8.2. Output of the XSLT transformation.


In addition to the transformToXml() method, two more XSLT processing functions are available to convert documents: transformToDoc() and transformToUrl(). transformToDoc() outputs a DomDocument that can then be processed further with the standard DOM functions described earlier. transformToUri() renders to a URI, given as the second parameter to the function, as shown here:

 <?php $proc->transformToUri($xml, "/tmp/crap.html"); ?> 



    PHP 5 Power Programming
    PHP 5 Power Programming
    ISBN: 013147149X
    EAN: 2147483647
    Year: 2003
    Pages: 240

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net