|
3.4. Transform XML SpreadsheetsXML spreadsheets provide Excel data in a format that can be easily used by other applications or transformed into presentation documents, such as HTML web pages. For either task, you often need to modify the content of the XML spreadsheet, and the best way to do that is with XSLT. You can use XSLT to perform a wide variety of transformations, such as:
Note: Now that you've saved a workbook as XML, what can you do with it? XSLT lets you transform XML into other formats, extract or highlight information, or make other changes. 3.4.1. How to do itTo transform an XML spreadsheet, follow these general steps:
There are three different ways to perform the transformation. The following table compares each of those techniques, and the sections that follow describe each of the preceding steps in more detail.
3.4.2. How to create XSLTXSLT is a simple language containing looping, decision-making, evaluation, branching, and functional statements. It follows the same conventions as XML and its sole purpose is to interpret and transform valid XML documents into some other text. Excel qualifies the names of the XML nodes it creates with namespaces from the Microsoft Offices schemas. An Excel workbook defines the following namespaces: <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" xmlns:html="http://www.w3.org/TR/REC-html40" xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml"> Notice that the default namespace (xmlns, highlighted in bold ) is urn:schemas-microsoft-com:office:spreadsheet . This is the same as the namespace for the ss prefix (xmlns:ss, also in bold ). You use this ss namespace prefix when referring to workbook nodes in your XSLT file. Different nodes in the XML spreadsheet use different default namespaces. For instance, the DocumentProperties node uses the following default namespace: <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> Therefore, when referring to the DocumentProperties node or its children, define a prefix for the namespace urn:schemas-microsoft-com:office:office in your XSLT, and use that prefix to refer to those nodes. If you don't do this, your XSLT won't recognize the nodes in the XML spreadsheet. This is a very common problem, and it should be the first place you look if your XSLT outputs nothing. It is convenient to copy the namespace definitions from the XML spreadsheet Worksheet node to your XSLT stylesheet. For instance, the following XSLT example uses the copied ss namespace to locate nodes in an XML spreadsheet: <?xml version="1.0"?> <!-- Strip.xslt transforms a XML Spreadsheet to its bare essentials --> <xsl:stylesheet version="1.0" xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"> <xsl:output method="xml" indent="yes" /> <xsl:template match="ss:Workbook"> <xsl:processing-instruction name="mso-application">prog </xsl:processing-instruction> <xsl:element name="ss:Workbook"> <xsl:copy-of select="ss:Styles" /> <xsl:for-each select="ss:Worksheet"> <xsl:if test="count(ss:Table/ss:Row/ss:Cell/ss:Data) > 0"> <xsl:copy-of select="." /> </xsl:if> </xsl:for-each> </xsl:element> </xsl:template> </xsl:stylesheet> The preceding transformation copies worksheets that contain data, and formatting styles used by cells in those worksheets, into a new XML spreadsheet file. Empty worksheets, document properties, and other items are simply omitted. Excel still recognizes the resulting output as an XML spreadsheet since it conforms to the Excel schema and contains the mso-application processing instruction. To see how this transformation works:
Figure 3-2. An XML spreadsheet with empty worksheets removed by a transformation3.4.3. How to transform in codeAs mentioned previously, there are several ways to transform XML. Transforming XML from Visual Basic code uses the Microsoft XML object library to call the Microsoft XML Parser (msxml4.dll). The Microsoft XML object library also provides a means to create new XML files; navigate between nodes; copy, delete, and add nodes; and more. To perform a transformation in code, follow these steps:
For example, the following code loads the TestStrip.xml XML spreadsheet and Strip.xslt transformation, processes the transformation, and saves the result: Sub Strip( ) ' Requires reference to Microsoft XML Dim xdoc As New DOMDocument, xstyle As New DOMDocument Dim xml As String xdoc.Load (ThisWorkbook.path & "\TestStrip.xml") xstyle.Load (ThisWorkbook.path & "\Strip.xslt") xml = xdoc.transformNode(xstyle) SaveFile xml, "Out.xml" End Sub Sub SaveFile(content As String, fileName As String) ' Requires reference to Microsoft Scripting Runtime Dim fso As New FileSystemObject, strm As TextStream fileName = ThisWorkbook.path & "\" & fileName If fso.FileExists(fileName) Then fso.DeleteFile(fileName) Set strm = fso.CreateTextFile(fileName) strm.Write (content) strm.Close End Sub The preceding SaveFile helper procedure is necessary because the transformNode method returns a string containing the XML created by the transformation. Once the XML is saved, you can open the file by double-clicking on it in Windows Explorer, or by using the following code: Application.Workbooks.Open ("out.xml") 3.4.4. How to transform from the command lineYou can also perform transformations using the command-line transformation utility (msxsl.exe). Msxsl.exe is available from Microsoft for free in the MSDN download area. It is a small shell executable that simply calls the Microsoft XML Parser to perform the transformation. For example, the following command line transforms the TestStrip.xml file, using the Strip.xslt transformation shown previously, and writes the output to Out.xml : msxsl TestStrip.xml Strip.xslt -o Out.xml The output is the same as that created by using the DOMDocument object's TRansformNode method shown in the preceding section. The command-line utility allows you to automate transformations using batch files rather than Visual Basic code. 3.4.5. How to transform in the browserAnother way to perform a transformation is to include an xml-stylesheet processing instruction in the XML spreadsheet. The mso-application instruction supersedes other instructions, so you must replace that processing instruction in order to have a browser perform the translation. The following XML shows the changes you must make to the XML spreadsheet file: deletions are shown in strikethrough and additions are shown in bold : <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="worksheet.xslt"?> <?mso-application prog?> <Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet" xmlns:o="urn:schemas-microsoft-com:office:office" ... Now when a user opens the XML file, the browser transforms and displays the file (Figure 3-3). Note: Not all browsers support this technique. Be sure to test your transformations in non-Microsoft browsers. Figure 3-3. Transforming an XML spreadsheet in the browser3.4.6. How it worksThe transformation shown in Figure 3-3 converts cells in a worksheet to HTML table elements. It also displays document properties of the workbook. The transformation is performed by the following XSLT fragment: <xsl:template match="ss:Workbook"> <html> <body> <h1>Display XML Spreadsheets as HTML Tables</h1> Author: <xsl:value-of select="o:DocumentProperties/o:Author" /> <br /> LastSaved: <xsl:value-of select="o:DocumentProperties/o:LastSaved" /> <br /> Number of worksheets: <xsl:value-of select="count(ss:Worksheet)" /> <xsl:for-each select="ss:Worksheet"> <h2><xsl:value-of select="@ss:Name" /></h2> <table border="1" frame="box"> <xsl:for-each select="ss:Table/ss:Row"> <tr> <xsl:for-each select="ss:Cell/ss:Data"> <td><xsl:value-of select="." /></td> </xsl:for-each> </tr> </xsl:for-each> </table> </xsl:for-each> </body> </html> The advantage of using a processing instruction to perform the transformation is that you don't significantly alter the spreadsheet. You can switch the file back to an Excel XML spreadsheet simply by removing the xml-stylesheet instruction and replacing the mso-application instruction, as shown in the following XSLT fragment: <xsl:template match="ss:Workbook"> <xsl:processing-instruction name="xml-stylesheet"> type="text/xsl" href="Worksheet.xslt"</xsl:processing-instruction> <xsl:copy-of select="." /> </xsl:template> To transform the file back into an XML spreadsheet, simply change the xsl:processing-instruction element: <xsl:processing-instruction name="mso-application"> prog</xsl:processing-instruction> When a user requests an XML file that includes an xml-stylesheet processing instruction, the file is downloaded and the transformation is processed on the user's machine. That takes more time than if the XML file had already been transformed, but any changes to the XSLT are automatically reflected, because the transformation is performed dynamically. 3.4.7. How to reset file associationsIn order for the mso-application processing instruction to work correctly when the user opens the file from Windows Explorer, the XML file type must be associated with the Microsoft Office XML Editor (MsoXmlEd.Exe). Some applications, such as Mozilla, can change this association and break the ability to easily open these files from Explorer. If this happens, you can reset the XML file type by following these steps:
3.4.8. What about...
|
|