9.5 XSL: The Internet s Rosetta Stone | Internet-Enabled Business Intelligence


Team-Fly

	Internet-Enabled Business Intelligence By William A. Giovinazzo
	Table of Contents

	Chapter 9. eXtensible Markup Language

9.5 XSL: The Internet's Rosetta Stone

In 1799 near the lower Egyptian town of Rosetta, French troops discovered a black basalt slab that turned out to be of great importance in understanding ancient cultures. The stone, known today as the Rosetta Stone, bore three different scripts: hieroglyphic, demotic, and Greek. The scripts each told the same story. In 1822, French Egyptologist Jean Francois Champollion was able to decipher this stone, making it possible to translate papyri and other stones. This translation identified the clues essential in deciphering all ancient Egyptian inscriptions.

Today we are in a new age. Information is no longer etched in stone but sent electronically through wires and fiber optic cables. Throughout this chapter, we have discussed how XML can be used as a means of providing structure to this data so that it is more easily communicated. We have not discussed, however, the weightier matter of how this XML translation occurs. Of course, we are always left with the alternative of translating this data programmatically. While this is certainly an option, it defeats the purpose of XML being a simple way to communicate information between systems.

The solution, as one can no doubt surmise from the title of this section, is XSL, eXtensible Stylesheet Language. XSL turns out to be the Rosetta Stone of the Internet age. It contains two XML applications, which operate independently of one another: one to transform the documents and one to format the resultant output. It is important to note the independent aspects of the two applications. In many environments, the representation or format of the data is not relevant. We have seen this in the B2B space, where XML is used merely to translate the data from one system's format to another. There is no human review of the data, so formatting is unnecessary.

Eventually, the data needs to be displayed; this is especially the case in the BI arena. XSL therefore deals with the display of information, providing transformations to HTML as well as XML. Transformations to XML documents are not restricted to the original set of tags established in the original document's DTD. Transformations can generate completely new sets of tags in the resultant XML or HTML document.

The transformation can occur on either the server or the client. The transformation can be performed in advance and stored on the server. We will see an example of this when reviewing a real-world use of XML in an IEBI environment. The transformation can also occur when requested on the client. Figure 9.10 presents the simplicity of an XSL transformation. As shown in the diagram, the transformation receives as input the XSL style sheet, the input document, and optionally the DTD of the input document. The input document can be of any markup language that is a descendent of XML. The transformation process requires that the input document contain a tree structure. This means that all input documents must meet the XML validity requirement. Whether the input is XML, HTML, or SGML, all must be valid.

Figure 9.10. XSL transformation.

graphics/09fig10.gif

The second input is actually a specific type of XML document, an XSL style sheet. This style sheet contains a series of transformation rules referred to as a template . Each rule is a template element containing a pattern and a description of the output. The pattern is the match attribute of the template element. The value of this pattern is compared with the nodes of the input XML document. Nodes matching the pattern described in the XSL template elements are formatted according to the content of that template element. The output is then put into an XSL transformation output tree. XSL's use of the input tree structure is what necessitates the input document's compliance to XML's validity requirement.

While this explanation may seem a bit nebulous, let's look to our annual report for clarification . Let us say that we want to take the annual report and display it on our Web site. Of course, we would want to highlight the executive summary on a page of its own. Figure 9.11 presents the XSL style sheet for extracting the executive summary from the XML document to create an HTML file.

We now have the basic XSL template. The next step in the process is to use this template to generate an HTML file. The XSL process works with the input tree structure. It works its way through this structure, comparing each node in the structure with the elements of the template. To visualize this process a bit more clearly, we present the structure of the quarterly report in Figure 9.12. The XSL transformation process begins with the root node of the XML input document. It compares this node with all the template rules in the style sheet. We see that this matches the pattern of the first node in our XSL document. The output of this element is the opening HTML. This opening tag is written to the output document. We also see that the element directs the transformation to apply-templates. This causes the transformation to process the children of the current node.

Figure 9.11 XSL style sheet.

 <xsl:stylesheet>   <xsl:template match="/">     <html>       <xsl:apply-templates/>     </html>   </xsl:template>   <xsl:template match="header">     <xsl:apply-templates/>   </xsl:template>   <xsl:template match="rpt_title">     <head>       <xsl:apply-templates select="rpt_title"/>     </head>   </xsl:template>   <xsl:template match="executive_sum">     <body>       <xsl:apply-templates/>     </body>   </xsl:template>   <xsl:template match="title">     <b>     <xsl:value-of select="title"/>     </b>   </xsl:template>   <xsl:template match="paragraph">     <xsl:value-of select="paragraph">   </xsl:template> </xsl:stylesheet>

Figure 9.12. Quarterly report structure.

graphics/09fig12.gif

The next child in the structure is the header node. This also happens to match the next element in our XSL template. The content of this element is apply-templates, which causes the transformation to process the children of this node. The next node in the input XML document is the report title, which is contained in the rpt_title node. This node matches the next element of the XSL template. The content of the element outputs the report title to the header of the HTML page. It then directs the transformation to proceed to the next node in the XML structure with the apply-templates command.

Figure 9.13 HTML version of quarterly report.

 <html>    <head>    BIG TIME CORPORATION THIRD QUARTER REPORT    </head>    <body>    <b> Another Record Breaking Quarter </b>       more paragraphs       </body> </html>

The next node in the header branch of the quarterly report is the executive summary. As we move into the executive_summary branch, the tag body_text is inserted into the HTML document. Nested within this insertion is another statement to apply the templates, which drives us down into the children nodes in the structure. The next child node is the title node that conveniently matches the pattern for the next element in the XSL template. This element's output causes the text "Another Record-Breaking Quarter" to be written out, surrounded by HTML bold tags, <b> </b>. The template element is applied to the paragraph node within the executive summary. The difference is that the only output that is generated is the text within the original XML document. Figure 9.13 presents the results of the transformation process.

In reviewing Figure 9.13, we see that we have easily transformed an XML document into an HTML file. This process can be repeated for the rest of the quarterly report. Using attributes where appropriate, we can insert links between pages. As each quarterly report is generated, it is automatically posted to the corporate Web site, where employees , investors, market analysts, and even competitors can view it.

In this example, we placed a quarterly report on a Web site. Big deal, eh? What's this have to do with business intelligence? Well, let's not miss the vision because of this simplified example. In Figure 9.14 we show how we might be able to take advantage of XML in an Internet-enabled data warehouse. Think of what we can do with such a transformation process when applying it to BI. We discussed at the beginning of this chapter how the use of XML in a B2B exchange can simplify the extraction of the data from the daily transaction. We also showed that one of the advantages of XML is that we can take two documents, an XML document and a DTD, and easily convert them to a memory structure. This is presented in Figure 9.14. When we wish to convert different types of documents or make modifications to the current document, we change not the conversion process but the input to that process.

Figure 9.14. XML in an Internet-enabled data warehouse.

graphics/09fig14.gif

Now, consider the nature of an exchange. It is still a transaction-oriented system where the structure and format of its data is to facilitate the processing of individual transactions, no different from any other source system. If the transactions are based on XML, however, the ETL process is greatly simplified. First, we know the structure of the XML documents flowing through the exchange, since we probably have the DTDs of these documents. If we don't, how can our exchange deal with them? We then use XSL to extract from these files the data we need for our data warehouse. The XSL transformation process goes through the document, searching for the nodes that are of interest, and writes them to the output XML document.


Team-Fly

Top