Breaking Down the Conversion | Applied XML Solutions

In this chapter , we'll write software to convert the XML order from Listing 5.2 into the EDIFACT order in Listing 5.1. In the next chapter, we will see how to build the reverse transformation, from EDIFACT to XML.

If we analyze the conversion, we realize at least three steps are involved:

The converter must read the XML document.
The converter must convert between the two structures. Specifically, it must transform XML elements in EDI segments. This might involve splitting an XML element into several segments or grouping several elements into a single segment. It also must transform codes (such as ISBN ) into their EDIFACT equivalents ( IB ).
It must write the EDIFACT document according to the rules of the EDIFACT syntax.

Which tools are available to help us? An XML parser can take care of the first step, but what about the next two? It turns out that an XSLT processor can help with the second step. Missing is the capability to write a document according to the EDIFACT rules.

Indeed, if we compare this transformation with the XML-to-HTML and XML-to-XML transformations from Chapter 4, the major difference is that our output format (EDIFACT) is not in the XML family (XML or HTML).

Text Conversion

The simplest solution is to use <xml:output method="text"/> to generate the EDIFACT document. We could write rules similar to the following:

 <xsl:template match="Price">    <xsl:text>PRI+AAA:</xsl:text>    <xsl:value-of select="Price"/>    <xsl:text>::SRP</text> </xsl:template>

However, in practice, this is difficult and error prone. The EDIFACT syntax is not complicated, but it would not be easy to write XSL templates that handle compression (removing empty fields where it is unambiguous) properly. Nor would it be easy to implement the escape rules (a question mark before the + , : , ' , and ? characters ).

Introducing a Formatter

The trick then is to use XSLT for what it is good at, namely converting XML documents into other XML documents. In addition, we should complement it with our own software to deal with the idiosyncrasies of the EDIFACT format. Such software is called a formatter.

How does it work in practice? We can define an XML vocabulary that parallels the syntactic components of EDIFACT: segment, composite data element, and simple data element. For example, the segment

 PRI+AAA:24.99::SRP'

would be rendered, in its XML form, as the following:

 <Segment tag="PRI">    <Composite>       <Simple>AAA</Simple>       <Simple>24.99</Simple>       <Simple/>       <Simple>SRP</Simple>    </Composite> </Segment>

The XSL formatter cannot produce the raw EDIFACT, but it can easily produce this XML-ized version. Furthermore, writing a formatter that takes the XML-ized code and writes it according to the EDIFACT syntax isn't that difficult. It's just a matter of putting the plus sign and quotation marks in the right place. Figure 5.5 illustrates this.

Figure 5.5. Completing XSL with a custom formatter.

graphics/05fig05.gif

This gives us an interface into XSLT. The XSLT processor will generate the XML-ized version and our own formatter will turn it into proper EDIFACT. This technique enables us to harness the transformative power of XSLT for any file format, including EDIFACT, X12, RTF, Adobe Illustrator, and anything else (with the appropriate formatters).

At this point, you might wonder , why bother? Do we have to use XSLT? Isn't it easier to write an ad hoc Java application that parses the XML document and turns it immediately into EDIFACT? No XSLT, no need to XML-ize EDIFACT, and no problem!

In practice, using XSLT has several advantages, such as the following:

XSLT processors are optimized for transformation and, in most cases, they are more efficient than ad hoc solutions. Furthermore, when the XSLT processor is improved, our application benefits from a free performance boost.
It is faster to debug transformation written in XSLT than Java code because XSLT is not compiled.
In practice, we need to convert several documents: the purchase order, corresponding invoice, order acknowledgement , and more. Using XSLT, we can build a generic transformation engine that can be adapted to any document.
In my experience, it is easier to teach nonprogrammers style sheet coding than it is to teach them Java coding.
In my experience, it is easier to maintain XSLT style sheets than the corresponding Java code because style sheets are declarative in nature.

Listing 5.3 is the XML-ized version of Listing 5.1. Compare it with Listing 5.1. Figure 5.6 illustrates the structure of this document. As you can see, it's flat like EDIFACT.

Warning

Listing 5.3 is an intermediate format for our application. Using it as an XML order would not make a lot of sense, if only because it is a flat structure like EDIFACT.

Listing 5.3 XML-ized Version of the EDIFACT Order

 <?xml version="1.0"?> <Message>    <Segment tag="UNH">       <Simple>1</Simple>       <Composite>          <Simple>ORDERS</Simple>          <Simple>D</Simple>          <Simple>96A</Simple>          <Simple>UN</Simple>       </Composite>    </Segment>    <Segment tag="BGM">       <Composite>          <Simple>220</Simple>       </Composite>       <Simple>AGL153</Simple>       <Simple>9</Simple>       <Simple>AB</Simple>    </Segment>    <Segment tag="DTM">       <Composite>          <Simple>137</Simple>          <Simple>20000310</Simple>          <Simple>102</Simple>       </Composite>    </Segment>    <Segment tag="DTM">       <Composite>          <Simple>61</Simple>          <Simple>20000410</Simple>          <Simple>102</Simple>       </Composite>    </Segment>    <Segment tag="NAD">       <Simple>BY</Simple>       <Composite><Simple/></Composite>       <Composite><Simple/></Composite>       <Composite>          <Simple>PLAYFIELD BOOKS</Simple>       </Composite>       <Composite>          <Simple>34 FOUNTAIN SQUARE PLAZA</Simple>       </Composite>       <Simple>CINCINNATI</Simple>       <Simple>OH</Simple>       <Simple>45202</Simple>       <Simple>US</Simple>    </Segment>    <Segment tag="NAD">       <Simple>SE</Simple>       <Composite><Simple/></Composite>       <Composite><Simple/></Composite>       <Composite>          <Simple>QUE</Simple>       </Composite>       <Composite>          <Simple>201 WEST 103RD STREET</Simple>       </Composite>       <Simple>INDIANAPOLIS</Simple>       <Simple>IN</Simple>       <Simple>46290</Simple>       <Simple>US</Simple>    </Segment>    <Segment tag="LIN">       <Simple>1</Simple>    </Segment>    <Segment tag="PIA">       <Simple>5</Simple>       <Composite>          <Simple>0789722429</Simple>          <Simple>IB</Simple>       </Composite>    </Segment>    <Segment tag="QTY">       <Composite>          <Simple>21</Simple>          <Simple>5</Simple>       </Composite>    </Segment>    <Segment tag="PRI">       <Composite>          <Simple>AAA</Simple>          <Simple>24.99</Simple>          <Simple/>          <Simple>SRP</Simple>       </Composite>    </Segment>    <Segment tag="LIN">       <Simple>2</Simple>    </Segment>    <Segment tag="PIA">       <Simple>5</Simple>       <Composite>          <Simple>0789724308</Simple>          <Simple>IB</Simple>       </Composite>    </Segment>    <Segment tag="QTY">       <Composite>          <Simple>21</Simple>          <Simple>10</Simple>       </Composite>    </Segment>    <Segment tag="PRI">       <Composite>          <Simple>AAA</Simple>          <Simple>42.50</Simple>          <Simple/>          <Simple>SRP</Simple>       </Composite>    </Segment>    <Segment tag="UNS">       <Simple>S</Simple>    </Segment>    <Segment tag="CNT">       <Composite>          <Simple>3</Simple>          <Simple>2</Simple>       </Composite>    </Segment>    <Segment tag="UNT">       <Simple>17</Simple>       <Simple>1</Simple>    </Segment> </Message>

Figure 5.6. The structure of the XML-ized message closely mimics the EDIFACT syntax.

graphics/05fig06.gif