Additional Resources

   

Again, the technique in this chapter is not limited to EDIFACT . Any file format can be XML-ized, as was explained in Chapter 5. In fact, there is often value in designing an XML model from an existing format. However, depending on the input file format, you will have to do more or less work to parse it.

You should start by searching for existing parsers. For example, if you deal with Excel spreadsheets, you can turn to the OpenExchange DDL, (available from http://www.gotovbs.com ) or, in Delphi only, to the TXLSRead and TXLSWrite components (available from http://www.axolot.com/ components /xlsreadwrite.htm ). If you work with RTF or PostScript, you should consider PCYACC (available from http://www.abxsoft.com) .

If you cannot find an existing parser, you must write your own. For some formats, such as EDIFACT, I find it simpler to write the parser from scratch. In my experience, this is true for old legacy formats. Over the years , the syntax has accumulated many exceptions, so writing code around the exceptions is faster.

On the other hand, if you are lucky enough to work with a more modern format, chances are a compiler-compiler will be useful. A compiler-compiler is a tool used to help write parsers. The idea is that you write a high-level description of the format and the tool compiles it into an actual parser.

Two advantages to this approach exist. First, you are working at a higher level of abstraction, so it is faster. Second, the parsers are very efficient. The downside is that these tools work best with formats that were designed rigorously which excludes many legacy formats.

Some of the most interesting tools are as follows :

  • YACC is one of the oldest compiler-compilers . A PC version, PCYACC , is available commercially from Abraxas (http://www.abxsoft.com) . The product ships with several pre-built parsers, including RTF, VRML, HTML, and PostScript.

  • Bison is a GNU replacement for YACC. It is available from http://www.gnu.org.

  • ANTLR is a powerful open -source compiler-compiler. It is available from http://www.antlr.org.

  • Visual Parse++ is a commercial product from Sandstone that offers a graphical development environment. It is available from http://www.sand-stone.com.

For more information on this topic, read Compilers Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. This book is dubbed the "Dragon book" and enjoys an almost religious following. However, at close to 800 pages in a small typeset , it is not for the faint of heart.

If you want a shorter introduction, I recommend Compiler Construction from Niklaus Wirth (of Pascal fame). At 180 pages, it is an easy read.

   


Applied XML Solutions
Applied XML Solutions
ISBN: 0672320541
EAN: 2147483647
Year: 1999
Pages: 142

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net