Flylib.com

Books Software

 
 
 

Resources


Resources

DOM Level 2 Core Specification, November 13, 2000, by the World Wide Web Consortium. Available online at http://www.w3.org/DOM/DOMTR#dom2,

DOM Reference, Microsoft XML Core Services (MSXML) 4.0, by Microsoft Corporation. Online documentation included as part of the MSXML SDK. Available online at http://msdn.microsoft.com/xml.

Java API for XML Processing Specification 1.2, by Sun Microsystems. Online Javadoc documentation, included as part of the Java XML Pack Spring 2002. Available online at http://java.sun.com/xml.

Xerces2 API Javadoc, by Apache Software Foundation. Online Javadoc documentation, included as part of the Xerces2 SDK. Available online at http://xml.apache.org.


Chapter 8. Converting Flat Files to and from XML

Ohio is the flat place between Hoboken and Malibu.

”John Fleischman

This chapter presents utilities for converting flat files to and from XML documents. A generalized grammar for describing flat file logical organization is discussed and used in the design.


Flat File to XML: Functionality and Operation

Requirements

This utility converts a flat file containing one or more logical documents to one or more XML instance documents, each representing a single logical document. Here's a summary of the required functionality.

  • Inputs : A flat file in fixed or variable length format, with each field having a fixed length. A specific field in each record contains a record identifier. The input flat file may consist of more than one logical document. The second input is a file description document (as discussed in Chapter 6) that describes the flat file and the grammar of the XML document to be produced.

  • Processing : An Element is created in the output document for each logical group of records in the flat file, and each input flat file record is written to an Element that is a child of that group Element. Each field in the record is written to a child Element of the record Element. The organization of records into groups, of fields into records, and their Element names are derived from the file description document. Field content is converted to schema language data types as specified in the file description document. Empty fields, that is, those with zero length after whitespace has been trimmed , do not create Elements in the output document. Processing breaks optionally occur on a change in a trading partner field.

  • Output : One or more XML instance documents, each in a single file. The root Element name is derived from the grammar in the file description document. The file name is formed by appending a three-digit sequence number to the root Element name and adding the extension .xml. If break on trading partner has been specified, the documents for each trading partner are placed in a separate subdirectory. The subdirectories are named according to the trading partner IDs in the partner break field.

Running the Utility

This section provides instructions for running the flat file to XML conversion utility from the command line.

For Java:

java FlatToXML InputFile.dat OutputDirectory FileDescription.XML

or

java FlatToXML -h

For C++ on Win32:

FlatToXML InputFile.dat OutputDirectory FileDescription.XML

or

FlatToXML -h

Options follow the parameters except for the help option, which may be specified by itself.

Parameters:

  • First : File specification of the input flat file (required). The specification may include the full or relative path name. If no path name is specified, the file is assumed to reside in the current working directory. The full file name must be specified, but there is no restriction on the extension name.

  • Second : Path specification of the output directory (required). The directory must exist. Either a relative or an absolute path name may be specified. The trailing directory separator character is optional. If no break on trading partner is specified, all the created XML files are placed in this directory. If break on trading partner has been specified, then a subdirectory for each trading partner is created beneath this directory.

  • Third : File specification of the file description document (required). If no path name is specified, the file is assumed to reside in the current working directory. The full file name must be specified, but there is no restriction on the extension name.

Options:

  • -v (Validate) : Validate the created XML documents before writing them to disk. The documents are validated against the schema specified in the file description document.

  • -h (Help) : Display a help message and exit without further processing.

Restrictions:

Unless otherwise noted, all numeric limits may be modified by changing parameters in the program source and appropriate type definitions in the file description document schemas.

  • A field may have a maximum of 1,023 bytes.

  • A record may be no longer than 16,383 bytes.

  • A maximum of 100 fields per record is supported.

  • There is no absolute limit on the number of records; the number is only practically limited by system memory.

  • Each field must be assigned a unique Element name.

  • Element names are limited to 127 characters .

  • It is recommended, though not required, that field grammar Elements be specified in their record description Element in ascending order by offset.

  • Path lengths for complete file specifications are limited to 127 characters.

  • Schema location URIs are limited to 127 characters.

  • A maximum of 999 output XML documents from an input flat file is supported.

  • A maximum of 100 different trading partner destinations in an input flat file is supported.

  • The field indicating a break on trading partner must be in the beginning record of a logical document.

  • Trading partner IDs must be valid directory names for the operating system where the utility is run.

Sample Input and Output: Invoice

As in Chapter 7, we're going to use invoices from Big Daddy's Gourmet Cocoa for our example. However, Big Daddy has now upgraded to a more capable order management and bookkeeping system. The new system supports a more comprehensive flat, hierarchical file structure than the CSV formats supported by the previous system.

The simple invoice example is composed of two levels of record groups. The group at the top level is the invoice itself, consisting of a header record, ship to address, one or more line item groups, and a summary record. The second group, the line item group, contains a line item record and an item description record.

This particular file has variable length records. Although Big Daddy's system uses variable length records, we could just as easily specify a fixed length record. Table 8.1 shows the logical layout of the invoice file.

Figure 8.1 shows the sample input invoice flat file.

Table 8.1. Logical Layout for the Invoice

Group

Record

Record Tag

Field Number

Field Name

Offset

Length

Data Type

Description

Invoice

Header

HDR

1

Record Tag

3

Alphanumeric

Record identifier

     

2

Customer Number

3

20

Alphanumeric

Identifier we have assigned to the customer in our system

     

3

Invoice Number

23

20

Alphanumeric

System-assigned invoice number

     

4

Invoice Date

43

8

Date

Date of invoice, formatted YYYYMMDD

     

5

PO Number

51

20

Alphanumeric

Customer purchase order number

     

6

Due Date

71

8

Date

Date that invoice amount is due for payment, formatted as YYYYMMDD

Invoice

Ship To

SHP

1

Record Tag

3

Alphanumeric

Record identifier

     

2

Ship to Name

3

40

Alphanumeric

Name of the receiving location for the shipped order

     

3

Ship to Street 1

43

30

Alphanumeric

First address line of the receiving location

     

4

Ship to Street 2

73

30

Alphanumeric

Second address line of the receiving location

     

5

Ship to City

103

20

Alphanumeric

City of the receiving location

     

6

Ship to State or Province

123

3

Alphanumeric

State or province of the receiving location

     

7

Ship to Postal Code

126

10

Alphanumeric

Postal code of the receiving location

     

8

Ship to Country

136

3

Alphanumeric

Country of the receiving location

Invoice/Line Item

Line

LIN

1

Record Tag

3

Alphanumeric

Record identifier

     

2

Item ID

3

20

Alphanumeric

Our identifier for the ordered item

     

3

Item Invoiced Quantity

23

10

Decimal number, space filled

The number of units to be invoiced

     

4

Item Unit Price

33

10

Implied decimal, two places, zero filled

Unit price in U.S. dollars

     

5

Extended Amount Due

43

10

Implied decimal, two places, zero filled

Total amount due for the invoiced item in U.S. dollars (unit price multiplied by the quantity invoiced)

Invoice/Line Item

Item Description

DSC

1

Record Tag

3

Alphanumeric

Record identifier

     

2

Item Description

3

80

Alphanumeric

Description of the ordered item

Invoice

Summary

SUM

1

Record Tag

3

Alphanumeric

Record identifier

     

2

Total Amount

3

10

Implied decimal, two places, zero filled

Total amount due on invoice in U.S. dollars

     

3

Number of Lines

13

10

Integer, space filled

Total number of invoice lines

Figure 8.1 Sample Input Flat File (Invoices.dat)

[View full width]

10        20        30        40        50        60        70        80
graphics/ccc.gif
90        100       110       120       130
graphics/ccc.gif
HDRBQ003               2002041             20021112AZ999345            20021212
SHPYazoo Grocers - NE Distribution Center  12 Industrial Parkway NW
graphics/ccc.gif
Portland            ME 04101
LINHCVAN                       120000000259000003108
DSCInstant Hot Cocoa Mix - Vanilla flavor
LINHCMIN                       240000000253000006072
DSCInstant Hot Cocoa Mix - Mint flavor
SUM0000009180         2
HDRBQ003               2002042             20021112AW999346            20021212
SHPYazoo Grocers - SE Distribution Center  Dock 37                       3975 Hwy 75
graphics/ccc.gif
Atoka               OK 74525
LINHCVAN                       360000000259000009324
DSCInstant Hot Cocoa Mix - Vanilla flavor
LINHCMIN                       720000000253000018216
DSCInstant Hot Cocoa Mix - Mint flavor
SUM0000027540         2
HDRAY001               2002043             200211122002-0967           20021212
SHPCorner Drug and Sundries                14 Main Street
graphics/ccc.gif
Wichita             KS 67201
LINHCVAN                       240000000259000006216
DSCInstant Hot Cocoa Mix - Vanilla flavor
SUM0000006216         1
HDRBR095               2002044             200211124397-0498           20021212
SHPBig Box Discounters - Store # 97        37 MegaMall
graphics/ccc.gif
Azusa               CA 91702
LINHCMIN                      1200000000253000030360
DSCInstant Hot Cocoa Mix - Mint flavor
LINHCVAN                      3600000000259000093240
DSCInstant Hot Cocoa Mix - Vanilla flavor
LINHCDUC                      2400000000259000062160
DSCInstant Hot Cocoa Mix - Dutch Chocolate flavor
SUM0000185760         3
HDRBR095               2002045             200211124345-0498           20021212
SHPBig Box Discounters - Store # 45        45 Highway 76
graphics/ccc.gif
Branson             MO 65615
LINHCMIN                       720000000253000018216
DSCInstant Hot Cocoa Mix - Mint flavor
LINHCDUC                       960000000259000024864
DSCInstant Hot Cocoa Mix - Dutch Chocolate flavor
SUM0000043080         2
HDRDQ349               2002046             20021112987-43671           20021212
SHPMaple Leaf Grocers - DC #1              987 Yorkland Blvd
graphics/ccc.gif
Willowdale          ON M2J 4Y8   CAN
LINHCMOC                     36000000000269000096840
DSCInstant Hot Cocoa Mix - Mocha flavor
SUM0000096840         1

Listed below are the first three XML documents produced by the utility from this input file.

FlatInvoice001.xml
<?xml version="1.0" encoding="UTF-8"?>
<FlatInvoice
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="FlatInvoice.xsd">
  <Header>
    <RecordID>HDR</RecordID>
    <CustomerNumber>BQ003</CustomerNumber>
    <InvoiceNumber>2002041</InvoiceNumber>
    <InvoiceDate>2002-11-12</InvoiceDate>
    <PONumber>AZ999345</PONumber>
    <DueDate>2002-12-12</DueDate>
  </Header>
  <ShipTo>
    <RecordID>SHP</RecordID>
    <ShipToName>
      Yazoo Grocers - NE Distribution Center
    </ShipToName>
    <ShipToStreet1>12 Industrial Parkway NW</ShipToStreet1>
    <ShipToCity>Portland</ShipToCity>
    <ShipToStateOrProvince>ME</ShipToStateOrProvince>
    <ShipToPostalCode>04101</ShipToPostalCode>
  </ShipTo>
  <LineItemGroup>
    <LineItem>
      <RecordID>LIN</RecordID>
      <ItemID>HCVAN</ItemID>
      <ItemQuantity>12</ItemQuantity>
      <UnitPrice>2.59</UnitPrice>
      <ExtendedPrice>31.08</ExtendedPrice>
    </LineItem>
    <ItemDescription>
      <RecordID>DSC</RecordID>
      <Description>
        Instant Hot Cocoa Mix - Vanilla flavor
      </Description>
    </ItemDescription>
  </LineItemGroup>
  <LineItemGroup>
    <LineItem>
      <RecordID>LIN</RecordID>
      <ItemID>HCMIN</ItemID>
      <ItemQuantity>24</ItemQuantity>
      <UnitPrice>2.53</UnitPrice>
      <ExtendedPrice>60.72</ExtendedPrice>
    </LineItem>
    <ItemDescription>
      <RecordID>DSC</RecordID>
      <Description>
        Instant Hot Cocoa Mix - Mint flavor
      </Description>
    </ItemDescription>
  </LineItemGroup>
  <Summary>
    <RecordID>SUM</RecordID>
    <TotalAmount>91.80</TotalAmount>
    <NumberOfLines>2</NumberOfLines>
  </Summary>
</FlatInvoice>
FlatInvoice002.xml
<?xml version="1.0" encoding="UTF-8"?>
<FlatInvoice
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="FlatInvoice.xsd">
  <Header>
    <RecordID>HDR</RecordID>
    <CustomerNumber>BQ003</CustomerNumber>
    <InvoiceNumber>2002042</InvoiceNumber>
    <InvoiceDate>2002-11-12</InvoiceDate>
    <PONumber>AW999346</PONumber>
    <DueDate>2002-12-12</DueDate>
  </Header>
  <ShipTo>
    <RecordID>SHP</RecordID>
    <ShipToName>
      Yazoo Grocers - SE Distribution Center
    </ShipToName>
    <ShipToStreet1>Dock 37</ShipToStreet1>
    <ShipToStreet2>3975 Hwy 75</ShipToStreet2>
    <ShipToCity>Atoka</ShipToCity>
    <ShipToStateOrProvince>OK</ShipToStateOrProvince>
    <ShipToPostalCode>74525</ShipToPostalCode>
  </ShipTo>
  <LineItemGroup>
    <LineItem>
      <RecordID>LIN</RecordID>
      <ItemID>HCVAN</ItemID>
      <ItemQuantity>36</ItemQuantity>
      <UnitPrice>2.59</UnitPrice>
      <ExtendedPrice>93.24</ExtendedPrice>
    </LineItem>
    <ItemDescription>
      <RecordID>DSC</RecordID>
      <Description>
        Instant Hot Cocoa Mix - Vanilla flavor
      </Description>
    </ItemDescription>
  </LineItemGroup>
  <LineItemGroup>
    <LineItem>
      <RecordID>LIN</RecordID>
      <ItemID>HCMIN</ItemID>
      <ItemQuantity>72</ItemQuantity>
      <UnitPrice>2.53</UnitPrice>
      <ExtendedPrice>182.16</ExtendedPrice>
    </LineItem>
    <ItemDescription>
      <RecordID>DSC</RecordID>
      <Description>
        Instant Hot Cocoa Mix - Mint flavor
      </Description>
    </ItemDescription>
  </LineItemGroup>
  <Summary>
    <RecordID>SUM</RecordID>
    <TotalAmount>275.40</TotalAmount>
    <NumberOfLines>2</NumberOfLines>
  </Summary>
</FlatInvoice>
FlatInvoice003.xml
<?xml version="1.0" encoding="UTF-8"?>
<FlatInvoice
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="FlatInvoice.xsd">
  <Header>
    <RecordID>HDR</RecordID>
    <CustomerNumber>AY001</CustomerNumber>
    <InvoiceNumber>2002043</InvoiceNumber>
    <InvoiceDate>2002-11-12</InvoiceDate>
    <PONumber>2002-0967</PONumber>
    <DueDate>2002-12-12</DueDate>
  </Header>
  <ShipTo>
    <RecordID>SHP</RecordID>
    <ShipToName>Corner Drug and Sundries</ShipToName>
    <ShipToStreet1>14 Main Street</ShipToStreet1>
    <ShipToCity>Wichita</ShipToCity>
    <ShipToStateOrProvince>KS</ShipToStateOrProvince>
    <ShipToPostalCode>67201</ShipToPostalCode>
  </ShipTo>
  <LineItemGroup>
    <LineItem>
      <RecordID>LIN</RecordID>
      <ItemID>HCVAN</ItemID>
      <ItemQuantity>24</ItemQuantity>
      <UnitPrice>2.59</UnitPrice>
      <ExtendedPrice>62.16</ExtendedPrice>
    </LineItem>
    <ItemDescription>
      <RecordID>DSC</RecordID>
      <Description>
        Instant Hot Cocoa Mix - Vanilla flavor
      </Description>
    </ItemDescription>
  </LineItemGroup>
  <Summary>
    <RecordID>SUM</RecordID>
    <TotalAmount>62.16</TotalAmount>
    <NumberOfLines>1</NumberOfLines>
  </Summary>
</FlatInvoice>