Requirements This utility converts a flat file containing one or more logical documents to one or more XML instance documents, each representing a single logical document. Here's a summary of the required functionality. -
Inputs : A flat file in fixed or variable length format, with each field having a fixed length. A specific field in each record contains a record identifier. The input flat file may consist of more than one logical document. The second input is a file description document (as discussed in Chapter 6) that describes the flat file and the grammar of the XML document to be produced. -
Processing : An Element is created in the output document for each logical group of records in the flat file, and each input flat file record is written to an Element that is a child of that group Element. Each field in the record is written to a child Element of the record Element. The organization of records into groups, of fields into records, and their Element names are derived from the file description document. Field content is converted to schema language data types as specified in the file description document. Empty fields, that is, those with zero length after whitespace has been trimmed , do not create Elements in the output document. Processing breaks optionally occur on a change in a trading partner field. -
Output : One or more XML instance documents, each in a single file. The root Element name is derived from the grammar in the file description document. The file name is formed by appending a three-digit sequence number to the root Element name and adding the extension .xml. If break on trading partner has been specified, the documents for each trading partner are placed in a separate subdirectory. The subdirectories are named according to the trading partner IDs in the partner break field. Running the Utility This section provides instructions for running the flat file to XML conversion utility from the command line. For Java: java FlatToXML InputFile.dat OutputDirectory FileDescription.XML or java FlatToXML -h For C++ on Win32: FlatToXML InputFile.dat OutputDirectory FileDescription.XML or FlatToXML -h Options follow the parameters except for the help option, which may be specified by itself. Parameters: -
First : File specification of the input flat file (required). The specification may include the full or relative path name. If no path name is specified, the file is assumed to reside in the current working directory. The full file name must be specified, but there is no restriction on the extension name. -
Second : Path specification of the output directory (required). The directory must exist. Either a relative or an absolute path name may be specified. The trailing directory separator character is optional. If no break on trading partner is specified, all the created XML files are placed in this directory. If break on trading partner has been specified, then a subdirectory for each trading partner is created beneath this directory. -
Third : File specification of the file description document (required). If no path name is specified, the file is assumed to reside in the current working directory. The full file name must be specified, but there is no restriction on the extension name. Options: -
-v (Validate) : Validate the created XML documents before writing them to disk. The documents are validated against the schema specified in the file description document. -
-h (Help) : Display a help message and exit without further processing. Restrictions: Unless otherwise noted, all numeric limits may be modified by changing parameters in the program source and appropriate type definitions in the file description document schemas. -
A field may have a maximum of 1,023 bytes. -
A record may be no longer than 16,383 bytes. -
A maximum of 100 fields per record is supported. -
There is no absolute limit on the number of records; the number is only practically limited by system memory. -
Each field must be assigned a unique Element name. -
Element names are limited to 127 characters . -
It is recommended, though not required, that field grammar Elements be specified in their record description Element in ascending order by offset. -
Path lengths for complete file specifications are limited to 127 characters. -
Schema location URIs are limited to 127 characters. -
A maximum of 999 output XML documents from an input flat file is supported. -
A maximum of 100 different trading partner destinations in an input flat file is supported. -
The field indicating a break on trading partner must be in the beginning record of a logical document. -
Trading partner IDs must be valid directory names for the operating system where the utility is run. Sample Input and Output: Invoice As in Chapter 7, we're going to use invoices from Big Daddy's Gourmet Cocoa for our example. However, Big Daddy has now upgraded to a more capable order management and bookkeeping system. The new system supports a more comprehensive flat, hierarchical file structure than the CSV formats supported by the previous system. The simple invoice example is composed of two levels of record groups. The group at the top level is the invoice itself, consisting of a header record, ship to address, one or more line item groups, and a summary record. The second group, the line item group, contains a line item record and an item description record. This particular file has variable length records. Although Big Daddy's system uses variable length records, we could just as easily specify a fixed length record. Table 8.1 shows the logical layout of the invoice file. Figure 8.1 shows the sample input invoice flat file. Table 8.1. Logical Layout for the Invoice Group | Record | Record Tag | Field Number | Field Name | Offset | Length | Data Type | Description | Invoice | Header | HDR | 1 | Record Tag | | 3 | Alphanumeric | Record identifier | | | | 2 | Customer Number | 3 | 20 | Alphanumeric | Identifier we have assigned to the customer in our system | | | | 3 | Invoice Number | 23 | 20 | Alphanumeric | System-assigned invoice number | | | | 4 | Invoice Date | 43 | 8 | Date | Date of invoice, formatted YYYYMMDD | | | | 5 | PO Number | 51 | 20 | Alphanumeric | Customer purchase order number | | | | 6 | Due Date | 71 | 8 | Date | Date that invoice amount is due for payment, formatted as YYYYMMDD | Invoice | Ship To | SHP | 1 | Record Tag | | 3 | Alphanumeric | Record identifier | | | | 2 | Ship to Name | 3 | 40 | Alphanumeric | Name of the receiving location for the shipped order | | | | 3 | Ship to Street 1 | 43 | 30 | Alphanumeric | First address line of the receiving location | | | | 4 | Ship to Street 2 | 73 | 30 | Alphanumeric | Second address line of the receiving location | | | | 5 | Ship to City | 103 | 20 | Alphanumeric | City of the receiving location | | | | 6 | Ship to State or Province | 123 | 3 | Alphanumeric | State or province of the receiving location | | | | 7 | Ship to Postal Code | 126 | 10 | Alphanumeric | Postal code of the receiving location | | | | 8 | Ship to Country | 136 | 3 | Alphanumeric | Country of the receiving location | Invoice/Line Item | Line | LIN | 1 | Record Tag | | 3 | Alphanumeric | Record identifier | | | | 2 | Item ID | 3 | 20 | Alphanumeric | Our identifier for the ordered item | | | | 3 | Item Invoiced Quantity | 23 | 10 | Decimal number, space filled | The number of units to be invoiced | | | | 4 | Item Unit Price | 33 | 10 | Implied decimal, two places, zero filled | Unit price in U.S. dollars | | | | 5 | Extended Amount Due | 43 | 10 | Implied decimal, two places, zero filled | Total amount due for the invoiced item in U.S. dollars (unit price multiplied by the quantity invoiced) | Invoice/Line Item | Item Description | DSC | 1 | Record Tag | | 3 | Alphanumeric | Record identifier | | | | 2 | Item Description | 3 | 80 | Alphanumeric | Description of the ordered item | Invoice | Summary | SUM | 1 | Record Tag | | 3 | Alphanumeric | Record identifier | | | | 2 | Total Amount | 3 | 10 | Implied decimal, two places, zero filled | Total amount due on invoice in U.S. dollars | | | | 3 | Number of Lines | 13 | 10 | Integer, space filled | Total number of invoice lines | Figure 8.1 Sample Input Flat File (Invoices.dat) [View full width] 10 20 30 40 50 60 70 80 90 100 110 120 130 HDRBQ003 2002041 20021112AZ999345 20021212 SHPYazoo Grocers - NE Distribution Center 12 Industrial Parkway NW Portland ME 04101 LINHCVAN 120000000259000003108 DSCInstant Hot Cocoa Mix - Vanilla flavor LINHCMIN 240000000253000006072 DSCInstant Hot Cocoa Mix - Mint flavor SUM0000009180 2 HDRBQ003 2002042 20021112AW999346 20021212 SHPYazoo Grocers - SE Distribution Center Dock 37 3975 Hwy 75 Atoka OK 74525 LINHCVAN 360000000259000009324 DSCInstant Hot Cocoa Mix - Vanilla flavor LINHCMIN 720000000253000018216 DSCInstant Hot Cocoa Mix - Mint flavor SUM0000027540 2 HDRAY001 2002043 200211122002-0967 20021212 SHPCorner Drug and Sundries 14 Main Street Wichita KS 67201 LINHCVAN 240000000259000006216 DSCInstant Hot Cocoa Mix - Vanilla flavor SUM0000006216 1 HDRBR095 2002044 200211124397-0498 20021212 SHPBig Box Discounters - Store # 97 37 MegaMall Azusa CA 91702 LINHCMIN 1200000000253000030360 DSCInstant Hot Cocoa Mix - Mint flavor LINHCVAN 3600000000259000093240 DSCInstant Hot Cocoa Mix - Vanilla flavor LINHCDUC 2400000000259000062160 DSCInstant Hot Cocoa Mix - Dutch Chocolate flavor SUM0000185760 3 HDRBR095 2002045 200211124345-0498 20021212 SHPBig Box Discounters - Store # 45 45 Highway 76 Branson MO 65615 LINHCMIN 720000000253000018216 DSCInstant Hot Cocoa Mix - Mint flavor LINHCDUC 960000000259000024864 DSCInstant Hot Cocoa Mix - Dutch Chocolate flavor SUM0000043080 2 HDRDQ349 2002046 20021112987-43671 20021212 SHPMaple Leaf Grocers - DC #1 987 Yorkland Blvd Willowdale ON M2J 4Y8 CAN LINHCMOC 36000000000269000096840 DSCInstant Hot Cocoa Mix - Mocha flavor SUM0000096840 1 Listed below are the first three XML documents produced by the utility from this input file. FlatInvoice001.xml <?xml version="1.0" encoding="UTF-8"?> <FlatInvoice xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FlatInvoice.xsd"> <Header> <RecordID>HDR</RecordID> <CustomerNumber>BQ003</CustomerNumber> <InvoiceNumber>2002041</InvoiceNumber> <InvoiceDate>2002-11-12</InvoiceDate> <PONumber>AZ999345</PONumber> <DueDate>2002-12-12</DueDate> </Header> <ShipTo> <RecordID>SHP</RecordID> <ShipToName> Yazoo Grocers - NE Distribution Center </ShipToName> <ShipToStreet1>12 Industrial Parkway NW</ShipToStreet1> <ShipToCity>Portland</ShipToCity> <ShipToStateOrProvince>ME</ShipToStateOrProvince> <ShipToPostalCode>04101</ShipToPostalCode> </ShipTo> <LineItemGroup> <LineItem> <RecordID>LIN</RecordID> <ItemID>HCVAN</ItemID> <ItemQuantity>12</ItemQuantity> <UnitPrice>2.59</UnitPrice> <ExtendedPrice>31.08</ExtendedPrice> </LineItem> <ItemDescription> <RecordID>DSC</RecordID> <Description> Instant Hot Cocoa Mix - Vanilla flavor </Description> </ItemDescription> </LineItemGroup> <LineItemGroup> <LineItem> <RecordID>LIN</RecordID> <ItemID>HCMIN</ItemID> <ItemQuantity>24</ItemQuantity> <UnitPrice>2.53</UnitPrice> <ExtendedPrice>60.72</ExtendedPrice> </LineItem> <ItemDescription> <RecordID>DSC</RecordID> <Description> Instant Hot Cocoa Mix - Mint flavor </Description> </ItemDescription> </LineItemGroup> <Summary> <RecordID>SUM</RecordID> <TotalAmount>91.80</TotalAmount> <NumberOfLines>2</NumberOfLines> </Summary> </FlatInvoice> FlatInvoice002.xml <?xml version="1.0" encoding="UTF-8"?> <FlatInvoice xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FlatInvoice.xsd"> <Header> <RecordID>HDR</RecordID> <CustomerNumber>BQ003</CustomerNumber> <InvoiceNumber>2002042</InvoiceNumber> <InvoiceDate>2002-11-12</InvoiceDate> <PONumber>AW999346</PONumber> <DueDate>2002-12-12</DueDate> </Header> <ShipTo> <RecordID>SHP</RecordID> <ShipToName> Yazoo Grocers - SE Distribution Center </ShipToName> <ShipToStreet1>Dock 37</ShipToStreet1> <ShipToStreet2>3975 Hwy 75</ShipToStreet2> <ShipToCity>Atoka</ShipToCity> <ShipToStateOrProvince>OK</ShipToStateOrProvince> <ShipToPostalCode>74525</ShipToPostalCode> </ShipTo> <LineItemGroup> <LineItem> <RecordID>LIN</RecordID> <ItemID>HCVAN</ItemID> <ItemQuantity>36</ItemQuantity> <UnitPrice>2.59</UnitPrice> <ExtendedPrice>93.24</ExtendedPrice> </LineItem> <ItemDescription> <RecordID>DSC</RecordID> <Description> Instant Hot Cocoa Mix - Vanilla flavor </Description> </ItemDescription> </LineItemGroup> <LineItemGroup> <LineItem> <RecordID>LIN</RecordID> <ItemID>HCMIN</ItemID> <ItemQuantity>72</ItemQuantity> <UnitPrice>2.53</UnitPrice> <ExtendedPrice>182.16</ExtendedPrice> </LineItem> <ItemDescription> <RecordID>DSC</RecordID> <Description> Instant Hot Cocoa Mix - Mint flavor </Description> </ItemDescription> </LineItemGroup> <Summary> <RecordID>SUM</RecordID> <TotalAmount>275.40</TotalAmount> <NumberOfLines>2</NumberOfLines> </Summary> </FlatInvoice> FlatInvoice003.xml <?xml version="1.0" encoding="UTF-8"?> <FlatInvoice xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FlatInvoice.xsd"> <Header> <RecordID>HDR</RecordID> <CustomerNumber>AY001</CustomerNumber> <InvoiceNumber>2002043</InvoiceNumber> <InvoiceDate>2002-11-12</InvoiceDate> <PONumber>2002-0967</PONumber> <DueDate>2002-12-12</DueDate> </Header> <ShipTo> <RecordID>SHP</RecordID> <ShipToName>Corner Drug and Sundries</ShipToName> <ShipToStreet1>14 Main Street</ShipToStreet1> <ShipToCity>Wichita</ShipToCity> <ShipToStateOrProvince>KS</ShipToStateOrProvince> <ShipToPostalCode>67201</ShipToPostalCode> </ShipTo> <LineItemGroup> <LineItem> <RecordID>LIN</RecordID> <ItemID>HCVAN</ItemID> <ItemQuantity>24</ItemQuantity> <UnitPrice>2.59</UnitPrice> <ExtendedPrice>62.16</ExtendedPrice> </LineItem> <ItemDescription> <RecordID>DSC</RecordID> <Description> Instant Hot Cocoa Mix - Vanilla flavor </Description> </ItemDescription> </LineItemGroup> <Summary> <RecordID>SUM</RecordID> <TotalAmount>62.16</TotalAmount> <NumberOfLines>1</NumberOfLines> </Summary> </FlatInvoice> |