Describing the File Formats


Describing the CSV file format in Chapter 7 was fairly simple due to the restrictions we placed on it. The most significant of these was that every row has the same logical format. We're going to allow more variation in our flat file formats. Applications that use flat files to import or export data typically support several different logical record formats and group these records into repeating units. We'll need to specify more information about the grammar of our flat files than we did with CSV files. In addition to data types and other characteristics of fields, we'll need to specify the details of all the record types as well as how the records are grouped together.

As with the CSV format, the flat file format's file description document has three major sections, each represented by an Element that is an immediate child of the root Element.

  • PhysicalCharacteristics : the flat file characteristics

    Figure 8.2 Sample Output Flat File (PurchaseOrders.Dat)
    [View full width]
     10        20        30        40        50        60        70        80 graphics/ccc.gif 90        100       110       120       130 graphics/ccc.gif HDRBQ003               AZ999345            2002110120021115 SHPYazoo Grocers - NE Distribution Center  12 Industrial Parkway, NW graphics/ccc.gif Portland            ME 04101 LINHCVAN                       1200000002590000000000 DSCInstant Hot Cocoa Mix - Vanilla flavor LINHCMIN                       2400000002530000000000 DSCInstant Hot Cocoa Mix - Mint flavor HDRBQ003               AW999346            2002110120021115 SHPYazoo Grocers - SE Distribution Center  Dock 37                       3975 Hwy 75 graphics/ccc.gif Atoka               OK 74525 LINHCVAN                       3600000002590000000000 DSCInstant Hot Cocoa Mix - Vanilla flavor LINHCMIN                       7200000002530000000000 DSCInstant Hot Cocoa Mix - Mint flavor HDRAY001               2002-0967           2002110920021114 SHPCorner Drug and Sundries                14 Main Street graphics/ccc.gif Wichita             KS 67201 LINHCVAN                       2400000002590000000000 DSCInstant Hot Cocoa Mix - Vanilla flavor 
  • XMLOutputCharacteristics : XML output characteristics, required only when converting to XML

  • Grammar : flat file grammar

Flat File Physical Characteristics

The PhysicalCharacteristics Element describes the file's physical characteristics. This Element is required for both the source and target conversion utilities.

Table 8.3 shows the child Elements of the PhysicalCharacteristics Element. All are required unless otherwise noted.

XML Output Characteristics

Characteristics governing the output XML documents are described in the XMLOutputCharacteristics Element. This Element is used only when converting from flat files to XML.

Table 8.4 shows the child Elements of the XMLOutputCharacteristics Element. All are required unless otherwise noted.

Flat File Grammar

The grammar of a flat file is described in the Grammar Element. Although the XML representation of groups of records in flat files may be fairly intuitive, a few diagrams might help make it clearer.

Figure 8.3 shows a typical stream of records in a flat file, using our cocoa invoice as an example. For brevity only the record tags appear in the figure.

Figure 8.3. Record Stream in the Invoice File

graphics/08fig03.gif

If we look only at the records we can't for certain deduce much about the logical structure of a document. We would probably suspect that the HDR record started a new document and that perhaps the LIN and DSC records were a repeating group. However, we don't know for certain just by looking at the document; we must verify our suspicions by consulting the file specification or the application designer. For our purposes, we use Table 8.1 as our specification. This allows us to interpret the stream as shown in Figure 8.4.

Figure 8.4. Record Stream in the Invoice File, with Groups Added

graphics/08fig04.gif

Figure 8.4, in essence, shows what is known as a syntax tree . Figure 8.5 converts the brackets into nodes in the tree. I show siblings at the same level in the diagram to make relationships more obvious.

Figure 8.5. Syntax Tree for the Invoice File

graphics/08fig05.gif

The logical structure in Figure 8.5 now finally starts to look like something we might see in XML. All we have to do to make the transformation complete is to change the text from record identifiers and descriptions to XML Element names (Figure 8.6).

Figure 8.6. Invoice Document in XML

graphics/08fig06.gif

Table 8.3. Child Elements of the PhysicalCharacteristics Element

Child Element

Child Element

Attribute

Schema Data Type

Description

Allowable Values, Restrictions, or Comments

RecordFormat

     

Specifies the physical format of the record

Only one of Fixed or Variable is allowed.

 

Fixed

Length

positiveInteger

Specifies the physical record length

Maximum value reflects restriction on record length as noted in restrictions list in text.

 

Variable

RecordTerminator

union of U, W, and hexBinary

Designates a UNIX-style line feed, Windows-style carriage return and line feed pair, or a hexadecimal value

U, W, or a two-character hexadecimal number from 00 through FF representinga single byte.

TagInfo

     

Specifies the location of the record identifier within the record

The tag contents will be interpreted as an alpha-numeric string, with leading and trailing white-space removed. Must be the same offset and length for every record type.

   

Offset

nonNegativeInteger

Specifies the offset from zero in bytes for the first position of the tag

Maximum value reflects restriction on record length as noted in restrictions list in text.

   

Length

positiveInteger

Specifies the length of the tag in bytes

Maximum value reflects restriction on field length as noted in restrictions list in text.

Table 8.4. Child Elements of the XMLOutputCharacteristics Element

Child Element

Attribute

Schema Data Type

Description Description

Allowable Values, Restrictions, or Comments

SchemaLocationURL

value

anyURI

URL of the schema file for the output document. Will be written as the value of the root Element's noNamespaceSchemaLocation Attribute.

Optional. If not specified the noNamespaceSchemaLocation Attribute will not be written. An error will occur if output validation is requested and this Element is not present.

PartnerBreak

   

Information about a field that dictates a different trading partner when its content changes (for example, a customer number in the first field of the invoice).

Optional. Field contents are interpreted as an alphanumeric string and must be valid as a directory name for the operating system. If not specified, all output documents will be created in the output directory instead of creating a separate subdirectory for each trading partner.

 

Offset

nonNegativeInteger

Offset from zero in bytes for the first position of the field.

Maximum value reflects restriction on record length as noted in the restrictions list in the text.

 

Length

positiveInteger

Length of the field in bytes.

Maximum value reflects restriction on field length as noted in the restrictions list in the text.

Now the transformation is complete. However, one other diagram may be helpful in fully understanding the file description documents and how the utilities use them. The logical structure of the grammar of our invoice file exactly matches the structure of the XML representation of the invoice document (Figure 8.7). The Element names in the file description document are shown in boldface type, while the invoice Elements they specify are shown in italics. Note that we define each Element in the invoice document only once and don't repeat the GroupDescription for each occurrence of the LineItemGroup Element.

Figure 8.7. Grammar Description of the Invoice Document

graphics/08fig07.gif

For a more detailed discussion of the analysis of flat file grammars, refer to the High-Level Design Considerations section. Table 8.5 shows the details of the Grammar Element and its child Nodes. All are required unless noted. The indentation in the Element column shows the approximate hierarchical relationships. The Allowable Child Elements column lists the specific details of the hierarchy.

Table 8.6 shows the data types supported for the flat file format. To those we developed for the CSV file format in Chapter 7 we add a new numeric and a new date data type.

For all types, a runtime error occurs if Truncatable is false and the length of the XML Element contents exceeds the field length.

I should make a note here about truncating versus rounding fractional digits. In these utilities I always truncate and never round. I've had enough bad experiences with floating point arithmetic that I'm taking the easy way out and just truncating. If you need to round fractional digits, you can use an XSLT transformation or whatever means you use to put the data into the proper XML source format. Or, if you want to modify the source code, you can take an approach similar to the one I discuss in the Enhancements and Alternatives section at the end of the chapter.

Table 8.5. Flat File Grammar Characteristics in the Grammar Element

Element

Allowable Child Elements

Attribute

Schema Language Data Type

Description

Allowable Values, Restrictions, or Comments

Grammar

RecordDescription, GroupDescription

   

Describes the grammar of both the flat file and the corre-sponding XML representation.

The first child Element of the Grammar Element must be a RecordDescription Element. It may be followed by any combination of RecordDescription or GroupDescription Elements.

   

ElementName

NMTOKEN

Specifies the name of the document's root Element.

When creating XML documents, the specified name is assigned to the document's root Element. When creating a flat file, the input XML document's root Element must match this name. Maximum length reflects restriction on length of Element names.

   

TagValue

token

The value of the Header record's record identifier field.

Maximum length reflects restriction on field length. Do not include trailing spaces if the tag length is less than the length specified in the TagInfo Element.

GroupDescription

RecordDescription, GroupDescription

   

Describes the grammar of a group of records.

Any combination of RecordDescription or GroupDescription Elements can follow the first RecordDescription Element.

   

ElementName

NMTOKEN

Specifies the name of the Element representing the group.

Maximum length reflects restriction on length of Element names.

   

TagValue

token

The value of the record identifier field described for the first record in the group.

Maximum value reflects restriction on field length. Do not include trailing spaces if the tag length is less than the length specified in the TagInfo Element.

RecordDescription

FieldDescription

   

Describes the grammar of an individual record and the corresponding XML, a RecordDescription is required representation

A RecordDescription Element is required for each unique record type in the file.If a record type may appear at different for each position.

   

ElementName

NMTOKEN

Specifies the name of the Element representing a row.

Maximum length reflects restriction on length of Element names.

   

TagValue

token

The value of the record identifier field described by the TagInfo Element above.

Maximum length reflects restriction on field length. Do not include trailing spaces if the tag length is less than the length specified in the TagInfo Element.

FieldDescription

None

   

Describes the characteristics of a field in the flat file and the corresponding XML representation.

One FieldDescription Element is required for each field in the flat file record. If a range of characters within the record is not covered by a field description, they will be ignored for flat file source conversions and space filled for flat file target conversions.

   

ElementName

NMTOKEN

Specifies the name of the Element representing the field.

Maximum length reflects restriction on length of Element names.

   

FieldNumber

positiveInteger

Specifies the number of the field, starting at one.

Maximum value reflects restriction on the number of fields per record.

   

DataType

token

Specifies the data type of the field in the flat file.

The supported data types developed in this chapter are shown in Table 8.6. The Grammar data type code values are used.

   

Offset

nonNegative-Integer

Specifies the offset from zero in bytes for the first position of the field.

Maximum value reflects restriction on record length.

   

Length

positiveInteger

Specifies the length of the field in bytes.

Maximum value reflects restriction on field length.

   

Truncatable

boolean

Indicates whether or not truncation is permitted. See comments regarding truncation in Table 8.6.

Optional, defaults to false.

   

FillCharacter

union of single character string and hex-Binary

When converting to flat files as the target, the field will be padded with this character if the source XML Element content is missing or shorter than the field length.

Optional, defaults to an ASCII space character. A single literal character or a two-character hexadecimal number from 00 through FF representing a single byte may be specified.

Table 8.6. Flat File Data Types

Flat File Data Type

Grammar Data Type Code

Schema Data Type

Actions with Flat File as Source

Actions with Flat File as Target

Actions with Flat File as Target if Truncatable Is True

Alphanumeric

AN

string

Leading and trailing white-space (any character with an integer value less than or equal to a space character) is trimmed . All other white-space within the string is preserved.

If the source is shorter than the field length, the field is left-justified and filled to the right with the fill character.

The string is right-truncated to the field length.

Real number

R

decimal

Leading zeroes and leading plus signs are removed. All whitespace is trimmed.

The number is right-justified within the field. Leading characters are set according to the fill character. If the fill character is a zero, the minus sign if present is placed in the left-most position. For all other fill characters the minus sign immediately precedes the most significant digit.

Fractional digits to the right of the decimal point are truncated until the Element contents are equal to the field length. An error occurs if digits to the left of the decimal exceed the field length.

Implied decimal number

Nx, where x represents the number of implied decimal places

decimal

Leading zeroes and leading plus signs are removed. All whitespace is trimmed.

The number is right-justified within the field. If the number source decimal number exceeds x, the number is right-truncated to x fractional digits. Zeroes are added as fractional digits if the source number has fewer than x fractional digits. Leading characters are set according to the fill character. If the fill character is a zero, the sign character is placed in the left-most position. For all other fill characters the sign character immediately precedes the most significant digit.

Ignored, not truncatable.

Date in YYYYMMDD format

DYYYYMMDD

date

N/A

The date is left-justified within the field and filled with the specified fill character if the field is longer than 8 characters.

Ignored, not truncatable.

Date in MM/DD/YYYY format

DMMsDDsYYYY

date

Month and day may be either one or two digits each.

The date is left-justified within the field and filled with the specified fill character if the field is longer than 10 characters.

Ignored, not truncatable.

Example File Description Documents

Here are the file description documents for the flat file invoice and purchase order examples.

Sample InvoiceFlatSourceDescription.xml
 <?xml version="1.0" encoding="UTF-8"?> <FlatSourceFileDescription     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"     xsi:noNamespaceSchemaLocation="FlatSourceFileDescription.xsd">   <PhysicalCharacteristics>     <RecordFormat>       <Variable RecordTerminator="W"/>     </RecordFormat>     <TagInfo Offset="0" Length="3"/>   </PhysicalCharacteristics>   <XMLOutputCharacteristics>     <SchemaLocationURL value="FlatInvoice.xsd"/>     <PartnerBreak Offset="3" Length="20"/>   </XMLOutputCharacteristics>   <Grammar ElementName="FlatInvoice" TagValue="HDR">     <RecordDescription ElementName="Header" TagValue="HDR">       <FieldDescription FieldNumber="1"           ElementName="RecordID" DataType="AN"           Offset="0" Length="3"/>       <FieldDescription FieldNumber="2"           ElementName="CustomerNumber" DataType="AN"           Offset="3" Length="20"/>       <FieldDescription FieldNumber="3"           ElementName="InvoiceNumber" DataType="AN"           Offset="23" Length="20"/>       <FieldDescription FieldNumber="4"           ElementName="InvoiceDate" DataType="DYYYYMMDD"           Offset="43" Length="8"/>       <FieldDescription FieldNumber="5"           ElementName="PONumber" DataType="AN"           Offset="51" Length="20"/>       <FieldDescription FieldNumber="6"           ElementName="DueDate" DataType="DYYYYMMDD"           Offset="71" Length="8"/>     </RecordDescription>        <RecordDescription ElementName="ShipTo" TagValue="SHP">          <FieldDescription FieldNumber="1"              ElementName="RecordID" DataType="AN"              Offset="0" Length="3"/>          <FieldDescription FieldNumber="2"              ElementName="ShipToName" DataType="AN"              Offset="3" Length="40"/>          <FieldDescription FieldNumber="3"              ElementName="ShipToStreet1" DataType="AN"              Offset="43" Length="30"/>          <FieldDescription FieldNumber="4"              ElementName="ShipToStreet2" DataType="AN"              Offset="73" Length="30"/>          <FieldDescription FieldNumber="5"              ElementName="ShipToCity" DataType="AN"              Offset="103" Length="20"/>          <FieldDescription FieldNumber="6"             ElementName="ShipToStateOrProvince"             DataType="AN" Offset="123" Length="3"/>          <FieldDescription FieldNumber="7"              ElementName="ShipToPostalCode" DataType="AN"              Offset="126" Length="10"/>          <FieldDescription FieldNumber="8"              ElementName="ShipToCountry"              DataType="AN" Offset="136" Length="3"/>        </RecordDescription>        <GroupDescription ElementName="LineItemGroup"            TagValue="LIN">          <RecordDescription ElementName="LineItem" TagValue="LIN">            <FieldDescription FieldNumber="1"                ElementName="RecordID" DataType="AN"                Offset="0" Length="3"/>            <FieldDescription FieldNumber="2"                ElementName="ItemID" DataType="AN"                Offset="3" Length="20"/>            <FieldDescription FieldNumber="3"                ElementName="ItemQuantity" DataType="R"                Offset="23" Length="10"/>            <FieldDescription FieldNumber="4"                ElementName="UnitPrice" DataType="N2"                Offset="33" Length="10"/>            <FieldDescription FieldNumber="5"                ElementName="ExtendedPrice" DataType="N2"                Offset="43" Length="10"/>          </RecordDescription>          <RecordDescription ElementName="ItemDescription"              TagValue="DSC">            <FieldDescription FieldNumber="1"                ElementName="RecordID" DataType="AN"                Offset="0" Length="3"/>            <FieldDescription FieldNumber="2"                ElementName="Description" DataType="AN"                Offset="3" Length="80"/>          </RecordDescription>        </GroupDescription>        <RecordDescription ElementName="Summary" TagValue="SUM">          <FieldDescription FieldNumber="1"              ElementName="RecordID" DataType="AN"              Offset="0" Length="3"/>          <FieldDescription FieldNumber="2"              ElementName="TotalAmount" DataType="N2"              Offset="3" Length="10"/>          <FieldDescription FieldNumber="3"              ElementName="NumberOfLines" DataType="N0"              Offset="13" Length="10"/>        </RecordDescription>      </Grammar>    </FlatSourceFileDescription> 
Sample PurchaseOrderFlatTargetDescription.xml
 <?xml version="1.0" encoding="UTF-8"?> <FlatTargetFileDescription     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"     xsi:noNamespaceSchemaLocation=         "FlatTargetFileDescription.xsd">   <PhysicalCharacteristics>     <RecordFormat>       <Variable RecordTerminator="W"/>     </RecordFormat>     <TagInfo Offset="0" Length="3"/>   </PhysicalCharacteristics>   <Grammar ElementName="PurchaseOrder" TagValue="HDR">     <RecordDescription TagValue="HDR" ElementName="POHeader">       <FieldDescription FieldNumber="1"           ElementName="RecordID" DataType="AN"           Offset="0" Length="3"/>       <FieldDescription FieldNumber="2"           ElementName="CustomerNumber" DataType="AN"           Offset="3" Length="23"/>       <FieldDescription FieldNumber="3"           ElementName="PONumber" DataType="AN"           Offset="23" Length="20"/>       <FieldDescription FieldNumber="4"           ElementName="PODate" DataType="DYYYYMMDD"           Offset="43" Length="8"/>       <FieldDescription FieldNumber="5"           ElementName="RequestedDeliveryDate" DataType="DYYYYMMDD"           Offset="51" Length="8"/>     </RecordDescription>     <RecordDescription TagValue="SHP" ElementName="ShipTo">       <FieldDescription FieldNumber="1"           ElementName="RecordID" DataType="AN"           Offset="0" Length="3"/>       <FieldDescription FieldNumber="2"           ElementName="ShipToName" DataType="AN"           Offset="3" Length="40" Truncatable="true"/>       <FieldDescription FieldNumber="3"           ElementName="ShipToStreet1" DataType="AN"           Offset="43" Length="30" Truncatable="true"/>       <FieldDescription FieldNumber="4"           ElementName="ShipToStreet2" DataType="AN"           Offset="73" Length="30" Truncatable="true"/>       <FieldDescription FieldNumber="5"           ElementName="ShipToCity" DataType="AN"           Offset="103" Length="20"/>       <FieldDescription FieldNumber="6"           ElementName="ShipToStateOrProvince" DataType="AN"           Offset="123" Length="3"/>       <FieldDescription FieldNumber="7"           ElementName="ShipToPostalCode" DataType="AN"           Offset="126" Length="10"/>       <FieldDescription FieldNumber="8"           ElementName="ShipToCountry" DataType="AN"           Offset="136" Length="3"/>     </RecordDescription>     <GroupDescription ElementName="LineItem" TagValue="LIN">       <RecordDescription TagValue="LIN" ElementName="Item">         <FieldDescription FieldNumber="1"             ElementName="RecordID" DataType="AN"             Offset="0" Length="3"/>         <FieldDescription FieldNumber="2"             ElementName="ItemID" DataType="AN"             Offset="3" Length="20"/>         <FieldDescription FieldNumber="3"             ElementName="OrderedQty" DataType="R"             Offset="23" Length="10" FillCharacter=" "/>         <FieldDescription FieldNumber="4"             ElementName="UnitPrice" DataType="N2"             Offset="33" Length="10" FillCharacter="0"/>         <FieldDescription FieldNumber="5"             ElementName="ExtendedAmount" DataType="N2"             Offset="43" Length="10" FillCharacter="0"/>       </RecordDescription>       <RecordDescription TagValue="DSC"           ElementName="ItemDescription">         <FieldDescription FieldNumber="1"             ElementName="RecordID" DataType="AN"             Offset="0" Length="3"/>         <FieldDescription FieldNumber="2"             ElementName="Description" DataType="AN"             Offset="3" Length="80" Truncatable="true"/>       </RecordDescription>     </GroupDescription>   </Grammar> </FlatTargetFileDescription> 


Using XML with Legacy Business Applications
Using XML with Legacy Business Applications
ISBN: 0321154940
EAN: 2147483647
Year: 2003
Pages: 181

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net