Describing the CSV file format in Chapter 7 was fairly simple due to the restrictions we placed on it. The most significant of these was that every row has the same logical format. We're going to allow more variation in our flat file formats. Applications that use flat files to import or export data typically support several different logical record formats and group these records into repeating units. We'll need to specify more information about the grammar of our flat files than we did with CSV files. In addition to data types and other characteristics of fields, we'll need to specify the details of all the record types as well as how the records are grouped together. As with the CSV format, the flat file format's file description document has three major sections, each represented by an Element that is an immediate child of the root Element.
Flat File Physical CharacteristicsThe PhysicalCharacteristics Element describes the file's physical characteristics. This Element is required for both the source and target conversion utilities. Table 8.3 shows the child Elements of the PhysicalCharacteristics Element. All are required unless otherwise noted. XML Output CharacteristicsCharacteristics governing the output XML documents are described in the XMLOutputCharacteristics Element. This Element is used only when converting from flat files to XML. Table 8.4 shows the child Elements of the XMLOutputCharacteristics Element. All are required unless otherwise noted. Flat File GrammarThe grammar of a flat file is described in the Grammar Element. Although the XML representation of groups of records in flat files may be fairly intuitive, a few diagrams might help make it clearer. Figure 8.3 shows a typical stream of records in a flat file, using our cocoa invoice as an example. For brevity only the record tags appear in the figure. Figure 8.3. Record Stream in the Invoice File
If we look only at the records we can't for certain deduce much about the logical structure of a document. We would probably suspect that the HDR record started a new document and that perhaps the LIN and DSC records were a repeating group. However, we don't know for certain just by looking at the document; we must verify our suspicions by consulting the file specification or the application designer. For our purposes, we use Table 8.1 as our specification. This allows us to interpret the stream as shown in Figure 8.4. Figure 8.4. Record Stream in the Invoice File, with Groups Added
Figure 8.4, in essence, shows what is known as a syntax tree . Figure 8.5 converts the brackets into nodes in the tree. I show siblings at the same level in the diagram to make relationships more obvious. Figure 8.5. Syntax Tree for the Invoice File
The logical structure in Figure 8.5 now finally starts to look like something we might see in XML. All we have to do to make the transformation complete is to change the text from record identifiers and descriptions to XML Element names (Figure 8.6). Figure 8.6. Invoice Document in XML
Table 8.3. Child Elements of the PhysicalCharacteristics Element
Table 8.4. Child Elements of the XMLOutputCharacteristics Element
Now the transformation is complete. However, one other diagram may be helpful in fully understanding the file description documents and how the utilities use them. The logical structure of the grammar of our invoice file exactly matches the structure of the XML representation of the invoice document (Figure 8.7). The Element names in the file description document are shown in boldface type, while the invoice Elements they specify are shown in italics. Note that we define each Element in the invoice document only once and don't repeat the GroupDescription for each occurrence of the LineItemGroup Element. Figure 8.7. Grammar Description of the Invoice Document
For a more detailed discussion of the analysis of flat file grammars, refer to the High-Level Design Considerations section. Table 8.5 shows the details of the Grammar Element and its child Nodes. All are required unless noted. The indentation in the Element column shows the approximate hierarchical relationships. The Allowable Child Elements column lists the specific details of the hierarchy. Table 8.6 shows the data types supported for the flat file format. To those we developed for the CSV file format in Chapter 7 we add a new numeric and a new date data type. For all types, a runtime error occurs if Truncatable is false and the length of the XML Element contents exceeds the field length. I should make a note here about truncating versus rounding fractional digits. In these utilities I always truncate and never round. I've had enough bad experiences with floating point arithmetic that I'm taking the easy way out and just truncating. If you need to round fractional digits, you can use an XSLT transformation or whatever means you use to put the data into the proper XML source format. Or, if you want to modify the source code, you can take an approach similar to the one I discuss in the Enhancements and Alternatives section at the end of the chapter. Table 8.5. Flat File Grammar Characteristics in the Grammar Element
Table 8.6. Flat File Data Types
Example File Description DocumentsHere are the file description documents for the flat file invoice and purchase order examples. Sample InvoiceFlatSourceDescription.xml<?xml version="1.0" encoding="UTF-8"?> <FlatSourceFileDescription xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="FlatSourceFileDescription.xsd"> <PhysicalCharacteristics> <RecordFormat> <Variable RecordTerminator="W"/> </RecordFormat> <TagInfo Offset="0" Length="3"/> </PhysicalCharacteristics> <XMLOutputCharacteristics> <SchemaLocationURL value="FlatInvoice.xsd"/> <PartnerBreak Offset="3" Length="20"/> </XMLOutputCharacteristics> <Grammar ElementName="FlatInvoice" TagValue="HDR"> <RecordDescription ElementName="Header" TagValue="HDR"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="CustomerNumber" DataType="AN" Offset="3" Length="20"/> <FieldDescription FieldNumber="3" ElementName="InvoiceNumber" DataType="AN" Offset="23" Length="20"/> <FieldDescription FieldNumber="4" ElementName="InvoiceDate" DataType="DYYYYMMDD" Offset="43" Length="8"/> <FieldDescription FieldNumber="5" ElementName="PONumber" DataType="AN" Offset="51" Length="20"/> <FieldDescription FieldNumber="6" ElementName="DueDate" DataType="DYYYYMMDD" Offset="71" Length="8"/> </RecordDescription> <RecordDescription ElementName="ShipTo" TagValue="SHP"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="ShipToName" DataType="AN" Offset="3" Length="40"/> <FieldDescription FieldNumber="3" ElementName="ShipToStreet1" DataType="AN" Offset="43" Length="30"/> <FieldDescription FieldNumber="4" ElementName="ShipToStreet2" DataType="AN" Offset="73" Length="30"/> <FieldDescription FieldNumber="5" ElementName="ShipToCity" DataType="AN" Offset="103" Length="20"/> <FieldDescription FieldNumber="6" ElementName="ShipToStateOrProvince" DataType="AN" Offset="123" Length="3"/> <FieldDescription FieldNumber="7" ElementName="ShipToPostalCode" DataType="AN" Offset="126" Length="10"/> <FieldDescription FieldNumber="8" ElementName="ShipToCountry" DataType="AN" Offset="136" Length="3"/> </RecordDescription> <GroupDescription ElementName="LineItemGroup" TagValue="LIN"> <RecordDescription ElementName="LineItem" TagValue="LIN"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="ItemID" DataType="AN" Offset="3" Length="20"/> <FieldDescription FieldNumber="3" ElementName="ItemQuantity" DataType="R" Offset="23" Length="10"/> <FieldDescription FieldNumber="4" ElementName="UnitPrice" DataType="N2" Offset="33" Length="10"/> <FieldDescription FieldNumber="5" ElementName="ExtendedPrice" DataType="N2" Offset="43" Length="10"/> </RecordDescription> <RecordDescription ElementName="ItemDescription" TagValue="DSC"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="Description" DataType="AN" Offset="3" Length="80"/> </RecordDescription> </GroupDescription> <RecordDescription ElementName="Summary" TagValue="SUM"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="TotalAmount" DataType="N2" Offset="3" Length="10"/> <FieldDescription FieldNumber="3" ElementName="NumberOfLines" DataType="N0" Offset="13" Length="10"/> </RecordDescription> </Grammar> </FlatSourceFileDescription> Sample PurchaseOrderFlatTargetDescription.xml<?xml version="1.0" encoding="UTF-8"?> <FlatTargetFileDescription xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation= "FlatTargetFileDescription.xsd"> <PhysicalCharacteristics> <RecordFormat> <Variable RecordTerminator="W"/> </RecordFormat> <TagInfo Offset="0" Length="3"/> </PhysicalCharacteristics> <Grammar ElementName="PurchaseOrder" TagValue="HDR"> <RecordDescription TagValue="HDR" ElementName="POHeader"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="CustomerNumber" DataType="AN" Offset="3" Length="23"/> <FieldDescription FieldNumber="3" ElementName="PONumber" DataType="AN" Offset="23" Length="20"/> <FieldDescription FieldNumber="4" ElementName="PODate" DataType="DYYYYMMDD" Offset="43" Length="8"/> <FieldDescription FieldNumber="5" ElementName="RequestedDeliveryDate" DataType="DYYYYMMDD" Offset="51" Length="8"/> </RecordDescription> <RecordDescription TagValue="SHP" ElementName="ShipTo"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="ShipToName" DataType="AN" Offset="3" Length="40" Truncatable="true"/> <FieldDescription FieldNumber="3" ElementName="ShipToStreet1" DataType="AN" Offset="43" Length="30" Truncatable="true"/> <FieldDescription FieldNumber="4" ElementName="ShipToStreet2" DataType="AN" Offset="73" Length="30" Truncatable="true"/> <FieldDescription FieldNumber="5" ElementName="ShipToCity" DataType="AN" Offset="103" Length="20"/> <FieldDescription FieldNumber="6" ElementName="ShipToStateOrProvince" DataType="AN" Offset="123" Length="3"/> <FieldDescription FieldNumber="7" ElementName="ShipToPostalCode" DataType="AN" Offset="126" Length="10"/> <FieldDescription FieldNumber="8" ElementName="ShipToCountry" DataType="AN" Offset="136" Length="3"/> </RecordDescription> <GroupDescription ElementName="LineItem" TagValue="LIN"> <RecordDescription TagValue="LIN" ElementName="Item"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="ItemID" DataType="AN" Offset="3" Length="20"/> <FieldDescription FieldNumber="3" ElementName="OrderedQty" DataType="R" Offset="23" Length="10" FillCharacter=" "/> <FieldDescription FieldNumber="4" ElementName="UnitPrice" DataType="N2" Offset="33" Length="10" FillCharacter="0"/> <FieldDescription FieldNumber="5" ElementName="ExtendedAmount" DataType="N2" Offset="43" Length="10" FillCharacter="0"/> </RecordDescription> <RecordDescription TagValue="DSC" ElementName="ItemDescription"> <FieldDescription FieldNumber="1" ElementName="RecordID" DataType="AN" Offset="0" Length="3"/> <FieldDescription FieldNumber="2" ElementName="Description" DataType="AN" Offset="3" Length="80" Truncatable="true"/> </RecordDescription> </GroupDescription> </Grammar> </FlatTargetFileDescription> |