As discussed in Chapter 6, we're going to use our own XML-based language to describe the formats of our non-XML files. Although the different formats will have many characteristics in common, we'll have a different language for each. In addition, because some characteristics, such as those dealing with XML output, are not relevant for both to and from XML conversions, we'll have a separate language (and schema) for each direction. In this section I describe the organization, Elements, Attributes, and usage of the file description document. In the design section we'll review the schemas for the file description documents as well as the sample invoice and purchase order. The CSV file description documents have three major sections, each represented by an Element that is a direct child of the root Element.
The language that describes conversion from XML is a subset of the one that describes conversion to XML. Given this, the file description documents and the schemas that specify them are very similar. We'll review them in the upcoming section on high-level design. The following three subsections describe the major sections of the file description documents. Each major section is handled by an Element that is a child of the file description document's root Element, CSVFileDescription. CSV Physical CharacteristicsThe CSV file's physical characteristics are described in the PhysicalCharacteristics Element. For CSV formats we need to specify the record terminator and the column and text delimiters. Table 7.3 shows the child Elements of the PhysicalCharacteristics Element. All are required. XML Output CharacteristicsCharacteristics governing the output XML documents are described in the XMLOutputCharacteristics Element. This Element is used only when converting from CSV files to XML. Table 7.4 shows the child Elements of the XMLOutputCharacteristics Element. Table 7.3. Child Elements of the PhysicalCharacteristics Element
CSV File GrammarThe data types of the columns and other characteristics of the CSV file (that is, the grammar) are described in the Grammar Element. We need to notice an important point, a new aspect of functionality that we are introducing beyond that provided in Chapter 3's basic utility. Because we are describing the grammar of the CSV file, we can also take the opportunity to specify the names of the Elements that we use in the corresponding XML representation. We are no longer restricted to the "Row" and "ColumnXX" Element names that we used in Chapters 2 and 3. While this does add minimal complexity to the processing, it allows us to assign semantically meaningful names to the Elements in the XML document. Table 7.5 shows the Grammar Element and its children. All are required unless noted. Indentation in the Element column shows hierarchichal parent/child relationships. The Allowable Child Elements column lists the specific details of the hierarchy. Table 7.4. Child Elements of the XMLOutputCharacteristics Element
Table 7.5. CSV Grammar Characteristics in the Grammar Element
Table 7.6 shows the CSV file data types developed in this chapter. Other types will be added in later chapters. The Enhancements and Alternatives section at the end of this chapter describes how to add new data types. Table 7.6 shows the type of the data as it appears in the CSV file, the corresponding schema language data type as used in the XML representation, and comments about the data type. Table 7.6. CSV File Data Types
Example File Description DocumentsHere are the file description documents for the invoice and purchase order examples. Sample InvoiceCSVSourceDescription.xml<?xml version="1.0" encoding="UTF-8"?> <CSVSourceFileDescription xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CSVSourceFileDescription.xsd"> <PhysicalCharacteristics> <RecordTerminator value="W"/> <ColumnDelimiter value=","/> <TextDelimiter value="""/> </PhysicalCharacteristics> <XMLOutputCharacteristics> <DocumentBreakColumn value="2"/> <PartnerBreakColumn value="1"/> <SchemaLocationURL value="CSVInvoice.xsd"/> </XMLOutputCharacteristics> <Grammar ElementName="Invoice"> <RowDescription ElementName="InvoiceLine"> <ColumnDescription FieldNumber="1" ElementName="CustomerNumber" DataType="AN"/> <ColumnDescription FieldNumber="2" ElementName="InvoiceNumber" DataType="AN"/> <ColumnDescription FieldNumber="3" ElementName="InvoiceDate" DataType="DMMsDDsYYYY"/> <ColumnDescription FieldNumber="4" ElementName="PONumber" DataType="AN"/> <ColumnDescription FieldNumber="5" ElementName="DueDate" DataType="DMMsDDsYYYY"/> <ColumnDescription FieldNumber="6" ElementName="ShipToName" DataType="AN"/> <ColumnDescription FieldNumber="7" ElementName="ShipToStreet1" DataType="AN"/> <ColumnDescription FieldNumber="8" ElementName="ShipToStreet2" DataType="AN"/> <ColumnDescription FieldNumber="9" ElementName="ShipToCity" DataType="AN"/> <ColumnDescription FieldNumber="10" ElementName="ShipToStateOrProvince" DataType="AN"/> <ColumnDescription FieldNumber="11" ElementName="ShipToPostalCode" DataType="AN"/> <ColumnDescription FieldNumber="12" ElementName="ShipToCountry" DataType="AN"/> <ColumnDescription FieldNumber="13" ElementName="ItemID" DataType="AN"/> <ColumnDescription FieldNumber="14" ElementName="ItemQuantity" DataType="R"/> <ColumnDescription FieldNumber="15" ElementName="UnitPrice" DataType="R"/> <ColumnDescription FieldNumber="16" ElementName="ItemDescription" DataType="AN"/> <ColumnDescription FieldNumber="17" ElementName="ExtendedPrice" DataType="R"/> </RowDescription> </Grammar> </CSVSourceFileDescription> Sample PurchaseOrderCSVTargetDescription.xml<?xml version="1.0" encoding="UTF-8"?> <CSVTargetFileDescription xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="CSVTargetFileDescription.xsd"> <PhysicalCharacteristics> <RecordTerminator value="U"/> <ColumnDelimiter value=","/> <TextDelimiter value="""/> </PhysicalCharacteristics> <Grammar ElementName="PurchaseOrder"> <RowDescription ElementName="POLine"> <ColumnDescription FieldNumber="1" ElementName="CustomerNumber" DataType="AN"/> <ColumnDescription FieldNumber="2" ElementName="PONumber" DataType="AN"/> <ColumnDescription FieldNumber="3" ElementName="PODate" DataType="DMMsDDsYYYY"/> <ColumnDescription FieldNumber="4" ElementName="RequestedDeliveryDate" DataType="DMMsDDsYYYY"/> <ColumnDescription FieldNumber="5" ElementName="ShipToName" DataType="AN" DelimitText="true"/> <ColumnDescription FieldNumber="6" ElementName="ShipToStreet1" DataType="AN" DelimitText="true"/> <ColumnDescription FieldNumber="7" ElementName="ShipToStreet2" DataType="AN" DelimitText="true"/> <ColumnDescription FieldNumber="8" ElementName="ShipToCity" DataType="AN" DelimitText="true"/> <ColumnDescription FieldNumber="9" ElementName="ShipToStateOrProvince" DataType="AN"/> <ColumnDescription FieldNumber="10" ElementName="ShipToPostalCode" DataType="AN"/> <ColumnDescription FieldNumber="11" ElementName="ShipToCountry" DataType="AN"/> <ColumnDescription FieldNumber="12" ElementName="ItemID" DataType="AN"/> <ColumnDescription FieldNumber="13" ElementName="OrderedQty" DataType="R"/> <ColumnDescription FieldNumber="14" ElementName="UnitPrice" DataType="R"/> <ColumnDescription FieldNumber="15" ElementName="ItemDescription" DataType="AN" DelimitText="true"/> </RowDescription> </Grammar> </CSVTargetFileDescription> Note that in these documents and in the associated schemas, the URLs for the schema locations are all relative. They specify only the file name and not the full path location. So, a processor would expect these to all reside in the same path . I'll follow this convention in this and the next two chapters. |