New DataCell Methods and Derived Classes
For the data types supported for CSV files we add two new methods to the base class and create three new derived classes. Depending on the conversion requirements, the new derived classes may have their own implementation of the fromXML or toXML method, or they may use the implementation in the base class.
New DataCell Methods
This method checks to see whether a DelimitText Element in the cell 's Grammar Element is set to true. It is primarily used by the XML to CSV utility to determine whether or not a column should have the text delimiter added to it before writing it to the output CSV file.
Logic for the DataCell delimitText Method
Arguments: None Returns: Boolean - true if DelimitText Attribute is present and has a value of true, false otherwise Delimit Text <- call Grammar Element's getAttribute on "DelimitText" IF Delimit Text String is null or if Delimit Text String is false return false ENDIF Return true
This method trims leading zeroes from numeric data types. It also trims leading and trailing whitespace. It is designed to properly trim leading zeroes and spaces from the following numeric representations while retaining the sign characters :
Note that this algorithm is somewhat permissive in that it will convert and pass data that isn't numeric or that has more than one sign character. Again, we depend on schema validation to detect most of those kinds of problems. We keep our code simple here by doing the minimum required.
Logic for the DataCell trimLeadingZeroes Method
Arguments: None Returns: Error status or throws exception Initialize Sign Character IF first character is + or - sign character Sign Character <- first Character ENDIF Call trim (C++ base class method or native Java) to remove leading and trailing whitespace from Cell Buffer IF Cell Buffer is empty after trimming return ENDIF Position <- 0 IF first character is + or - sign character Sign Character <- first Character Position <- 1; ENDIF DO while Position < Buffer Length and Cell Buffer[Position] = "0" Position++ ENDDO IF Position = Buffer Length because it contains only zeroes Decrement Position so that we have at least one zero ENDIF IF Sign Character is present Cell Buffer <- Sign Character + Cell Buffer substring starting at Position ELSE Cell Buffer <- Cell Buffer substring starting at Position ENDIF Return success
This class handles conversion to and from an alphanumeric data type and the schema language string data type. It uses the base class fromXML method but implements its own version of the toXML method.
Logic for the DataCellAN toXML Method
Arguments: None Returns: Error status or throws exception Call trim (C++ utility method or native Java) to remove leading and trailing whitespace from Cell Buffer Return success
This class handles conversion to and from a real number (or decimal) data type and the schema language decimal data type. The main thing to note about this class is that the fromXML method trims spaces. If the source XML document is validated , the source Element with a schema language data type of decimal should never have spaces in it. However, we can't depend on validation being performed, and other methods that we'll develop later depend on there being no spaces in the Cell Buffer. So, we remove them with the fromXML method.
Logic for the DataCellReal fromXML Method
Arguments: None Returns: Error status or throws exception Trim Spaces Remove leading plus sign if present Return success
Logic for the DataCellReal toXML Method
Arguments: None Returns: Error status or throws exception Call trimLeadingZeroes base class method to remove leading zeroes and leading and trailing spaces Return success
This class handles conversion to and from the date data type in MM/DD/YYYY format to the schema language date data type in ISO 8601 date format, that is, YYYY-MM-DD. In the MM/DD/YYYY format both the MM and the DD may be either one or two digits in length.
Note : Various library functions are available in Java and C++ that might make the implementation a bit more efficient than the one presented here. However, in the interest of keeping things simple for myself , as I'm implementing in both languages, I've chosen basic algorithms that work equally well in either language. All positions are expressed as offsets from the first character at position zero.
Also, remember that we're only putting enough validation into these routines to avoid nasty runtime exceptions. Schema validation is our primary method for ensuring that we have good dates.
Logic for the DataCellDateMMsDDsYYYY fromXML Method
Arguments: None Returns: Error status or throws exception IF Buffer Length != 10 Return error ENDIF Month <- Cell Buffer characters at offsets 5 and 6 Day <- Cell Buffer characters at offsets 8 and 9 Year <- Cell Buffer characters at offsets 0 through 3 Cell Buffer <- Month + forward slash + Day + forward slash + year Return success
Logic for the DataCellDateMMsDDsYYYY toXML Method
Arguments: None Returns: Error status or throws exception Month State = 0 Day State = 1 Year State = 2 State <- Month Cell Buffer < trim leading and trailing whitespace from Cell Buffer IF Buffer Length > 10 Return error ENDIF TempChar <- First character in Cell Buffer DO until end of Cell Buffer IF TempChar = Slash Increment State ELSE DO CASE of State Month: Append TempChar to Month BREAK Day: Append TempChar to Day BREAK Year: Append TempChar to Year BREAK other: Return Error ENDDO ENDIF TempChar <- Next character in Cell Buffer ENDDO IF length of Month is 1 Month <- "0" + Month ENDIF If Length of Day is 1 Day <- "0" + Day ENDIF IF (Length of Month != 2) OR (Length of Day != 2) OR (Length of Year != 4) Return error ENDIF Cell Buffer = Year + dash + Month + dash + Day Return success