C Implementation | Using XML with Legacy Business Applications

C++ Implementation

The C++ implementation is composed of the following files:

XMLToCSVBasic.cpp : the main routine for the C++ application
CSVRowWriter.cpp : a class with write and formatRow methods
CSVRowWriter.h : the header file for the CSVRowWriter class
handleCOMError.cpp : the function that displays information about COM errors
displayParseError.cpp : the function that displays information about parsing errors
BlasterIncludes.h : the header file for the C++ code for including all standard headers and libraries, like importing MSXML
BlasterPrototypes.h : the header file with prototypes for handleCOMError and handleParseError

The first thing we need to keep in mind about the C++ implementation is that MSXML is a COM component (in Microsoft's Component Object Model). That said, there is a certain amount of COM baggage and weirdness that comes along with it. If you are already working in a COM environment, you will probably be comfortable with most of the COM- related concepts in this book. If you aren't, I have put together a quick reference in Appendix C that should help you with most of what you need to know to use a COM component like MSXML. If you fall into this category it would probably be a good idea to flip back to Appendix C and read it before you proceed with this section.

main in XMLToCSVBasic.cpp

The structure of the main routine is nearly identical to the Java implementation. This subsection covers a few things that need to be pointed out.

From the Logic for main

 Set up DOM XML environment (dependent on implementation) Load input XML Document (dependent on implementation)

Due to the COM considerations and the need to check for return values rather than relying on exceptions to be thrown, this translates into a bit more code in the C++ implementation than it did in Java.

From XMLToCSVBasic.cpp

 //  Set up the COM environment, since MSXML is a //  COM Component hResult = CoInitialize(NULL); if (FAILED(hResult)) {   cerr << "Failed to initialize COM environment" << endl;   return 0; } //  We'll do our main processing within a try block //  We only expect to throw COM, MSXML parse //  and I/O exceptions try {   //  Declare smart pointers for our MSXML COM Interfaces.   //  We'll use the Document2 class for all our stuff so that   //  we can do schema validation.   //  We do these within the try block so that when we exit   //  the program they will be out of scope.   IXMLDOMDocument2Ptr spDocInput;   IXMLDOMNodeListPtr spRowList;   //  Create the COM DOM Document object   hResult =       spDocInput.CreateInstance(__uuidof(DOMDocument40));   if FAILED(hResult)   {     throw cCreateInstanceError;   }   //  Tell it we don't want to load the document   //  asynchronously   spDocInput->async =  VARIANT_FALSE;   //  Load input XML Document (dependent on implementation)   hResult = spDocInput->load(cInputXMLName);   // Check for errors   if( hResult != VARIANT_TRUE)   {     spParseError = spDocInput->GetparseError();     cerr << "Parsing Error" << endl;     displayParseError(spParseError);     throw cParseError;   }

We first need to set up our COM environment. Next, we create spDocInput (an IXMLDOMDocument2Ptr) as a COM object. We set its async property to false since we want to wait until it is completely loaded before we proceed. Finally, we load the document using the DOM Level 3 Load semantics.

Again, the rest of the main routine is pretty straightforward. Here's the part that does the most work.

 //  Initialize CSVRowWriter object RowWriter = new CSVRowWriter(&OutputCSV); //  NodeList of Rows <- Call Document's //    getElementsByTagName for all elements named Row spRowList = spDocInput->getElementsByTagName("Row"); //  DO until Rows NodeList.item[index] is null //    Call CSVRowWriter write method, passing //      NodeList.item[index] //    Increment index //  ENDDO while (spRowList->item[iRows] != NULL) {   IXMLDOMElementPtr spRow = spRowList->item[iRows];   RowWriter->write(spRow);   iRows++; }

Note that spRowList is an IXMLDOMNodeListPtr object.

write in CSVRowWriter.cpp

As in the Java implementation, this is where most of the work is done. The Column Array is implemented as an array of character pointers. We call "new" to get memory for each cell when we have something to put into it. We later call "delete" in the formatRow method to clear out the cell and free the memory after we have formatted it to the output buffer. The GetnodeValue method that we use on the Column Node's Text Node returns a COM VARIANT, so we use the _bstr_t COM helper class to treat it as character data. Here's the most interesting part:

From CSVRowWriter.cpp ”write

 //  Columns NodeList <- Get Row's childNodes attribute spColumnList = spRow->childNodes; //  DO until Columns NodeList.item[index] is null while (spColumnList->item[iRowChildren] != NULL) {   // Get a shorthand name for this guy   IXMLDOMNodePtr spColumn;   spColumn = spColumnList->item[iRowChildren];   //  Column Name <- get NodeName attribute   strcpy(cColumnName,spColumn->nodeName);   //  Column Number <- Derive from Column Name   strcpy(cColumnNumber,&(cColumnName[6]));   iColumnNumber = atoi(cColumnNumber);   //  IF Column Number > Highest Column   //    Highest Column <- Column Number   //  ENDIF   if (iColumnNumber > iHighestColumn)   {     iHighestColumn = iColumnNumber;   }   //  Get memory for this column entry, and init it   if (cColumnArray[iColumnNumber] == NULL)   {     cColumnArray[iColumnNumber] = new char[MAXCOLUMNSIZE];     memset(cColumnArray[iColumnNumber],'  // Columns NodeList <- Get Row's childNodes attribute spColumnList = spRow->childNodes; // DO until Columns NodeList.item[index] is null while (spColumnList->item[iRowChildren] != NULL) { // Get a shorthand name for this guy IXMLDOMNodePtr spColumn; spColumn = spColumnList->item[iRowChildren]; // Column Name <- get NodeName attribute strcpy(cColumnName,spColumn->nodeName); // Column Number <- Derive from Column Name strcpy (cColumnNumber,&(cColumnName[6])); iColumnNumber = atoi(cColumnNumber); // IF Column Number > Highest Column // Highest Column <- Column Number // ENDIF if (iColumnNumber > iHighestColumn) { iHighestColumn = iColumnNumber; } // Get memory for this column entry, and init it if (cColumnArray[iColumnNumber] == NULL) { cColumnArray[iColumnNumber] = new char[MAXCOLUMNSIZE]; memset (cColumnArray[iColumnNumber],'\0',MAXCOLUMNSIZE); } // Column Array [Column Number] <- get nodeValue of // item[index] firstChild Node // GetnodeValue is returned as a COM VARIANT, so we // use _bstr_t to get to the character text strcpy(cColumnArray[iColumnNumber], _bstr_t(spColumn->firstChild->GetnodeValue())); // Increment index iRowChildren++; } // ENDDO 
 ',MAXCOLUMNSIZE);   }   //  Column Array [Column Number] <- get nodeValue of   //    item[index] firstChild Node   //  GetnodeValue is returned as a COM VARIANT, so we   //  use _bstr_t to get to the character text   strcpy(cColumnArray[iColumnNumber],      _bstr_t(spColumn->firstChild->GetnodeValue()));   //  Increment index   iRowChildren++; } //  ENDDO

Note that in the DO loop we don't check on the Node type. Unlike the Java implementation, we encountered only ColumnXX Element Nodes in the Row's list of child Nodes (no Text Nodes).

Error Handling

As might be expected, error handling is a bit different in C++ than it is in Java. A lot of this is due to basic differences between the languages. The Java class libraries natively use exceptions extensively, for everything from I/O problems to, in our programs, SAX exceptions. C++, on the other hand, while it does provide a facility for declaring, throwing, and catching exceptions, does not by default use them very much. The standard C++ class libraries usually set error flags instead of throwing exceptions. As a consequence, I generally will not make much use of exceptions in the C++ code. The only exception to this is in the main routines, and that is due to COM. Since MSXML is a COM object, we can throw COM exceptions. However unlikely this may actually be, good programming practice requires that the main part of our code be enclosed in a try block and followed by a catch block that checks for COM exceptions. Within that block I also throw a limited set of exceptions as a graceful way to exit the try block. Consistent with the KISS principle, these are thrown as simple char exceptions. The corresponding catch block has a set of case statements that take actions appropriate to each value.

The catch blocks in the main routine call functions to display information on COM exceptions and text about any other errors. However, we display information on parse errors right after we load, rather than doing it in the catch block. We do this so that we can limit the scope of our IXMLDOM smart pointers to the try block. This basic approach will be used in most of our other main programs.

Catch Blocks in XMLToCSVBasic.cpp

 //  General purpose catch block catch (char cException) {   switch (cException)   {   case  cCreateInstanceError:     cerr << "Failed to create Document instance" << endl;     cerr << "HRESULT = x" << hex << hResult << endl;   case  cParseError:     break;   case  cOutputFileError:     cerr << "Failure in opening output file " <<       cOutputCSVName << endl;     break;   default:     cerr << "Unexpected char exception " << cException       << endl;   }   iExitStatus = 1; } //  Do some simple error handling for COM exceptions catch (_com_error &e) {   handleCOMError(e);   iExitStatus = 1; }

The routine below displays COM exception information to cerr.

handleCOMError.cpp

 void handleCOMError(_com_error &e) {   cerr << "COM Error" << endl;   cerr << "Code = %08l" << hex << e.Error() << endl;   cerr << "Message = " << e.ErrorMessage() << endl;   _bstr_t bstrSource(e.Source());   if (bstrSource.length() != 0)   {     cerr << "Source = " << (LPCSTR) bstrSource << endl;   }   _bstr_t bstrDescription(e.Description());   if (bstrDescription.length() != 0)   {     cerr << "Description = " << (LPCSTR) bstrDescription <<          endl;   } }

With any luck, you'll never see this bit of code executed during routine use. However, you may run into some COM exceptions during programming development. (I did.) Make sure to catch and report whatever scraps of information COM gives you. I stole the main ideas from the MSXML code samples but made a few changes. Not all exceptions report Source or Description, so I test before trying to output them to cerr.

The following routine displays detailed information about parsing errors using properties of the IXMLDOMParseError object (that's what an IXMLDOMParseErrorPtr points to).

From handleCOMError.cpp

 void handleParseError(IXMLDOMDocument2Ptr spDocInput) {   IXMLDOMParseErrorPtr spParseError;   spParseError = spDocInput->GetparseError();   cerr << "Parse Error" << endl << endl;   cerr << "File URL = " << spParseError->url << endl;   cerr << "Position in file = " << spParseError->filepos <<       endl;   cerr << "Line = " << spParseError->line << endl;   cerr << "Position in line = " << spParseError->linepos <<       endl;   cerr << "Reason = " << spParseError->reason << endl; }

Other than these differences, the overall approach to error handling in the C++ implementation is the same as that of the Java implementation. When we hit an error we try to exit as quickly and gracefully as possible, displaying as much information as might be helpful.