I said in Chapter 1 that we were going to take a fairly simple approach to error handling. I'm not going to change that direction, but we need to be a bit more specific about what we're going to do.
There is a bit of debate in the XML world about just how strict validation should be. The debate centers on questions like whether or not a somewhat lax model of validation should be supported for certain applications or types of problems, rather than a very strict model. For example, some people think that a string Element that exceeds its maximum length facet should just cause the APIs to issue a truncation warning instead of throwing an error and terminating parsing or validation. All the parsers I know of support only a valid or invalid state; there isn't any in-between. If parsers were to do something in-between, it would be nice if the Schema Recommendation defined those in-between levels. It really doesn't as currently written. This discussion is, of course, somewhat tangential to our main concerns, but it gives a flavor of the issues involved.
My guiding principle in error handling is to ensure that users get what they expect and only what they expect, or get nothing at all. Some may consider this a rather severe and unforgiving strategy, but the cost of manually correcting and reprocessing is probably less in the long term than inadvertently introducing bad data into an application. So, when in doubt I'll terminate the processing.
What this means for the conversions is that if the data I'm manipulating doesn't match what the algorithm expects, the utilities will terminate. For example, if I'm converting a date in MM/DD/YYYY format to the ISO date format of YYYY-MM-DD but the users specified a two-digit year (02 instead of 2002), the conversion attempt will terminate. It will also terminate if a string is truncated on conversion unless we explicitly say we're going to allow truncation on that field. I'll be more specific about termination conditions for each of our supported data types, but this gives you the idea.
In Java we'll follow the usual path and create and throw our own exceptions. In C++ we'll follow the conventions established in Chapter 1 and use return codes if we can force an orderly exit back to the main procedure. If this turns out to be awkward for a specific circumstance, we'll just call an error routine and exit. For both implementations we'll have standard error messages with text coded as constants in separate interfaces or header files.
A final word. To keep the code simple in many cases we'll assume that the XML data is what we expect it to be. We're not going to go to much trouble to validate it and check for errors. That's what schema validation is for. For example, when converting from an Element with a schema language date data type, we're going to assume that we're working with data that fits the pattern of YYYY-MM-DD. If we get data that is too short we may throw a runtime exception. However, when handling our non-XML legacy formats we'll put in enough logic to gracefully exit when we attempt to process bad data rather than throwing nasty exceptions.