One of the major uses of XML is the exchange of data between heterogeneous systems. Given almost any collection of data, it's straightforward to design some XML markup that fits it. Because XML is natively supported on essentially any platform of interest, you can send data encoded in such an XML application from point A to point B without worrying about whether point A and point B agree on how many bytes there are in a float, whether ints are big-endian or little-endian, whether strings are null delimited or use an initial length byte, or any of the myriad of other issues that arise when moving data between systems. As long as both ends of the connection agree on the XML application used, they can exchange information regardless of what software produced the data. One side can use Perl and the other Java. One can use Windows and the other Unix. One can run on a mainframe and the other on a Mac. The document can be passed over HTTP, e-mail, NFS, BEEP, Jabber, or sneakernet. Everything except the XML document itself can be ignored. The details of the XML markup used depend heavily on the information being exchanged. If you're exchanging financial data, you might use the Open Financial Exchange (OFX) [http://www.ofx.net/ofx/] If you're exchanging genetic codes, you might use the Gene Expression Markup Language (GEML) [http://www.rosettabio.com/products/conductor/geml/] If you're exchanging news articles in a syndication service, you might use NewsML [http://www. xmlnews .org/NewsML/]. And if no standard XML application exists that fits your needs, you'll probably invent your own. But whatever XML application you choose, certain features will crop up again and again that can benefit from standardization. These include the envelope used to pass the data and the representations of basic data types, such as integer and date. EnvelopesAn envelope may not be needed if (a) only two systems are involved, (b) they talk only to each other, and (c) they always send the same type of message. It's enough for one system to send the other the message in the agreed-upon XML format. However, when there are many dozens, hundreds, or even thousands of different systems exchanging many different kinds of messages in many different ways, it's useful to have some standards that are independent of the message content. This offers up some hope that when a message in an unrecognized format is received, it can still be processed in a reasonable fashion. For example, a system might receive a message ordering 1,000 "Frodo Lives" buttons but not know how to handle that order. However, it may be able to read enough information from the envelope to route the request to the program that does know how to process the order. In XML-RPC, the envelope is essentially all the markup, and the data inside the envelope is all the text content. SOAP and RSS are a little more complex. For SOAP, the envelope is an XML document, and the data is too. In some ways RSS, especially RSS 1.0, is the most complex of all because it's based on the relatively complex RDF syntax. RDF mixes the envelope and the data together so that you can't point to any one element in the document and say, "That's the envelope," or "That element is the data." Instead, pieces of both the envelope and the data are intermingled throughout the complete document. In all three cases, however, it's straightforward to extract the data from the envelope for further processing. Data RepresentationAnother area ripe for standardization is the proper representation of low-level data such as dates and numbers . Nobody really cares how many bytes there are in an int, as long as there are enough to hold all of the values they want to hold. Nobody really cares whether dates are written Day-Month-Year or Month-Day-Year, as long as it's easy to tell which is which. It doesn't really matter how this information is passed, as long as there's one standard way of doing it that everyone can agree on and process without excessive hassle. In XML all data of any type must be passed as text, but the proper textual representation of simple data types such as integer and date is trickier than most developers initially assume. For example, integers can be uncomplicatedly represented in the form 42, -76, +34562, 0, and so forth. The normal base-10 representation with optional plus or minus signs is fully adequate for most needs. However, consider the number 28562476535, the dollar value of Bill Gates' Microsoft stock holdings alone as of July 24, 2002. This is a perfectly good integer, albeit a large one. However, it's so large that trying to use it in many applications will lead to a crash or some other form of error. Floating-point numbers are even worse . Two different computers can look at an unambiguous string such as 65431987467.324345192 and interpret it as two different numbers. Dates cause problems even for humans . Is 07/04/01 the Fourth of July, 2001? the Fourth of July, 1901? the seventh of April, 2001? Some other date? These are all very real issues that cause real problems in systems today. XML itself doesn't standardize the text representation of data, but the W3C XML Schema Language does. In particular, schemas define the 44 simple data types shown in Table 2.1. By assigning these data types to particular elements, you can clearly state what a particular string means in a syntax everyone can understand. And if these data types aren't enough, the W3C XML Schema Language also lets you define new types that are combinations or restrictions of these basic types. Table 2.1. Primitive Data Types Defined in the W3C XML Schema Language
Even without using schema validation or the full schema apparatus, you can use these data types in your own documents. Simply attach an xsi:type attribute to any element identifying the type of that element's content. The xsi prefix is mapped to the http://www.w3.org/2001/XMLSchema-instance namespace URI. Example 2.1 is an XML document that uses these data types to label different parts of an order document. Notice that some things that naively might be assumed to be numeric types are in fact strings. Example 2.1 An XML Document That Labels Elements with Schema Simple Types<?xml version="1.0" encoding="ISO-8859-1"?> <Order xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Customer id="c32" xsi:type="xsd:string">Chez Fred</Customer> <Product> <Name xsi:type="xsd:string">Birdsong Clock</Name> <SKU xsi:type="xsd:string">244</SKU> <Quantity xsi:type="xsd:positiveInteger">12</Quantity> <Price currency="USD" xsi:type="xsd:decimal">21.95</Price> <ShipTo> <Street xsi:type="xsd:string">135 Airline Highway</Street> <City xsi:type="xsd:string">Narragansett</City> <State xsi:type="xsd:NMTOKEN">RI</State> <Zip xsi:type="xsd:string">02882</Zip> </ShipTo> </Product> <Product> <Name xsi:type="xsd:string">Brass Ship's Bell</Name> <SKU xsi:type="xsd:string">258</SKU> <Quantity xsi:type="xsd:positiveInteger">1</Quantity> <Price currency="USD" xsi:type="xsd:decimal">144.95</Price> <Discount xsi:type="xsd:decimal">.10</Discount> <ShipTo> <GiftRecipient xsi:type="xsd:string"> Samuel Johnson </GiftRecipient> <Street xsi:type="xsd:string">271 Old Homestead Way</Street> <City xsi:type="xsd:string">Woonsocket</City> <State xsi:type="xsd:NMTOKEN">RI</State> <Zip xsi:type="xsd:string">02895</Zip> </ShipTo> <GiftMessage xsi:type="xsd:string"> Happy Father's Day to a great Dad! Love, Sam and Beatrice </GiftMessage> </Product> <Subtotal currency='USD' xsi:type="xsd:decimal"> 393.85 </Subtotal> <Tax rate="7.0" currency='USD' xsi:type="xsd:decimal">28.20</Tax> <Shipping method="USPS" currency='USD' xsi:type="xsd:decimal">8.95</Shipping> <Total currency='USD' xsi:type="xsd:decimal">431.00</Total> </Order> As well as using a schema for explicit labeling, a document can use a schema to indicate the type. However, right now the APIs for such things aren't finished, so it's best to explicitly label elements when the types are important. XML-RPC uses only the int , boolean , decimal , dateTime , and base64 types as well as a string type that's restricted to ASCII. Furthermore, it does not allow the NaN, Inf, and -Inf values for double. It does not use xsi:type attributes, relying instead on predefined semantics for particular elements. SOAP allows all 44 types and does use xsi:type attributes to label elements. |