SOAP | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

In large part, XML-RPC was invented by a single person who really didn't know a lot about XML. Consequently he made many very questionable choices; and because XML-RPC did not go through any standardization process, there was nobody to fix his mistakes. For example, in XML-RPC the string type is defined as an "ASCII string". Now frankly, this is just plain dumb, as well as not a little ethnocentric. XML documents are Unicode, not ASCII. Modern programming languages like Java can handle Unicode without any trouble. Indeed a language that can't process Unicode really isn't suitable for processing XML. There is no good reason to limit XML-RPC strings to ASCII. I certainly wouldn't say you have to use non-ASCII characters in your XML-RPC documents; but if you want to use them, they should certainly be allowed. However, the inventor of XML-RPC also happened to be the vendor of an ASCII-limited database, so he inserted the ASCII-only constraint into XML-RPC rather than upgrade his database to support Unicode.

There are a lot of other issues like that with XML-RPC, some equally obvious, some more subtle. Nonetheless, clearly XML-RPC was a good idea in principle if not in execution. Consequently work began on a more serious effort to enable remote procedure calls by passing XML documents over HTTP. This effort is known as the Simple Object Access Protocol, or just SOAP. Whereas XML-RPC was a quick hack by one developer, SOAP was developed by a committee of XML experts from various companies including IBM and Microsoft.

You've undoubtedly heard the old saw about a camel being a horse designed by committee. The fact is, a camel is actually superbly adapted to its environment. SOAP is a much more robust protocol than XML-RPC. It is much better designed from an XML standpoint as well. It takes advantage of numerous features of XML, such as attributes, Unicode, and namespaces that XML-RPC either ignores or actively opposes. XML-RPC is adequate for simple tasks , but if you get serious with it you rapidly hit a wall. SOAP can take you a lot further. Although there are some basic services available using XML-RPC, the future clearly lies with SOAP.

The biggest conceptual difference between SOAP and XML-RPC is that XML-RPC exchanges a limited number of parameters of six fixed types, plus structs and arrays. But SOAP allows you to send the server arbitrary XML elementsa much more flexible approach.

A SOAP Example

Let's investigate how the stock quote example would likely be implemented in SOAP. Encoded as a SOAP document, the request document looks quite different, but the same information is present, as demonstrated in Example 2.15.

Example 2.15 A SOAP Document That Requests the Current Stock Price of Red Hat

 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Body>     <getQuote          xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <symbol>RHAT</symbol>     </getQuote>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

The most obvious difference between this document and the XML-RPC equivalent in Example 2.6 is the use of namespaces. Namespaces allow the method request to be an arbitrary XML element. This goes way beyond merely passing a method name and some argument values. SOAP permits much more complex XML messages than does XML-RPC.

The server's response is equally flexible, as Example 2.16 demonstrates .

Example 2.16 A SOAP Response

 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Body>     <Quote xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <Price>4.12</Price>     </Quote>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

These two examples are minimal SOAP documents. The root element of every SOAP document is Envelope , which must be in the http://schemas.xmlsoap.org/soap/envelope/ namespace, at least in SOAP 1.1. (The URL will change in SOAP 1.2.) Normally a prefix is used, and as always you can pick any prefix as long as the URI stays the same. In this chapter, I always assume that the prefix SOAP-ENV is mapped to that namespace URI. (This is the prefix that the SOAP 1.1 specification uses.)

Each SOAP-ENV:Envelope element contains exactly one SOAP-ENV:Body element. The content of this element is one or more XML elements specific to the service. These examples use Quote , getQuote , and Price elements in the http://namespaces.cafeconleche.org/xmljava/ch2/ namespace. Other services will use other elements from other namespaces. It's also permissible to use elements from no namespace at all, although using namespaces is highly recommended.

Posting SOAP Documents

Currently, most SOAP messages are passed over HTTP using POST, just like XML-RPC messages. Other transport protocols such as SMTP, BEEP, and Jabber can be supported as well. However, there are a couple of crucial differences in the HTTP headers used for SOAP:

The HTTP request header must contain a SOAPAction field.
If the SOAP request fails, the server should return an HTTP 500 Internal Server Error rather than 200 OK.

The SOAPAction field alerts web servers and firewalls that they're dealing with a SOAP message. This enables firewalls to filter SOAP requests more easily without looking at the request body. The value of the SOAPAction field is a double-quoted URI that somehow indicates the intent of the message. For instance, if Example 2.15 were POSTed to a servlet running on www. ibiblio .org under the control of the user elharo , then you might use the SOAPAction http://www.ibiblio.org/#elharo to indicate to the server and firewall which user was responsible for processing this request. This is shown in Example 2.17.

Example 2.17 A SOAP Request for the Current Stock Price of Red Hat

 POST /xml/cgi-bin/SOAPHandler HTTP/1.1 Content-Type: text/xml; charset="utf-8" Content-Length: 267 SOAPAction: "http://www.ibiblio.org/#elharo" <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Body>     <getQuote          xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <symbol>RHAT</symbol>     </getQuote>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Conceptually, SOAPAction URIs are very similar to namespace URIs because they aren't meant to be resolved. They simply provide a convenient way of assigning unique identifiers to certain classes of SOAP messages. There's no particular standard for choosing them. You might use the full absolute URL that receives the SOAP request, or you might use some previously agreed-upon URI. You can even use nothing at all. But the SOAPAction header must be present in order for the request to be identified as a SOAP request.

The server will normally send the response back to the client over the same socket the client used to send the request and then close the connection. Like any other HTTP response, a SOAP response begins with an HTTP return code, message, and header. Assuming the request was successful, the response code is 200 OK. Unlike the request, the response does not use any special header fields beyond those used by regular web browsers and servers. Example 2.18 demonstrates.

Example 2.18 A SOAP Document That Returns the Current Stock Price of Red Hat

 HTTP/1.0 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: 260 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Body>     <Quote xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <Price>4.12</Price>     </Quote>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Faults

It's a fact of life that requests fail. They may fail for reasons beyond the control of the SOAP provider. For example, you may launch your SOAP request into the ether just before the phone company severs the wire connecting you to the Internet while hooking up your neighbor's new DSL line. That sort of failure would make it itself known at a lower layer, below XML and SOAP, probably as a SocketException if you were working in Java.

It's also possible for your request to arrive successfully at the server, only to find that the server doesn't recognize the URL you're posting to. In fact, the server might not even be configured to support SOAP requests. This sort of error would not throw an exception, but it would return a 404 Not Found page rather than the expected SOAP response. Your code should be prepared to handle such events.

Finally, it's also possible for the SOAP responder itself to be reached and correctly invoked, but then be unable to process the request. This may occur because the request contained bad data (for example, a symbol for a stock that doesn't exist) or simply because the server code is buggy and encountered a problem. In these cases the SOAP server itself is responsible for producing the correct error message. This error message is a SOAP response with a SOAP-ENV:Envelope and a SOAP-ENV:Body , just like a normal response. However, the SOAP-ENV:Body must contain exactly one SOAP-ENV:Fault element and must not contain anything else.

The SOAP-ENV:Fault element contains up to four child elements:

faultcode

A faultcode element contains a qualified name such as SOAP-ENV:VersionMismatch to identify the fault.

faultstring

A faultstring element contains a plain text message to describe the fault for human readers.

faultactor

A faultactor element contains a URI to identify the node that generated the fault. It's used when a SOAP request is passed through a chain of handlers. This element is optional.

faultdetail

A faultdetail element is used when the fault is specifically related to the body of the request (for example, an unrecognized stock symbol) as opposed to the envelope. It contains child elements to describe the fault. This element is present only if the fault was related to the SOAP body as opposed to the SOAP header.

Caution

These four child elements of SOAP-ENV:Fault are not namespace qualified, which is a little surprising. They are not in the http://schemas.xmlsoap.org/soap/envelope/ namespace. They are not in some other namespace. They are in no namespace at all.

SOAP defines four specific fault codes in the http://schemas.xmlsoap.org/soap/envelope/ namespace to indicate common conditions in a generic way. These are

SOAP-ENV:VersionMismatch

The namespace of the SOAP-ENV element indicates that this message is intended for a server implementing a different version of the SOAP protocol; for example, a SOAP 1.2 message has been sent to a SOAP 1.1 server.

SOAP-ENV:MustUnderstand

There's something in the header that the message says the server must understand before acting, but the server does not recognize it. (I'll talk about this soon in the section SOAP Headers.)

SOAP-ENV:Client

The client sent a message that is somehow defective. Perhaps it omitted a key piece of information the server needs. For example, the getQuote message was sent and understood , but the getQuote element did not have a symbol child. The client is to blame for the problem.

SOAP-ENV:Server

The client sent a correctly formed message with all the necessary information, but some error prevented the server from processing it. For example, the server may have needed to connect to a remote database to retrieve some information, but the database server had crashed. The server is to blame for the problem.

Example 2.19 is a fault that might be returned in response to a request for the nonexistent stock ABCD. The faultcode element is set to SOAP-ENV:Client to indicate that the client's request was incorrect. The faultstring element just contains a brief string of unmarked-up text that can be used to describe the problem to a human reader more fully. The faultdetail content includes elements in the same namespace as the successful response, http://namespaces.cafeconleche.org/xmljava/ch2/ . Because this request was processed by a single node, no faultactor element is necessary.

Example 2.19 A SOAP Fault Response

 HTTP/1.0 500 Internal Server Error Content-Type: text/xml; charset="utf-8" Content-Length: 498 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"  xmlns:stock="http://namespaces.cafeconleche.org/xmljava/ch2/">   <SOAP-ENV:Body>     <SOAP-ENV:Fault>       <faultcode>SOAP-ENV:Client</faultcode>       <faultstring>         There is no stock with the symbol ABCD.       </faultstring>       <faultdetail>         <stock:InvalidSymbol>ABCD</stock:InvalidSymbol>       </faultdetail>     </SOAP-ENV:Fault>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Encoding Styles

The information encoded in the example SOAP documents to this point has been nothing more than Unicode text strings. When you want to encode other types, such as integers, arrays, and objects, you need to specify how the characters that make up the XML document should be deserialized into the local platform's understanding of those types. For example, if a Java program encounters the element <Price>4.12</Price> , should it convert it into a double ? a float ? a java.lang.String ? a java.math.BigDecimal ? a custom Price class? something else?

Any element in a SOAP document can have a SOAP-ENV:encodingStyle attribute whose value is a URI pointing to some kind of schema that specifies what types are assigned to which elements. The most common language to use for this schema is the W3C XML Schema Language. However, other schema languages such as RELAX NG are also allowed.

Example 2.20 uses the SOAP-ENV:encodingStyle attribute on the getQuote element to point to a schema at the relative URL trading.xsd. This schema defines the symbol element as having the custom type StockSymbol , and is shown in Example 2.21. This schema is used only for assigning types. Although it is not used for validation, with a little extra work it could be.

Example 2.20 A SOAP Document That Specifies the Encoding Style

 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">   <SOAP-ENV:Body>     <getQuote          xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/"               SOAP-ENV:encodingStyle="trading.xsd">       <symbol>RHAT</symbol>     </getQuote>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Example 2.21 A Schema That Assigns Type to Elements in the http://namespaces.cafeconleche.org/xmljava/ch2/ Namespace

 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" targetNamespace="http://namespaces.cafeconleche.org/xmljava/ch2/" xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/" elementFormDefault="qualified">   <xsd:element name="getQuote">     <xsd:complexType>       <xsd:sequence>         <xsd:element name="symbol" type="StockSymbol"                      maxOccurs="unbounded"/>       </xsd:sequence>     </xsd:complexType>   </xsd:element>   <xsd:simpleType name="StockSymbol">     <xsd:restriction base="xsd:string">       <!-- two to six upper case letters -->       <xsd:pattern value="[A-Z][A-Z][A-Z]?[A-Z]?[A-Z]?[A-Z]?"/>     </xsd:restriction>   </xsd:simpleType> </xsd:schema>

You can place the SOAP-ENV:encodingStyle attribute on any element in the document. It applies to that element and its descendants, and it overrides the schemas declared on any ancestor . It is common to place it on the root SOAP-ENV:Envelope element.

SOAP singles out one encoding style for special treatment. If the SOAPENV:encodingStyle attribute has the value http://schemas.xmlsoap.org/soap/encoding/ , then a predefined set of types is available that includes one element for each simple type defined in the W3C XML Schema Language and listed in Table 2.1. For example, assuming that the SOAP-ENC prefix is bound to the http://schemas.xmlsoap.org/soap/encoding/ URI ( not the same as the namespace URI or the prefix for the SOAP envelope), then an int can be placed in a SOAP-ENC:int element in the following manner:

 <SOAP-ENC:int>12</SOAP-ENC:int>

Table 2.2 gives the complete list of types and their normal Java semantics, although this really just mirrors Table 2.1. In many cases, Java does not have a type that exactly matches one of the derived types; thus it uses the broader base class. For example, Java does not have an unsigned integer type, but all values of type xsd:unsignedInt can fit into a Java long . Java does not have a PositiveInteger class, but all xsd:positiveInteger s can be represented by a java.math.BigInteger . In some cases the mapping is obvious. In others, different programs may use different Java types and objects to deserialize the same values. For example, an xsd:int is exactly a Java int , and an xsd:double is as close to a Java double as it's possible for a base-10 string to be. However, an xsd:anyURI could reasonably be converted to a java.net.URL , a java.lang.String , or some custom URI class.

Table 2.2. Simple Value Elements Defined in SOAP

SOAP Type	Java Type
`SOAP-ENC:string`	`java.lang.String`
`SOAP-ENC:boolean`	`boolean`
`SOAP-ENC:decimal`	`java.math.BigDecimal`
`SOAP-ENC:float`	`float`
`SOAP-ENC:double`	`double`
`SOAP-ENC:integer`	`java.math.BigInteger`
`SOAP-ENC:positiveInteger`	`java.math.BigDecimal`
`SOAP-ENC:nonPositiveInteger`	`java.math.BigInteger`
`SOAP-ENC:negativeInteger`	`java.math.BigInteger`
`SOAP-ENC:nonNegativeInteger`	`java.math.BigInteger`
`SOAP-ENC:long`	`long`
`SOAP-ENC:int`	`int`
`SOAP-ENC:short`	`short`
`SOAP-ENC:byte`	`byte`
`SOAP-ENC:unsignedLong`	`double` , or `java.math.BigInteger`
`SOAP-ENC:unsignedInt`	`long`
`SOAP-ENC:unsignedShort`	`int`
`SOAP-ENC:unsignedByte`	`int`
`SOAP-ENC:duration`	custom class
`SOAP-ENC:dateTime`	`java.util.Date`
`SOAP-ENC:time`	`java.sql.Time`
`SOAP-ENC:date`	`java.sql.Date`
`SOAP-ENC:gYearMonth`	custom class
`SOAP-ENC:gYear`	custom class, `int` , or `java.math.BigInteger`
`SOAP-ENC:gMonthDay`	custom class
`SOAP-ENC:gDay`	custom class, or `int`
`SOAP-ENC:gMonth`	custom class, or `int`
`SOAP-ENC:hexBinary`	`byte[]`
`SOAP-ENC:base64Binary`	`byte[]`
`SOAP-ENC:anyURI`	`java.net.URL` , `java.lang.String` , or a custom class
`SOAP-ENC:QName`	`java.lang.String` , or a custom class
`SOAP-ENC:NOTATION`	`org.w3c.dom.Notation`
`SOAP-ENC:normalizedString`	`java.lang.String`
`SOAP-ENC:token`	`java.lang.String`
`SOAP-ENC:language`	`java.lang.String` , or a custom class
`SOAP-ENC:NMTOKEN`	`java.lang.String` , or a custom class
`SOAP-ENC:NMTOKENS`	`java.lang.String` , or a custom class
`SOAP-ENC:Name`	`java.lang.String`
`SOAP-ENC:NCName`	`java.lang.String`
`SOAP-ENC:ID`	`java.lang.String`
`SOAP-ENC:IDREF`	`java.lang.String`
`SOAP-ENC:IDREFS`	an array or list of `java.lang.String` s, or a custom class
`SOAP-ENC:ENTITY`	`org.w3c.dom.Entity`
`SOAP-ENC:ENTITIES`	an `org.w3c.dom.NodeList` containing `org.w3c.dom.Entity` objects

These mappings are not written in stone. Some of the XML-like types such as SOAP-ENC:ENTITY and SOAP-ENC:IDREFS are particularly uncertain , and may be implemented in different ways in different environments. However, this should give you a fairly good idea of the sorts of possible mappings between SOAP types and Java types. In addition to this list of simple types, the http://schemas.xmlsoap.org/soap/encoding/ encoding defines concepts of structs, references, byte arrays, and arrays.

Structs

A struct is simply an element that contains child elements but has no mixed content. For example, following is a Quote struct that contains Symbol and Price members :

 <Quote xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">    <Symbol>RHAT</Symbol>   <Price>4.12</Price> </Quote>

In Java terms, by using the http://schemas.xmlsoap.org/soap/encoding/ encoding style, you're indicating that you want this element to be deserialized into an object of type Quote , which has two properties named Symbol and Price. In other words, the class definition looks something like this:

 public class Quote {   public String getSymbol();   public double getPrice(); }

You may or may not have such a class in your system. If the SOAP request began its life as a Quote object that was subsequently converted to XML, transmitted across the Internet, and then turned back into a Java object, then perhaps you do have such a class. But what if the object began its life as a C struct or a C++ object? or if it was never anything except an XML document? In these cases, there may not be a convenient Quote class into which you can deserialize this compound object. Another possibility is to decode the name value pairs into some form of Hashtable or HashMap . The names of the fields would be the keys, and the values of the fields would be the values.

What this encoding really tells you is roughly how the author intended this document to be handled. However if you have some other way of making sense of this data, you are free to use it. You are not limited to any one deserialization form.

References

A reference type uses an href attribute to point to a value stored elsewhere in the SOAP request. This mirrors the structure when two objects must both contain the same object. For example, consider this trade request:

 <Bid xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">    <Symbol>RHAT</Symbol>   <Price>4.12</Price>   <Account>777-7777</Account> </Bid> <Bid xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">   <Symbol>YHOO</Symbol>   <Price>4.12</Price>   <Account>777-7777</Account> </Bid>

In both cases the account number is the same. Furthermore, it's not simply that the two numbers are equal: They indicate the same object. In Java terms it's the difference between the equals() method and the == operator. The first tests for equality, whereas the second tests for identity. If the local semantics demand that each Account element be deserialized as an Account object (perhaps with other fields filled in from a database rather than from the XML document), then you want some means of saying that this document should produce one Account object rather than two. This is done with a reference. Give the first Account element a unique id attribute, and use an href attribute in the second element to point to it, as shown here:

 <Bid xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">    <Symbol>RHAT</Symbol>   <Price>4.12</Price>   <Account id="a1">777-7777</Account> </Bid> <Bid xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">   <Symbol>YHOO</Symbol>   <Price>4.12</Price>   <Account href="#a1"/> </Bid>

This document represents two Bid objects. Each has three properties: Symbol , Price , and Account . The symbols are completely different. The prices are equal but not identical; that is, one can change without changing the other. There are two separate prices here that coincidentally have the same value. The Accounts, on the other hand, are identical. There is only one account here, used in two different places.

Arrays

In Java arrays are a funny kind of object, and in SOAP they are too. An array is represented as an element whose type is SOAP-ENC:Array . For example, this is an array of three numbers:

 <Bid xsi:type="SOAP-ENC:Array">    <Price>4.52</Price>   <Price>0.35</Price>   <Price>34.68</Price> </Bid>

In an array, the names of the elements don't really mean anythingonly the positions matter. If the name of the array doesn't matter either, you can use a SOAP-ENC:Array element instead. For example, this is an array of three doubles, with no extra information:

 <SOAP-ENC:Array>    <SOAP-ENC:double>4.52</SOAP-ENC:double>   <SOAP-ENC:double>0.35</SOAP-ENC:double>   <SOAP-ENC:double>34.68</SOAP-ENC:double> </SOAP-ENC:Array>

SOAP arrays are not as strongly typed as Java arrays are, at least by default. Whereas each array in Java must contain exclusively ints, or strings, or objects, a SOAP array can contain data of varying types. For example, the following array contains three items, each with a different name and type:

 <Bid xsi:type="SOAP-ENC:Array">    <Symbol  xsi:type="xsd:token">RHAT</Symbol>   <Price   xsi:type="xsd:double">4.12</Price>   <Account xsi:type="xsd:string">777-7777</Account> </Bid>

Given this possibility, it can be difficult to decode a SOAP array into a Java array. The closest Java equivalent is an Object[] array. However, primitive types like double would need to be replaced by an instance of the matching type wrapper class, such as java.lang.Double instead. Another possibility is to use a java.util.Vector or java.util.ArrayList instead of a straight array, though this still doesn't remove the need for the type wrapper classes.

If you want to restrict the type of array components , you can add a SOAP-ENC:arrayType attribute to the array element. The value of this attribute is the type of the individual component followed by square brackets containing the length of the array. This is more similar to C's array declaration syntax than Java's. For example, this array must contain exactly three doubles:

 <Bid xsi:type="SOAP-ENC:Array" SOAP-ENC:arrayType="xsd:double[3]">     <Price>4.52</Price>    <Price>0.35</Price>    <Price>34.68</Price>  </Bid>

Any array component not specifically typed otherwise can be a struct. Furthermore, any array component can be another array. However, this does not produce a multidimensional array. Instead, multidimensional arrays are created by stringing together the values from the second row after the values from the first row, the values from the third row after the values from the second row, and so on. The SOAP-ENC:arrayType attribute indicates the number of columns . For example, this is a three-row by two-column array of doubles:

 <SOAP-ENC:Array SOAP-ENC:arrayType="xsd:double[3,2]">     <SOAP-ENC:double>1.1</SOAP-ENC:double>    <SOAP-ENC:double>1.2</SOAP-ENC:double>    <SOAP-ENC:double>2.1</SOAP-ENC:double>    <SOAP-ENC:double>2.2</SOAP-ENC:double>    <SOAP-ENC:double>3.1</SOAP-ENC:double>    <SOAP-ENC:double>3.2</SOAP-ENC:double>  </SOAP-ENC:Array>

Although the XML representation is one-dimensional, the Java interpretation is two-dimensional. When deserialized, this forms the following Java array:

 double[][] array = {   {1.1, 1.2},   {2.1, 2.2},   {3.1, 3.2} }

In the interest of efficiency over potentially slow networks, SOAP allows partially transmitted and sparse arrays. A partially transmitted array (also known as a varying array ) does not begin with position 0; it instead begins at a specified index. For example, it might have ten components indexed from 3 to 12 inclusive. In SOAP you indicate the position at which a partially transmitted array begins with a SOAP-ENC:offset attribute. The value of this attribute is the index of the first element in the array enclosed in square brackets. For example, the following array begins at 3:

 <SOAP-ENC:Array SOAP-ENC:offset="[3]">    <SOAP-ENC:string>Component 3</SOAP-ENC:string>   <SOAP-ENC:string>Component 4</SOAP-ENC:string>   <SOAP-ENC:string>Component 5</SOAP-ENC:string>   <SOAP-ENC:string>Component 6</SOAP-ENC:string>   <SOAP-ENC:string>Component 7</SOAP-ENC:string>   <SOAP-ENC:string>Component 8</SOAP-ENC:string>   <SOAP-ENC:string>...</SOAP-ENC:string> </SOAP-ENC:Array>

Java doesn't have such arrays, although Pascal and some other languages do. In Java you would likely deserialize such an array by putting null values or zeroes in the places before the beginning of the array.

In a sparse array, a very large percentage of the components are 0 or null. In SOAP a sparse array would pass only the non-zero /non-null components. However, when the array was deserialized, these would be filled in with zeroes or nulls. The number of elements in a sparse array must be specified by a SOAP-ENC:arrayType attribute. The position of each element that is provided is given by a SOAP-ENC:position attribute. For example, following is a ten-element array that provides only the second, third, and fifth elements:

 <SOAP-ENC:Array SOAP-ENC:arrayType="xsd:string[10]">    <SOAP-ENC:string SOAP-ENC:position="[2]">     2nd component   </SOAP-ENC:string>   <SOAP-ENC:string SOAP-ENC:position="[3]">     3rd component   </SOAP-ENC:string>   <SOAP-ENC:string SOAP-ENC:position="[5]">     5th component   </SOAP-ENC:string> </SOAP-ENC:Array>

The equivalent Java code looks like this:

 String[] array = new String[10];  array[2] = "\n     2nd component\n    "; array[3] = "\n     3rd component\n    "; array[5] = "\n     5th component\n    ";

Byte Arrays

A byte array is just a string that somehow encodes binary data. The most common such encoding is base64. A schema or an xsi:type attribute is needed to identify the encoding. For example, the following is a base64 encoded byte array that provides an SHA-1 digital signature for a document. The signature is normally 20 bytes, which becomes 56 characters when translated to base64.

 <SignatureValue>  AgGOvkMdqdKT7QyMuXPsuomkOqqEhGukKkj4Em7OKKQxYzheuseS8Q== </SignatureValue>

In Java this would normally be deserialized into a byte array.

SOAP Headers

In addition to the body of the request, each SOAP document can contain a header. This is not an HTTP header; rather, it is an additional child of the SOAP-ENV:Envelope element, specifically a SOAP-ENV:Header element. If a SOAP request is an envelope, then the body is the letter inside the envelope, and the header is the writing on the outside of the envelope that tells the post office where to deliver it, where to send it back if it can't be delivered, and how much you paid to get the letter delivered. In other words, a SOAP header provides meta-information about the request.

The sort of meta-information provided varies from request to request and from SOAP application to SOAP application. Some things that can be exchanged in headers include

Protocols the server must understand to process the request
A digital signature for the body of the message
A schema for the XML application used in the body
Credit card information to pay for the processing
A public key to be used to encrypt the response

Example 2.22 shows a bid document in which the header carries credit card information to pay for the request. In this case, the syntax used for the Payment element is specific to the XML application used in the body and even comes from the same namespace.

Example 2.22 A SOAP Request with a Digital Signature in the Header

 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Header>  <Payment xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <Name>Elliotte Harold</Name>       <Issuer>VISA</Issuer>       <Number>5125456787651230</Number>       <Expires>2005-12</Expires>     </Payment>   </SOAP-ENV:Header>   <SOAP-ENV:Body>     <buy id="buy1"          xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <symbol>MRBA</symbol>       <shares>100</shares>       <account>777-7777</account>     </buy>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

Like the SOAP body, the SOAP header can use any XML application it cares to use to encode the data. It is not limited to a fixed vocabulary. Indeed it can use more than one such vocabulary. The SOAP-ENV:Header element can contain multiple child elements from a hodgepodge of different namespaces. Each one of these elements, called a header entry, may be treated independently of the other header entries. Example 2.23 adds an additional header containing a digital signature for the request body. The syntax used for the Signature element is defined by XML-Signature Syntax and Processing [http://www.w3.org/TR/xmldsig- core /].

Example 2.23 A SOAP Request with Two Header Entries

 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Header>  <Payment xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <Name>Elliotte Harold</Name>       <Issuer>VISA</Issuer>       <Number>5125456787651230</Number>       <Expires>2005-12</Expires>     </Payment> <Signature xmlns="http://www.w3.org/2000/09/xmldsig#">   <SignedInfo>     <CanonicalizationMethod     Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"/>     <SignatureMethod       Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1" />     <Reference URI="file://J/xss4j/requestbody.xml">     <DigestMethod       Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" />       <DigestValue>3UxhLrdPpK3faRms5FOS6kAoeZI=</DigestValue>     </Reference>   </SignedInfo>   <SignatureValue>     ZeW/PYGT6A9iOqOrbMmeKOq1aQk+ars/QOC95Bj0xYrNAnLo/WK7+g==   </SignatureValue> </Signature>   </SOAP-ENV:Header>   <SOAP-ENV:Body>     <buy id="buy1"          xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <symbol>MRBA</symbol>       <shares>100</shares>       <account>777-7777</account>     </buy>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

The mustUnderstand Attribute

An individual SOAP document tends to be tied pretty closely to the service it plans to talk to. You can't send a request for a stock quote to a server designed to provide basketball scores and expect to get sensible results back. In order to indicate what is required of a server, a SOAP request can contain a SOAP-ENV:mustUnderstand attribute on each header entry. If this attribute has the value 1, then the service that receives the SOAP request must process the header entry. If it cannot, either because it does not understand the header entry or for some other reason, then it must fail the request and return a fault. If the SOAP-ENV:mustUnderstand attribute has the value 0, then processing the header is optional. The service should do so if it can, but failing to do so does not automatically lead to a fault. The default is 0.

Example 2.24 is a BUY order that requires the receiver to understand the Payment header. If the server does not recognize that header, it must not attempt to fulfill the order.

Example 2.24 A SOAP Request with a mustUnderstand Attribute

 <?xml version="1.0"?> <SOAP-ENV:Envelope  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >   <SOAP-ENV:Header>   <Payment xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/"              SOAP-ENV:mustUnderstand="1">       <Name>Elliotte Harold</Name>       <Issuer>VISA</Issuer>       <Number>5125456787651230</Number>       <Expires>2005-12</Expires>     </Payment>   </SOAP-ENV:Header>   <SOAP-ENV:Body>     <buy xmlns="http://namespaces.cafeconleche.org/xmljava/ch2/">       <symbol>MRBA</symbol>       <shares>100</shares>       <account>777-7777</account>     </buy>   </SOAP-ENV:Body> </SOAP-ENV:Envelope>

The actor Attribute

Although this book mostly focuses on SOAP messages that go straight from the sender system to the receiver who will process them, not all systems are this simple. A SOAP message can be forwarded from one SOAP processor to the next until it reaches its ultimate destination. By default, the headers are only read by the last processor. However, you can indicate that a header is intended for a closer processor by using an actor attribute on the header entry element. The value of this attribute is a URI identifying the processor for which the header entry is intended.

When a processor receives a SOAP message, it searches the header for header entries addressed to it. It acts on these header entries and deletes them. It may also add new header entries intended for processors later in the chain. Then it forwards the message to the next processor in the chain.

All processors except the final one only act on header entries that are specifically addressed to them. After acting on an entry, the processor deletes it before forwarding the request to the next processor in the chain. Furthermore, the URL http://schemas.xmlsoap.org/soap/actor/next indicates that the header entry should be processed and deleted by the first processor that sees it. The last processor in the chain processes any header entries not addressed to any processor in particular as well as any header entries that are addressed specifically to it.

The exact scheme for forwarding SOAP messages from one processor to the next is system dependent. For example, you might set up a gateway server outside the firewall to verify certain characteristics of a SOAP message before forwarding it to a processor inside the firewall. Such a gateway would either block or forward each message. A switching processor might inspect the body of the message and forward the request to different SOAP processors depending on what it saw there. Some systems might even use routing included in the messages themselves .

SOAP Limitations

Regrettably, in my opinion, SOAP does not allow developers to take full advantage of XML's expressiveness and extensibility. First of all, according to the SOAP 1.1 specification, "A SOAP message MUST NOT contain a Document Type Declaration." This allows non-validating parsers and parsers that cannot resolve external entities to be used to process SOAP messages without concern that they may be misinterpreting them because they don't apply default namespaces or resolve external entities. But, it also means the document can't be validated against a DTD.

Also according to the SOAP 1.1 specification, "A SOAP message MUST NOT contain Processing Instructions." Honestly this makes no sense to me whatsoever. I see little reason for forbidding these. This does mean that all information in a SOAP request must be passed through the defined SOAP structure, but it also makes it difficult to include other useful features beyond the SOAP structure. The most obvious is that you can't easily apply a stylesheet to a SOAP documentalthough that's not a huge loss because SOAP documents aren't meant for humans to read in the first place. However, it also means that it's difficult to serve SOAP documents out of the Cocoon application server. There are probably many other environment-specific instances where this becomes inconvenient.

Validating SOAP

SOAP is actively hostile to DTDs. The SOAP specification specifically forbids a SOAP request from containing a document type declaration. Thus you really have to use a schema to validate your documents, if you validate them at all.

Unlike XML-RPC, SOAP does have an official schema. In fact it has two, which you can download from the SOAP namespace URLs. The envelope schema at http://schemas.xmlsoap.org/soap/envelope/ describes the SOAP complex types: SOAP-ENV:Envelope , SOAP-ENV:header , SOAP-ENV:Body , and so on. The encoding schema at http://schemas.xmlsoap.org/soap/encoding/ defines the SOAP data types listed in Table 2.2: SOAP-ENC:int , SOAP-ENC:NMTOKENS , SOAP-ENC:gYear , and so on. You can find these schemas in Appendix B.

XML-RPC is a monolithic XML application not designed to be integrated with other XML applications. SOAP, by contrast, is incomplete without some other XML application to form the body of the SOAP request. Thus the SOAP schema cannot be monolithic. Because it must rely on some other XML application in its own namespace (or perhaps no namespace at all, although this is not recommended), the SOAP schema cannot on its own validate any SOAP documents. It also requires that the developer provide a separate schema for the document bodies, and then merge the two together using xsd:import elements.

Example 2.25 is a master schema for quote request documents such as Example 2.15. This schema declares no elements of its own but does import both SOAP schemas, as well as the schema for getQuote elements seen earlier in Example 2.21. This schema can be used to validate a complete SOAP request that has a getQuote body element. If you wanted to validate the other SOAP documents in this chapter that use other elements in the header and body, you would just need to write declarations for those elements too. They could be placed in the master schema, trading.xsd, or their own schema documents, whichever seems most convenient.

Example 2.25 A Master Schema for SOAP Trading Documents

 <?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"   targetNamespace="http://schemas.xmlsoap.org/soap/envelope/">   <!-- Standard SOAP schemas -->   <xsd:include     schemaLocation="http://schemas.xmlsoap.org/soap/envelope/"   />   <xsd:import     schemaLocation="http://schemas.xmlsoap.org/soap/encoding/"     namespace="http://schemas.xmlsoap.org/soap/encoding/"   />   <!-- Local schema -->   <xsd:import schemaLocation="trading.xsd"     namespace="http://namespaces.cafeconleche.org/xmljava/ch2/"   /> </xsd:schema>