SOAP Theory | The Official XMLSPY Handbook

Simple Object Access Protocol (SOAP) was developed several years ago—specifically, in December 1999 the W3C organized the XML protocol discussion mailing list. SOAP, in its first incarnation, was a simple specification used to specify how two computers can share information using an XML message. Before SOAP applications, developers had applications communicate with each other by using technologies such as DCOM (Distributed Component Object Model), CORBA (Common Object Request Broker Architecture), IIOP (Internet Inter-ORB Protocol), and Java RMI (Remote Method Invocation). The problem with these technologies is that they are specific and not one of them is a universal protocol. Therefore, to permit Internet and extranet (and not just intranet) communications, the individual parties had to agree to use a specific technology. With SOAP, the parties still need to communicate about the messages they are exchanging, but the parties do not need to discuss the technologies used.

The purpose of SOAP is to allow any computer to exchange information with another computer using any protocol. What SOAP brings to the table is a way of structuring a message so that the receiving computer can use it. For example, you decide to pass along a stack of paper to a coworker in your office. The coworker looks at you and asks, “Okay, so what do I do with this stack of paper?” If you were using the SOAP approach, you would have handed off the stack of papers wrapped in associated folders marked with sticky notes describing the purpose of the enclosed documents. SOAP, in combination with XML Schemas and WSDL, doesn’t attempt to decipher the message. Instead, it provides information on how the message should be handled.

The SOAP specification in a nutshell

In its simplest form, a SOAP message looks like the following:

<Envelope>     <Header>...</Header>     <Body>...</Body> </Envelope>

The SOAP message has three basic XML tags:

Envelope: A wrapper for the SOAP message that provides the root node element
Header: A tag used to contain information about how to route, process, or manipulate the SOAP contents; content within the Header tag is used solely by the SOAP infrastructure
Body: A tag used to contain the actual message that will be used in an application. The contained message is constructed using XML.

The previous SOAP message is the simplest example, but the SOAP message may look more like this:

<?xml version="1.0" ?> <env:Envelope xmlns:env="http://www.w3.org/2001/12/soap-envelope">      <env:Header> ... </env:Header>     <env:Body> ... </env:Body> </env:Envelope>

The reason you are more likely to see SOAP messages that resemble this second example is because most SOAP messages contain XML namespaces. SOAP has undergone some version changes, and the way that specific XML tags are interpreted may have changed. Hence, by using a specific namespace URI identifier, it is possible to accept or reject SOAP messages. The acceptance or rejection of the SOAP message could be a SOAP gateway, or it could be the SOAP processor itself. The env namespace identifier specifies the version of the SOAP message.

To generate or process a message, the application looks for content between the XML Body tag. The content within the Body tag must have its own root. The best way to understand this is to consider SOAP as either a single XML document or multiple documents embedded within the SOAP XML document.

Consider the following SOAP message:

<Envelope>     <Header>...</Header>     <Body>         <item>something</item>         <item>another thing</item>     </Body> </Envelope>

In terms of an XML document, in which there is only one root node, it appears that the Body tag is the root document node for the item tag nodes. In fact, what is happening here is that there is no single child XML document, but multiple child documents with each child document being the item tag.

Must there be either a single child node or multiple child nodes? The answer is no because SOAP is a flexible architecture that requires the XML tags to adhere to the SOAP specification. The only requirement that has been proposed in the SOAP 1.2 specification is that each child element must have its own namespace. This makes it possible to separate the SOAP message from the XML document contained within it. In a bigger picture, this makes it possible to manipulate documents without requiring the XML document or the SOAP message to know about the other.

The details of the SOAP specification

Now that you’ve had a look at SOAP at its simplest level, this section walks you through the details. SOAP, by its nature, is an abstract communication mechanism without specification. In the past, and still today, the most common scenario is to send the SOAP message using the HTTP protocol. But you can also send SOAP messages by using SMTP (Simple Mail Transfer Protocol) or any other protocol. SOAP is, by its nature, stateless, meaning that a SOAP message sent at one moment in time has no recollection of an associated SOAP message sent at another period of time.

The SOAP specification does not specify how to communicate. To compare this to a language, you could say that SOAP specifies that there are words, but not how the words are put together to form a language. It would seem that SOAP is useless without this additional functionality. On the contrary, SOAP focuses on one thing and does that one thing well. The focus of SOAP is making it possible to exchange a message with another entity. In SOAP terms, there is the sender that sends the message and the receiver that receives the message. The sender may only send a single message and not receive a response, in which case the SOAP message was a one-way message. If the sender sends a message and the receiver sends a response to the SOAP message, a send response SOAP exchange occurred. Both scenarios are part of the SOAP specification.

Sending a SOAP Message

In both a one-way and a send response message exchange, the sender must send a SOAP message. A typical SOAP message is as follows:

<?xml version=”1.0” ?> <env:Envelope xmlns:env=”http://www.w3.org/2001/12/soap-envelope”>      <env:Header>        <th:transaction            xmlns:th=”http://www.transaction.org/2001/12/transaction”>            123        </th:transaction>     </env:Header>     <env:Body>         <math:add xmlns:math=”http://www.devspace.com/2002/1/math”>             <math:num>1</math:num>             <math:num>1</math:num>         </math:add>     </env:Body> </env:Envelope>

One item of interest in this message example is that XML namespaces are used everywhere. This is an important feature of SOAP that has been highlighted in the SOAP 1.2 specification. Namespaces make it possible to break a document into separate processing segments without actually destroying the format of the document.

SOAP and Namespaces

The typical SOAP message just shown has three different namespaces: env, th, and math. The env namespace is part of the SOAP specification. The th namespace is a fictitious namespace that references a transaction specification. The math namespace references a namespace defined by the author to perform mathematical additions. In each of the examples, the namespace references a URI. You may think that if you typed the URI into your browser, something would be returned. Namespace identifiers don’t work that way, however. The URI, in this case, is simply an identifier.

Standard XML parsers use namespace identifiers as a mechanism to group XML tags. A standard parser will not access the Internet to see what the namespace identifier represents. Applications use the namespace identifier to make sure that they are parsing the correct data. In all SOAP processor implementations, the namespace identifiers are used to identify the version of the SOAP and to execute the correct SOAP message. For example, the EasySOAP++ client is pre-SOAP 1.2 and does not accept SOAP 1.2 responses. EasySOAP++ can handle only SOAP 1.1 requests. Therefore, to process SOAP 1.2 messages, you would have to use Apache Axis, Microsoft .NET, or something similar.

SOAP Headers

In the typical SOAP message example shown earlier, there was a SOAP Header XML tag and child transaction XML tag. The SOAP Header tag is optional and does not need to be present. The purpose of the SOAP Header tag is to instruct the infrastructure what to do with the message. In the case of the typical SOAP message example, the transaction XML tag is referencing a currently executing transaction. The transaction may or may not mean anything to the receiver that has to process the SOAP message.

In the SOAP message example, if the transaction means nothing to the receiver, the transaction is ignored and processing continues. However, if the XML-tag transaction were constructed as follows, the transaction could not be ignored:

<th:transaction      xmlns:th=”http://www.transaction.org/2001/12/transaction”      env:mustUnderstand=”1”>

In the modified version of the transaction XML tag, an attribute mustUnderstand is added. The mustUnderstand attribute is part of the SOAP namespace (prefixed with an env) and is a specification. In the SOAP specification, it is stated that the receiver must understand the mustUnderstand attribute in a header element to be able to process the SOAP message. If the receiver cannot understand the header element, the receiver must issue a fault stating that the receiver is not able to process the header element.

In a previous paragraph, I mentioned that a SOAP header element is used to indicate how to process the SOAP headers. This is not entirely true because specific actor attributes can be assigned. For example, a SOAP message needs routing information that is specific to the domain where the SOAP message is being sent. At this point, a SOAP header has to be added, but the SOAP receiver will not process it. A so-called SOAP intermediary processes the SOAP message. A SOAP intermediary is a server that has the capability to receive a SOAP message and process the SOAP headers, but not the contents of the SOAP message itself. If a SOAP header has to be specifically processed by the SOAP intermediary, it is possible to assign an actor attribute as shown by the following example:

<th:some_action xmlns:th="http://www.devspace.com/2001/12/something"  env:actor="http://www.devspace.com/some_logical_machine" ..>

In this example, the some_action XML tag has an actor attribute, which indicates that some_action tag should be processed at a specific SOAP intermediary. In this case the SOAP intermediary is specified using a specific URI, which (in theory) could be the IP of some computer. However, specifying a specific IP is shortsighted, and instead the SOAP intermediary should be specified using a logical name. The logical name should reflect either the business process or the purpose of the machine. Examples could include gateway, router, firewall, and so on.

One final word on SOAP headers: if a specific header element has child elements, and the specific element was not processed, then the child elements are not processed either. When processing header elements, there is a strict parent-child relationship that cannot be broken.

SOAP BODY

Every SOAP message must have a SOAP Body XML tag. Within the SOAP Body is the data to be processed by the receiver. Any valid XML data that has an associated namespace identifier can be included. Not having a namespace identifier causes an error in the SOAP processor. Otherwise, the data does not matter to the SOAP processor and is passed on as is. Within the SOAP Body, there can be either one child element or multiple child elements. The significance of a single or multiple child elements is only of interest to the SOAP message processor.

SOAP ENCODING

In the SOAP 1.2 specification, the other important factor is encoding of the data. Although XML allows any type of structuring and formatting, there are encoding schemes defined. The encoding schemes are purely optional and need not necessarily apply. However, if a sender sends a message with encoding defined, the receiver should, generally, not ignore the encoding. The purpose of encoding is to make it simpler to translate a data type from one application platform to another application platform.

However, there are some things to consider regarding encoding. In previous versions of SOAP, encoding was part of the specification, but now encoding is an adjunct of the SOAP 1.2 specification. What does this mean to the average programmer? It means that the W3C standards body is starting to split SOAP into different sections and areas. The SOAP-encoding specification specifically deals with using SOAP in an RPC (Remote Procedure Context). If SOAP is used in another context, the encoding specification does not necessarily apply. But there is a generally accepted encoding that applies in all circumstances. That is XML Schema Part 2, which defines how numbers, complex structures, and other things are encoded. The SOAP encoding narrows down the specification for use in arrays and other programmer-specific items. A more detailed discussion of encoding is beyond the scope of this book.

Getting a Reply

When the SOAP message causes the receiver to generate a reply, the reply must take a specific form. The reply has the exact same format as the request except that the generated content is a response. With SOAP 1.2, the notion is that a reply is the same as a message sent.

Getting an Error Message

The only special type of SOAP message is a SOAP error message. The generic form of the SOAP error message is as follows:

<?xml version=”1.0” ?> <env:Envelope xmlns:env=”http://www.w3.org/2001/12/soap-envelope”>    <env:Body>         <env:Fault>             <faultcode>env:Receiver</faultcode>             <faultstring>Something happened</faultstring>             <detail>                 <err:message                      xmlns:err=”http:www.devspace.com/errors”>                     Here is some error                 </err:message>             </detail>         </env:Fault>     </env:Body> </env:Envelope>

The SOAP error is a document within the SOAP Body tag as specified by the Fault XML tag. The Fault tag is part of the env namespace, which is part of the SOAP specification. What is odd about the error specification is that, within the Fault tag, things become a bit illogical. The SOAP specification hammers home the point of the namespace identifying all elements of the SOAP message, except within the standard Fault child tags. Another oddity is that the value contained within the faultcode tag is namespace identified. The final oddity is that the four child XML tags are all lowercase although, thus far, everything else has been defined with the first letter uppercase. Although none of these oddities is illegal XML, each is just a bit out of place with the rest of the well-designed SOAP specification.

Here are the four possible child XML tags:

faultcode: Specifies the error that has occurred. The faultcode error needs to be programmatically recognizable because the SOAP processor uses this error code to display the error. There are predefined error codes as follows:
- VersionMismatch: The SOAP Envelope message being sent does not have the same namespace identifier as was expected by the receiver.
- MustUnderstand: A SOAP Header element was passed that the SOAP receiver or intermediary does not understand. An example of this error is given in the next section
- DTDNotSupported: If a SOAP message contains a DTD description anywhere, the SOAP infrastructure will generate an error because DTDs are not supported.
- DataEncodingUnknown: An encoding is used within the SOAP message that is not supported by the SOAP infrastructure.
- Sender: An error has occurred where the SOAP sender has sent data that makes it impossible for the SOAP intermediaries or SOAP receiver to process the SOAP message.
- Receiver: An error has occurred in which the problem is not the SOAP message itself, but the processing of the SOAP message. Although this may be the same as an error in the content, it really means the error lies with a dependency caused by the SOAP message. For example, if the SOAP message causes another SOAP message call and that call has an error, this faultcode is generated.
faultstring: Specifies a human-readable explanation of the error. The value for faultstring is a string as defined by the XML Schema string definition.
faultactor: Specifies which node on the SOAP calling chain caused the error. Using this value, it is possible to pinpoint if the SOAP receiver was at fault or if a SOAP intermediary was at fault.
detail: Specifies a more detailed error message. Unlike the faultstring, the detail part of the error message can have sub-XML elements. However, if child XML elements are defined then they must be namespace defined.

The mustUnderstand attribute specifies that the specified SOAP intermediary or SOAP receiver must know what to do with the child SOAP Header element. If, however, the child SOAP Header element is unknown, an error is generated. But the SOAP error message has an additional element within the SOAP Header collection indicating which child SOAP Header element is the problem. An example error message is as follows:

<?xml version="1.0" ?> <env:Envelope      xmlns:env="http://www.w3.org/2001/12/soap-envelope"     xmlns:err="http://www.w3.org/2001/12/soap-faults">     <env:Header>         <err:Misunderstood qname=" th:transaction"          xmlns:th="http://www.transaction.org/2001/12/transaction" />     </env:Header>     <env:Body>         <env:Fault>             <faultcode>env:Receiver</faultcode>             <faultstring>               Could not understand SOAP Header sub-tag             </faultstring>         </env:Fault>     </env:Body> </env:Envelope>

This SOAP error message is in reference to the SOAP message that was sent with a transaction header element. In this example, the unknown header element is exchanged with the SOAP err namespace-defined Misunderstood element. The attribute qname specifies which header element is the problem. Notice that the header element associated namespace and namespace identifier are also stored within the Misunderstood XML tag. Exact details of why the header element is incorrect are given within the Fault tag.

SOAP bindings and protocols

In previous versions of the SOAP specification, the HTTP protocol was always referenced within the specification. Starting with the SOAP 1.2 specification, this reference has changed to transport bindings. SOAP, from day one, has not been protocol-specific, but because HTTP was used most frequently, HTTP was referenced. Transport bindings make it possible to send a SOAP message on a variety of different protocols. SOAP 1.2 still only specifically outlines the HTTP protocol, but other protocols are in the process of being defined. For example, another place where SOAP messages can be sent is via e-mail protocols—such as SMTP (Simple Mail Transport Protocol) or POP (Post Office Protocol).

What do transport bindings mean to the SOAP specification itself? Actually, not much. SOAP can be sent on any protocol because SOAP is a self-enclosed specification. Transport bindings have significance only for those programmers who implement SOAP processors. For example, defining a SOAP transport binding for HTTP specifies that a SOAP message is not the MIME type text/xml but application/ soap. This detail would never be of relevance to the application programmer.