An Introduction to the Theory of SOAP | Applied Software Engineering Using Apache Jakarta Commons (Charles River Media Computer Engineering)

Around December 1999, the Simple Object Access Protocol (SOAP) became official after the W3C organized an XML protocol discussion mailing list. The initial version of SOAP was meant to allow a computer to exchange data using XML. Before SOAP, those types of communications were carried out using technologies such as Distributed Component Object Model (DCOM), Common Object Request Broker Architecture (CORBA), or Java Remote Method Invocation (RMI). The problem with these protocols is that they are specific to a language, vendor, or platform. Before SOAP, there was no such thing as a universal protocol. The protocols that existed around that time were intended for Intranet consumption and not the Internet. SOAP changed all that. Using SOAP, people could exchange XML data packets and nobody cared what technology, vendor, or platform the client or server used.

By nature, the SOAP specification has no protocol, but it was never defined that way. SOAP could be used by any other protocol, or you could use it using raw network communications. Originally when SOAP was developed, the specification used HTTP as the protocol. In the SOAP 1.2 specification, SOAP has been disentangled from an underlying protocol, and bindings to protocols have been defined. There are specifically defined bindings, but a SOAP message could be sent using the Simple Mail Transfer Protocol (SMTP). SOAP defines in an abstract sense a sender and receiver. The terms "client" and "server" are not used because in SOAP a server could send a message to the client without warning. In that scenario, the server would be the sender and the client the receiver.

The SOAP Specification

In the simplest case, a SOAP message is identical to Listing 10.1.

Listing 10.1

 <Envelope>  <Header>...</Header>  <Body>...</Body> </Envelope>

The SOAP message has the following two parts :

Header : Is an XML tag used to define how to describe the packet of data. This is similar to an address on an envelope you are mailing. Within the XML tag Header you can have multiple child XML tags, which depend on the SOAP infrastructure used.
Body : Is an XML tag used to define the payload of the package of data. Every SOAP message must have a SOAP Body XML tag. This is similar to the letter within the envelope you are mailing. Within the XML tag Body you can have a single child XML tag, or multiple ones. Only the SOAP message processor is interested in how many child elements there are. The available tags are dependent on the data exchanged between the client and the server, or vice versa.

The content of the SOAP packet is stored within the XML tag Body . Within the contents of the SOAP Body tag you can have any valid XML tags that are namespace identified. Abstractly, this is like embedding an XML document within another XML document. A Web Service client or server would generate the data within the body. The Web Service infrastructure would generate the data around the body, which includes the XML tags Envelope and Header .

Listing 10.1 is a simple SOAP message. However, most SOAP messages would be structured like Listing 10.2.

Listing 10.2

 <?xml version="1.0" ?> <env:Envelope xmlns:env=" http://www.w3.org/2003/05/soap-envelope "> <env:Header> <th:transaction xmlns:th="http://www.transaction.org/2001/12/transaction"> 123 </th:transaction> </env:Header> <env:Body> <math:add xmlns:math="http://www.devspace.com/2003/6/math"> <math:num>1</math:num> <math:num>1</math:num> </math:add> </env:Body> </env:Envelope>

Listing 10.2 is a typical SOAP message because XML namespaces are used. The SOAP 1.2 specification explicitly defines that you should use XML namespaces. XML namespaces prevent the various XML document chunks from being confused with each other.

In Listing 10.2, there are three different namespaces: env , th , and math . The env namespace is part of the SOAP specification. The th namespace is a fictitious namespace that references a transaction specification. The math namespace references a namespace defined by the author to perform mathematical additions. In each of the examples, the namespace references a URL. You might be tempted to believe that if you typed the URL into the browser something would be returned. You would be sadly mistaken, because a namespace identifier is just that, an identifier, even if it resembles a URL.

SOAP Headers

In Listing 10.2, the SOAP message had a SOAP Header tag. Contained within the SOAP Header tag was a child transaction XML tag . The SOAP Header tag is optional. When a SOAP Header tag is present, the contents within are translated as infrastructure data. Examples of defining a header include defining a transaction or routing information. If a particular embedded XML tag means nothing to the SOAP infrastructure, then the embedded XML tag is ignored.

Listing 10.3 is an example where the embedded XML tag cannot be ignored.

Listing 10.3

 <th:transaction xmlns:th="http://www.transaction.org/2001/12/transaction" env:mustUnderstand="1">

In Listing 10.3, the XML tag transaction has an attribute called mustUnderstand . Notice how the XML tag transaction and attribute mustUnderstand are associated with different namespaces. This is because the attribute mustUnderstand is part of the SOAP specification. The attribute mustUnderstand with a value of 1 indicates that the SOAP infrastructure must understand the tag, which in the case of Listing 10.3 is the XML tag transaction . If the tag cannot be understood , then an error is generated and returned to the sender of the message. This is akin to addressing a letter using a zip code that does not exist.

The sender or receiver process does not need to process the SOAP header. This is because the SOAP message could be routed or proxied using an intermediary process. The intermediary process could add, remove, or manipulate SOAP headers. However, the contents of the SOAP message aren't usually manipulated or processed .

If a SOAP header has to be processed by a specific SOAP intermediary, then the attribute role can be assigned as is shown in Listing 10.4.

Listing 10.4

 <th:some_action xmlns:th="http://www.devspace.com/2001/12/something" env:role="http://www.devspace.com/some_logical_machine" ..>

In Listing 10.4, the attribute role references the name of some machine. A SOAP intermediary may process the header information, or it may even generate the header and role information. The attribute role forces particular pieces of SOAP infrastructure into particular roles. Roles could be message logging, or transaction management applications.

The SOAP 1.2 specification introduced the notion that SOAP headers are removed when they are processed. For example, if a SOAP intermediary processed the header in Listing 10.4, the header would have to be removed. This helps keep the SOAP header infrastructure simple. For example, imagine sending a SOAP message to five different computers that are connected via chained SOAP processes. This might mean that you could have excess baggage because all of the computers are adding and not removing headers. If the SOAP infrastructure explicitly removes a header when it is processed, the SOAP message is kept lean. If, however, the SOAP header needs to be kept and sent to further computers, then you could use the attribute relay as shown in Listing 10.5.

Listing 10.5

 <th:some_action xmlns:th="http://www.devspace.com/2001/12/something" env:role="http://www.devspace.com/some_logical_machine" env:relay="true">

In Listing 10.5, the attribute relay is assigned a value of true , which means that the header is removed when processed.

SOAP Encodings

In the SOAP 1.2 specification, the way the data is encoded is very important. In general, XML and SOAP allow the data to be encoded using XML rules. However, there are other encoding schemes, such as the SOAP encoding format. This means that you must structure the content within the SOAP Body tag using a specific format. The encoding schemes are purely optional and are defined using namespace identifiers. The purpose of an encoding is to make it simpler to translate a data type from one application platform to another. We will discuss the specifics of an encoding format later in this chapter.

Variations in the SOAP Response

When the SOAP message causes the receiver to generate a reply, the reply must take a specific form. The reply has the exact same format as the request, except that the generated content is a response. With SOAP 1.2, a reply is the same as a message send.

The only special type of SOAP message is a SOAP error message, as shown in Listing 10.6.

Listing 10.6

 <?xml version="1.0" ?> <env:Envelope xmlns:env="http://www.w3.org/2001/12/soap-envelope"> <env:Body> <env:Fault> <env:Code> <env:Value>env:Receiver</env:Value> </env:Code> <env:Reason> <env:Text xml:lang="en">Something happened</env:Text> </env:Reason> <env:Detail> <err:message xmlns:err="http:www.devspace.com/errors"> Here is some error </err:message> </env:Detail> </env:Fault> </env:Body> </env:Envelope>

In Listing 10.6, the SOAP error is a document within the SOAP Body tag, as specified by the Fault XML tag. The Fault tag is part of the env namespace, which is part of the SOAP specification. Listing 10.6 contains three child XML elements that are usually there: Code , Reason , and Detail . The XML element Code defines the specific fault code of the SOAP message, which is a predefined type. The XML element Reason defines the reason of the error as a short title. You can define multiple reasons that represent different languages. Lastly, the XML element Detail represents a detailed error message that can contain other XML elements. The detailed error message should indicate what went wrong and how to correct it.