Simple Object Access Protocol (SOAP) | ebXML: The New Global Standard for Doing Business on the Internet

Simple Object Access Protocol (SOAP) offers an XML-based language for the exchange of messages over decentralized and distributed environments such as the web. Its authors, from Microsoft Corp.; IBM and its Lotus Development subsidiary,

DevelopMentor; and Userland Software, submitted version 1.1 of SOAP to the W3C as a Note in May 2000.[2] Since then, the W3C has put in place the XML Protocol Working Group, whose purpose is to formalize the earlier SOAP work as a robust W3C specification for exchange of XML transactions between application programs via the Internet. The whole focus is to develop a simple and easily implemented application-to-application layer that can be used with scripting languages. (The W3C is scheduled to deliver the specifications as a formal recommendation in September of 2001.)[3]

Now that ebXML has also adopted SOAP as the foundation for its messaging transfer (and by extension the W3C's emerging XML protocol standard, in which SOAP plays a major role), using SOAP provides alignment and interoperable message handling with a broad range of emerging related standards, systems, and tools.

In March 2001, ebXML adopted a variation of SOAP called SOAP Messages with Attachments, a specification that, as the name implies, allows for adding binary attachments to the basic SOAP message.[4] The extension to add support for binary content is required to allow secure exchanges using digital certificates, as well as business messages with binary content, such as pictures and illustrations. This chapter discusses only the basic SOAP specification. Chapter 8, "ebXML Technical Architecture," goes into some detail in explaining how ebXML applies the SOAP Messages with Attachments specification to its messaging functions.

As the name implies, SOAP 1.1 aims to provide a simple and lightweight method for exchanging structured data in peer-to-peer relationships. It defines the message package, offers encoding guidelines for data used in applications connected by these messages, and provides rules for representing remote procedure calls ( RPCs ), a type of online interaction in a distributed environment. The authors defined SOAP as a series of building blocks to maintain its simplicity for most potential users.[5]

SOAP's importance extends beyond its offer of an XML-based message protocol. The other specifications described in this chapter all use SOAP for its messaging functions and, as a result, it helped generate these several new e-business initiatives. Now that ebXML has also adopted SOAP as the foundation for its messaging transfer (and by extension the W3C's emerging XML Protocol standard, in which SOAP plays a major role), using SOAP provides alignment and interoperable message handling with a broad range of emerging related standards, systems, and tools.

SOAP Messages

SOAP messages are XML documents (textual documents) defined inside an outer SOAP envelope. SOAP messages must have this envelope to meet the specifications.Within the envelope is a SOAP header and body. SOAP messages must have a body, but the header is not required in all instances. The XML grammar rules for envelopes are found in an XML namespace (http://schemas.xmlsoap.org/soap/envelope/). The use of namespaces is a syntax device in XML to avoid name clashes , hence the term "namespace." Particularly when you exchange XML, you need a simple way to denote the markup tag names you're using from other potential fragments of XML elsewhere in the exchanged XML. For instance, <address> may occur in the SOAP header for Internet delivery, and also in the payload <address> as the postal delivery, but these two things clearly need to be handled separately. See Chapter 4, "The Promise of XML," for more discussion of XML namespaces.

The SOAP envelope serves as the first element in the document and thus identifies it as a SOAP message. The SOAP body contains the information transmitted to the receiver. Each message must have a body, so there cannot be an empty SOAP message. If the message has a SOAP header, it appears as the first child element in the envelope, and before the body.[6]

The SOAP header allows the sender to add management or control information in the message, important for routing, security, or proper handling by the recipient. This element has very few rules of its own, but relies on XML namespaces identified by the sender for its semantics. However, the specification identifies two attributes that can appear in a SOAP header:

actor ” Senders may want to route SOAP messages through intermediaries, or designate parts of the message for certain recipients. SOAP headers are designated only for the recipients of the messages and cannot be forwarded to other recipients, but recipients can insert a new header for the next recipient. The actor attribute uses Uniform Resource Identifiers (URIs) ”Internet resources or locations such as web addresses ”as values, and indicates the recipient of the header. Without the actor attribute, the recipient must assume that it is the only and ultimate destination for the message.
mustUnderstand [7] ” This attribute tells the recipient whether the header entries made by the sender are required to be processed or can be ignored, and has values of (No) and 1 (Yes). Absence of this attribute is the same as a value of . For example, a header may have a security key and encryption that the receiver needs to process and correctly decrypt the resulting message body.

If a SOAP message has only a SOAP body element and no SOAP header, it has the same meaning as a message identifying the default actor and a mustUnderstand attribute with a value of (No).[8]

The SOAP body is a mandatory part of the SOAP message. The first level of sub-elements under the body are called body entries, and consist of the default XML namespace reference for the body. Further levels down in the SOAP body may use the combination of more namespaces and local names, but this isn't mandatory.[9]

The only content for a SOAP body specifically defined in the specification is the Fault element used to provide status or error information. The Fault element consists of four sub-elements:

faultcode ” required in each Fault element, and must contain one of the specified codes in the specification definition provided by the implementer.
faultstring ” an explanation of the faultcode , required in a Fault element.[10]
faultactor ” provides information on the party that caused the fault, using the actor attribute discussed earlier.
detail ” for application-specific information related to the SOAP body content, required if the receiver could not process the SOAP body contents.

First-level sub-elements are called detail entries, and are identified by a combination of namespace URI and local name. However, the fault report cannot contain information about errors in SOAP header entries, which must be transferred in the SOAP header. The absence of a detail indicates that the fault lies somewhere else other than the processing of the SOAP body.[11]

SOAP uses the existing long-established HTTP and remote procedure calls (RPC) standards of the IETF as the underlying middleware plumbing to physically move the XML content across the web between web servers.[12] This means that any existing web server or HTTP-compatible middleware component can also handle SOAP messaging.

SOAP Coding

SOAP allows senders to identify the kinds of data exchanged in SOAP messages. This ability is exploited in ebXML, with the header explicitly allowing the receiver to quickly analyze the header information to determine the business action required. For instance, what type of transaction is being received, and from whom? Security crosschecks can deduce whether this is permitted, and then the correct business processing can be started automatically for that body content.

While XML offers many ways to express structured data in documents, the demands of application-to-application exchanges require tighter synchronization, and that meant tightening the rules for business exchanges. With SOAP, these various types of exchanged data can range from simple discrete values to complex compound entities, such as entire control verbs, interchange command parameters, and value sets. Both the sending and receiving systems must exactly match on these critical control items to ensure that the correct actions result. SOAP borrows liberally from the XML Schema specification, and makes a distinction between simple and compound types. Simple types are values (names, measurements, enumerations) with no further subdivisions or parts. Compound types are collections of values that have some relationship to each other. For example, a typical North American street address consists of three simple types: a street number, street name, and apartment or suite number, related as parts of the same street-address entity.

Each compound value has a function called an accessor, which can be the name of its role in the message or an ordinal number that serves as an identifier or descriptor of the data. For example, in a purchase order or invoice, the party to whom the invoice is sent is called the Bill-To party. In a SOAP message, a compound data type would include the accessor Bill-To (it identifies the role of the data in the invoice), as well as the value for the element, such as a DUNS number.

SOAP's simple design, ability to support data types, and close compatibility with RPC have made it a popular set of tools for providing messaging functions.

If the application requires it, the accessor can make the compound type unique. The uniqueness can refer to the type of data within the application, by using a unique name (such as date/time stamp) as the accessor. By using a URI, by definition a unique data string, you can also create a universally unique type as the accessor.

For simple types, SOAP adopts the types defined in part 2 of the XML Schema specification.[13] This list is rather lengthy, but the SOAP specification discusses those more likely to be used in SOAP messages such as strings and enumerations (lists of specific selections) in more detail. For compound types, SOAP defines structs and arrays, two concepts borrowed from software design. Structs are compound values distinguished by the accessor name, as in the invoice Bill-To example just described. An array is a compound value in which its position in the collection (for example, cell , row, or column numbers in a spreadsheet) distinguishes it from all the others in the collection.[14]

Because of its development from remote procedure calls (RPCs), the SOAP specifications show the use of SOAP for RPC exchanges. The SOAP body carries an RPC's method call and response. Both the RPC call and the response are modeled as structs, as defined by the compound type rules noted earlier. Any faults returned use the SOAP Fault element. The SOAP header can contain any supporting information needed by the remote system to process the request, such as identifiers or authorization data.The RPC call and response parallel the HTTP or web transport protocol request and response architecture. The specifications show how SOAP can bind to HTTP.[15]

SOAP's simple design, ability to support data types, and close compatibility with RPC have made it a popular set of tools for providing messaging functions. The other specifications in this chapter also use SOAP, which acts as an endorsement of these features and abilities .