SOAP: XML Messaging and Remote Application Access

 <  Day Day Up  >  

SOAP is an XML-based packaging scheme to transport XML messages from one application to another. SOAP gives you a standard container that can send its XML payload over any transport protocol (such as HTTP). A great deal of directive information, especially security- related information, is placed in a SOAP header. We will focus on SOAP headers in more detail when we delve into WS-Security later.

SOAP envelopes can be routed through many intermediaries. SOAP intermediaries can process, save, or modify the SOAP messages they receive. Some SOAP intermediaries do security processing on the messages they receive.

SOAP supports RPC/encoded and Document/literal modes. RPC/encoded acts much like traditional RPC middleware. RPC communications tend to be synchronous and finer-grained, making them suitable for intra-organizational integrations where network speeds are high and latency is low. Document/literal supports a more loosely coupled approach. Document communications tend to be asynchronous and coarse-grained, making them suitable for inter-organization (for example, B2B) integrations.

The SOAP structure used for a particular Web service is described in the associated WSDL file.

Where SOAP Came From and Why It's Important

SOAP used to stand for Simple Object Access Protocol. With SOAP 1.1, the technical committee decided the term is no longer an acronym because SOAP is neither simple, nor does it deal with objects.

SOAP is what makes application integration (that is, distributed computing) possible. After XML defines the content of a message, it is SOAP that defines how the data moves from one place to another over the network. SOAP allows the sender and receiver of XML documents to support a common data transfer protocol. If the concept of sending and receiving messages over HTTP sounds familiar, it should: Web-based middleware is really not new; it has been around in primitive form as long as users have been appending extra commands to HTTP messages. SOAP formalizes this process and makes it work not only for XML documents, but also for executing remote procedures.

SOAP was developed in 1998 at Microsoft, with collaboration from UserLand and DevelopMentor. The goal, first espoused by Dave Winer of UserLand, was to establish a simple way for applications to exchange structured data over Web protocols. SOAP built on grass roots efforts spread throughout a community working on XML-RPC. XML-RPC continues to be used somewhat, but SOAP has gained such broad acceptance that it has quickly supplanted XML-RPC. A lot of RPC learning had gone on prior to SOAP, most recently with Java's Remote Method Invocation (RMI). SOAP is much simpler than RMI. Being based on XML makes SOAP extremely easy to work with, very late binding, and totally interoperable. It is not as simple as XML-RPC because it addresses some of XML-RPC's shortcomings. When SOAP was published as a specification on the Web in late 1999, it already had the backing of IBM in addition to Microsoft. But like XML, SOAP was open source from the beginning and still got the backing of the majors. It appears that with SOAP, distributed computing, for the first time, is free from industry giants and consortia going their own ways and creating competing standards.

SOAP is a derivative of XML as was XML-RPC. SOAP is what really defines Web services because it makes XML invoke a procedure remotely either directly via a remote procedure call or by using its Document/literal mode to transfer an XML document and change remote processing on that document. SOAP is in the same class of technology as Internet Inter-ORB Protocol (IIOP) for CORBA as well as RMI for Java. A big difference, however, is that SOAP is text-based . This has huge implications. It makes working with SOAP much easier, and it makes things developed with SOAP easier to debug. SOAP messages are firewall friendly. Processing text-based messages at every step in the communications path is extremely easy. This ease-of-use is in huge contrast to DCOM and CORBA, which had binary encodings and no metadata describing what was being passed. That SOAP is text-based also opens up interesting and powerful runtime monitoring, analysis, and compliance capabilities that tools are just now starting to fully exploit.

The most common way SOAP works is as an extension of HTTP that supports XML messaging. Just as HTTP GETs retrieve a Web page and HTTP POSTs submit form data to a Web server, SOAP uses those same mechanisms to send and receive XML messages. The Web server needs to understand SOAP, and it does so by having a SOAP processor that looks for these special types of POSTs. Technically, the SOAP specification supports expansion to other protocols (such as UDP, SMTP, JMS, and more), but the HTTP mapping is the only one that has formally been defined so far (up to and including SOAP v1.2).

As we have mentioned, there are two very different ways to view SOAP: Document/literal style and RPC/encoded style. With Document/literal style, an XML document is sent as the payload of the SOAP message. The client sends a message to the service, the service processes the message, and it sends back a response message. In essence, the client has no knowledge of how the service is implemented or how it processes the message (the XML document). This mode is particularly suitable for passing business documents asynchronously between loosely coupled services. When you are dealing with B2B integrations, this approach is best because it maximizes the effectiveness of lower-bandwidth, less-reliable connections and more accurately models real business-to-business communications and processes. It is correct to think of Document/literal-style Web services as batch processing. This mode is efficient, scalable, and because of its coarse granularity, it is reliable.

In contrast, RPC/encoded style provides RPCs transparently to the client, resulting in the translation of method calls into XML, sending them to the server, and returning the response message as the return value. This mode is most suitable for synchronous communication between tightly coupled services. Think of it as the online mode. In RPC mode, an important function of SOAP is a set of encoding rules that define a serialization mechanism. This creates a standard way of capturing programming language data elements such as integers, strings, and complex structures, in a language-neutral, interoperable format. Extending this concept, you can see how a remote procedure call can be expressed in XML and serialized by SOAP over HTTP. The RPC parameters are encoded as child elements of a common parent. The name of the procedure is attached to the parent indicating the operation to be performed remotely.

A SOAP request is most commonly sent as an HTTP POST with content type set to text/XML and a field called SOAPAction set to either an empty string or the name of the SOAP method. This way, the receiving Web server knows this is a SOAP message and acts accordingly . A SOAP message consists of an address and an envelope, which, in turn , is made up of some headers and a body; and the body consists of one or more elements. The structure of a SOAP message is shown in Figure 2.4.

Figure 2.4. The structure of a SOAP message.

graphics/02fig04.gif


SOAP Doesn't Really Access Objects

Although SOAP was originally named for the Simple Object Access Protocol, it isn't really accessing objects, but it does encapsulate data and operations on that data. So, we won't quibble that these are components or services, not objects. The point is that when you define a service, you are creating a collection of components and making the methods in those components available (remotely) through an interface that SOAP can affect. This object system is not sophisticated. It does not have garbage collection, it cannot access objects by reference, and it cannot batch together requests , all of which are capabilities that DCOM and CORBA provided. However, neither of them was truly scalable, they were exceedingly complex, and for all intents and purposes, they are dead technologies.

SOAP is lightweight, highly interoperable, and easy to implement. It binds well to HTTP, and it supports RPC calls that map directly to HTTP requests and responses. SOAP even binds well to other protocols, so it becomes a powerful tool for enterprise application integration.

SOAP has supplanted XML-RPC, DCOM, and IIOP in one fell swoop. For all practical purposes, SOAP is Web services. There is even good news from SOAP on the battle of the behemoths. SOAP has, in fact, become the bridging protocol between J2EE and Microsoft .NET.


SOAP Envelope

The envelope is the top-level XML element in a SOAP message. You specify it using the ENV namespace prefix and the Envelope element:

 

 <SOAP-ENV:Envelope     xmlns:SOAP-ENV="http://www.w3.org/2001/12/soap-envelope"> . . . </SOAP-ENV:Envelope> 

A major function of the envelope is to define namespaces used by the SOAP message. Typical namespaces that are included are xmlns:SOAP-ENV (SOAP Envelope namespace), xmlns:xsi (XML Schema for Instances), and xmlns:xsd (XML Schema for DataTypes).

The envelope indicates the start and end of a message. It defines a complete package, thus making it unambiguous when the receiver is done receiving the message.

Headers are optional, but there can be more than one as well. If a header is present, it must be the first child of the envelope.

SOAP Header

SOAP headers are used for directive information. This is the place where SOAP security lives. System-level information used to manage and secure the message is placed here as well. SOAP runtimes and intermediaries process SOAP header directives.

Headers are intended to add new features and functionality. The intention for this incomplete and imprecise concept is to have it be used by specific applications that build on top of the basic protocol. These extensions need to be standardized. As an example, the WS-Security header, covered in detail in Chapter 7, will be located here. This is also the place where information about transactions, routing, payments, guaranteed delivery, and so forth will be placed. Any element in a SOAP processing chain can add or delete items from the header or choose to ignore items if they are unknown.

A sender can require that the receiver understand a header. This allows for graceful failure when a receiver receives a message too new for it to understand. Headers speak directly to the SOAP processors and can require that a processor reject the entire SOAP message if it does not understand the header. This security requirement is important. If a header contains critical security information that a SOAP processor does not understand, you may not want it to process this SOAP message at all because to do so would completely bypass the ignored header's security information.

SOAP Body

The SOAP body is the main payload of the message. The body contains the information that must be sent to the ultimate recipient. This is the place where the XML document for the application initiating the SOAP message resides. For an RPC, the body contains a single element that contains the method name, arguments, and a Web service target address. If a header is present, the body is its immediate sibling; otherwise , it is the first child of the envelope. The structure of a SOAP request is shown in Listing 2.4 and that of a SOAP response in Listing 2.5.

Listing 2.4. The Structure of a SOAP Request
 <SOAP-ENV:Envelope ...     <SOAP-ENV:Body>         <m:GetOrderStatus         xmlns:m="www.myservice.com/OrderEntry">             <orderno>43564</orderno>         </m:GetOrderStatus>     </SOAP-ENV:Body> </SOAP-ENV:Envelope> 

Listing 2.5. The Structure of a SOAP Response
 <SOAP-ENV:Envelope ...     <SOAP-ENV:Body>         <m:GetOrderStatusReply         xmlns:m="www.myservice.com/OrderEntry">             <orderstatus>Shipped June 18</orderstatus>         </m:GetOrderStatusReply>     </SOAP-ENV:Body> </SOAP-ENV:Envelope> 

The preceding are examples of RPC/encoded SOAP, the more tightly coupled Web services scheme. In RPC mode, there is always a request/response pair that models a function call/call return value pair.

This capability to invoke remote procedures when used in the RPC/encoded style is a major SOAP feature that plain XML over HTTP does not possess. For Web services that are building distributed applications over the Web, this capability of SOAP will be important. Included in an RPC mapping is a URI for the target SOAP node, a procedure name, an optional procedure signature, and the parameters to the procedure. (Throughout, you can substitute method for procedure .) RPC invocations are modeled as structs with an accessor for each parameter. The struct name is identical to the procedure name. The response struct uses a naming convention that easily identifies this as a response. The response is always a struct or a fault, but not both.

In Document/literal mode, the loosely coupled Web services scheme, the sender sends a message containing a document such as a purchase order. The receiver independently and transparently to the client determines how to process it.

When SOAP is used in the Document/literal style, its capability to transmit XML documents that are then processed by the recipient using either the Document Object Model (DOM) or Simple API for XML (SAX), both of which are XML parsing protocols, will be critical. For businesses, this mode will be a very powerful asynchronous paradigm; it can be used to turn critical business documents such as purchase orders into actionable objects with a defined set of operations that are performed on these documents as part of a standard business-process workflow.

SOAP Processing

Today's application servers all have SOAP processors built in. These SOAP processors are evolving to have support for Web services security standards built in as those standards stabilize. This enables sites to set up their Web services environments to automatically manage security. The SOAP header processor is called from the SOAP runtime system.

Alternatively, the Web services environment can route messages through an intermediary. SOAP intermediaries can process headers, but not the body. Intermediaries can add or remove headers. If the intermediary does not recognize a header, it must ignore that header and forward it on.

The security stage of SOAP processors is expected to authenticate identities, implement role-based security for authorizations, encrypt or decrypt the contents of a message, validate digital signatures, implement extended security conversations, and call out to external third-party authorities as needed.

SOAP v1.2 added attributes to indicate which SOAP nodes are responsible for processing which headers.

SOAP Attachments

Attachments are not part of the SOAP standard, but they are part of an accepted note by the W3C. Attachments are recognized as important and are accepted as part of e-business XML (ebXML) and RosettaNet.

Attachments are the way to send binary data or entire XML documents. Examples of binary data attachments include image files or large drawing documents. An example of an entire separately defined XML document placed as an attachment is an ACORD (an insurance industry XML dialect ) document.

If the SOAP attachment approach were not available, all binary data would have to be Base-64 encoded and decoded because the XML body must be completely digestable plain text. This Base-64 encoding/decoding of binary data into ASCII form adds significant overhead. (Base-64 encoding is explained in detail in Appendix A.)

SOAP borrows from Multipurpose Internet Mail Extensions (MIME) to create SOAP attachments that mimic the way MIME allows email attachments. SOAP with attachments creates a MIME envelope with attachments that are linked from within the SOAP envelope. Processing of these attachments in the payload is directed to separate MIME part handlers in a fashion analogous to email attachment processing.

SOAP and Web Services Security

The defined SOAP transport currently binds just to HTTP. This has implications on SOAP being transparent through traditional firewalls and has some disconcerting security implications. The efforts to make sure that Web services "just work" with the existing Web servers, firewalls, and infrastructure also means that Web services can bring in new identities to organizations and take out critical business information if you are not careful. WSDL makes this situation worse because it advertises the Web services API to the outside world if you are not careful here as well.

The good security news is that SOAP binding to HTTP also means you can directly secure SOAP using SSL because all Web and application servers already know how to apply SSL to HTTP. This approach will suffice only for simple point-to-point Web services, but many do fit this description. Other SOAP transports that will quickly take hold include SMTP and JMS, as well as others over time.

For multi-hop Web services, for those that include identity, for those that require integrity of messages, and for many other reasons, SSL will not suffice for SOAP security, which is why SOAP headers were made extensible for security directives, as already mentioned. WS-Security secures SOAP through its security tokens in the SOAP header; these tokens attach identity to a SOAP message and provide for persistent security for the SOAP payload when used with XML Encryption and XML Signature. WS-Security and SOAP security are covered in detail in Chapter 7.

 <  Day Day Up  >  


Securing Web Services with WS-Security. Demystifying WS-Security, WS-Policy, SAML, XML Signature, and XML Encryption
Securing Web Services with WS-Security: Demystifying WS-Security, WS-Policy, SAML, XML Signature, and XML Encryption
ISBN: 0672326515
EAN: 2147483647
Year: 2004
Pages: 119

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net