Defining SOAP Messaging | Mining Google Web Services: Building Applications with the Google API

Because Google Web Services relies so heavily on SOAP, you need to know how SOAP messaging works. The following sections describe SOAP basics, tell you about some important SOAP issues, and provide a few pointers on where you can learn more about SOAP. You'll also see a very simple SOAP example designed for use with a Web page with JavaScript.

Note	Many of the SOAP examples in this book rely on the Microsoft SOAP Toolkit. You can download this toolkit at http://msdn.microsoft.com/nhp/default.asp?contentid=28000523. The examples all rely on the 3.0 version of the toolkit ”the latest version available at the time of writing.

Determining Which SOAP Standard to Use

SOAP has gone through three major revisions. Each revision makes SOAP a better product to use for communication purposes. Almost no one uses the SOAP 1.0 standard anymore. Few vendors used this standard because it had some significant problems that the SOAP 1.1 standard quickly solved . The SOAP 1.1 standard is popular because it works well for most communication that doesn't require security. For example, you can safely use SOAP 1.1 products for most Google Web Services tasks because you aren't passing along anything that's secret.

Tip

You can find many SOAP 1.1 resources now. The ZVON Web site at http://www.zvon.org/xxl/soapReference/Output/index.html provides a great reference you can use to learn more about SOAP. You'll find a SOAP tutorial at http://www.w3schools.com/soap/. The SOAP 1.1 specification appears at http://static.userland.com/xmlRpcCom/soap/SOAPv11.htm. Microsoft provides a SOAP testing tool that you can download at http://msdn.microsoft.com/library/en-us/dnsoap/html/soapvalidator.asp. Learn more about SOAP messages with attachments at http://www.w3.org/TR/SOAP-attachments. Finally, if you want to learn all the ins and outs of SOAP 1.1 with both Microsoft and third party products, get my book Special Edition Using SOAP (Que, 2001).

The SOAP 1.2 standard originally appeared on the scene on July 9, 2001 (http://www.w3.org/TR/2001/WD-soap12-20010709/). However, the World Wide Web Consortium (W3C) didn't make it a recommendation until June 24, 2003 (http://www.w3.org/TR/soap12-part1/). Consequently, many of the products you see on the Internet today still rely on SOAP 1.1 and will probably continue to rely on this standard for some time. Even Microsoft's latest release of Visual Studio.NET still relies on SOAP 1.1.

SOAP 1.2 adds some very important features. The most important feature is added security. However, according to an eWeek article (http://www.eweek.com/article2/0,4149,1137432,00.asp), the new standard includes over 400 fixes for previous problems. You can also find some interesting InfoWorld articles on the topic at http://www.infoworld.com/article/03/05/07/HNsoap_1.html, http://www.infoworld.com/article/02/12/19/021219hnsoapadvance_1.html?1220fram, and http://www.infoworld.com/article/02/11/01/021101hnsoap12_1.html?1104mnam. The last article is especially important because it points out another reason that vendors haven't embraced SOAP 1.2 ”potential patent issues were involved that the standards committee had to clear up.

Tip

Although SOAP 1.2 resources are still a little rare, you should look at the primer at http://www.w3.org/TR/soap12-part0/, the framework specification at http://www.w3.org/TR/soap12-part1/, and the adjuncts at http://www.w3.org/TR/soap12-part2/. You may also want to read about test collection methods at http://www.w3.org/TR/soap12-testcollection/. The W3C is also working on a number of additional specifications that aren't at the recommendation stage. These specifications include SOAP 1.2 attachments (http://www.w3.org/TR/soap12-af/), SOAP 1.2 email bindings (http://www.w3.org/TR/soap12-email), and SOAP 1.2 normalization (http://www.w3.org/TR/soap12-n11n/).

Given the current state of SOAP, you need to consider three things when you decide which standard to use. First, you need to know whether your product of choice even supports SOAP 1.2 ”most don't. Second, you need to consider whether the features SOAP 1.2 offers are essential to your organization. In many cases, using SOAP 1.1 still works fine. Third, you need to consider whether the remote sites you want to work with use SOAP 1.2. Using SOAP 1.1 until your partners catch up probably makes sense. You do want to use SOAP 1.2 sometime in the future, so planning for it today is a good idea.

Understanding the Parts of a SOAP Message

To understand SOAP, you need to consider the features that make up a SOAP message. A SOAP message includes the SOAP package, the XML envelope, and the HyperText Transfer Protocol (HTTP) or Simple Mail Transfer Protocol (SMTP) transport. Think about this system in the same way that you do a letter, with SOAP acting as the letter, XML as the envelope to hold the letter, and HTTP or SMTP as the mail carrier to deliver the letter. The most common transport protocol in use today is HTTP, so that's what we'll look at in this section. Keep in mind, however, that SOAP can theoretically use any of a number of transport protocols and probably will in the future. Figure 3.10 shows a common SOAP message configuration. Notice the SOAP message formatting. This isn't the only way to wrap a SOAP message in other protocols, but it's the most common method in use today.

Figure 3.10: An illustration of a typical SOAP message.

The HTTP portion of a SOAP message looks much the same as any other HTTP header you may have seen in the past. In fact, if you don't look carefully , you might pass it by without paying any attention. As with any HTTP transmission, there are two types of headers ”one for requests and another for responses. Figure 3.10 shows examples of both types.

Tip

Working with the new capabilities provided by technologies like XML and SOAP means dealing with dynamically created Web pages. While it's nice that you can modify the content of a Web page as needed for an individual user , it can also be a problem if you need to troubleshoot the Web page. That's where a handy little script comes into play. Type javascript: <xmp> ˜+ document.all(0).outerHTML+ </xmp> in the Address field of Internet Explorer for any dynamically created Web page and you'll see the actual HTML for that page. This includes the results of using scripts and other page construction techniques.

As with any request header, the HTTP portion of a SOAP message will contain an action (Post, in most cases), the HTTP version, a Host name , and some Content-Length information. The Post action portion of the header will contain the path for the SOAP listener. Also located within a request header is a Content-Type entry of text/xml and a charset entry of utf-8 . The utf-8 entry is important right now because many SOAP toolkits don't support utf-16 or other character sets.

You'll also find the unique SOAPAction entry in the HTTP request header. It contains the Uniform Resource Identifier (URI) of the component used to parse the SOAP request. If the SOAPAction entry is "", then the server will use the HTTP Request-URI entry to locate a listener instead. This is the only SOAP-specific entry in the HTTP header ”everything else we've discussed could appear in any HTTP formatted message.

The response header portion of the HTTP wrapper for a SOAP message contains all of the essentials as well. You'll find the HTTP version, status, and content length as usual. There are two common status indicators for a response header: 200 OK or 500 Internal Server Error. The SOAP specification allows use of any value in the 200 series for a positive response, but a server must return a status value of 500 for SOAP errors to indicate a server error.

Whenever a SOAP response header contains an error status, the SOAP message must include a SOAP fault section. We'll talk about SOAP faults in the " Defining Fault Tolerance in a SOAP Message" section of the chapter. All you need to know now is that the HTTP header provides the first indication of a SOAP fault that will require additional processing.

All SOAP messages use XML encoding. SOAP follows the XML specification, and you can consider it a true superset of XML. In other words, it adds to the functionality already in place within XML. Anyone familiar with XML will feel comfortable with SOAP at the outset ”all you really need to know is the SOAP nuances . Although the examples in the SOAP specification don't show an XML connection (other than the formatting of the SOAP message), SOAP messages always contain an XML header similar to the one shown in Figure 3.10.

A simple SOAP message consists of an envelope that contains both a header and a body ( sort of the same arrangement used by an HTML page). The header can contain information that isn't associated with the data itself. For example, the header commonly contains a transaction ID when the application needs one to identify a particular SOAP message. The body contains the data in XML format. If an error occurs, the body will contain fault information, rather than data.

SOAP is essentially a one-way data transfer protocol. While SOAP messages often follow a request/response pattern, the messages themselves are individual entities. This means that a SOAP message is stand-alone ”it doesn't rely on the immediate presence of a server, nor is a response expected when a request message contains all of the required information. For example, some types of data entry may not require a response since the user is inputting information and may not care about a response.

The envelope in which a SOAP message travels , however, may provide more than just a one-way transfer path. For example, when a developer encases a SOAP message within an HTTP envelope, the request and response both use the same connection. HTTP creates and maintains the connection, not SOAP. Consequently, the connection follows the HTTP way of performing data transfer ”using the same techniques as a browser uses to request Web pages for display.

Defining Fault Tolerance in a SOAP Message

Sometimes a SOAP request will generate a fault message instead of the anticipated reply. The server may not have the means to answer your request, the request you generated may be incomplete, or bad communication may prevent your message from arriving in the same state as you sent it. There are many reasons that you may receive a SOAP fault message including messages that the client produces that the server can't process, errors on the server such as a missing application, and SOAP version mismatches .

When a server returns a fault message, it doesn't return any data. Look at Figure 3.10 and you'll see a typical client fault message. Notice the message contains only fault information. With this in mind, the client-side applications you create must be prepared to parse SOAP fault messages and return the information in such a way that the user will understand the meaning of the fault.

Figure 3.10 shows the standard presentation of a SOAP fault message. Notice that the fault envelope resides within the body of the SOAP message. A fault envelope will generally contain a faultcode and faultstring element that tells you which error occurred. All of the other SOAP fault message elements are optional. The following list tells you how they're used.

faultcode The faultcode contains the name of the error that occurred. It can use a dot syntax to define a more precise error code. The faultcode will always begin with a classification. For example, the faultcode in Figure 3.10 consists of a SOAP-ENV error code followed by a MustUnderstand subcode. This error tells you that the server couldn't understand the client request. Since it's possible to create a list of standard SOAP faultcodes , you can use them directly for processing purposes.

faultstring This is a human-readable form of the error specified by the faultcode entry. This string should follow the same format as HTTP error strings. You can learn more about HTTP error strings by reading the HTTP specification at http://www.faqs.org/rfcs/rfc2616.html. A good general rule to follow is to make the faultstring entry short and easy to understand.

faultactor This element points to the source of a fault in a SOAP transaction. It contains a Uniform Resource Identifier (URI) similar to the one used for determining the destination of the header entry. According to the specification, you must include this element if the application that generates the fault message isn't the ultimate destination for the SOAP message.

detail You'll use this element to hold detailed information about a fault when available. For example, this is the element used to hold server-side component return values. This element is SOAP message body specific, which means you can't use it to detail errors that occur in other areas like the SOAP message header. A detail entry acts as an envelope for storing detail subelements. Each subelement includes a tag containing namespace information and a string containing error message information.

Understanding How WSDL Fits In

The documentation that comes with Google Web Services Kit contains examples of how to format a message using SOAP. These examples include the XML header and all of the features discussed in the " Understanding the Parts of a SOAP Message" section of the chapter. Figure 3.11 shows a typical example from the kit. However, you won't need to use these examples in most cases.

Figure 3.11: The kit contains a number of SOAP message examples.

WSDL provides a means for describing a Web service so that the Integrated Development Environment (IDE) you use can create the definitions needed. Some developers originally found WSDL less than helpful, and it doesn't work with every SOAP toolkit you can download. The SOAP samples help developers who must create messages manually get the format correct. However, if you use a product such as Visual Studio.NET , the IDE downloads the WSDL from the Google Web site and you'll find that you don't actually have to worry about the construction of the SOAP message.

Tip

You can find a wealth of resources about WSDL on the Internet. One of the more interesting offerings includes the ZVON reference at http://www.zvon.org/xxl/WSDL1.1/Output/index.html. The W3C has a tutorial at http://www.w3schools.com/wsdl/default.asp. Originally, Microsoft and IBM promoted WSDL on their Web sites, but you can now find the specification on the W3C site at http://www.w3.org/TR/wsdl. You can find the IBM view of Web services at http://www-106.ibm.com/developerworks/ webservices / and http://www.alphaworks.ibm.com/tech/webservicestoolkit. A WSDL search engine (where you can find services that rely on both SOAP and WSDL) appears at http://www.salcentral.com/salnet/webserviceswsdl.asp.

Like many other topics discussed, WSDL relies on XML as a basis for communication. Figure 3.12 shows a typical example of the WSDL file for Google Web Services. Note that the Visual Studio .NET IDE automatically downloads this file as part of the process of creating a reference to the Web site ”the "Creating a Web Reference" section of Chapter 6 describes how to perform this task.

Figure 3.12: Using WSDL makes SOAP messaging extremely easy for the developer.

Notice that the WSDL file contains a list of complex types. It also contains a list of function names . You call a function such as doGoogleSearch() in your code. The IDE automatically creates code to send a doGoogleSearchRequest SOAP message and code that interprets the doGoogleSearchResponse SOAP message the application receives from the server. The WSDL file is instrumental in performing all this work automatically.

Performing a Simple SOAP Call

It's time to try the first SOAP call using a technique that many developers will employ for learning Google Web Services ”a simple Web page. Listing 3.4 shows how to make a simple SOAP call using JavaScript. You'll find the complete source for this example in the \Chapter 03\Viewing XSLT folder of the source code located on the Sybex Web site.

Listing 3.4: Simple JavaScript SOAP Call

 function CallGoogle()   {      // Create the SOAP client.      var SoapClient = new ActiveXObject("MSSOAP.SoapClient30");      // Initialize the SOAP client so it can access Google      // Web Services.      SoapClient.MSSoapInit("http://api.google.com/GoogleSearch.wsdl",                            "GoogleSearchService",                            "GoogleSearchPort");      // Make a search request.      var ThisResult =         SoapClient.doGoogleSearch("Your-License-Key",                                   SubmissionForm.SearchStr.value,                                   1,                                   10,                                   false,                                   "",                                   false,                                   "",                                   "",                                   "");      // Return the results.      return ThisResult;   }

The code begins by creating a SOAP client. The client communicates with the server ”it ensures that the message traffic flows as anticipated and that the request is formed correctly. The SoapClient.MSSoapInit() function creates a connection to the server. This step isn't the same as sending data to the server ”all it does is create the connection.

At this point, the code can make a request of Google Web Services. It performs this task by sending all of the required arguments as part of the SoapClient.doGoogleSearch() function call. Figure 3.11 lists the arguments ”the Google Web Services Kit provides an overview of the information and you'll find a detailed description in the "Understanding that Google Only Directly Supports SOAP" section of Chapter 4.

On return from the call, ThisResult contains the return data from Google in the form of a SOAP message. This object isn't an XML document. You must retrieve the XML document using the technique shown in Listing 3.3.

Understanding Privacy Issues

No one sends private information across the Internet using Google unless they're looking for an individual. Google won't ask for anyone's name, address, telephone number, or other contact information. In fact, such a request would be very suspect since Google doesn't even have a use for such information ”they're not selling anything, but their advertisers are. The only time you need to consider privacy issues when working with Google Web Services is if you somehow associate a request with a particular user or company. When your company is the only user of the application, you need to consider how much information you want to give away to people peeking at your communications. In some cases, you might decide to look for specific information using analysis of a general search that doesn't compromise privacy in any way.

Even though Google doesn't require the use of any personal information, you might require such information for your application. Make sure you actually need the information before you request it. You should also include a privacy policy on your Web site to ensure that everyone can use the application you created. The "Designing for Privacy Issues" section of Chapter 11 describes how to implement a privacy policy on your site.