Sample Transaction


All transactions examined in this book make use of HTTP (which should be somewhat familiar, because it is the protocol on which the World Wide Web is used). Understanding what a SOAP transaction looks like in its most basic form (the raw HTTP transactions) will come in useful when trying to diagnose problems later. The following sample transaction performs a search for "paul reinheimer" and requests only the first two results.

Request Headers

The purpose of the request headers in this context is to declare where the request itself is being sent, and indicate what is being asked for. Though indicating where the request is being sent may seem redundant because that sort of information is also included when addressing the TCP/IP packets, remember that multiple sites can be hosted on the same machine. This information, present in the HTTP request, allows the server to determine which site should receive the request.

 POST /search/beta2 HTTP/1.0 Host: api.google.com Connection: close Accept: */* Content-Length: 992 SOAPAction: "urn:GoogleSearchAction" Content-Type: text/xml; charset="utf-8" 

As you can see, the request header itself is rather brief. This brevity is due in part to it being manually generated. The headers created by a web browser are much longer, because they include lots of optional information. The rest of this section goes through this line by line.

 POST /search/beta2 HTTP/1.0 

A POST request is being made against the script on the server located at /search/beta2, and this request will follow the HTTP 1.0 protocol. POST (and its companion GET) should be familiar from form processing; the encoding is similar, only the content changes.

Note 

This is one of those cases where my tendency to leave a trailing slash on all URLs actually managed to bite me back. Performing a POST against /search/beta2/ will fail; Google returns that the script at that location does not accept POST requests.

 Host: api.google.com 

This line defines the Host that the request is being directed to. The inclusion of the Host line in the HTTP headers allows a single server to easily handle requests directed to multiple domains on the same IP; it also allows certain network devices to route requests to different servers to assist in load balancing. Note that the information from both the Host and POST lines can be combined to give the overall URL for the resource, which is http://api.google.com/search/beta2. You can split a URL down to its component parts with the parse_url() function.

 Connection: close 

This line states that the connection will close immediately after this request is completed, as opposed to a persistent request, where several transactions could occur successively. Persistent requests are useful when requesting an item such as an HTML document, where embedded images will likely need to be downloaded from the same server once the HTML has been received and parsed. That isn't the case here, so the close method makes far more sense. Note that this line does not close the connection; it merely sets the connection type.

 Accept: */* 

The Accept header indicates which content types are acceptable, in order of preference. In this case you are willing to accept any content that Google would care to return to you. You do, however, know what Google will return because the specification for the API indicates that you will always be sent application/xml.

 Content-Length: 992 

The Content-Length header warns the server as to the length of the body of the message in bytes. The strlen() function will prove to be invaluable when populating this element.

 SOAPAction: "urn:GoogleSearchAction" 

This is a custom header under the SOAP protocol. The Google API defines this header and its contents.

 Content-Type: text/xml; charset="utf-8" 

This Content-Type header defines what the body of the message will contain — text, formatted in XML. The character set used will be UTF-8. Google requires all transactions to be performed in UTF-8. This is clearly defined in its specification, and included in a note in the WSDL file.

Request Body

The body of the request contains the actual SOAP request for the server. The information required here is defined by Google. SOAP requests are presented in XML (a quick crash course in XML was presented in Chapter 3); this allows you to generate requests quickly and allows Google to parse that request in a similar quick and easy fashion.

 <?xml version="1.0" encoding="UTF-8" standalone="no"?> <SOAP-ENV:Envelope   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"   xmlns:typens="urn:GoogleSearch"   xmlns:xsd="http://www.w3.org/2001/XMLSchema"   xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">   <SOAP-ENV:Body>     <mns:doGoogleSearch xmlns:mns="urn:GoogleSearch" SOAP- ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">     <key xsi:type="xsd:string">u6U/r39QFHK18Qcjz/XdWSbptVaj9k1t</key>     <q xsi:type="xsd:string">paul reinheimer</q>     <start xsi:type="xsd:int">0</start>     <maxResults xsi:type="xsd:int">2</maxResults>     <filter xsi:type="xsd:boolean">1</filter>     <restrict xsi:type="xsd:string"></restrict>     <safeSearch xsi:type="xsd:boolean">1</safeSearch>     <lr xsi:type="xsd:string"></lr>     <ie xsi:type="xsd:string"></ie>     <oe xsi:type="xsd:string"></oe>     </mns:doGoogleSearch>   </SOAP-ENV:Body> </SOAP-ENV:Envelope> 

As mentioned previously, each SOAP request includes both the Envelope and the Body. The envelope defines the namespaces used in the request and encapsulates the body, which contains the API call itself. Each section of the request body is examined in detail here.

 <?xml version="1.0" encoding="UTF-8" standalone="no"?> <SOAP-ENV:Envelope   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"   xmlns:typens="urn:GoogleSearch"   xmlns:xsd="http://www.w3.org/2001/XMLSchema"   xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > 

These lines declare the document to be written as XML version 1.0, with UTF-8 encoding. Then begin the SOAP-ENV root element (the SOAP envelope) and declare the appropriate namespaces.

 <SOAP-ENV:Body>     <mns:doGoogleSearch xmlns:mns="urn:GoogleSearch"       SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> 

Here the SOAP Body element begins, ready to encapsulate the request itself; the second line adds additional namespaces to the Body element.

 <key xsi:type="xsd:string">u6U/r39QFHK18Qcjz/XdWSbptVaj9k1t</key> 

The key element must be included with every request you send to Google. This is the API key issued to you when you register for the Google API (covered earlier in this chapter). The xsi:type="xsd:string" portion of the element defines the type of data being sent, in this case, a string. More information on the types can be found at the URLs listed when the namespaces were defined in the Envelope element.

 <q xsi:type="xsd:string">paul reinheimer</q> 

The q represents the query that will be sent to Google; again, a string is being sent.

Note 

Google only cares about the first 10 terms (or words) sent in a query, something to keep in mind when using the service.

 <start xsi:type="xsd:int">0</start> <maxResults xsi:type="xsd:int">2</maxResults> 

The start element defines the first index of the first element to be returned; Google counts just like arrays, starting at 0. maxResults defines how many results you wish to receive, in this case 2. Google will not return more than 10 results at a time — if you require more results, use multiple requests. With each successive request increment the start element. Note the type for both elements is int (integer).

 <filter xsi:type="xsd:boolean">1</filter> <restrict xsi:type="xsd:string"></restrict> 

The filter element when enabled (set to 1) filters results in two ways: First, nearly duplicate content is removed from the result set, and second, if many results are located on the same domain, only the first two are returned. These filters are on by default when using the standard Google web front end, but can be turned off here by sending a 0 rather than a 1. The restrict element can be used to restrict results to pages from a particular country or topic. For example, you use unclesam in the restrict element to limit results to U.S. Government topics. For a full listing of topic and country restrictions available, see the APIs_Reference.html document that came in your GoogleAPI download.

 <safeSearch xsi:type="xsd:boolean">1</safeSearch> 

Turning safeSearch on (by setting it to 1) instructs Google to remove adult sites from the search results. Keep in mind that no filter is perfect, so you may still see some adult sites in the result set.

 <lr xsi:type="xsd:string"></lr> <ie xsi:type="xsd:string"></ie> <oe xsi:type="xsd:string"></oe> 

The lr element allows you to restrict results by the language in which they are written. lang_en, for example, restricts results to pages in the English language. For a full listing of available language restrictions, see the APIs_Reference.html file that came with the API documentation download. ie and oe are both required elements that represent input encoding and output encoding, respectively. However, these values are both ignored and all input is expected to be UTF-8, and all output will be UTF-8 as well.

     </mns:doGoogleSearch>   </SOAP-ENV:Body> </SOAP-ENV:Envelope> 

The message namespace, body, and envelope are closed in turn.

Response Header

The response header is quite similar to the request header generated earlier. It serves many of the same purposes. The information here helps the client machine understand what type of information it is about to receive, and what occurrence on the server generated that information.

 HTTP/1.0 200 OK Content-Type: text/xml; charset=utf-8 Cache-control: private Date: Sun, 16 Jan 2005 21:06:04 GMT Server: GFE/1.3 Connection: Close 

The HTTP response header returned after performing the request looks quite similar to the original request header, so only new elements are examined here.

 HTTP/1.0 200 OK 

Quite similar to the request header, this defines the response as being HTTP version 1.0. It also gives the HTTP response code of 200 and a short description of that status code, OK. The most commonly known HTTP response code would probably be 404: Not found. If you happen to receive that response while attempting to access an API, you have the wrong URL.

 Cache-control: private Date: Sun, 16 Jan 2005 21:06:04 GMT Server: GFE/1.3 

The Cache-control: private declaration states that the content may be cached, but not in any sort of a shared cache. For the purposes of the applications here, this element will be ignored. However, servers between your applications and the Google server will note this header, and (hopefully) not attempt to cache any of the transactions. The Date header is relatively self-explanatory; note that it will always contain a time in GMT. The Server line is an optional header declaring the name and version of the server software used.

Response Body

The response body contains the server's response to your SOAP request. The response returned is quite long, even though you requested only the first two search results. As such, it has been broken into more manageable chunks here. Also note that white space has also been modified to allow for easy reading. After the segmented form has been covered, the entire process will be reexamined in greater detail.

 <?xml version='1.0' encoding='UTF-8'?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema">  <SOAP-ENV:Body>  <ns1:doGoogleSearchResponse xmlns:ns1="urn:GoogleSearch"    SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">   <return xsi:type="ns1:GoogleSearchResult">    <directoryCategories xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"      xsi:type="ns2:Array" ns2:arrayType="ns1:DirectoryCategory[0]">    </directoryCategories>    <documentFiltering xsi:type="xsd:boolean">false</documentFiltering>    <endIndex xsi:type="xsd:int">2</endIndex>    <estimateIsExact xsi:type="xsd:boolean">false</estimateIsExact>    <estimatedTotalResultsCount xsi:type="xsd:int">2070</estimatedTotalResultsCount> 

This top section first takes care of defining the namespaces that will be used throughout the document, then follows the body of the response and declaration of the response namespaces. Here the response begins in earnest with information regarding what Google found in response to your query.

 <resultElements xmlns:ns3="http://schemas.xmlsoap.org/soap/encoding/"   xsi:type="ns3:Array" ns3:arrayType="ns1:ResultElement[2]">  <item xsi:type="ns1:ResultElement">   <URL xsi:type="xsd:string">">http://www.preinheimer.com/</URL>   <cachedSize xsi:type="xsd:string">60k</cachedSize>   <directoryCategory xsi:type="ns1:DirectoryCategory">    <fullViewableName xsi:type="xsd:string"></fullViewableName>    <specialEncoding xsi:type="xsd:string"></specialEncoding>   </directoryCategory>   <directoryTitle xsi:type="xsd:string"></directoryTitle>   <hostName xsi:type="xsd:string"></hostName>   <relatedInformationPresent     xsi:type="xsd:boolean">true</relatedInformationPresent>   <snippet xsi:type="xsd:string">&lt;b&gt;...&lt;/b&gt; Posted by     &lt;b&gt;Paul&lt;/b&gt; &lt;b&gt;Reinheimer&lt;/b&gt; in Computing at 20:57 |     Comments (0) | Trackbacks (0). &lt;b&gt;...&lt;/b&gt;&lt;br&gt; thanks     &lt;b&gt;paul&lt;/b&gt;. Posted by &lt;b&gt;Paul&lt;/b&gt;     &lt;b&gt;Reinheimer&lt;/b&gt; at 22:45 | Comments (2) | Trackbacks (0).     &lt;b&gt;...&lt;/b&gt; </snippet>   <summary xsi:type="xsd:string"></summary>   <title xsi:type="xsd:string">preinheimer.com</title>  </item>  <item xsi:type="ns1:ResultElement">   <URL xsi:type="xsd:string"> ">http://p2p.wrox.com/blogs_bio.asp?AUTHOR_ID=22424</URL>   <cachedSize xsi:type="xsd:string">17k</cachedSize>   <directoryCategory xsi:type="ns1:DirectoryCategory">    <fullViewableName xsi:type="xsd:string"></fullViewableName>    <specialEncoding xsi:type="xsd:string"></specialEncoding>   </directoryCategory>   <directoryTitle xsi:type="xsd:string"></directoryTitle>   <hostName xsi:type="xsd:string"></hostName>   <relatedInformationPresent     xsi:type="xsd:boolean">true</relatedInformationPresent>   <snippet xsi:type=" xsd:string">&lt;b&gt;...&lt;/b&gt; p2p.wrox.com Forums, Blogs     &amp;gt; &lt;b&gt;Paul&lt;/b&gt; &lt;b&gt;Reinheimer&amp;#39;s&lt;/b&gt; Bio.     Wrox Blogs. Archive RSS&lt;br&gt; Feed, &lt;b&gt;Paul&lt;/b&gt;     &lt;b&gt;Reinheimer&lt;/b&gt;. Homepage: http://www.preinheimer.com.     &lt;b&gt;...&lt;/b&gt;  </snippet>   <summary xsi:type="xsd:string"></summary>   <title xsi:type="xsd:string">p2p.wrox.com Forums</title>  </item> </resultElements> 

Here you have the two search results you requested. Note that the returned elements are encapsulated with then resultElements element, then each result is an item in itself. Also note that appropriate text has been HTML-encoded — &lt; instead of < and such.

 <searchComments xsi:type="xsd:string"></searchComments> <searchQuery xsi:type="xsd:string">paul reinheimer</searchQuery> <searchTime xsi:type="xsd:double">0.05788</searchTime> <searchTips xsi:type="xsd:string"></searchTips> <startIndex xsi:type="xsd:int">1</startIndex> </return> </ns1:doGoogleSearchResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope> 

Finally, a few last pieces of information about your request, and closure of the request namespace, the Body, and finally the Envelope.

In Depth

 <?xml version='1.0' encoding='UTF-8'?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance" xmlns:xsd="http://www.w3.org/1999/XMLSchema">  <SOAP-ENV:Body> 

These initial lines are very similar to the ones used in the request: The document is declared as XML, the SOAP Envelope begins, some namespaces are declared, and the SOAP Body begins.

 <ns1:doGoogleSearchResponse xmlns:ns1="urn:GoogleSearch"   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">  <return xsi:type="ns1:GoogleSearchResult"> 

Here the response namespace beings, and similar to the request, you see the encodingStyle namespace added. The return element will encapsulate all the information related to your search, not just the search results themselves.

 <directoryCategories xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/"   xsi:type="ns2:Array" ns2:arrayType="ns1:DirectoryCategory[0]"> </directoryCategories> 

The directoryCategories element will contain an array listing all of the categories the search matched in the Open Directory Project (dmoz.org) when available. This search didn't match any, so none were returned.

 <documentFiltering xsi:type="xsd:boolean">false</documentFiltering> <endIndex xsi:type="xsd:int">2</endIndex> 

The documentFiltering element indicates whether any filtering was performed on the search results. It would only be true if the request indicated that filtering was desired AND filtering was accomplished. In this case filtering was requested, but none was required. endIndex indicates the index of the last search result returned in this set; note that endIndex is a 1-based count (start counting at 1 rather than 0).

 <estimateIsExact xsi:type="xsd:boolean">false</estimateIsExact> <estimatedTotalResultsCount xsi:type="xsd:int">2070</estimatedTotalResultsCount> 

The estimateIsExact element indicates whether the following estimatedTotalResultsCount is actually an exact value. You will likely only see true when performing a search that has very few results.

 <resultElements xmlns:ns3="http://schemas.xmlsoap.org/soap/encoding/"   xsi:type="ns3:Array" ns3:arrayType="ns1:ResultElement[2]"> 

The resultElements element is of significant interest. It contains all available results, each within its own item element.

 <item xsi:type="ns1:ResultElement">   <URL xsi:type="xsd:string">">http://www.preinheimer.com/</URL>   <cachedSize xsi:type="xsd:string">60k</cachedSize> 

The item element declares itself to be a ResultElement, and as such you can expect it to contain all of the information regarding this particular search result. The URL element declares the URL for this search result, and cachedSize indicates the size (in kilobytes) of Google's cache of this page.

 <directoryCategory xsi:type="ns1:DirectoryCategory">  <fullViewableName xsi:type="xsd:string"></fullViewableName>  <specialEncoding xsi:type="xsd:string"></specialEncoding> </directoryCategory> <directoryTitle xsi:type="xsd:string"></directoryTitle> 

These directory-based items indicate in which categories the Open Directory Project classifies this particular result. My website is apparently unclassifiable (or, alternatively, not worth classifying). The directoryTitle element would indicate the title of the page according to the ODP. Generally, I have found little use for these elements; they haven't been populated with enough frequency to make it worth the effort to code around them.

 <hostName xsi:type="xsd:string"></hostName> 

The hostName element is used only when a filter was turned on in the request and many results were found in the same domain. In that case, the second result (remember when a filter is on, only the first two results from any domain are returned) will have the hostName value set. In this case you may want to run a second search, adding site: <hostname> to the query to obtain further results from that specific domain.

 <relatedInformationPresent   xsi:type="xsd:boolean">true</relatedInformationPresent> 

When the relatedInformationPresent value is set to true, Google knows of other pages similar to this result. To obtain them, perform another query in the form related: <URL>. Note that the related query cannot be combined with any other query.

 <snippet xsi:type="xsd:string">&lt;b&gt;...&lt;/b&gt; Posted by   &lt;b&gt;Paul&lt;/b&gt; &lt;b&gt;Reinheimer&lt;/b&gt; in Computing at 20:57 |   Comments (0) | Trackbacks (0). &lt;b&gt;...&lt;/b&gt;&lt;br&gt; thanks   &lt;b&gt;paul&lt;/b&gt;. Posted by &lt;b&gt;Paul&lt;/b&gt;   &lt;b&gt;Reinheimer&lt;/b&gt; at 22:45 | Comments (2) | Trackbacks (0).   &lt;b&gt;...&lt;/b&gt; </snippet> 

This is a snippet from the result; note that all appropriate characters are HTML-encoded. You may want to run html_entity_decode() on the snippet if you are displaying it to users (be careful what you allow through because this is foreign data).

   <summary xsi:type="xsd:string"></summary>   <title xsi:type="xsd:string">preinheimer.com</title>  </item> 

Finally, the summary element is only populated if the page has an entry in the ODP. The title comes from the page in question (in HTML this would be the <title></title> value; in other page types such as PDF it comes from their equivalent), and the item is closed.

Note 

For the sake of brevity I have skipped the second item element.

 <searchComments xsi:type="xsd:string"></searchComments> <searchQuery xsi:type="xsd:string">paul reinheimer</searchQuery> <searchTime xsi:type="xsd:double">0.05788</searchTime> <searchTips xsi:type="xsd:string"></searchTips> <startIndex xsi:type="xsd:int">1</startIndex> 

The searchComments element may be populated with a textual message for the end user. It may contain information that certain common words were removed from the search query (for example, The following words are very common and were not included in your search: as a). searchQuery will contain the search that was completed (the q from the request), and searchTime will contain the amount of time it took Google to complete the request. searchTips will contain textual tips for the end user on how to better use Google, and startIndex will contain the index of the first result (1-based).

 </return> </ns1:doGoogleSearchResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope> 

Finally, all of the open tags are closed, and so ends the response header.




Professional Web APIs with PHP. eBay, Google, PayPal, Amazon, FedEx, Plus Web Feeds
Professional Web APIs with PHP. eBay, Google, PayPal, Amazon, FedEx, Plus Web Feeds
ISBN: 764589547
EAN: N/A
Year: 2006
Pages: 130

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net