Using HTTP Agents


HTTP agents provide developers with the opportunity to leverage other Web servers as an information resource. ColdFusion uses <CFHTTP> to connect to and retrieve content from remote servers. The agent (your ColdFusion code) behaves just like a Web browser, retrieving the entire contents of the designated URL's page. Combined with other ColdFusion technologies, HTTP agents form the basic transport for content and application syndication services. HTTP agents fall into two camps: unilateral and cooperative.

Unilateral agents retrieve the contents of a Web page without any type of cooperation with the remote information resource. The developer has to determine flexible methodologies for coping with the returned data to accommodate potential changes in the structure of the information returned. For example, when syndicating headlines from the home page of a news portal, the portal owner might change the layout of his Web site from time to time. Consequently, unilateral agents are generally more complex to code, can break easily, and are prone to copyright infringements.

Cooperative agents are those that work in concert with a back-end page or robot on the remote server. A back-end page is not designed to be seen by a user with a Web browser; rather, it is coded specifically to respond to an HTTP agent. As a result, cooperative agents are generally very easy to put together and do not break often. On the other hand, a back-end page can be fairly complex depending on the level of functionality being delivered to the incoming agent: for example, a news portal delivering an XML news feed to syndicate partners.

Creating HTTP Agents

The <cfhttp> tag spawns an external agent that goes to a specified URL and retrieves the page's contents via HTTP. The agent is sophisticated and remarkably simple to implement:

 <cfhttp url="http://www.forta.com/"         method="GET"         resolveurl="true"> </cfhttp> <cfoutput> #CFHTTP.FileContent# </cfoutput> 

In this example, <cfhttp> is sending an agent to retrieve and then display the forta.com home page. The entire contents of the page are returned as the variable CFHTTP.FileContent. The method="get" attribute is used to retrieve any text or binary file from the URL.

When a file is brought back from a remote server, any relative links to images, other Web pages, and so on will no longer be valid. resolveurl determines whether to resolve URLs found in CFHTTP.FileContent to absolute addresses. Absolute links will then point to the correct remote address.

When a Web browser accesses a Web server, it indicates the browser's type and version. The useragent attribute can be used to nominate the agent or browser of your choice. The default is "ColdFusion".

TIP

Web sites often use routines to detect the type of browser the user is operating. This information is then used to deliver a different client-side experience depending on the browser detected. To ensure that your HTTP agent returns the Web page you expect, make sure that the USERAGENT attribute is set to an actual Web browser type.

You can determine a Web browser's user agent value by browsing a ColdFusion server while debugging is turned on. Check the CGI variables for HTTP_USER_AGENT.


ColdFusion debugging is covered in Chapter 25, "Debugging."


The <cfhttp> agent has the same issues as any other Web browser does. If your server is located in a network that must use a proxy server to reach the Web, you will need to specify a proxy using the appropriate attributes for the <cfhttp> agent as well.

TIP

<cfhttp> is not clever enough to tell the proxy server to bypass the cache. Therefore, depending on how your proxy server has been configured, a <cfhttp> agent might continue to retrieve the cached version of a Web page even though the actual page has long since been modified.

You can often trick the proxy server's cache by adding a different URL parameter onto the end of the remote address each time you request a page. A URL-encoded time stamp works well.

 <cfset timestamp=URLEncodedFormat(Now())> <cfhttp url="http://www.forta.com/index.cfm?#timestamp#"         method="GET"> 


<cfhttp> returns a number of useful variables in addition to CFHTTP.FileContent. These are listed in Table 43.1.

Table 43.1. <cfhttp> Return Variables

VARIABLE NAMES

DESCRIPTION

CFHTTP.CharSet

Returns the character set of a retrieved URL

CFHTTP.FileContent

Returns the entire contents of the remote file for text and MIME files

CFHTTP.Header

Returns the entire response header in its raw-text format as a simple variable

CFHTTP.MimeType

Returns the MIME type of the filefor example, "text/html"

CFHTTP.ResponseHeader

Returns the entire response header in a structure. If there are multiple instances of a header keymultiple cookies, for examplethe values are placed in an array within the ResponseHeader structure.

CFHTTP.StatusCode

Returns the HTTP error code and associated error stringfor example, "200 Success"


<cfhttp> can send variables to a URL ahead of its retrieval of the Web page. This is what you would expect given that a standard Web browser can submit form variables, cookies, and URL parameters to the Web server. By submitting variables to the Web server directly, you can activate a form's action pages and other dynamic content, effectively bypassing the user interface that a normal Web user would navigate.

To submit variables, you must change to method="POST" and specify the variables to send using a series of <cfhttpparam> tags nested between <cfhttp> and </cfhttp>.

For example, when submitting text-searching variables to a search page, you can use the following code:

 <cfhttp url="http://www.forta.com/cf/tips/browse.cfm"         method="POST"         resolveurl="yes">   <cfhttpparam type="FORMFIELD"                name="search"                value="XML"> </cfhttp> 

<cfhttpparam> can be used to specify any combination of the following variable types:

  • BODY

  • CGI

  • COOKIE

  • FILE

  • FORMFIELD

  • HEADER

  • URL

  • XML

TIP

If using type="file", the mimetype should be specified too.


<cfhttp> can also be used to retrieve a file and save it directly to disk. This is particularly useful if you are grabbing binary files such as images, but it works equally well with Web pages and other text files. You need to specify the PATH where the file is to be saved. If you don't specify a FILE attribute, the original filename will be used.

 <cfhttp url="http://www.forta.com/images/0321125169_m.gif"         method="GET"         path="C:\images"         file="0321125169_m.gif"         resolveurl="false"         throwonerror="yes"> 

Remote Data File Queries

<cfhttp> can be used to retrieve a remote data file and dynamically generate a query object. <cfhttp> sends out an agent to retrieve the file and then parses the data according to the delimiter attribute. textqualifier indicates the character at the start and finish of a column. The columns attribute can dictate column headings; otherwise, the very first record is used for column names.

 <cfhttp url="http://www.forta.com/sales/jan.txt"         method="GET"         name="sales"         columns="isbn,quantity"         textqualifier=""""""         delimiter=","         resolveurl="false"></cfhttp> <cfoutput query="sales"> #isbn#: #quantity#<br> </cfoutput> 

NOTE

To specify quotation marks as a textqualifier, you need to escape each quotation mark with yet another quotation mark!

 textqualifier="""""" 

There are six quotation marks in this example: two to define the value of the attribute and two for each quotation mark in the value.


Secure HTTP

<cfhttp> supports secure connections between ColdFusion (the client) and remote HTTP servers. To use secure connections, simply use https instead of http as the protocol in the URL. <cfhttp> will then automatically use port 443; you may override this if needed.

Handling Proxy Servers

If your ColdFusion server accesses the outside world via a proxy server, <cfhttp> calls will fail unless ColdFusion itself routes HTTP requests via the proxy server. <cfhttp> supports the use of proxy servers but does not autodetect them. To use a proxy server, you must pass its host name or IP address to the PROXYSERVER attribute.

Troubleshooting HTTP Agents

<cfhttp> spawns an additional process that sends out an HTTP agent. The calling application page waits until the agent has traveled onto the network and returned before it continues processing. The agent's activity on the Web is largely beyond the developer's control. Any number of issues, from network congestion to the remote host's simply being unavailable, can cause the agent to fail.

If the agent fails or takes an inordinate amount of time to complete its mission, it might jeopardize the processing of the calling page. It is a good idea to set a timeout attribute for the agent so that you can code appropriate error handling to deal with long-running agents. Set throwonerror="Yes" to raise a standard exception should <cfhttp> time outor alternatively, check the CFHTTP.Status variable.

NOTE

TIMEOUT is not supported if running a JVM's prior to version 1.4.


Trapping errors is covered in Chapter 26, "Error Handling."


On occasion, the remote host might respond with an error. In other words, the remote host is available on the network, but the Web service has failed for some reason. You can test the CFHTTP.Status variable to detect whether things have gone wrong and respond accordingly.

NOTE

A successful HTTP agent will return a STATUSCODE of "200 Success" or "200 OK".


Page Scraping

A page scrape involves capturing an HTML page from another Web server and then processing the page for information. For instance, you might be interested in harvesting a list of contacts from an affiliate's Web site and displaying them on your own. However, you might want to get rid of the header and footer displayed on the affiliate's site and substitute your own.

Excising a particular piece of content from a Web site is done using string parsing. Typically this is achieved by locating a region on the page above and below the desired content. A region is identified as a constant string (such as a heading or a comment), or a pattern that can be reliably matched using a regular expression. Using string functions, you then remove all the text above and below the content you want in the CFHTTP.FileContent variable.

This form of syndication can be useful when you are dealing with a partner who has a basic Internet site. As long as the structure of the page (that is, the regions you are matching) does not change, you can syndicate content from the Web page with little or no input from the partner. However, because pages do change, relying on specific content in a specific format presents a high risk of error. Web Services are a far safer way to implement syndication and data sharing.

Web Services were reviewed in Chapter 32, "Web Services."




Macromedia ColdFusion MX 7 Certified Developer Study Guide
Macromedia ColdFusion MX 7 Certified Developer Study Guide
ISBN: 0321330110
EAN: 2147483647
Year: 2004
Pages: 389
Authors: Ben Forta

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net