Putting cfhttp to Use

Putting `<cfhttp>` to Use

The <cfhttp> tag has unlimited usesfor example, it can be used as a simple request for a page, or as the cornerstone to a back-end agent that directs content to a user through email. Now that you have looked at the various attributes and syntax descriptions for the <cfhttp> tag, let's write some examples to demonstrate its various capabilities.

Using the `GET` Method

The first example demonstrates a simple GET operation. Listing 21.1 shows the CFML code necessary to use the <cfhttp> tag in a GET operation. This example fetches the index page from www.wired.com (a popular news site) and then displays the results.

Listing 21.1. `getwired.cfm`Retrieving the Index Page from www.wired.com via the `<cfhttp>` Tag

 <!---  Filename: getwired.cfm  Purpose: Get the index page from Wired.com ---> <cfhttp method="get" url="http://www.wired.com" resolveURL="yes"> <cfoutput> #cfhttp.filecontent# </cfoutput>

Figure 21.1 shows the output of the example, with the index page from www.wired.com fully displayed, including all its graphics and links.

Figure 21.1. Results from the main index page being pulled from the Wired site.

Looking through the code, you can see that the results of the request to the Web page are shown because the cfhttp.filecontent variable is output. In addition, the attribute resolveURLU is set to YES, which tells ColdFusion to go into the results of the request and change all relative references into absolute references. For example the images on the page are by default not hard coded to a specific location. Therefore, if we output the result of the request to the browser that requested the ColdFusion page containing the <cfhttp>, we won't be able to see the images because it would request that the images be embedded into the document using our server as the relative location.

Because resolving these locations is an extra step for ColdFusion, it is important to understand when it is appropriate to use this setting. Use it whenever you will be displaying the results of your internal HTTP request. On requests that are interacting for communication or data retrieval, this setting should be set to NO.

There are several cases where the results of a <cfhttp> request are not to be shown but instead stored locally. The next example demonstrates using <cfhttp> with the GET method to save the results to a file. To accomplish this, the path and file attributes are specified with the directory and filename to which the results are to be saved. If the file attribute is left blank, it defaults to the name of the file being requested.

In this next example, the cfhttp.FileContent variable doesn't contain the results of the request; instead, it contains a message that the results are stored in the specified file. To display the outcome of the request, the <cffile> tag would be needed to read the contents of the download file into a variable and then display the results. The modified template is shown in Listing 21.2.

Listing 21.2. `getwired2.cfm`Using `<cfhttp>` with the `GET` Method to Download a File

 <!---  Filename: getwired2.cfm  Purpose: Get the index page from Wired.com and save it ---> <cfhttp method="get" url="http://www.wired.com"  file="wiredindex.html" path="#expandPath('.')#" resolveurl="YES"> <cffile action="READ" variable="httpfile" file="#expandPath('./wiredindex.html')#"> <cfoutput> #httpfile# </cfoutput>

This technique is more commonly used to download documents and images from the Internet when other protocols, such as FTP, are not available.

TIP

Coupling the get method with the upload capability of a <cffile> and forms is a quick way to create your own FTP-style client. Use <cfhttp> to pass between servers and <cffile>/<cfcontent> to upload and download files.

The output from this listing is the same as Listing 21.1. When running the example, you will notice that many of the links are broken. This is because the resolveURL attribute is ignored when the path and file attributes are specified. A quick workaround for saving a result HTML file with resolved links is to request the file as shown in Listing 21.1, and save the results found in the cfhttp.filecontent with <cffile>. The limitation has no effect when the technique is used to save a document or an image locally, as shown in the next example.

The preceding example (Listing 21.2) used the GET method to display and save the output of a standard Web page. The next example demonstrates the use of the <cfhttp> tag to download a binary file, such as an image or word document, from a remote Web server. For most binary files, the only method you can use to access them is GET. Using an unsupported method such as POST creates a "405 Method Not Allowed" HTTP error. Listing 21.3 shows this example and demonstrates the use of the <cfdump> tag to display the resulting cfhttp structure.

Listing 21.3. `getbinary.cfm`Using `<cfhttp>` with the `GET` Method to Download a Binary File

 <!---  Filename: getbinary.cfm  Author: Raymond Camden (ray@camdenfamily.com)  Purpose: Get an image and save it ---> <!--- get the base URL using our UDF ---> <cfset theURL = getBaseURL()> <!--- add in our gif ---> <cfset theURL = theURL & "/excite_logo.gif"> <cfoutput> <p> Getting #theURL# </p> </cfoutput> <cfhttp method="get" url="#theURL#" resolveURL="YES"  path="#getDirectoryFromPath(getCurrentTemplatePath())#"  file="excite_logo_copy.gif"> <cfdump var="#cfhttp#">

The image file used in this example (excite_logo.gif) and all the code listings are included on the CD-ROM that accompanies this book. The UDF used in this example, getBaseURL(), is defined in the Application.cfc file, also included on the CD-ROM. All this UDF does is translate the current request URL to a base URL (essentially, the current URL minus the filename).

Building on this functionality, you could create a tool that enables you to download binary documents through HTTP by dynamically specifying the url, file, and path attributes. In this dynamic situation, the MIME type of the binary file requested might need to be examined in order to filter adequate file types. Looking at the resulting <cfhttp> structure, we would find this in the cfhttp.MimeType variable. The results of Listing 21.3 are shown in Figure 21.2.

Figure 21.2. Output from the `<cfhttp>` tag after downloading a binary file using the `GET` method.

Building a Query from a Text File

HTML is a poor way of passing and storing data for use by other systems. By having the data stored in an agreed-upon format, sharing information between servers is much easier. One of the formats that can be used is a delimited text file. With the <cfhttp> tag, using the GET method, you can read a delimited text file and create a query object from it. Listing 21.4 contains the sample code necessary to perform this action.

Listing 21.4. `getauthors1.cfm`Using `<cfhttp>` to Build a Query Using a Text File

 <!---  Filename: getauthors1.cfm  Purpose: Get the authors data ---> <html> <head>  <title>CFHTTP QUERY TEST</title> </head> <body> <cfhttp method="GET"  url="#getBaseURL()#/authors.txt"  name="authors" delimiter="," textQualifier=""""> <table border> <tr> <th align="left">Last Name</th> <th align="left">First Name</th> </tr> <cfoutput query="authors"> <tr> <td align="left">#authors.lastname#</td> <td align="left">#authors.firstname#</td> </tr> </cfoutput> </table> </body> </html>

Several attributes must be used to have the <cfhttp> tag read the text file and create a query object. Setting the name attribute to the desired variable name indicates that you want the file pointed to by the URL attribute to be converted into a query object. In the example in Listing 21.4, the query object is called authors.

The only requirements of the text file are that the values are delineated and that the text values are qualified. The delimiter attribute specifies the value that separates the text values. The default is a comma (,), which also happens to be the most common. The typical filename extension for a comma-separated file is .csv. Because the text values can hold the delineating character, they need to be surrounded by some type of text qualifier. The textQualifier attribute is used to specify the value or values that surround all the text values. The default is a double quotation mark (").

By default, the first row of the text file is reserved for the column headers, even if none is present. To signal that this isn't the case, the attribute firstRowAsHeaders is used to signal whether to use the first row to determine the headers for the query. If this is set to trUE, the query object will be created with a column_x pattern for its name. To set your own column headers, the columns attribute is used to specify the names of the columns in the text file. The columns attribute must contain a comma-separated list of column headers that are in the same sequence as the columns in the text file. For each column of data, there must be a representing column header. In Listing 21.4, we do not need to specify the columns because our text file has the column names in the first line. We do not need to specify firstRowAsHeaders either since it defaults to true.

Immediately after the <cfhttp> tag executes, a query object is available for manipulation. Figure 21.3 shows the output from this example.

Figure 21.3. This output is the result of a query created using the `<cfhttp>` tag.

To summarize, the <cfhttp> tag uses the following guidelines when possessing text files:

The name attribute specifies the name of the query object that is created by ColdFusion.
A delimiter is specified with the delimiter attribute. If the delimiter is contained within a field in the file, it must be quoted using the character specified in the textQualifier attribute.
The first row of the text file is interpreted as the column headers by default. You can override this setting by using the columns attribute; however, the first column is still ignored. The only exception is when the firstRowAsHeaders attribute is used.
When ColdFusion encounters duplicate column names, it adds an underscore (_) character to the duplicate column name to make it unique.

Using the `POST` Method

The POST method provides a way of interacting with other servers by letting you pass a wide variety of information for processing. Although the GET method does allow you to pass information as part of the URL's query string, it limits the type and quantity of information that can be passed to the server. The POST method, in contrast, enables you to create much richer interactive portals that feed both behind-the-scene agents as well as end users.

NOTE

Information passed through the POST method is embedded into the HTTP header of the request, whereas information passed through the GET method is embedded into the URL. Both forms will pass information, but the POST method is more structured and robust.

Eight types of variables can be passed through a POST method: URL, CGI, COOKIE, FORM, FILE, XML, HEADER, and BODY. The code in Listing 21.5 demonstrates the passing of most of these types of data. Note that when passing a file through <cfhttpparam>, instead of specifying the value attribute, you specify the file attribute, which contains the name of the file to be uploaded.

There is no restriction on the type of page the <cfhttp> tag can request. It can be another ColdFusion page, an ASP (Active Server Page), a PHP, or any other valid Web page. The variables passed are exposed exactly as if a browser were passing them. Because both CGI and URL variables can be passed in this manner, take care that you don't create a duplicate variable. Creating a duplicate variable overwrites the original values or appends the value into a string, depending on how the server handles the HTTP packet that is generated. As a general rule, never pass URL parameters through the URL attribute of the <cfhttp> tag; pass them only through <cfhttpparam>.

Listing 21.5. `dopost.cfm`Using `<cfhttp>` with the `POST` Method

 <!---  Filename: dopost.cfm  Purpose: Do a Post ---> <!--- get the base URL using our UDF ---> <cfset theURL = getBaseURL()> <!--- add in our gif ---> <cfset theURL = theURL & "/dopostrequest.cfm"> <cfhttp method="POST" url="#theURL#">  <cfhttpparam name="form_test" type="FormField" value="This is a form variable.">  <cfhttpparam name="url_test" type="URL" value="This is a URL variable.">  <cfhttpparam name="cgi_test" type="CGI" value="This is a CGI variable.">  <cfhttpparam name="cookie_test" type="Cookie" value="This is a cookie.">  <cfhttpparam name="filename" type="FILE"  file="#getDirectoryFromPath(getCurrentTemplatePath())#excite_logo.gif">  <cfhttpparam name="user-agent" type="header" value="FakeIE"> </cfhttp> <cfoutput> #cfhttp.filecontent# </cfoutput>

As you can see, the code is pretty simple. The information is passed to the dopostrequest.cfm template. The code for this template is in Listing 21.6, and the results of the page are shown in Figure 21.4.

Figure 21.4. The `<cfhttp>` tag using the `POST` method produces this output.

In Listing 21.6, the getHTTPRequestData() function is used to view the contents of the HTTP request data. This function returns a structure that describes and exposes the entire HTTP request packet. The content variable contains all the information passed in the body of the request packet in its native form. Because this example passes a file, this variable is transmitted in a binary format. To work with this binary value you have to issue a toString() function to convert it to a local variable. This function provides access to the full packet that makes up the HTTP request. Custom header information can be pulled out and used for items such as authentication or message routing.

Listing 21.6. `dopostrequest.cfm`A Template That Processes the `<cfhttp> POST` Method Variables

 <!---  Filename: dopostrequest.cfm  Purpose: Handle a post request ---> <html> <head>  <title>CFHTTP Post Test</title> </head> <body>  <cfoutput>  The following variables were POSTED here.  <p>  Form_Test: #Form.form_test#<br>  URL_Test: #URL.url_test#<br>  CGI_Test: #CGI.cgi_test#<br>  Cookie_Test: #COOKIE.cookie_test#<br>  FileName: #form.filename#<br>  </cfoutput>  <p/>  The HTTP Request Data is the following:  <cfdump var="#GetHttpRequestData()#"> </body> </html>

Creating Intelligent Agents with `<cfhttp>`

Now that you have experience with the basic features of the <cfhttp> tag, it's time to build your first intelligent agents. In this chapter we will discuss three types of agents. The first agent goes to barnesandnoble.com and requests a list of books written by this book's lead author, Ben Forta. This agent demonstrates how to interact with another site's functionality without modifying the results. The second agent searches for authors from your local site. In this example we'll interact with an actual agent that is expecting requests. The last example creates an agent that modifies external information for its own needs.

Regardless of the premise of the agent you build, you must address two issues. One is how it will interact with the other servers and applications. The second is how you work with the result of the server communication, realizing that each of the back-end communications can result in different formats.

Server Interaction with Intelligent Agents

An agent can make a request to any other Web server and for any page. In some cases, the agent will make a request to a site that is expecting agents to make requests, and other times the requests are made to a page that was created for its corresponding form page only. The challenge when requesting a page that is not set up for your agents is that your code is constantly changing to ensure that you are passing and requesting the correct page. (The Web Services concept was set up to provide a way to minimize this impact.)

The first agent you create passes information from your local server to the book search engine at www.barnesandnoble.com and then displays the results. The search engine is not aware of our agent.

Because it isn't expecting any agents, it will undoubtedly change the search page to fit the needs of barnesandnoble.com. This situation illustrates the main problem with such agents: the target page can change its required variables at any time and therefore break the application.

However, we are not daunted. The first step in creating this agent is to go out to the book search engine's form page, http://search.barnesandnoble.com/booksearch/search.asp, and view the source. The goal is to understand what page does the actual querying and what values it expects. With the source of the page exposed, we note the form variables it expects (ATH, for the author) and the search-processing file (http://search.barnesandnoble.com/booksearch/results.asp).

With these values we can create a ColdFusion template that interacts with the search-processing page. The results from the search query are loaded into the CFHTTP.FileContent variable. Listing 21.7 shows the code used for this agent. For this example, the author name has been hard-coded, but this can quickly be adapted to take an author from a form value or database field.

Listing 21.7. `searchbn.cfm`Passing Information to the www.barnesandnoble.com Book Search Engine

 <!---  Filename: searchbn.cfm  Purpose: Search barnesandnoble.com ---> <cfhttp method="get" url="http://search.barnesandnoble.com/bookSearch/results.asp"  resolveURL="YES" redirect="yes"  userAgent="Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)"  timeout="10">  <cfhttpparam name="ath" type="formfield" value="BEN FORTA"> </cfhttp> <cfoutput> #CFHTTP.fileContent# </cfoutput>

When using the <cfhttp> tag to extract HTML from remote Web servers that don't expect agents, you should exercise caution, from an intellectual property perspective. The code in Listing 21.7 is simple, yet it demonstrates the power of the <cfhttp> tag. By researching the form fields necessary to drive search engines, you can add a powerful function to your ColdFusion templates.

The redirect, useragent, and timeout attributes of the <cfhttp> tag can be of key importance when you need to communicate with the outside world. As servers face an increasingly intense bombardment of requests from search engines, many sites are starting to filter requests based on the setting of the User-Agent value of the header.

When the requests are made through the <cfhttp> tag, the default of the user agent is either the name of the Java JVM you are using or ColdFusion. By providing the userAgent attribute, we can mask our request to look like it came from a different sourcein our example, a Mozilla-compliant browser. Also, in response to constant changes in site structures, another popular approach has been redirecting requests from an old page location to the new location of the page. If redirect is set to TRUE, the request will flow through various redirects until it finds the necessary page. Otherwise, it will cause an error.

The redirects that are accomplished under the single <cfhttp> tag can be found in the CFHTTP structure on the responseHeader[LOCATION] key. This will give you access to where the request was routed. The <cfhttp> tag allows a maximum of only four redirects for a given request.

The last attribute to focus on is timeout. Because the requests are going external to our application, we lose control of the performance of each request. So if a given request is slow, the performance of our application is affected. The timeout attribute determines how long, in seconds, ColdFusion should wait before it terminates the request.

NOTE

You can set the timeout for a given request in the URL, the <cfhttp> tag, or the ColdFusion Administrator. These are the rules for how a request timeout is figured out: The URL variable requesttimeout can be used to set the maximum time in seconds that the request can take if a timeout attribute is specified in the <cfhttp> tag the lesser of the two values. If no URL variable is specified, then it is the lowest of either the setting in the ColdFusion Administrator, or the <cfhttp> tag.

If no timeout is set in the URL, the <cfhttp> tag, or the ColdFusion Administrator, ColdFusion processes requests synchronously, meaning that ColdFusion waits indefinitely for cfhttp requests to process.

This raises one more point of interest. When any timeout value is set, ColdFusion will go ahead and creates a separate thread to process the new HTTP request. Thus your single request to the ColdFusion page turns into two requests in ColdFusion. No timeout means that the same request does all the work.

Note that you must enable the timeout set in the ColdFusion Administrator in order for the ColdFusion Administrator timeout and the URL timeout to take effect.

Another use of this technique would be to find local Social Security offices, weather, stocks, and so on. Figure 21.5 shows the output from this example.

Figure 21.5. The Barnes & Noble book search engine, with data provided by the `<cfhttp>` tag, produces this output.

The results passed back from this example are in HTML. Although this doesn't pose a problem to your application, if the results are going to be embedded directly into the page it makes separating the data elements from the visual elements virtually impossible. And since the data that is exposed back from our agents will more than likely be used outside of the visual representation another site needs, this poses a problem. But bringing these back-end HTTP agents to another level requires either a heavy amount of parsing or having the data returned in a standard format, such as XML or WDDX. We will look at both these options over the next two examples.

CAUTION

It is important to emphasize that using any sort of parsing to extract data or reformat an HTML document is not recommended. This technique is commonly referred to as screen scraping. It is risky because of the danger of change in the returned HTML. Both parsing mechanisms rely on some type of pattern or set HTML being in place. If this changes, the whole parsing routine could have to change.

Another problem with screen scraping has to do with the need to avoid changing other parts of the HTML document (such as JavaScript) that are necessary for the HTML to work correctly. Removing either the JavaScript or the element that it needs can cause the resulting HTML page to throw errors. XML is the solution for worry-free passing of data back to an agent.

Interacting with Planned Agents

The second agent we're studying in this section acts against an author search engine created just for agents. These types of interactions are the most stable because the parameters used don't frequently change. Because these pages are not visible by browsing the Web site, you must have an arrangement with the particular Web site so that you know the locations and the parameters necessary for processing files.

The author search agent is pretty straightforward: It displays a self-posting form that collects the desired first and last names to search for. The specified first and last names are then passed to the author search processing page. Unlike the preceding example, in which the content was passed back in HTML form, here the resulting information tends to be more structured when you interact with an agent.

In this example, the resulting feedback is an XML document. Just as when you are trying to tap into an external site, the prerequisite for starting to construct the integration is to understand exactly where the agent is located and what is expected. Here the agent is looking for an author's first and last name. Listing 21.8 shows the code used in this second agent. It should be noted that in our examples we use several CGI variables that may or may not be available depending upon which Web server we are using.

Listing 21.8. `AuthorSearch.cfm`Passing Information to the Author Search Processing Page

 <!---  Filename: AuthorSearch.cfm  Purpose: Search against an XML file. ---> <cfif isDefined("form.search")>   <!--- Get authors by first and last name --->   <cfhttp url="#getBaseURL()#/authorSearchPort.cfm" method="POST">     <cfhttpparam name="firstname" type="formfield" value="#FORM.firstname#">     <cfhttpparam name="lastname" type="formfield" value="#FORM.lastname#">   </cfhttp>   <cfset results = xmlParse(trim(cfhttp.FileContent))>   <html>   <head>   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">   <title>Author Results</title>   </head>   <body>   <h2>Author Results</h2>   <hr>   <cfif structKeyExists(results.Authors, "name")>     <cfloop from="1" to="#arrayLen(Results.Authors.name)#" index="i">     <table>     <tr>     <th align="left">Author Name:</th>     <cfoutput><td>#Results.Authors.name[i].lname.XmlText#,     #Results.Authors.name[i].fname.XmlText#</td></cfoutput>     </tr>     </table>     <hr>     </cfloop>  </cfif>  </body>  </html> <cfelse>   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">   <html>   <head>   <title>Author Search</title>   </head>   <body>   <h1>Author Search Form</h1>   <cfoutput>   <form action="#cgi.path_info#" method="post">   </cfoutput>   <table>   <tr>   <td>First Name:</td>   <td><input type="text" name="firstName" size="40"></td>   </tr>   <tr>   <td>Last Name:</td>   <td><input type="text" name="lastName" size="40"></td>   </tr>   <tr>   <td></td>   <td><input type="submit" name="search" value="Search"></td>   </tr>   </table>   </form>   </body>   </html> </cfif>

Figure 21.6 shows the results of the search for "Raymond Camden." The file with which Listing 21.8 interacts, authorSearchPort.cfm, is included on this book's CD. This document receives the form data and searches against a static XML document, storedauthors.xml, also included on the CD. What's nice is that the remote client, Listing 21.8, doesn't even need to care. All it needs to know is what to send and what to expect back.

Figure 21.6. A search for "Raymond Camden" based on data provided by the `<cfhttp>` tag produced this output.

Summarizing the `<cfhttp>` Tag

The preceding examples showed how to use the <cfhttp> tag to interact with remote Web servers. The capability to create queries using text files demonstrates the power of data sharing as well as exposes a different method of receiving data and processing it using ColdFusion. To create intelligent agents, you must build upon the server interaction capabilities of the <cfhttp> tag to pull information and use it for internal processing. With the ability to upload and download files, and interaction with CGI applications such as search engines or other ColdFusion templates, <cfhttp> provides yet more tools to use during your application design.