7.5 Communicating with Server-Side Programs Through GET | Java Network Programming, Third Edition

The URL class makes it easy for Java applets and applications to communicate with server-side programs such as CGIs, servlets, PHP pages, and others that use the GET method. (Server-side programs that use the POST method require the URLConnection class and are discussed in Chapter 15.) All you need to know is what combination of names and values the program expects to receive, and cook up a URL with a query string that provides the requisite names and values. All names and values must be x-www-form-url-encodedas by the URLEncoder.encode() method, discussed earlier in this chapter.

There are a number of ways to determine the exact syntax for a query string that talks to a particular program. If you've written the server-side program yourself, you already know the name -value pairs it expects. If you've installed a third-party program on your own server, the documentation for that program should tell you what it expects.

On the other hand, if you're talking to a program on a third-party server, matters are a little trickier. You can always ask people at the remote server to provide you with the specifications for talking to their site. However, even if they don't mind doing this, there's probably no single person whose job description includes "telling third-party hackers with whom we have no business relationship exactly how to access our servers." Thus, unless you happen upon a particularly friendly or bored individual who has nothing better to do with their time except write long emails detailing exactly how to access their server, you're going to have to do a little reverse engineering.

This is beginning to change. A number of web sites have realized the value of opening up their systems to third party developers and have begin publishing developers' kits that provide detailed information on how to construct URLs to access their services. Sites like Safari and Amazon that offer RESTful, URL-based interfaces are easily accessed through the URL class. SOAP-based services like eBay's and Google's are much more difficult to work with.

Many programs are designed to process form input. If this is the case, it's straightforward to figure out what input the program expects. The method the form uses should be the value of the METHOD attribute of the FORM element. This value should be either GET , in which case you use the process described here, or POST , in which case you use the process described in Chapter 15. The part of the URL that precedes the query string is given by the value of the ACTION attribute of the FORM element. Note that this may be a relative URL, in which case you'll need to determine the corresponding absolute URL. Finally, the name-value pairs are simply the NAME attributes of the INPUT elements, except for any INPUT elements whose TYPE attribute has the value submit.

For example, consider this HTML form for the local search engine on my Cafe con Leche site. You can see that it uses the GET method. The program that processes the form is accessed via the URL http://www.google.com/search. It has four separate name-value pairs, three of which have default values:

 <form name="search" action="http://www.google.com/search" method="get">   <input name="q" />   <input type="hidden" value="cafeconleche.org" name="domains" />   <input type="hidden" name="sitesearch" value="cafeconleche.org" />   <input type="hidden" name="sitesearch2" value="cafeconleche.org" />    <br />    <input type="image" height="22" width="55"        src="images/search_blue.gif" alt="search" border="0"        name="search-image" /> </form>

The type of the INPUT field doesn't matterfor instance, it doesn't matter if it's a set of checkboxes, a pop-up list, or a text fieldonly the name of each INPUT field and the value you give it is significant. The single exception is a submit input that tells the web browser when to send the data but does not give the server any extra information. In some cases, you may find hidden INPUT fields that must have particular required default values. This form has three hidden INPUT fields.

In some cases, the program you're talking to may not be able to handle arbitrary text strings for values of particular inputs. However, since the form is meant to be read and filled in by human beings, it should provide sufficient clues to figure out what input is expected; for instance, that a particular field is supposed to be a two-letter state abbreviation or a phone number.

A program that doesn't respond to a form is much harder to reverse engineer. For example, at http://www. ibiblio .org/nywc/bios. phtml , you'll find a lot of links to PHP pages that talk to a database to retrieve a list of musical works by a particular composer. However, there's no form anywhere that corresponds to this program. It's all done by hardcoded URLs. In this case, the best you can do is look at as many of those URLs as possible and see whether you can guess what the server expects. If the designer hasn't tried to be too devious , this information isn't hard to figure out. For example, these URLs are all found on that page:

 http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Anderson       &first=Beth&middle= http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Austin      &first=Dorothea&middle= http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Bliss      &first=Marilyn&middle= http://www.ibiblio.org/nywc/compositionsbycomposer.phtml?last=Hart      &first=Jane&middle=Smith

Looking at these, you can guess that this particular program expects three inputs named first, middle, and last, with values that consist of the first, middle, and last names of a composer, respectively. Sometimes the inputs may not have such obvious names. In this case, you have to do some experimenting, first copying some existing values and then tweaking them to see what values are and aren't accepted. You don't need to do this in a Java program. You can simply edit the URL in the Address or Location bar of your web browser window.

The likelihood that other hackers may experiment with your own server-side programs in such a fashion is a good reason to make them extremely robust against unexpected input.

Regardless of how you determine the set of name-value pairs the server expects, communicating with it once you know them is simple. All you have to do is create a query string that includes the necessary name-value pairs, then form a URL that includes that query string. Send the query string to the server and read its response using the same methods you use to connect to a server and retrieve a static HTML page. There's no special protocol to follow once the URL is constructed . (There is a special protocol to follow for the POST method, however, which is why discussion of that method will have to wait until Chapter 15.)

To demonstrate this procedure, let's write a very simple command-line program to look up topics in the Netscape Open Directory (http://dmoz.org/). This site is shown in Figure 7-3 and it has the advantage of being really simple.

Figure 7-3. The basic user interface for the Open Directory

The basic Open Directory interface is a simple form with one input field named search ; input typed in this field is sent to a CGI program at http://search.dmoz.org/cgi-bin/search, which does the actual search. The HTML for the form looks like this:

 <form accept-charset="UTF-8"       action="http://search.dmoz.org/cgi-bin/search" method="GET"> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <input size=30 name=search> <input type=submit value="Search"> <a href="http://search.dmoz.org/cgi-bin/search?a.x=0"> <small><i>advanced</i></small></a> </form>

There are only two input fields in this form: the Submit button and a text field named Search. Thus, to submit a search request to the Open Directory, you just need to collect the search string, encode it in a query string, and send it to http://search.dmoz.org/cgi-bin/search. For example, to search for "java", you would open a connection to the URL http://search.dmoz.org/cgi-bin/search?search=java and read the resulting input stream. Example 7-12 does exactly this.

Example 7-12. Do an Open Directory search

 import com.macfaq.net.*; import java.net.*; import java.io.*; public class DMoz {   public static void main(String[] args) {        String target = "";          for (int i = 0; i < args.length; i++) {       target += args[i] + " ";     }     target = target.trim( );     QueryString query = new QueryString("search", target);     try {       URL u = new URL("http://search.dmoz.org/cgi-bin/search?" + query);       InputStream in = new BufferedInputStream(u.openStream( ));       InputStreamReader theHTML = new InputStreamReader(in);       int c;       while ((c = theHTML.read( )) != -1) {         System.out.print((char) c);       }      }     catch (MalformedURLException ex) {       System.err.println(ex);     }     catch (IOException ex) {       System.err.println(ex);     }        } }

Of course, a lot more effort could be expended on parsing and displaying the results. But notice how simple the code was to talk to this server. Aside from the funky-looking URL and the slightly greater likelihood that some pieces of it need to be x-www-form-url-encoded, talking to a server-side program that uses GET is no harder than retrieving any other HTML page.