Making URL Connections


In the last section, you saw how to use socket-level programming to connect to an SMTP server and send an e-mail message. It is nice to know that this can be done and to get a glimpse of what goes on "under the hood" of an Internet service such as e-mail. However, if you are planning an application that incorporates e-mail, you will probably want to work at a higher level and use a library that encapsulates the protocol details. For example, Sun Microsystems has developed the JavaMail API as a standard extension of the Java platform. In the JavaMail API, you simply issue a call such as

 Transport.send(message); 

to send a message. The library takes care of message protocols, multiple recipients, handling attachments, and so on.

For the remainder of this chapter, we concentrate on higher-level services that the standard edition of the Java platform provides. Of course, the runtime library uses sockets to implement these services, but you don't have to worry about the protocol details when you use the higher-level services.

URLs and URIs

The URL and URLConnection classes encapsulate much of the complexity of retrieving information from a remote site. You can construct a URL object from a string:


URL url = new URL(urlString);

If you simply want to fetch the contents of the resource, then you can use the openStream method of the URL class. This method yields an InputStream object. Use it in the usual way, for example, to construct a Scanner:

 InputStream inStream = url.openStream(); Scanner in = new Scanner(inStream); 

As of JDK 1.4, the java.net package makes a useful distinction between URLs (uniform resource locators) and URIs (uniform resource identifiers).

A URI is a purely syntactical construct that specifies the various parts of the string specifying a web resource. A URL is a special kind of URI, namely, one with sufficient information to locate a resource. Other URIs, such as

 mailto:cay@horstmann.com 

are not locatorsthere is no data to locate from this identifier. Such a URI is called a URN (uniform resource name).

In the Java library, the URI class has no methods for accessing the resource that the identifier specifiesits sole purpose is parsing. In contrast, the URL class can open a stream to the resource. For that reason, the URL class only works with schemes that the Java library knows how to handle, such as http:, https:, ftp: the local file system (file:), and JAR files (jar:).

To see why parsing is not trivial, consider how complex URIs can be. For example,

 http://maps.yahoo.com/py/maps.py?csz=Cupertino+CA ftp://username:password@ftp.yourserver.com/pub/file.txt 

The URI specification gives rules for the makeup of these identifiers. A URI has the syntax


[scheme:]schemeSpecificPart[#fragment]

Here, the [ . . . ] denotes an optional part, and the : and # are included literally in the identifier.

If the scheme: part is present, the URI is called absolute. Otherwise, it is called relative.

An absolute URI is opaque if the schemeSpecificPart does not begin with a / such as

 mailto:cay@horstmann.com 

All absolute nonopaque URIs and all relative URIs are hierarchical. Examples are

 http://java.sun.com/index.html ../../java/net/Socket.html#Socket() 

The schemeSpecificPart of a hierarchical URI has the structure


[//authority][path][?query]

where again [ . . . ] denotes optional parts.

For server-based URIs, the authority part has the form


[user-info@]host[:port]

The port must be an integer.

RFC 2396, which standardizes URIs, also supports a registry-based mechanism by which the authority has a different format, but they are not in common use.

One of the purposes of the URI class is to parse an identifier and break it up into its various components. You can retrieve them with the methods

 getScheme getSchemeSpecificPart getAuthority getUserInfo getHost getPort getPath getQuery getFragment 

The other purpose of the URI class is the handling of absolute and relative identifiers. If you have an absolute URI such as

http://docs.mycompany.com/api/java/net/ServerSocket.html

and a relative URI such as

 ../../java/net/Socket.html#Socket() 

then you can combine the two into an absolute URI.

http://docs.mycompany.com/api/java/net/Socket.html#Socket()

This process is called resolving a relative URL.

The opposite process is called relativization. For example, suppose you have a base URI

http://docs.mycompany.com/api

and a URI

http://docs.mycompany.com/api/java/lang/String.html

Then the relativized URI is

 java/lang/String.html 

The URI class supports both of these operations:

 relative = base.relativize(combined); combined = base.resolve(relative); 

Using a URLConnection to Retrieve Information

If you want additional information about a web resource, then you should use the URLConnection class, which gives you much more control than the basic URL class.

When working with a URLConnection object, you must carefully schedule your steps, as follows:

1.

Call the openConnection method of the URL class to obtain the URLConnection object:

 URLConnection connection = url.openConnection(); 

2.

Set any request properties, using the methods

 setDoInput setDoOutput setIfModifiedSince setUseCaches setAllowUserInteraction setRequestProperty setConnectTimeout setReadTimeout 

We discuss these methods later in this section and in the API notes.

3.

Connect to the remote resource by calling the connect method.

 connection.connect(); 

Besides making a socket connection to the server, this method also queries the server for header information.

4.

After connecting to the server, you can query the header information. Two methods, getHeaderFieldKey and getHeaderField, enumerate all fields of the header. As of JDK 1.4, the method getHeaderFields gets a standard Map object containing the header fields. For your convenience, the following methods query standard fields.

 getContentType getContentLength getContentEncoding getDate getExpiration getLastModified 

5.

Finally, you can access the resource data. Use the getInputStream method to obtain an input stream for reading the information. (This is the same input stream that the openStream method of the URL class returns.) The other method, getContent, isn't very useful in practice. The objects that are returned by standard content types such as text/plain and image/gif require classes in the com.sun hierarchy for processing. You could register your own content handlers, but we do not discuss that technique in this book.

CAUTION

Some programmers form the wrong mental image when using the URLConnection class, thinking that the getInputStream and getOutputStream methods are similar to those of the Socket class. But that isn't quite true. The URLConnection class does quite a bit of magic behind the scenes, in particular the handling of request and response headers. For that reason, it is important that you follow the setup steps for the connection.


Let us now look at some of the URLConnection methods in detail. Several methods set properties of the connection before connecting to the server. The most important ones are setDoInput and setDoOutput. By default, the connection yields an input stream for reading from the server but no output stream for writing. If you want an output stream (for example, for posting data to a web server), then you need to call

 connection.setDoOutput(true); 

Next, you may want to set some of the request headers. The request headers are sent together with the request command to the server. Here is an example:

 GET www.server.com/index.html HTTP/1.0 Referer: http://www.somewhere.com/links.html Proxy-Connection: Keep-Alive User-Agent: Mozilla/4.76 (Windows ME; U) [en] Host: www.server.com Accept: text/html, image/gif, image/jpeg, image/png, */* Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Cookie: orangemilano=192218887821987 

The setIfModifiedSince method tells the connection that you are only interested in data that have been modified since a certain date. The setUseCaches and setAllowUserInteraction are only used inside applets. The setUseCaches method directs the browser to first check the browser cache. The setAllowUserInteraction method allows an applet to pop up a dialog box for querying the user name and password for password-protected resources (see Figure 3-7). These settings have no effect outside of applets.

Figure 3-7. A network password dialog box


Finally, you can use the catch-all setRequestProperty method to set any name/value pair that is meaningful for the particular protocol. For the format of the HTTP request headers, see RFC 2616. Some of these parameters are not well documented and are passed around by word of mouth from one programmer to the next. For example, if you want to access a password-protected web page, you must do the following:

  1. Concatenate the user name, a colon, and the password.

     String input = username + ":" + password; 

  2. Compute the base64 encoding of the resulting string. (The base64 encoding encodes a sequence of bytes into a sequence of printable ASCII characters.)

     String encoding = base64Encode(input); 

  3. Call the setRequestProperty method with a name of "Authorization" and value "Basic " + encoding:

     connection.setRequestProperty("Authorization", "Basic " + encoding); 

TIP

You just saw how to access a password-protected web page. To access a password-protected file by FTP, you use an entirely different method. You simply construct a URL of the form

 ftp://username:password@ftp.yourserver.com/pub/file.txt 


Once you call the connect method, you can query the response header information. First, let us see how to enumerate all response header fields. The implementors of this class felt a need to express their individuality by introducing yet another iteration protocol. The call

 String key = connection.getHeaderFieldKey(n); 

gets the nth key from the response header, where n starts from 1! It returns null if n is zero or larger than the total number of header fields. There is no method to return the number of fields; you simply keep calling getHeaderFieldKey until you get null. Similarly, the call

 String value = connection.getHeaderField(n); 

returns the nth value.

Mercifully, as of JDK 1.4, the method getHeaderFields returns a Map of response header fields that you can access as explained in Chapter 2.

 Map<String,List<String>> headerFields = connection.getHeaderFields(); 

Here is a set of response header fields from a typical HTTP request.

 Date: Wed, 29 Aug 2004 00:15:48 GMT Server: Apache/1.3.31 (Unix) Last-Modified: Sun, 24 Jun 2004 20:53:38 GMT Accept-Ranges: bytes Content-Length: 4813 Connection: close Content-Type: text/html 

As a convenience, six methods query the values of the most common header types and convert them to numeric types when appropriate. Table 3-1 shows these convenience methods. The methods with return type long return the number of seconds since January 1, 1970 GMT.

Table 3-1. Convenience Methods for Response Header Values

Key Name

Method Name

Return Type

Date

getdate

long

Expires

getExpiration

long

Last-Modified

getLastModified

long

Content-Length

getContentLength

int

Content-Type

getContentType

String

Content-Encoding

getContentEncoding

String


The program in Example 3-5 lets you experiment with URL connections. Supply a URL and an optional user name and password on the command line when running the program, for example:

 java URLConnectionTest http://www.yourserver.com user password 

The program prints

  • All keys and values of the header;

  • The return values of the six convenience methods in Table 3-1;

  • The first 10 lines of the requested resource.

The program is straightforward, except for the computation of the base64 encoding. There is an undocumented class, sun.misc.Base64Encoder, that you can use instead of the one that we provide in the example program. Simply replace the call to base64Encode with

 String encoding = new sun.misc.BASE64Encoder().encode(input.getBytes()); 

However, we supplied our own class because we do not like to rely on the classes in the sun or com.sun packages.

NOTE

The javax.mail.internet.MimeUtility class in the JavaMail standard extension package also has a method for Base64 encoding. JDK 1.4 has a class java.util.prefs.Base64 for the same purpose, but it is not public, so you cannot use it in your code.


Example 3-5. URLConnectionTest.java
   1. import java.io.*;   2. import java.net.*;   3. import java.util.*;   4.   5. /**   6.    This program connects to a URL and displays the   7.    response header data and the first 10 lines of the   8.    requested data.   9.  10.    Supply the URL and an optional username and password (for  11.    HTTP basic authentication) on the command line.  12. */  13. public class URLConnectionTest  14. {  15.    public static void main(String[] args)  16.    {  17.       try  18.       {  19.          String urlName;  20.          if (args.length > 0)  21.             urlName = args[0];  22.          else  23.             urlName = "http://java.sun.com";  24.  25.          URL url = new URL(urlName);  26.          URLConnection connection = url.openConnection();  27.  28.          // set username, password if specified on command line  29.  30.          if (args.length > 2)  31.          {  32.             String username = args[1];  33.             String password = args[2];  34.             String input = username + ":" + password;  35.             String encoding = base64Encode(input);  36.             connection.setRequestProperty("Authorization", "Basic " + encoding);  37.          }  38.  39.          connection.connect();  40.  41.          // print header fields  42.  43.          Map<String, List<String>> headers = connection.getHeaderFields();  44.          for (Map.Entry<String, List<String>> entry : headers.entrySet())  45.          {  46.             String key = entry.getKey();  47.             for (String value : entry.getValue())  48.                System.out.println(key + ": " + value);  49.          }  50.  51.          // print convenience functions  52.  53.          System.out.println("----------");  54.          System.out.println("getContentType: " + connection.getContentType());  55.          System.out.println("getContentLength: " + connection.getContentLength());  56.          System.out.println("getContentEncoding: " + connection.getContentEncoding());  57.          System.out.println("getDate: " + connection.getDate());  58.          System.out.println("getExpiration: " + connection.getExpiration());  59.          System.out.println("getLastModifed: " + connection.getLastModified());  60.          System.out.println("----------");  61.  62.          Scanner in = new Scanner(connection.getInputStream());  63.  64.          // print first ten lines of contents  65.  66.          for (int n = 1; in.hasNextLine() && n <= 10; n++)  67.             System.out.println(in.nextLine());  68.          if (in.hasNextLine()) System.out.println(". . .");  69.       }  70.       catch (IOException e)  71.       {  72.          e.printStackTrace();  73.       }  74.    }  75.  76.    /**  77.       Computes the Base64 encoding of a string  78.       @param s a string  79.       @return the Base 64 encoding of s  80.    */  81.    public static String base64Encode(String s)  82.    {  83.       ByteArrayOutputStream bOut = new ByteArrayOutputStream();  84.       Base64OutputStream out = new Base64OutputStream(bOut);  85.       try  86.       {  87.          out.write(s.getBytes());  88.          out.flush();  89.       }  90.       catch (IOException e)  91.       {  92.       }  93.       return bOut.toString();  94.    }  95. }  96.  97. /**  98.    This stream filter converts a stream of bytes to their  99.    Base64 encoding. 100. 101.    Base64 encoding encodes 3 bytes into 4 characters. 102.    |11111122|22223333|33444444| 103.    Each set of 6 bits is encoded according to the 104.    toBase64 map. If the number of input bytes is not 105.    a multiple of 3, then the last group of 4 characters 106.    is padded with one or two = signs. Each output line 107.    is at most 76 characters. 108. */ 109. class Base64OutputStream extends FilterOutputStream 110. { 111.    /** 112.       Constructs the stream filter 113.       @param out the stream to filter 114.    */ 115.    public Base64OutputStream(OutputStream out) 116.    { 117.       super(out); 118.    } 119. 120.    public void write(int c) throws IOException 121.    { 122.       inbuf[i] = c; 123.       i++; 124.       if (i == 3) 125.       { 126.          super.write(toBase64[(inbuf[0] & 0xFC) >> 2]); 127.          super.write(toBase64[((inbuf[0] & 0x03) << 4) | ((inbuf[1] & 0xF0) >> 4)]); 128.          super.write(toBase64[((inbuf[1] & 0x0F) << 2) | ((inbuf[2] & 0xC0) >> 6)]); 129.          super.write(toBase64[inbuf[2] & 0x3F]); 130.          col += 4; 131.          i = 0; 132.          if (col >= 76) 133.          { 134.             super.write('\n'); 135.             col = 0; 136.          } 137.       } 138.    } 139. 140.    public void flush() throws IOException 141.    { 142.       if (i == 1) 143.       { 144.          super.write(toBase64[(inbuf[0] & 0xFC) >> 2]); 145.          super.write(toBase64[(inbuf[0] & 0x03) << 4]); 146.          super.write('='); 147.          super.write('='); 148.       } 149.       else if (i == 2) 150.       { 151.          super.write(toBase64[(inbuf[0] & 0xFC) >> 2]); 152.          super.write(toBase64[((inbuf[0] & 0x03) << 4) | ((inbuf[1] & 0xF0) >> 4)]); 153.          super.write(toBase64[(inbuf[1] & 0x0F) << 2]); 154.          super.write('='); 155.       } 156.    } 157. 158.    private static char[] toBase64 = 159.    { 160.       'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 161.       'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 162.       'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 163.       'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/' 164.    }; 165. 166.    private int col = 0; 167.    private int i = 0; 168.    private int[] inbuf = new int[3]; 169. } 

NOTE

A commonly asked question is whether the Java platform supports access of secure web pages (https: URLs). As of JDK 1.4, SSL support is a part of the standard library. Before JDK 1.4, you were only able to make SSL connections from applets by taking advantage of the SSL implementation of the browser.



 java.net.URL 1.0 

  • InputStream openStream()

    opens an input stream for reading the resource data.

  • URLConnection openConnection();

    returns a URLConnection object that manages the connection to the resource.


 java.net.URLConnection 1.0 

  • void setDoInput(boolean doInput)

  • boolean getDoInput()

    If doInput is true, then the user can receive input from this URLConnection.

  • void setDoOutput(boolean doOutput)

  • boolean getDoOutput()

    If doOutput is true, then the user can send output to this URLConnection.

  • void setIfModifiedSince(long time)

  • long getIfModifiedSince()

    The ifModifiedSince property configures this URLConnection to fetch only data that have been modified since a given time. The time is given in seconds from midnight, GMT, January 1, 1970.

  • void setUseCaches(boolean useCaches)

  • boolean getUseCaches()

    If useCaches is TRue, then data can be retrieved from a local cache. Note that the URLConnection itself does not maintain such a cache. The cache must be supplied by an external program such as a browser.

  • void setAllowUserInteraction(boolean allowUserInteraction)

  • boolean getAllowsUserInteraction()

    If allowUserInteraction is TRue, then the user can be queried for passwords. Note that the URLConnection itself has no facilities for executing such a query. The query must be carried out by an external program such as a browser or browser plug-in.

  • void setConnectTimeout(int timeout) 5.0

  • int getConnectTimeout() 5.0

    set or get the timeout for the connection (in milliseconds). If the timeout has elapsed before a connection was established, the read method of the associated input stream throws a SocketTimeoutException.

  • void setReadTimeout(int timeout) 5.0

  • int getReadTimeout() 5.0

    set the timeout for reading data (in milliseconds). If the timeout has elapsed before a read operation was successful, the connect method throws a SocketTimeoutException.

  • void setRequestProperty(String key, String value)

    sets a request header field.

  • Map<String,List<String>> getRequestProperties() 1.4

    returns a map of request properties. All values for the same key are placed in a list.

  • void connect()

    connects to the remote resource and retrieves response header information.

  • Map<String,List<String>> Map getHeaderFields() 1.4

    returns a map of response headers. All values for the same key are placed in a map.

  • String getHeaderFieldKey(int n)

    gets the key for the nth response header field, or null if n is 0 or larger than the number of response header fields.

  • gets value of the nth response header field, or null if n is 0 or larger than the number of response header fields.

  • gets the content length if available, or -1 if unknown.

  • String getContentType

    gets the content type, such as text/plain or image/gif.

  • String getContentEncoding()

    gets the content encoding, such as gzip. This value is not commonly used, because the default identity encoding is not supposed to be specified with a Content-Encoding header.

  • long getDate()

  • long getExpiration()

  • long getLastModifed()

    get the date of creation, expiration, and last modification of the resource. The dates are specified as seconds from midnight, GMT, January 1, 1970.

  • InputStream getInputStream()

  • OutputStream getOutputStream()

    return a stream for reading from the resource or writing to the resource.

  • Object getContent()

    selects the appropriate content handler to read the resource data and convert it into an object. This method is not useful for reading standard types such as text/plain or image/gif unless you install your own content handler.

Posting Form Data

In the preceding section, you saw how to read data from a web server. Now we will show you how your programs can send data back to a web server and to programs that the web server invokes.

To send information from a web browser to the web server, a user fills out a form, like the one in Figure 3-8.

Figure 3-8. An HTML form


When the user clicks the Submit button, the text in the text fields and the settings of the checkboxes and radio buttons are sent back to the web server. The web server invokes a program that processes the user input.

Many technologies enable web servers to invoke programs. Among the best known ones are Java servlets, JavaServer Faces, Microsoft ASP (Active Server Pages), and CGI (Common Gateway Interface) scripts. For simplicity, we use the generic term script for a server-side program, no matter what technology is used.

The server-side script processes the form data and produces another HTML page that the web server sends back to the browser. This sequence is illustrated in Figure 3-9. The response page can contain new information (for example, in an information-search program) or just an acknowledgment. The web browser then displays the response page.

Figure 3-9. Data flow during execution of a server-side script


We do not discuss the implementation of server-side scripts in this book. Our interest is merely in writing client programs that interact with existing server-side scripts.

When form data is sent to a web server, it does not matter whether the data are interpreted by a servlet, a CGI script, or some other server-side technology. The client sends the data to the web server in a standard format, and the web server takes care of passing it on to the program that generates the response.

Two commands, called GET and POST, are commonly used to send information to a web server.

In the GET command, you simply attach parameters to the end of the URL. The URL has the form


http://host/script?parameters

For example, at the time of this writing, the Yahoo! web site has a script, py/maps.py, at the host maps.yahoo.com. The script requires two parameters, addr and csz. You separate the parameters by an & and encode the parameters, using the following scheme.

Replace all spaces with a +. Replace all nonalphanumeric characters by a %, followed by a two-digit hexadecimal number. For example, to transmit the street name S. Main, you use S%2e+Main, since the hexadecimal number 2e (or decimal 46) is the ASCII code of the "." character. This encoding keeps any intermediate programs from messing with spaces and interpreting other special characters. This encoding scheme is called URL encoding.

For example, to get a map of 1 Infinite Loop, Cupertino, CA, simply request the following URL:

http://maps.yahoo.com/py/maps.py?addr=1+Infinite+Loop&csz=Cupertino+CA

The GET command is simple, but it has a major limitation that makes it relatively unpopular: Most browsers have a limit on the number of characters that you can include in a GET request.

In the POST command, you do not attach parameters to a URL. Instead, you get an output stream from the URLConnection and write name/value pairs to the output stream. You still have to URL-encode the values and separate them with & characters.

Let us look at this process in more detail. To post data to a script, you first establish a URLConnection.


URL url = new URL("http://host/script");
URLConnection connection = url.openConnection();

Then, you call the setDoOutput method to set up the connection for output.

 connection.setDoOutput(true); 

Next, you call getOutputStream to get a stream through which you can send data to the server. If you are sending text to the server, it is convenient to wrap that stream into a PrintWriter.

 PrintWriter out = new PrintWriter(connection.getOutputStream()); 

Now you are ready to send data to the server:

 out.print(name1 + "=" + URLEncoder.encode(value1, "UTF-8") + "&"); out.print(name2 + "=" + URLEncoder.encode(value2, "UTF-8")); 

Close the output stream.

 out.close(); 

Finally, call getInputStream and read the server response.

Let us run through a practical example. The web site at http://www.census.gov/ipc/www/idbprint.html contains a form to request population data (see Figure 3-8 on page 167). If you look at the HTML source, you will see the following HTML tag:

 <form method=post action="/cgi-bin/ipc/idbsprd"> 

This tag means that the name of the script executed when the user clicks the Submit button is /cgi-bin/ipc/idbsprd and that you need to use the POST command to send data to the script.

Next, you need to find out the field names that the script expects. Look at the user interface components. Each of them has a name attribute, for example,


<select name="tbl" size=8>
<option value="001">001 Total Midyear Population</option>
more options . . .
</select>

This tells you that the name of the field is tbl. This field specifies the population table type. If you specify the table type 001, you will get a table of the total midyear population. If you look further, you will also find a country field name cty with values such as US for the United States and CH (!) for China. (Sadly, the Census Bureau seems to be unaware of the ISO-3166 standard for country codes.)

Finally, a field named optyr allows selection of the year range. For this example, we will just set it to latest checked. For example, to get the latest data for the total midyear population of China, you construct this string:

 tbl=1&cty=CH&optyr=latest+checked 

Send the string to the URL

http://www.census.gov/cgi-bin/ipc/idbsprd:

The script sends back the following reply:

 <PRE> U.S. Bureau of the Census, International Data Base Table 001. Total Midyear Population ---------------- -------------------- Country or area/ Year                       Population ---------------- -------------------- China 2004                    1,298,847,624 ---------------- -------------------- Source: U.S. Bureau of the Census, International         Data Base. </PRE> 

As you can see, this particular script doesn't bother with constructing a pretty table. That is the reason we picked it as an exampleit is easy to see what happens with this script, whereas it can be confusing to decipher a complex set of HTML tags that other scripts produce.

The program in Example 3-6 sends POST data to the census bureau. We provide a simple GUI to pick a country and view the report (see Figure 3-10).

Figure 3-10. Harvesting information from a server


In the doPost method, we first open the connection, call setDoOutput(true), and open the output stream. We then enumerate the names and values in a Map object. For each of them, we send the name, = character, value, and & separator character:


out.print(name);
out.print('=');
out.print(URLEncoder.encode(value, "UTF-8"));
if (more pairsout.print('&');

Finally, we read the response from the server.

There is one twist with reading the response. If a script error occurs, then the call to connection.getInputStream() throws a FileNotFoundException. However, the server still sends an error page back to the browser (such as the ubiquitous "Error 404 -page not found"). To capture this error page, you cast the URLConnection object to the HttpURLConnection class and call its getErrorStream method:

 InputStream err = ((HttpURLConnection) connection).getErrorStream(); 

The technique that this program displays is useful whenever you need to query information from an existing web site. Simply find out the parameters that you need to send (usually by inspecting the HTML source of a web page that carries out the same query), and then strip out the HTML tags and other unnecessary information from the reply.

Example 3-6. PostTest.java

[View full width]

   1. import java.io.*;   2. import java.net.*;   3. import java.util.*;   4. import java.awt.*;   5. import java.awt.event.*;   6. import javax.swing.*;   7.   8. /**   9.    This program demonstrates how to use the URLConnection class for a POST request.  10. */  11. public class PostTest  12. {  13.    public static void main(String[] args)  14.    {  15.       JFrame frame = new PostTestFrame();  16.       frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);  17.       frame.setVisible(true);  18.    }  19. }  20.  21. class PostTestFrame extends JFrame  22. {  23.    /**  24.       Makes a POST request and returns the server response.  25.       @param urlString the URL to post to  26.       @param nameValuePairs a map of name/value pairs to supply in the request.  27.       @return the server reply (either from the input stream or the error stream)  28.    */  29.    public static String doPost(String urlString, Map<String, String> nameValuePairs)  30.       throws IOException  31.    {  32.       URL url = new URL(urlString);  33.       URLConnection connection = url.openConnection();  34.       connection.setDoOutput(true);  35.  36.       PrintWriter out = new PrintWriter(connection.getOutputStream());  37.  38.       boolean first = true;  39.       for (Map.Entry<String, String> pair : nameValuePairs.entrySet())  40.       {  41.          if (first) first = false;  42.          else out.print('&');  43.          String name = pair.getKey();  44.          String value = pair.getValue();  45.          out.print(name);  46.          out.print('=');  47.          out.print(URLEncoder.encode(value, "UTF-8"));  48.       }  49.  50.       out.close();  51.  52.       Scanner in;  53.       StringBuilder response = new StringBuilder();  54.       try  55.       {  56.          in = new Scanner(connection.getInputStream());  57.       }  58.       catch (IOException e)  59.       {  60.          if (!(connection instanceof HttpURLConnection)) throw e;  61.          InputStream err  62.             = ((HttpURLConnection)connection).getErrorStream();  63.          if (err == null) throw e;  64.          in = new Scanner(err);  65.       }  66.       while (in.hasNextLine())  67.       {  68.          response.append(in.nextLine());  69.          response.append("\n");  70.       }  71.  72.       in.close();  73.       return response.toString();  74.    }  75.  76.    public PostTestFrame()  77.    {  78.       setSize(DEFAULT_WIDTH, DEFAULT_HEIGHT);  79.       setTitle("PostTest");  80.  81.       JPanel northPanel = new JPanel();  82.       add(northPanel, BorderLayout.NORTH);  83.  84.       final JComboBox combo = new JComboBox();  85.       for (int i = 0; i < countries.length; i += 2)  86.       combo.addItem(countries[i]);  87.       northPanel.add(combo);  88.  89.       final JTextArea result = new JTextArea();  90.       add(new JScrollPane(result));  91.  92.       JButton getButton = new JButton("Get");  93.       northPanel.add(getButton);  94.       getButton.addActionListener(new  95.          ActionListener()  96.          {  97.             public void actionPerformed(ActionEvent event)  98.             {  99.                new Thread(new 100.                   Runnable() 101.                   { 102.                      public void run() 103.                      { 104.                         final String SERVER_URL = "http://www.census.gov/cgi-bin/ipc /idbsprd"; 105.                         result.setText(""); 106.                         Map<String, String> post = new HashMap<String, String>(); 107.                         post.put("tbl", "001"); 108.                         post.put("cty", countries[2 * combo.getSelectedIndex() + 1]); 109.                         post.put("optyr", "latest checked"); 110.                         try 111.                         { 112.                            result.setText(doPost(SERVER_URL, post)); 113.                         } 114.                         catch (IOException e) 115.                         { 116.                            result.setText("" + e); 117.                         } 118.                      } 119.                   }).start(); 120.             } 121.          }); 122.    } 123. 124.    private static String[] countries = { 125.       "Afghanistan", "AF", "Albania", "AL", "Algeria", "AG", "American Samoa", "AQ", 126.       "Andorra", "AN", "Angola", "AO", "Anguilla", "AV", "Antigua and Barbuda", "AC", 127.       "Argentina", "AR", "Armenia", "AM", "Aruba", "AA", "Australia", "AS", "Austria" , "AU", 128.       "Azerbaijan", "AJ", "Bahamas, The", "BF", "Bahrain", "BA", "Bangladesh", "BG", 129.       "Barbados", "BB", "Belarus", "BO", "Belgium", "BE", "Belize", "BH", "Benin", "BN", 130.       "Bermuda", "BD", "Bhutan", "BT", "Bolivia", "BL", "Bosnia and Herzegovina", "BK", 131.       "Botswana", "BC", "Brazil", "BR", "Brunei", "BX", "Bulgaria", "BU", "Burkina  Faso", "UV", 132.       "Burma", "BM", "Burundi", "BY", "Cambodia", "CB", "Cameroon", "CM", "Canada", "CA", 133.       "Cape Verde", "CV", "Cayman Islands", "CJ", "Central African Republic", "CT",  "Chad", "CD", 134.       "Chile", "CI", "China", "CH", "Colombia", "CO", "Comoros", "CN", "Congo  (Brazzaville", "CF", 135.       "Congo (Kinshasa)", "CG", "Cook Islands", "CW", "Costa Rica", "CS", "Cote  d'Ivoire", "IV", 136.       "Croatia", "HR", "Cuba", "CU", "Cyprus", "CY", "Czech Republic", "EZ",  "Denmark", "DA", 137.       "Djibouti", "DJ", "Dominica", "DO", "Dominican Republic", "DR", "East Timor", "TT", 138.       "Ecuador", "EC", "Egypt", "EG", "El Salvador", "ES", "Equatorial Guinea", "EK", 139.       "Eritrea", "ER", "Estonia", "EN", "Ethiopia", "ET", "Faroe Islands", "FO",  "Fiji", "FJ", 140.       "Finland", "FI", "France", "FR", "French Guiana", "FG", "French Polynesia", "FP", 141.       "Gabon", "GB", "Gambia, The", "GA", "Gaza Strip", "GZ", "Georgia", "GG",  "Germany", "GM", 142.       "Ghana", "GH", "Gibraltar", "GI", "Greece", "GR", "Greenland", "GL", "Grenada",  "GJ", 143.       "Guadeloupe", "GP", "Guam", "GQ", "Guatemala", "GT", "Guernsey", "GK", "Guinea" , "GV", 144.       "Guinea-Bissau", "PU", "Guyana", "GY", "Haiti", "HA", "Honduras", "HO", 145.       "Hong Kong S.A.R", "HK", "Hungary", "HU", "Iceland", "IC", "India", "IN",  "Indonesia", "ID", 146.       "Iran", "IR", "Iraq", "IZ", "Ireland", "EI", "Israel", "IS", "Italy", "IT",  "Jamaica", "JM", 147.       "Japan", "JA", "Jersey", "JE", "Jordan", "JO", "Kazakhstan", "KZ", "Kenya", "KE", 148.       "Kiribati", "KR", "Korea, North", "KN", "Korea, South", "KS", "Kuwait", "KU", 149.       "Kyrgyzstan", "KG", "Laos", "LA", "Latvia", "LG", "Lebanon", "LE", "Lesotho", "LT", 150.       "Liberia", "LI", "Libya", "LY", "Liechtenstein", "LS", "Lithuania", "LH",  "Luxembourg", "LU", 151.       "Macau S.A.R", "MC", "Macedonia, The Former Yugo. Rep. of", "MK", "Madagascar",  "MA", 152.       "Malawi", "MI", "Malaysia", "MY", "Maldives", "MV", "Mali", "ML", "Malta", "MT", 153.       "Man, Isle of", "IM", "Marshall Islands", "RM", "Martinique", "MB",  "Mauritania", "MR", 154.       "Mauritius", "MP", "Mayotte", "MF", "Mexico", "MX", "Micronesia, Federated  States of", "FM", 155.       "Moldova", "MD", "Monaco", "MN", "Mongolia", "MG", "Montserrat", "MH",  "Morocco", "MO", 156.       "Mozambique", "MZ", "Namibia", "WA", "Nauru", "NR", "Nepal", "NP",  "Netherlands", "NL", 157.       "Netherlands Antilles", "NT", "New Caledonia", "NC", "New Zealand", "NZ",  "Nicaragua", "NU", 158.       "Niger", "NG", "Nigeria", "NI", "Northern Mariana Islands", "CQ", "Norway", "NO", 159.       "Oman", "MU", "Pakistan", "PK", "Palau", "PS", "Panama", "PM", "Papua New  Guinea", "PP", 160.       "Paraguay", "PA", "Peru", "PE", "Philippines", "RP", "Poland", "PL", "Portugal" , "PO", 161.       "Puerto Rico", "RQ", "Qatar", "QA", "Reunion", "RE", "Romania", "RO", "Russia",  "RS", 162.       "Rwanda", "RW", "Saint Helena", "SH", "Saint Kitts and Nevis", "SC", "Saint  Lucia", "ST", 163.       "Saint Pierre and Miquelon", "SB", "Saint Vincent and the Grenadines", "VC",  "Samoa", "WS", 164.       "San Marino", "SM", "Sao Tome and Principe", "TP", "Saudi Arabia", "SA",  "Senegal", "SG", 165.       "Serbia and Montenegro", "YI", "Seychelles", "SE", "Sierra Leone", "SL",  "Singapore", "SN", 166.       "Slovakia", "LO", "Slovenia", "SI", "Solomon Islands", "BP", "Somalia", "SO", 167.       "South Africa", "SF", "Spain", "SP", "Sri Lanka", "CE", "Sudan", "SU",  "Suriname", "NS", 168.       "Swaziland", "WZ", "Sweden", "SW", "Switzerland", "SZ", "Syria", "SY", "Taiwan" , "TW", 169.       "Tajikistan", "TI", "Tanzania", "TZ", "Thailand", "TH", "Togo", "TO", "Tonga",  "TN", 170.       "Trinidad and Tobago", "TD", "Tunisia", "TS", "Turkey", "TU", "Turkmenistan", "TX", 171.       "Turks and Caicos Islands", "TK", "Tuvalu", "TV", "Uganda", "UG", "Ukraine", "UP", 172.       "United Arab Emirates", "TC", "United Kingdom", "UK", "United States", "US",  "Uruguay", "UY", 173.       "Uzbekistan", "UZ", "Vanuatu", "NH", "Venezuela", "VE", "Vietnam", "VM", 174.       "Virgin Islands", "VQ", "Virgin Islands, British", "VI", "Wallis and Futuna", "WF", 175.       "West Bank", "WE", "Western Sahara", "WI", "Yemen", "YM", "Zambia", "ZA",  "Zimbabwe", "ZI" 176.    }; 177. 178.    public static final int DEFAULT_WIDTH = 400; 179.    public static final int DEFAULT_HEIGHT = 300; 180. } 

Our example program uses the URLConnection class to post data to a web site. More for curiosity's sake than for practical use, you may like to know exactly what information the URLConnection sends to the server in addition to the data that you supply.

The URLConnection object first sends a request header to the server. When you post form data, the header must include

 Content-Type: application/x-www-form-urlencoded 

You must also specify the content length, for example,

 Content-Length: 124 

The end of the header is indicated by a blank line. Then, the data portion follows. The web server strips off the header and routes the data portion to the server-side script.

Note that the URLConnection object buffers all data that you send to the output stream since it must first determine the total content length.


 java.net.HttpURLConnection 1.0 

  • InputStream getErrorStream()

    returns a stream from which you can read web server error messages.


 java.net.URLEncoder 1.0 

  • static String encode(String s, String encoding) 1.4

    returns the URL-encoded form of the string s, using the given character encoding scheme. (The recommended scheme is "UTF-8".) In URL encoding, the characters 'A' - 'Z', 'a'- 'z', '0'- '9', '-', '_', '.' and '*' are left unchanged. Space is encoded into '+', and all other characters are encoded into sequences of encoded bytes of the form "%XY", where 0xXY is the hexadecimal value of the byte.


 java.net.URLDecoder 1.2 

  • static string decode(String s, String encoding) 1.4

    returns the decoding of the URL encoded string s under the given character encoding scheme.



    Core JavaT 2 Volume II - Advanced Features
    Building an On Demand Computing Environment with IBM: How to Optimize Your Current Infrastructure for Today and Tomorrow (MaxFacts Guidebook series)
    ISBN: 193164411X
    EAN: 2147483647
    Year: 2003
    Pages: 156
    Authors: Jim Hoskins

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net