Using the URL Classes

   

Before jumping into the classes that are provided by the Java core API for dealing with URLs and the Web, you need to understand what is being referred to as URLs.

What Are URLs?

The primary classification of URLs is the scheme, which usually corresponds to an application protocol. Schemes include HTTP, FTP, Telnet, and Gopher. The rest of the URL syntax is in a format that depends on the scheme. A colon separates these two portions of information:

 scheme-name:scheme-info 

Thus, while mailto: chuckcavaness@yahoo.com indicates "send mail to user 'chuckcavaness'at the machine yahoo.com," ftp://chuckcavaness@foobar.org means " open an FTP connection to foobar.org and log in as user chuckcavaness."

Although IP addresses uniquely identify systems on the Internet, and ports identify TCP or UDP services on a system, URLs provide a universal identification scheme at the application level. Anyone who has used a Web browser is familiar with URLs; however, their complete syntax might not be self-evident. URLs were developed to create a common format of identifying resources on the Web, but they were designed to be general enough to encompass applications that predated the Web by decades. Similarly, the URL syntax is flexible enough to accommodate future protocols.

Most URLs conform to a general format that follows this pattern:

 scheme-name://host:port/file-info#internal-reference 

Scheme-name is a URL scheme such as HTTP, FTP, or Gopher. host is the domain name or IP address of the remote system. port is the port number on which the service is listening; because most application protocols define a standard port, unless a non-standard port is being used, the port and the colon that delimits it from the host are omitted. file-info is the resource requested on the remote system, which often is a file. However, the file portion might actually execute a server program and it usually includes a path to a specific file on the system. The internal-reference is usually the identifier of a named anchor within an HTML page. A named anchor enables a link to target a particular location within an HTML page. Usually this is not used, and this token with the # character that delimits it is omitted.

Creating a URL Object

The URL class enables you to easily create a data structure containing all the necessary information to obtain the remote resource. After a URL object has been created, you can obtain the various portions of the URL according to the general format. The URL object also enables you to obtain the remote data.

The URL class has six constructors that can be used:

 URL(String spec) throws MalformedURLException; URL(String protocol, String host, int port, String file) throws MalformedURLException; URL(String protocol, String host, int port, String file,URLStreamHandler handler) throws graphics/ccc.gif MalformedURLException; URL(String protocol, String host, String file) throws MalformedURLException; URL(URL context, String spec) throws MalformedURLException; URL(URL context, String spec, URLStreamHandler handler) throws MalformedURLException; 

The first constructor is the most commonly used and enables you to create a URL object with a simple declaration like

 URL myURL = new URL("http://www.yahoo.com/"); 

The other constructors allow you to specify explicitly the various portions of the URL. The last two constructors enable you to use relative URLs. A relative URL only contains part of the URL syntax; the rest of the data is completed from the URL to which the resource is relative. This will often be seen in HTML pages, where a reference to other.html means "get other.html from the same machine and directory where the current document resides."

Here are examples of a few of the constructors mentioned previously:

 URL firstURLObject = new URL("http://www.yahoo.com/"); URL secondURLObject = new URL("http","www.yahoo.com","/"); URL thirdURLObject =   new URL("http","www.yahoo.com",80,"/"); URL fourthURLObject = new URL(firstURLObject,"text/suggest.html"); 

The first three statements create URL objects that all refer to the Yahoo! Main page, although the fourth creates a reference to "text/suggest.html" relative to Yahoo's home page (such as http://www.yahoo.com/text/suggest.html). All these constructors can throw a MalformedURLException, which you will generally want to catch. The example shown later in this section illustrates this. Note that after you create a URL object, you can't change the resource that it points to. You will need to create a new URL object.

Several new access methods have been added to the URL class to make it more consistent with the URL defined in the IETF specification, RFC2396. The new methods are

  • public String getAuthority()

  • public String getPath()

  • public String getQuery()

  • public String getUserInfo()

Creating a URL Connection

Java provides a powerful and elegant mechanism for creating network client applications, allowing you to use relatively few statements to obtain resources from the Internet. The java.net is the primary package that contains the necessary Java classes, of which the two most important are the URL and URLConnection classes.

Now that you've created a URL object, you will want to actually obtain some useful data. You can either read directly from the URL object or obtain a URLConnection instance from it.

Reading directly from the URL object requires less code, but it is much less flexible, and it only allows a read-only connection. This is limiting, as many Web services enable you to write information that will be handled by a server application. The URL class has an openStream method that returns an InputStream object through which the remote resource can be read byte-by-byte.

Handling data as individual bytes is cumbersome, so you will often want to embed the returned InputStreamReader like a BufferedReader, allowing you to read the input line-by-line . This coding strategy is often referred to as using a decorator, as the DataInputStream decorates the InputStream by providing a more specialized interface. Listing 23.15 shows how to go about this while using a URL object to obtain an HTML page from a Web site and print out the HTML page to the console.

Listing 23.15 Source Code for PrintURLPage.java
 import java.net.*; import java.io.*; public class PrintURLPage {   // Default Constructor   public PrintURLPage()   {     super();   }   // Read the HTML page   public void printHTMLPage( String urlStr )   {     URL url = null;     BufferedReader reader = null;     String data = null;     try     {       // Create the URL object       url = new URL( urlStr );       // Decorate the input stream with something easier to use       reader = new BufferedReader( new InputStreamReader(url.openStream()) );       // Keep reading lines until there are no more to read       while( (data = reader.readLine()) != null )       {         // Just write out the text to the console         System.out.println( data );       }     }     catch( MalformedURLException ex )     {       ex.printStackTrace();     }     catch( IOException ex )     {       ex.printStackTrace();     }   }   // Start the example   public static void main( String args[] )   {     if ( args.length == 0 )     {       System.out.println( "Usage: java PrintURLPage "http://<url>" );       System.exit( 0 );     }     // Get the url passed in on the command line     String url = args[0];     PrintURLPage f = new PrintURLPage();     // Get and print the URL passed in     f.printHTMLPage( url );   } } 

Listing 23.15 will print out the HTML page to any available URL that is passed in on the command line. You must prefix the URL that is passed in with the HTTP protocol before the URL or a MalformedURLException will be thrown. Test the example out by running it with these URL's like this:

 java PrintURLPage http://www.yahoo.com java PrintURLPage http://www.cnn.com 

Reading and Writing to a URL Connection

Another more flexible way of connecting to the remote resource is by using the openConnection method of the URL class. This method returns a URLConnection object that provides a number of powerful methods that you can use to customize your connection to the remote resource.

For example, unlike the URL class, a URLConnection enables you to obtain both an InputStream and an OutputStream. This has a significant impact upon the HTTP protocol, whose access methods include both GET and POST. With the GET method, an application merely requests a resource and then reads the response. The POST method is often used to provide input to server applications by requesting a resource, writing data to the server with the HTTP request body, and then reading the response. To use the POST method, you can write to an OutputStream obtained from the URLConnection prior to reading from the InputStream. If you read first, the GET method will be used and a subsequent write attempt will be invalid.

If you are using the HTTP protocol, you will actually get an instance of a subclass of the URLConnection when you call the openConnection() method. The subclass naturally is called HttpURLConnection.

Also in SDK 1.3SE, there are other enhancements such as client-side support for http keepalives . The SDK also is reported to do a better job at URL parsing by following the RFC2396 more closely.

   


Special Edition Using Java 2 Standard Edition
Special Edition Using Java 2, Standard Edition (Special Edition Using...)
ISBN: 0789724685
EAN: 2147483647
Year: 1999
Pages: 353

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net