Using the URL Class


Using the URL Class

The java.net.URL class enables us to create objects that encapsulate URLs. In Java, most methods that expect to receive a URL expect to receive it as a URL object rather than as a String. In addition, a URL object allows us to access the individual building blocks of the URL. The URL object is easy to use. It takes a string argument (such as "www.google.com") and breaks it into various parts . For example, the following URL

http://www.newsbat.com:80/space/news.html#top

has these parts:

  • Protocol : http

  • Host : www.newsbat.com

  • Port : 80

  • Path /File : space/news.html

  • Reference : top

The URL class has six different constructors. In Listing 11.4, we use the first constructor, which takes a string with the URL, to create a URL object. If the URL is improperly formatted, a java.net.MalformedURLException is thrown at this point and printed to standard out. Otherwise, a wide variety of information about the URL is printed.

If you initialize a URL object with a string, the class in Listing 11.4 parses the string internally and validates each part (for example, throws a MalformedURLException exception if the port isn't an integer). If all goes well, the address is valid. Listing 11.4 shows an example of using URL class features. Similar to Listing 11.3, which demonstrated interrogating the attributes or properties of an InetAddress object, Listing 11.4 shows you how to interrogate the attributes or properties of a URL object.

Listing 11.4 Using Features of the URL Class
 import java.io.*; import java.net.*; import java.util.regex.*; /* This class interrogate the attributes or properties of a URL object. */ public class URLObjectTest {    public static void main(String[] args) throws Exception    {       String url = "http://www.newsbat.com:80/space/news.html#top";       if (args.length == 1)       {  //get URL from command line          url = args[0].toLowerCase();       }       URLObject uRLObject = new URLObject(url);       uRLObject.print();    } } class URLObject {    private String url;    private URL urlObject;    URLObject(String url)    {       this.url = url;       try       {          this.urlObject = new URL(url);       } catch(MalformedURLException e)       {          System.out.println(e);       }    }    public void setURL(String url)    {       this.url = url;    }    public String getURLAddress()    {       return this.url;    }    public URL getURL()    {       return this.urlObject;    } //print out the properties of the URL object    public void print()    {       boolean flag;       int i;       String s;       System.out.println("For : " + this.url);       try       {           s = this.urlObject.getAuthority();           System.out.println("getAuthority() = " + s);           //Object obj = this.urlObject.getContent();           //System.out.println("getContent() = " + obj);           i = this.urlObject.getDefaultPort();           System.out.println("getDefaultPort() = " + i);           s = this.urlObject.getFile();           System.out.println("getFile() = " + s);           s = this.urlObject.getHost();           System.out.println("getHost() = " + s);           s = this.urlObject.getPath();           System.out.println("getPath() = " + s);           i = this.urlObject.getPort();           System.out.println("getPort() = " + i);           s = this.urlObject.getProtocol();           System.out.println("getProtocol() = " + s);           s = this.urlObject.getQuery();           System.out.println("getQuery() = " + s);           s = this.urlObject.getRef();           System.out.println("getRef() = " + s);           s = this.urlObject.getUserInfo();           System.out.println("getUserInfo() = " + s);           i = this.urlObject.hashCode();           System.out.println("hashCode() = " + i);           s = this.urlObject.toString();           System.out.println("toString() = " + s);       } catch(Exception e)       {          System.out.println(e);       }    } } //java  URLObjectTest //returns: /* For : http://www.newsbat.com:80/space/news.html#top getAuthority() = www.newsbat.com:80 getDefaultPort() = 80 getFile() = /space/news.html getHost() = www.newsbat.com getPath() = /space/news.html getPort() = 80 getProtocol() = http getQuery() = null getRef() = top getUserInfo() = null hashCode() = 2048822873 toString() = http://www.newsbat.com:80/space/news.html#top */ 

Listing 11.5 contains such a class, URLWrapperDemonstration. As you can see, the class's set() method even uses the URL class internally to validate the URL. The primary downside of this approach is that commonly used Java methods that expect an instance of URL will not accept an instance of URLWrapperDemonstration.

Listing 11.5 An Example of Manipulating a URL Address
 import java.io.*; import java.net.*; /* This class manipulates the properties associated with a Web address by using the URL object. Particular properties or attributes, such as protocol and port, can be changed directly, as shown here. */ public class URLWrapperDemonstration {    public static void main(String[] args) throws Exception    {       String protocol = "http";       String host = "java.sun.com";       int port = 80;       String file = "j2se/1.4/docs/api/java/net/URL.html";       String ref = "#getFile()";       URLEntity urlEntity = new URLEntity();       urlEntity.setProtocol(protocol);       urlEntity.setHost(host);       urlEntity.setPort(port);       urlEntity.setFile(file);       urlEntity.setRef(ref);       urlEntity.setURL();       String s = urlEntity.toString();       System.out.println(s);       // http://java.sun.com/j2se/1.4/docs/api/       // java/net/URL.html#getFile()    } } //entity class that allows you to change properties, //such as host, port, and protocol. class URLEntity {    private String protocol;    private String host;    private int port;    private String file;    private String ref;    private URL urlEntity;    public void setProtocol(String s)    {      this.protocol = s;    }    public void setHost(String s)    {      this.host = s;    }    public void setPort(int i)    {      this.port = i;    }    public void setFile(String s)    {      this.file = s;    }    public void setRef(String s)    {      this.ref = s;    }    public void setURL()    {       try       {          this.urlEntity = new URL(this.protocol, this.host,                                 this.port, this.file + this.ref);       } catch(MalformedURLException e)       {          System.out.println(e);       }    }    public String toString()    {       return this.urlEntity.toString();    } } //returns: /* http://java.sun.com:80/j2se/1.4/docs/api/java/net/URL.html#getFile() */ 

Listing 11.5 uses URLWrapperDemonstration to demonstrate how you can build a wrapper around another class. For some reason, the Java folks didn't include set methods in the URL class to go along with the get methods, probably fearing that developers might not set all the URL parts before trying to connect (use the openConnection() method) with it. However, you can just wrap your own class around the URL object and do what you please . You can append the file portion of a URL to an existing URL object that was instantiated with a host string, like so:

 URL google = new URL("http://www.google.com/"); URL googleAbout = new URL(google, "about/"); 

The first line of code in the URLWrapperDemonstration class instantiates the URL object with a host address, and the second line adds the file path. Be careful: If you try to append too many parts like this, it becomes easy to make a mistake, in which case a MalformedURLException exception is thrown. The following snippet works:

 URL google = new URL("http://www.google.com/"); URL googleAbout = new URL(google, "about"); String s = googleAbout.toString(); System.out.println(s); //returns: //http://www.google.com/about 

As you can see, the about portion of the path is appended to the URL by using the two-parameter constructor or the URL class. Inside the URL object, the second parameter (file path) is assigned to its private file variable.

The simple program in Listing 11.6 shows you how to get an HTML page (for example, a Java application programming interface [API], a javadoc page, and so forth) from a Web server with a new twist ”this class is designed to print out the class's methods with return types.

Listing 11.6 Building a Web Page Parser
 import java.io.*; import java.net.*; /* This class gets a javadoc Web page and parses it, printing out the methods. */ public class ListClassMethods {    public static void main(String[] args) throws Exception    {       String className = "java.net.URL";       String lineHtml = "";       String html = "";       int start = 1;       int end = 1;       if (args.length == 1)       {           className = args[0];       }       //converts the dot notation into a file path       className = className.replace('.', '/');       String url = "http://java.sun.com/j2se/1.4/docs/api/";       url += className + ".html";       URL webPage = new URL(url);       BufferedReader page = new BufferedReader(              new InputStreamReader( webPage.openStream() ) );       StringBuffer pageBuffer = new StringBuffer();       while ((lineHtml = page.readLine()) != null)       {  //gets all the HTML          pageBuffer.append(lineHtml);       }       html = pageBuffer.toString();       page.close();       if (html.length() != 0)       {          //this finds the table of method names          String codeSnippet="";          start = html.indexOf("<A NAME=\"method_summary\">") + 1;          end = html.indexOf("</TABLE>", start);          html = html.substring(start, end);          end = 0;          int count = 0;          Scrub scrub = new Scrub();          while(count++<50)  //limit          {             //get return type             start = html.indexOf("<CODE>", end) ;             if (start == -1) { break; }             end = html.indexOf("</CODE>", start + 6);             if (end== -1) { break; }             codeSnippet = html.substring(start + 6, end);             scrub.setSource(codeSnippet);             scrub.remove("&nbsp;", ";");             scrub.remove("<A HREF", ">");             scrub.remove("</A>", ">");             scrub.remove("<B>", ">");             scrub.remove("</B>", ">");             codeSnippet = scrub.getSource();             System.out.println(codeSnippet);             //get method name             start = html.indexOf("<CODE><B>", end);             if (start == -1) { break; }             end = html.indexOf("</CODE>", start + 9);             if (end == -1) { break; }             codeSnippet = html.substring(start + 9, end);             scrub.setSource(codeSnippet);             scrub.remove("&nbsp;", ";");             scrub.remove("<A HREF", ">");             scrub.remove("</A>", ">");             scrub.remove("<B>", ">");             scrub.remove("</B>", ">");             codeSnippet = scrub.getSource();             System.out.println(codeSnippet);             System.out.println();          }//end while          System.out.println(--count + " methods");       }//end if    }//end main }//end class //this helper class removes HTML notation //from the text to isolate the text of interest class Scrub {    private int start = 0;    private int end = 0;    private String source = "";    void setSource(String source)    {       this.source = source;    }    void remove(String open, String close)    {       start = 0;       end = 0;       while( !(start==-1  end==-1) )       {          start = this.source.indexOf(open);          if(start > -1)          {             end = this.source.indexOf(close, start);             if(end > -1)             {                String s = this.source.substring(0,start);                end = end + close.length();                this.source = s + this.source.substring(end);             }          }       }    }    String getSource()    {       return this.source;    } } // java  ListClassMethods java.net.URL //returns: // boolean  equals(Objectobj) //String  getAuthority() //Object  getContent() //Object  getContent(Class[] classes) //int  getDefaultPort() //--code removed for space-- //String  toString() //22 methods 

Of course, you can get fancy with pattern matching ( import java.util.regex.* ) and other tweaks, such as passing a string array of items to remove rather than calling the remove method repeatedly, but the focus is the URL object, so the Scrub class handles the HTML tag removal chores.

graphics/caution_icon.gif

Be careful when placing the URL object in a loop. If the loop doesn't terminate, you could end up making a massive volume of requests to the resource at that URL, which could slow or stop the server hosting the resource. Saturating a server this way is called a denial-of-server (DOS) attack and can, if perpetrated intentionally, subject the perpetrator to legal action.




JavaT 2 Developer Exam CramT 2 (Exam CX-310-252A and CX-310-027)
JavaT 2 Developer Exam CramT 2 (Exam CX-310-252A and CX-310-027)
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 187

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net