Now that you've learned the basics of Java networking, it would be nice to do something actually useful. As you know, there are millions of Web sites available on the Internet. Each site probably refers to many others sites and you could, if you really wanted to, construct a graph of all the connected Web sites present on the Internet. Of course, doing all of them would take some time, but nonetheless is possible. The example in Listing 23.16 is a program that takes a URL as an argument and searches for all the URL links that come off the page. This application does not recursively go down into the URL links it finds, but it would not take much to add that functionality and therefore build a huge realistic graph of Web sites. Listing 23.16 Source Code for URLLinkExample.javaimport java.net.*; import java.io.*; import java.util.*; public class URLLinkExample { // Default Constructor public URLLinkExample() { super(); } // Does the token have a "http:" substring within it private boolean hasMatch( String token ) { return token.indexOf( "http:" ) != -1; } // Trim the string to something respectful to print out private String trimURL( String url ) { String tempStr = null; int beginIndex = url.indexOf( "http" ); int endIndex = url.length(); tempStr = url.substring( beginIndex, endIndex ); endIndex = tempStr.indexOf( '"'); if ( endIndex == -1 ) endIndex = tempStr.length(); return tempStr.substring( 0, endIndex ); } // Go through all the text returned from a Web site and search for links public Collection searchURL( String urlString ) { URL url = null; URLConnection conn = null; String nextLine = null; StringTokenizer tokenizer = null; Collection urlCollection = new ArrayList(); try { // Get a new URL object on the url string passed in url = new URL( urlString ); // open the connection conn = url.openConnection(); conn.setDoOutput( true ); // Complete the connection conn.connect(); BufferedReader reader = new BufferedReader( new InputStreamReader( conn.getInputStream() )); // Go through all the text and check it for being a link to another page while( (nextLine = reader.readLine()) != null ) { // Create a tokenizer on each text line tokenizer = new StringTokenizer( nextLine ); while( tokenizer.hasMoreTokens() ) { String urlToken = tokenizer.nextToken(); // If the token is a link, add it to a collection if ( hasMatch( urlToken) ) urlCollection.add( trimURL( urlToken ) ); } } } catch( MalformedURLException ex ) { ex.printStackTrace(); } catch( IOException ex ) { ex.printStackTrace(); } return urlCollection; } public static void main(String[] args) { if( args.length != 1 ) { System.out.println( "Usage: java URLLinkExample <url>" ); System.exit( -1 ); } // Get the url from the command line arguments String url = args[0]; System.out.println( "Searching web site: " + url ); URLLinkExample example = new URLLinkExample(); Collection urlCollection = example.searchURL( url ); // Print out the candidate links Iterator iter = urlCollection.iterator(); while( iter.hasNext() ) { System.out.println( iter.next() ); } } } Listing 23.17 and 23.18 shows the output when you run the application with a few sample URL test sites. Listing 23.17 Output for the URLLinkExample ApplicationC:\jdk1.3se_book\classes>java URLLinkExample http://www.netvendor.com Searching web site: http://www.netvendor.com http://www.crn.com/Components/Search/Article.asp?ArticleID=19596 http://www.crn.com/Components/Search/Article.asp?ArticleID=19596 http://www.iwvaluechain.com/Features/advatorialjuly.asp http://www.iwvaluechain.com/Features/advatorialjuly.asp http://www.ibm.com/e-business/casestudies/ http://atlanta.bbb.org/ C:\jdk1.3se_book\classes> Listing 23.18 Output for the URLLinkExample ApplicationC:\jdk1.3se_book\classes>java URLLinkExample http://www.javasoft.com Searching web site: http://www.javasoft.com http://search.java.sun.com/query.html http://www1.ecorpstore.com/consumer/javawear/ http://developer.java.sun.com/servlet/SessionServlet?url=http://developer.java.s un.com/developer/bugParade/index.jshtml http://reseller.sun.com:8003/ http://industry.java.sun.com/javanews/more/by_industry/0,2162, http://192.18.97.137/testdev/javanews/ http://industry.java.sun.com/jug/by_country/0,2236, http://industry.java.sun.com/jug/by_state/0,2248, http://industry.java.sun.com/jug http://developer.java.sun.com/developer/earlyAccess/j2sdk13/index.html http://sun.com/software/embeddedjava/ http://www.sun.com/developers/techdays http://forum.java.sun.com http://forum1.java.sun.com http://forum.java.sun.com/forum?folderBy@1.8aUZa1ZRa4G^ http://forum.java.sun.com/forum?folderBy@1.8aUZa1ZRa4G^0@.ee75dd0!skip=574 http://forum.java.sun.com/forum?folderBy@1.8aUZa1ZRa4G^0@.ee76e9a!skip=729 http://forum.java.sun.com/forum?folderBy@14.MagBa2p1acQ^0@.ee777e1!skip=57 http://java.sun.com/products/ejb/2.0.html http://developer.java.sun.com/developer/community/ http://204.160.241.24/javanews/classes http://www.sun.com/presents/discussions/j2ee/index.html http://www.sun.com/MySun/ http://www.sun.com/products-n-solutions/ http://www.sun.com/ http://www.iplanet.com/ http://www.javaworld.com/index.html http://www.artima.com/jini/ http://www.hotdispatch.com/java.html http://www.flashline.com/ http://www.componentsource.com/java/ http://theserverside.com/ http://www.jguru.com/portal/ http://www.javareport.com/ http://www.jars.com/ http://www.gamelan.com http://www.sys-con.com/java/index2.html http://www.dynamicdiagrams.net/mapa/cgi-bin/help.tcl?db=javasoft&dest=http://jav a.sun.com/ http://www.att.com/tollfree/international/dialguide/ http://www.sun.com http://www.sun.com http://www.sun.com/share/text/termsofuse.html http://www.sun.com/privacy C:\jdk1.3se_book\classes> |