An Example: Building a World Wide Web Graph

   

Now that you've learned the basics of Java networking, it would be nice to do something actually useful. As you know, there are millions of Web sites available on the Internet. Each site probably refers to many others sites and you could, if you really wanted to, construct a graph of all the connected Web sites present on the Internet. Of course, doing all of them would take some time, but nonetheless is possible.

The example in Listing 23.16 is a program that takes a URL as an argument and searches for all the URL links that come off the page. This application does not recursively go down into the URL links it finds, but it would not take much to add that functionality and therefore build a huge realistic graph of Web sites.

Listing 23.16 Source Code for URLLinkExample.java
 import java.net.*; import java.io.*; import java.util.*; public class URLLinkExample {   // Default Constructor   public URLLinkExample()   {     super();   }   // Does the token have a "http:" substring within it private boolean hasMatch( String token )   {     return token.indexOf( "http:" ) != -1;   }   // Trim the string to something respectful to print out   private String trimURL( String url )   {     String tempStr = null;     int beginIndex = url.indexOf( "http" );     int endIndex = url.length();     tempStr = url.substring( beginIndex, endIndex );     endIndex = tempStr.indexOf( '"');     if ( endIndex == -1 )       endIndex = tempStr.length();     return tempStr.substring( 0, endIndex );   }   // Go through all the text returned from a Web site and search for links   public Collection searchURL( String urlString )   {     URL url = null;     URLConnection conn = null;     String nextLine = null;     StringTokenizer tokenizer = null;     Collection urlCollection = new ArrayList();     try     {       // Get a new URL object on the url string passed in       url = new URL( urlString );       // open the connection       conn = url.openConnection();       conn.setDoOutput( true );       // Complete the connection       conn.connect();       BufferedReader reader =             new BufferedReader(                     new InputStreamReader( conn.getInputStream() ));       // Go through all the text and check it for being a link to another page       while( (nextLine = reader.readLine()) != null )       {         // Create a tokenizer on each text line         tokenizer = new StringTokenizer( nextLine );         while( tokenizer.hasMoreTokens() )         {           String urlToken = tokenizer.nextToken();           // If the token is a link, add it to a collection           if ( hasMatch( urlToken) )             urlCollection.add( trimURL( urlToken ) );         }       }     }     catch( MalformedURLException ex )     {       ex.printStackTrace();     }     catch( IOException ex )     {       ex.printStackTrace();     }     return urlCollection;   }   public static void main(String[] args)   {     if( args.length != 1 )     {       System.out.println( "Usage: java URLLinkExample <url>" );       System.exit( -1 );     }     // Get the url from the command line arguments     String url = args[0];     System.out.println( "Searching web site: " + url );     URLLinkExample example = new URLLinkExample();     Collection urlCollection = example.searchURL( url );     // Print out the candidate links Iterator iter = urlCollection.iterator();     while( iter.hasNext() )     {       System.out.println( iter.next() );     }   } } 

Listing 23.17 and 23.18 shows the output when you run the application with a few sample URL test sites.

Listing 23.17 Output for the URLLinkExample Application
 C:\jdk1.3se_book\classes>java URLLinkExample http://www.netvendor.com Searching web site: http://www.netvendor.com http://www.crn.com/Components/Search/Article.asp?ArticleID=19596 http://www.crn.com/Components/Search/Article.asp?ArticleID=19596 http://www.iwvaluechain.com/Features/advatorialjuly.asp http://www.iwvaluechain.com/Features/advatorialjuly.asp http://www.ibm.com/e-business/casestudies/ http://atlanta.bbb.org/ C:\jdk1.3se_book\classes> 
Listing 23.18 Output for the URLLinkExample Application
 C:\jdk1.3se_book\classes>java URLLinkExample http://www.javasoft.com Searching web site: http://www.javasoft.com http://search.java.sun.com/query.html http://www1.ecorpstore.com/consumer/javawear/ http://developer.java.sun.com/servlet/SessionServlet?url=http://developer.java.s un.com/developer/bugParade/index.jshtml http://reseller.sun.com:8003/ http://industry.java.sun.com/javanews/more/by_industry/0,2162, http://192.18.97.137/testdev/javanews/ http://industry.java.sun.com/jug/by_country/0,2236, http://industry.java.sun.com/jug/by_state/0,2248, http://industry.java.sun.com/jug http://developer.java.sun.com/developer/earlyAccess/j2sdk13/index.html http://sun.com/software/embeddedjava/ http://www.sun.com/developers/techdays http://forum.java.sun.com http://forum1.java.sun.com http://forum.java.sun.com/forum?folderBy@1.8aUZa1ZRa4G^ http://forum.java.sun.com/forum?folderBy@1.8aUZa1ZRa4G^0@.ee75dd0!skip=574 http://forum.java.sun.com/forum?folderBy@1.8aUZa1ZRa4G^0@.ee76e9a!skip=729 http://forum.java.sun.com/forum?folderBy@14.MagBa2p1acQ^0@.ee777e1!skip=57 http://java.sun.com/products/ejb/2.0.html http://developer.java.sun.com/developer/community/ http://204.160.241.24/javanews/classes http://www.sun.com/presents/discussions/j2ee/index.html http://www.sun.com/MySun/ http://www.sun.com/products-n-solutions/ http://www.sun.com/ http://www.iplanet.com/ http://www.javaworld.com/index.html http://www.artima.com/jini/ http://www.hotdispatch.com/java.html http://www.flashline.com/ http://www.componentsource.com/java/ http://theserverside.com/ http://www.jguru.com/portal/ http://www.javareport.com/ http://www.jars.com/ http://www.gamelan.com http://www.sys-con.com/java/index2.html http://www.dynamicdiagrams.net/mapa/cgi-bin/help.tcl?db=javasoft&dest=http://jav a.sun.com/ http://www.att.com/tollfree/international/dialguide/ http://www.sun.com http://www.sun.com http://www.sun.com/share/text/termsofuse.html http://www.sun.com/privacy C:\jdk1.3se_book\classes> 
   


Special Edition Using Java 2 Standard Edition
Special Edition Using Java 2, Standard Edition (Special Edition Using...)
ISBN: 0789724685
EAN: 2147483647
Year: 1999
Pages: 353

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net