Section 12.5. Networks: Getting Our Text from the Web


[Page 421 (continued)]

12.5. Networks: Getting Our Text from the Web

A network is formed whenever distinct computers communicate. Rarely does the communication take place with voltages over wires, the way that a computer encodes zeros and ones internally. It's too hard to maintain those voltages over distances. Instead, zeros and ones are encoded in some other way. For example, a modem (literally modulatordemodulator) maps zeros and ones to different audio frequencies. When we hear these different tones, it sounds like a bunch of buzzing bees to us, but to modems, it's pure binary.

Like onions and ogres, networks have layers. At the bottom level is the physical substrate. How are the signals being passed? Higher levels define how data is encoded. What makes up a zero? What makes up a one? Do we send a bit at a time? A packet of bytes at a time? A packet of bytes is like a letter in an envelope in that it contains data and a header that gives the information needed to get the data from the source to the destination.

Higher-level layers define the protocol for communication. A protocol is a set of rules that guide how an activity is performed. How does my computer tell your computer that it wants to talk, and what it wants to talk about? How do we address your computer at all? By treating these as distinct layers, we can easily swap out one part without changing the others. For example, most people with a direct connection to a network use a wired connection to an Ethernet network, but Ethernet is actually a mid-level protocol that works over wireless networks, too.

Humans have protocols, too. If Mark walks up to you, holds out his hand, and says, "Hi, my name is Mark," you will most certainly hold out your hand and say something like "Hi, my name is Gene" (assuming that your name is Geneif it wasn't, that would be pretty funny). There's an unwritten protocol for humans about how to greet one another. Computer protocols are about the same things, but they're written down to communicate the process exactly. What gets said isn't too different. One computer may send the message 'HELO' to another to start a conversation (We don't know why the protocol writers couldn't spare the extra 'L' to spell it right), and a computer may send 'BYE' to end the conversation. (We even sometimes call the start of a computer protocol the "handshake.") It's all about establishing a connection and making sure that both sides understand what's going on.


[Page 422]

The Internet is a network of networks. If you have a device in your home so that your computers can talk to one another (e.g., a router), then you have a network. With just that, you can probably copy files between computers and print. When you connect your network to the wider Internet (through an Internet Service Provider (ISP)), your network becomes part of the Internet.

The Internet is based on a set of agreements about a whole bunch of things:

  • How computers will be addressed: Currently, each computer on the Internet has a 32-bit number associated with itfour byte values, that are usually written like this separated by periods "101.132.64.15." These are called IP addresses (for Internet Protocol addresses).

    There is a system of domain names by which people can refer to specific computers without knowing their IP addresses. For example, when you access http://www.cnn.com, you are actually accessing http://64.236.24.20. (Go ahead and try it! It works.) There is a network of domain name servers that keep track of names like "www.cnn.com" and map them to addresses like "64.236.24.20." You can be connected to the Internet and still not be able to get to your favorite Web sites if your domain name server is brokenbut you might be able to get to it if you type in the IP address directly!

  • How computers will communicate: Data will be placed in packets which have a well-defined structure, including the sender's IP address, the receiver's IP address, and a number of bytes per packet.

  • How packets are routed around the Internet: The Internet was designed in the time of the Cold War. It was designed to withstand a nuclear attack. If a section of the Internet is destroyed (or damaged, or blocked as a form of censorship), the packet routing mechanism of the Internet will simply find a route around the damage.

But the topmost layers of the network define what the data being passed around means. One of the first applications placed on top of the Internet was electronic mail. Over the years, the mail protocols have evolved to standards today like POP (Post Office Protocol) and SMTP (Simple Mail Transfer Protocol). Another old and important protocol is FTP (File Transfer Protocol).

These protocols aren't super-complicated. When the communication ends, one computer will probably say 'BYE' or 'QUIT' to another. When one computer tells another computer to accept a file via FTP, it literally says "STO filename" (again, early computer developers didn't want to spare the two more bytes to say "STORE").


[Page 423]

The World Wide Web is yet another set of agreements, developed mostly by Tim Berners-Lee. The Web is based on top of the Internet, simply adding more protocols on top of the existing ones.

  • How to refer to things on the Web: Resources on the Web are referenced using URLs, Uniform Resource Locators. A URL specifies the protocol to use to address the resource, the domain name of the server that can provide the resource, and the path to the resource on that server. For example, a URL like http://www.cc.gatech.edu/index.html says "Use the HTTP protocol to talk to the computer at www.cc.gatech.edu and ask it for the resource index.html."

    Not every file on every computer attached to the Internet is accessible via a URL! There are some preconditions before a file is accessible via a URL. First, an Internet-accessible computer has to be running a piece of software that understands a protocol that Web browsers understand, typically HTTP or FTP. We call a computer that is running such a piece of software a server. A browser that accesses a server is called a client. Second, a server typically has a server directory which is accessible via that server. Only files in that directory, or subdirectories within that directory, are available.

  • How to serve documents: The most common protocol on the Web is HTTP, HyperText Transfer Protocol. It defines how resources are served on the Web. HTTP is really simpleyour browser literally says to a server things like "GET index.html" (just those letters!).

  • How those documents will be formatted: Documents on the Web are formatted using HTML, HyperText Markup Language.

You'll notice the term HyperText showing up frequently in reference to the Web. HyperText is literally non-linear text. It's a term invented by Ted Nelson to describe the kind of reading that we all do commonly on the Web but that didn't exist before computers: Read a little on one page, then click a link and read a little over there, then click Back and continue reading where you left off. The basic idea of HyperText dates back to Vannevar Bush, who was one of President Franklin Roosevelt's science advisors. He wanted to create a device for capturing flows of thought, which he called a Memex. But not until computers came along would this be possible. Tim Berners-Lee invented the Web and its protocols as a way of supporting rapid publication of research findings with connections between documents. The Web is certainly not the penultimate HyperText system. Systems like the ones that Ted Nelson worked on wouldn't allow "dead links" (links that are no longer accessible). But for all its warts, the Web works.

A browser (like Internet Explorer, Netscape Navigator, Mozilla, Opera, and so on) understands a lot about the Internet. It usually knows several protocols, such as HTTP, FTP, gopher (an early HyperText protocol), and mailto (SMTP). It knows HTML, how to format it, and how to grab resources referenced within the HTML, like JPEG pictures. For all of that, though, it's possible to access the Internet without nearly that much overhead. Mail clients (e.g., Outlook and Eudora) know some of these protocols without knowing all of them.


[Page 424]

Java, like other modern languages, provides classes to support access to the Internet without all the overhead of a browser. Basically, you can write little programs that are clients. Java's class java.net.URL allows you to open URLs and read them as if they were files. It has a method openStream which returns an object of the class java.io.InputStream which can be used to read from the URL.

We have been using FileReader to read from files. The class FileReader is a child of the more general class InputStreamReader which is a child of the class Reader (Figure 12.5). We can create an object of the class InputStreamReader using the InputStream object. The class InputStreamReader is also a child of the class Reader. Finally, we can create a BufferedReader object by passing it a InputStreamReader object. The class BufferedReader has a constructor which takes a Reader object. Since FileReader and InputStreamReader both inherit at some point from the Reader class they can both be used to create a BufferedReader object. This is because when a variable gives a class name as a type any class that inherits from the given type can be used instead. An object of a child class or even a grandchild class is an object of the inherited class so this substitution is allowed.

Figure 12.5. A depiction of the inheritance tree for some of the Reader Classes.


Using these classes, we can create another method to read the temperature directly from the Internet in the class TempFinder. The Web site has changed since when we originally saved a page from it in ajc-weather.html. Now we will look for '&ordm' and we will read from http://www.ajc.com/.

Program 112. Get the Temperature from a Live Weather Page
(This item is displayed on pages 424 - 427 in the print version)

import java.io.*; import java.net.URL; /**  * Class to find the temperature in a web page.  * @author Barb Ericson  */ 
[Page 425]
public class TempFinder { /** * Method to find the temperature in the passed * file * @param fileName the name of the file to look in */ public String getTemp(String fileName) { String seq = "<b>&deg"; String temp = null; String line = null; // try the following try { // read from the file BufferedReader reader = new BufferedReader(new FileReader(fileName)); // loop till end of file or find sequence while ((line = reader.readLine()) != null && line.indexOf(seq) < 0) {} // if there is a current line if (line != null) { // find the temperature int degreeIndex = line.indexOf(seq); int startIndex = line.lastIndexOf('>',degreeIndex); temp = line.substring(startIndex + 1, degreeIndex); } } catch (FileNotFoundException ex) { SimpleOutput.showError("Couldn't find file " + fileName); fileName = FileChooser.pickAFile(); temp = getTemp(fileName); } catch (Exception ex) { SimpleOutput.showError("Error during read or write"); ex.printStackTrace(); } return temp; } /** * Method to get the temperature from a network * @param urlStr the url as a string * @return the temperature as a string */ public String getTempFromNetwork(String urlStr) {
[Page 426]
String temp = null; String line = null; String seq = "&ordm"; try { // create a url URL url = new URL(urlStr); // open a buffered reader on the url InputStream inStr = url.openStream(); BufferedReader reader = new BufferedReader(new InputStreamReader(inStr)); // loop till end of file or find sequence while ((line = reader.readLine()) != null && line.indexOf(seq) < 0) {} // if there is a current line if (line != null) { // find the temperature int degreeIndex = line.indexOf(seq); int startIndex = line.lastIndexOf('>',degreeIndex); temp = line.substring(startIndex + 1, degreeIndex); } } catch (FileNotFoundException ex) { SimpleOutput.showError("Couldn't connect to " + urlStr); } catch (Exception ex) { SimpleOutput.showError("Error during read or write"); ex.printStackTrace(); } return temp; } public static void main(String[] args) { TempFinder finder = new TempFinder(); String file = FileChooser.getMediaPath("ajc-weather.html"); String temp = finder.getTemp(file); if (temp == null) System.out.println("Sorry, no temp was found in " + file); else System.out.println("The current temperature is " + temp); String urlString = "http://www.ajc.com/"; temp = finder.getTempFromNetwork(urlString); if (temp == null) System.out.println("Sorry, no temp was found at " + urlString);
[Page 427]
else System.out.println("The current temp " + "from the network is " + temp); } }


To run this main method, simply click on Tools, then RUN DOCUMENT'S MAIN METHOD. The output will look something like this:

> java TempFinder The current temperature is 54 The current temp from the network is 82


How it Works

This method getTempFromNetwork is nearly identical to the last one, except that we're reading the string weather from the AJC website live. We use the class URL to gain the ability to read the Web page as an input stream (a stream of bits). We use InputStreamReader to convert the bits into characters. And we use BufferedReader to buffer the characters as we read them for more efficient reading. Notice that each object has a specific role to play, and we create several objects to work together to accomplish the task.

One way to make Web pages interactive is to write programs that actually generate HTML. For example, when you type a phrase into a text area then click the SEARCH button, you are actually causing a program to execute on the server which executes your search and then generates the HTML (Web page) that you see in response. Java has increasingly been used to generate Web pages.



Introduction to Computing & Programming Algebra in Java(c) A Multimedia Approach
Introduction to Computing & Programming Algebra in Java(c) A Multimedia Approach
ISBN: N/A
EAN: N/A
Year: 2007
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net