12.5. Networks: Getting Our Text from the Web
A network is formed whenever distinct computers communicate. Rarely does the communication take place with voltages over wires, the way that a computer encodes zeros and ones internally. It's too hard to maintain those voltages over distances. Instead, zeros and ones are encoded in some other way. For example, a modem (literally modulatordemodulator) maps zeros and ones to different audio frequencies. When we hear these different tones, it sounds like a bunch of buzzing bees to us, but to modems, it's pure binary.
Like onions and ogres, networks have layers. At the bottom level is the physical substrate. How are the signals being passed? Higher levels define how data is encoded. What makes up a zero? What makes up a one? Do we send a bit at a time? A packet of bytes at a time? A packet of bytes is like a letter in an envelope in that it contains data and a header that gives the information needed to get the data from the source to the destination.
Higher-level layers define the protocol for communication. A protocol is a set of rules that guide how an activity is performed. How does my computer tell your computer that it wants to talk, and what it wants to talk about? How do we address your computer at all? By treating these as distinct layers, we can easily swap out one part without changing the others. For example, most people with a direct connection to a network use a wired connection to an Ethernet network, but Ethernet is actually a mid-level protocol that works over wireless networks, too.
Humans have protocols, too. If Mark walks up to you, holds out his hand, and says, "Hi, my name is Mark," you will most certainly hold out your hand and say something like "Hi, my name is Gene" (assuming that your name is Geneif it wasn't, that would be pretty funny). There's an unwritten protocol for humans about how to greet one another. Computer protocols are about the same things, but they're written down to communicate the process exactly. What gets said isn't too different. One computer may send the message 'HELO' to another to start a conversation (We don't know why the protocol writers couldn't spare the extra 'L' to spell it right), and a computer may send 'BYE' to end the conversation. (We even sometimes call the start of a computer protocol the "handshake.") It's all about establishing a connection and making sure that both sides understand what's going on.
The Internet is a network of networks. If you have a device in your home so that your computers can talk to one another (e.g., a router), then you have a network. With just that, you can probably copy files between computers and print. When you connect your network to the wider Internet (through an Internet Service Provider (ISP)), your network becomes part of the Internet.
The Internet is based on a set of agreements about a whole bunch of things:
But the topmost layers of the network define what the data being passed around means. One of the first applications placed on top of the Internet was electronic mail. Over the years, the mail protocols have evolved to standards today like POP (Post Office Protocol) and SMTP (Simple Mail Transfer Protocol). Another old and important protocol is FTP (File Transfer Protocol).
These protocols aren't super-complicated. When the communication ends, one computer will probably say 'BYE' or 'QUIT' to another. When one computer tells another computer to accept a file via FTP, it literally says "STO filename" (again, early computer developers didn't want to spare the two more bytes to say "STORE").
The World Wide Web is yet another set of agreements, developed mostly by Tim Berners-Lee. The Web is based on top of the Internet, simply adding more protocols on top of the existing ones.
You'll notice the term HyperText showing up frequently in reference to the Web. HyperText is literally non-linear text. It's a term invented by Ted Nelson to describe the kind of reading that we all do commonly on the Web but that didn't exist before computers: Read a little on one page, then click a link and read a little over there, then click Back and continue reading where you left off. The basic idea of HyperText dates back to Vannevar Bush, who was one of President Franklin Roosevelt's science advisors. He wanted to create a device for capturing flows of thought, which he called a Memex. But not until computers came along would this be possible. Tim Berners-Lee invented the Web and its protocols as a way of supporting rapid publication of research findings with connections between documents. The Web is certainly not the penultimate HyperText system. Systems like the ones that Ted Nelson worked on wouldn't allow "dead links" (links that are no longer accessible). But for all its warts, the Web works.
A browser (like Internet Explorer, Netscape Navigator, Mozilla, Opera, and so on) understands a lot about the Internet. It usually knows several protocols, such as HTTP, FTP, gopher (an early HyperText protocol), and mailto (SMTP). It knows HTML, how to format it, and how to grab resources referenced within the HTML, like JPEG pictures. For all of that, though, it's possible to access the Internet without nearly that much overhead. Mail clients (e.g., Outlook and Eudora) know some of these protocols without knowing all of them.
Java, like other modern languages, provides classes to support access to the Internet without all the overhead of a browser. Basically, you can write little programs that are clients. Java's class java.net.URL allows you to open URLs and read them as if they were files. It has a method openStream which returns an object of the class java.io.InputStream which can be used to read from the URL.
We have been using FileReader to read from files. The class FileReader is a child of the more general class InputStreamReader which is a child of the class Reader (Figure 12.5). We can create an object of the class InputStreamReader using the InputStream object. The class InputStreamReader is also a child of the class Reader. Finally, we can create a BufferedReader object by passing it a InputStreamReader object. The class BufferedReader has a constructor which takes a Reader object. Since FileReader and InputStreamReader both inherit at some point from the Reader class they can both be used to create a BufferedReader object. This is because when a variable gives a class name as a type any class that inherits from the given type can be used instead. An object of a child class or even a grandchild class is an object of the inherited class so this substitution is allowed.
Figure 12.5. A depiction of the inheritance tree for some of the Reader Classes.
Using these classes, we can create another method to read the temperature directly from the Internet in the class TempFinder. The Web site has changed since when we originally saved a page from it in ajc-weather.html. Now we will look for 'º' and we will read from http://www.ajc.com/.
Program 112. Get the Temperature from a Live Weather Page
import java.io.*; import java.net.URL; /** * Class to find the temperature in a web page. * @author Barb Ericson */
To run this main method, simply click on Tools, then RUN DOCUMENT'S MAIN METHOD. The output will look something like this:
> java TempFinder The current temperature is 54 The current temp from the network is 82
This method getTempFromNetwork is nearly identical to the last one, except that we're reading the string weather from the AJC website live. We use the class URL to gain the ability to read the Web page as an input stream (a stream of bits). We use InputStreamReader to convert the bits into characters. And we use BufferedReader to buffer the characters as we read them for more efficient reading. Notice that each object has a specific role to play, and we create several objects to work together to accomplish the task.
One way to make Web pages interactive is to write programs that actually generate HTML. For example, when you type a phrase into a text area then click the SEARCH button, you are actually causing a program to execute on the server which executes your search and then generates the HTML (Web page) that you see in response. Java has increasingly been used to generate Web pages.