The modern, interactive, full-color Web site is really just a milepost in a long list of computer interconnection technologies. It was not the first technology used to connect computers and it won't be the last.
In the earliest days of computing, circa 1955, computers were huge, single program devices that were used to crunch large numbers or manipulate large amounts of text data. The only network available in those days was the proverbial "SneakerNet," where a programmer or graduate student would output data to a tape drive, remove the reel and walk to the site of the second computer and mount the tape on a drive connect to this computer. It didn't take long for programmers and researchers to become dissatisfied with this approach.
In the early 1960s, the Advanced Research Projects Agency (ARPA) started a project with the purpose of finding a way to connect these mainframe computers so that data could be shared between them.
ARPA's first step was to create the hardware necessary to transfer electrical signals from one end of the computer to the other. The second step was to create the software that sits on each computer and controls the flow and interpretation of the data from the other machine. The result of this research effort was the TCP/IP protocol. In the early days TCP/IP was only one of several popular protocols. Because it was developed by a research project instead of by a company, there was no license fee for using it. That factor, combined with the fact that the fee-based technologies were not that much better, led to its near-universal adoption in the decades that followed.
The TCP/IP Level of the Web Site
All the technologies that we will describe in this chapter run on top of TCP/IP. That is to say that the other programs contain calls to the TCP/IP application-programming interface (API) somewhere in the lower layers of their systems. The reason for this is that every computer that runs TCP/IP software can communicate with every other computer that runs this software.
TCP/IP is not really a protocol, but rather a suite consisting of dozens of protocols. It is named for two of the most important protocols, Transmission Control Protocol (TCP) and the Internet Protocol (IP). TCP/IP is composed of four layers:
Application layer This layer can be a simple chat program or a complete online store such as Amazon.com. It conducts its business by making calls to transport a layer's API.
Transport layer This layer sends chunks of data from one computer to another with guaranteed delivery by making calls to the internetwork layer's API.
Internetwork layer This layer breaks the chunks of data into small pieces called datagrams and sends them to the other computer by calling the network access layer's API.
Network access layer This layer controls the hardware that sends bits and bytes to other computers.
The layers were created to separate low-level functionality from higher-level functionality. This organization of functions enables you to remove a product that provides one layer and replace it with one that performs better, is easier to use, and so on. Figure 26.1 shows an illustration of this interaction between layers.
Figure 26.1. The various layers in the TCP/IP protocol work together to provide reliable intercomputer communications.
Each layer calls the API of the layer beneath it to make requests for service. The end result is reliable communication between two computers.
The TCP and IP protocols take care of the movement of the data from one computer to the next. When the data gets there, however, the application layer must take over and process the data. A key part of that processing is the HTTP protocol, which is covered in the next section.
Hypertext Transport Protocol (HTTP)
A protocol is simply a published agreement between clients and servers that specifies what data will be passed from one party to the other and what syntax it will be in.
HTTP is an application layer protocol, which means that it depends on lower-layer protocols to do much of the work. In the case of HTTP, this lower-layer protocol is TCP/IP. The information in HTTP is transferred as plain ASCII characters. This is convenient for programmers because the instructions are human-readable, making debugging easier.
HTTP is a request and response protocol. This means that it is composed of pairs of requests and responses. Each of the request-and-response pairs is independent of every other pair. This kind of communication is called connectionless. If you want to string together requests to form a larger transaction, you must do this yourself in the programs that you write; HTTP will not do it for you. Figure 26.2 shows where HTTP fits into the layer diagram that we drew in Figure 26.1.
Figure 26.2. The HTTP protocol runs as a sublayer in the application layer.
The format of the HTTP request is a plain-text message. The server reads the message and performs the task that it has been instructed to perform in the message. In some cases, this task is to retrieve a static document and send it back to the client's browser for display; but in other cases, this request is for a servlet to be run. The results of this transaction, if any, are sent to the client.
On the left side of the Figure 26.2, you see a sublayer that is labeled "Your Client Application." In most cases, this client application is a browser. A browser is a piece of software that performs an amazing range of tasks. First, it is an HTTP application, meaning that it creates messages in that protocol for transmission to a Web server. In addition, it contains an HTTP parser that takes the messages that are returned by the Web server and translates them into a display that is pleasant, hopefully, to the human eye. This translation might require that a .jpg image be rendered, HTML be translated into text, XML parsed and Java Applets run, and so on.
The fact that the browser is normally a free download, or available as part of the operating system installation pack obscures the fact that it is really a sophisticated piece of software.
The Web Server
On the right side of Figure 26.2, you see a box labeled "Your Server Application." In the majority of installations, that server application is a piece of software known as the Web server. Web servers are also HTTP applications in that they both parse requests formatted in HTTP and respond by creating messages in HTTP format.
Originally, the job of the Web server was to locate documents and send them back to the requesting client computer in HTTP format. The Web servers were enhanced to add the capability to process CGI, Perl, and a host of other niche languages that composed the first generation of Web programming languages.
The modern Web server has a much more complicated task to perform. In addition to handing static documents to the client and running scripts, the Web server has to be able to call servlets, translate JSPs into servlets, and call enterprise JavaBeans (EJB). Figure 26.3 shows a newer version of Figure 26.2, which shows the browser and the Web server in the application layer.
Figure 26.3. The Browser and the Web server are both HTTP applications.
The client and the server applications are heavily dependent on the browser and the Web server's support mechanisms to run.
In reality, the amount of applet processing that a browser performs is dwarfed by the amount of HTML, both static and dynamic, that it processes. The source of the static HTML is far from constant because every server-side, application-building programming language communicates the results to the browser that made the request using HTML and HTTP. Thus, from the server side, almost every Web site seems to send it HTML.
The top sublayer in the server side is far more complex than its counterpart on the client side. There is a seemingly endless variety of programming languages that can be called by a Web server. In this chapter, we will limit our discussion to the Java varieties: JavaServer Pages (JSP), Servlets, Enterprise JavaBeans, and Web services.