1.1 How It Works | Open Source Development with LAMP: Using Linux, Apache, MySQL, Perl, and PHP

To give you a clear picture of how the parts discussed in this book interrelate, Figure 1.1 shows a pictorial overview of the Web. The Web is usually accessed through a browser, the most well known of which are Netscape/Mozilla and Internet Explorer. There are many alternative browsers, including Galeon, Konquerer, OmniWeb, and Opera, and also text-based options such as Lynx, links, and w3g.

Figure 1.1. How the Web works

graphics/01fig01.gif

When you click a link or type a URL into the location box (for instance, www.example.com ), the browser makes a socket connection (also known as a network connection) to the server www.example.com . The name www.example.com maps to an Internet address, which is a number in the form 1.2.3.4 ”an IP, or dotted quad. The browser connects to www.example.com using port 80, the port that the server operating system opens for such HTTP requests . This port is standardized. Other ports are used for other Internet connections: 22 for SSH, 23 for Telnet, and so on. (SSH and Telnet are addressed in Chapter 2.) It should be made clear that IP port numbers are not related to the physical ports of the machine (COM1 and COM2, USB, Firewire, parallel printer port, etc.).

Based on what the client requests, the server serves up, or delivers, information to the client. The type of data the server serves up includes plain text (this includes HTML), images, Java applets, various types of documents, PDFs, etc. This content that the server delivers can be generated by the server in one of several ways: static, dynamic, or embedded (each of these is discussed later).

The client's job is to receive from the server a stream of text, images, Java applets, documents, and so on and render, or appropriately display, them. The client also executes any JavaScript code and Java applets that are served up. ^[1] The client can send form data to the server using the Common Gateway Interface (CGI) protocol (see Chapter 7). The server can then process this data in whatever way it chooses. There are two sides to the processing that goes on: the client side and the server side. These two terms mean pretty much exactly what they seem to mean. The cool thing about this, as with many other things Web, is that once protocols are established, what happens on the client side or the server side is independent. If the client wants to block pop-up ads, there isn't much the server can do about it. If someone develops a new server-side application or improves an old one, the client doesn't care, as long as protocols are adhered to.

^[1] JavaScript should not be confused with Java; they are two different things. JavaScript, originally called LiveScript, is a language created by Netscape (www.netscape.com) that executes within the browser to do clever things like pop up new windows , image rollovers, and other nifty client-side things. Java is a platform-independent programming language created by Sun Microsystems (www.sun.com) that is often used to create applets that are downloaded and executed within the browser.

1.1.1 Serving Up Static Data

The simplest thing for the server to do is to serve up static data, or data that is the same for every client and changes only when an HTML programmer changes the source file. The server accomplishes this by locating an HTML (or image or PDF or ...) file on the local hard drive and sending that content back to the client unchanged. This requires no server programming ”the Apache web server does all the work.

This is illustrated in Figure 1.2. Let's say the user enters the URL www.example.com/ . The server www.example.com is contacted, and a request is made that causes the server to locate a file ( /var/www/html/index.html ) on its local drive. The file is located and sent back, as is, to the client. ^[2]

^[2] Actually, the server prepends some information, called the header, to the content and sends the header followed by the contents of index.html . More on this in Chapter 3.

Figure 1.2. Serving up static data

graphics/01fig02.gif

We discuss static content when we talk about how to set up and use the Apache web server (see Chapter 3). We also discuss how to easily create and manage a large (or small, for that matter) static Web site using the Website META Language (WML) ”see Chapter 6.

1.1.2 Serving Up Dynamic Data

A more complex way of generating HTML is to execute a server-side program that dynamically generates the HTML that is sent to the browser. There are many flavors of server-side programming, including tried-and-true CGI (Chapter 7) and the more flexible and powerful mod_perl (Chapter 8). The program generally does some sort of server-side processing, such as reading from a database or executing some other server-side program.

Dynamic web pages should not be confused with dynamic HTML, a term usually used when discussing web pages that exhibit dynamic behavior such as pop-up windows, image rollovers, dynamic clickable menus , and similar super-duper fancy eye candy . Dynamic HTML is often implemented with JavaScript and the Document Object Model (DOM).

Dynamic content is illustrated in Figure 1.3. If the user enters the URL www.example.com/cgi-bin/a.cgi , the server ( www.example.com ) receives a request to execute a program named a.cgi (the server knows it is an executable program because of the cgi-bin in the URL). The server locates the file on its local drive, perhaps at /var/www/cgi-bin/a.cgi , and executes the program. This program's job is to produce HTML (along with doing some useful task for which it was created, such as reading from a database, sending e-mail, or writing to a log file) that will be sent back to the client.

Figure 1.3. Serving up dynamic data

graphics/01fig03.gif

1.1.3 Serving Up Content with Embedded HTML

Another, more flexible way to create dynamic web pages is to use embedded HTML, or executable code embedded within an HTML file. This is agood approach when those working on the web site are from disparate backgrounds. For instance, if a person who knows HTML but is not a programmer builds the template for the web page, a programmer can come behind them and add executable code directly into the HTML file to make the page come alive . Not static, not quite dynamic, embedded web pages provide a measure of flexibility.

This enables a Web designer ”perhaps an artist or a graphic designer rather than the stereotypical artistically challenged nerd (like us) ”to create a web site that is usable and eye-pleasing instead of average and plain. ^[3] Then, after the artist has created the look and feel, the programmers can come in and add executable code right into the HTML to change the static content into dynamic content, livening a good-looking but otherwise static site. Of course, one of the target audiences of this book is the graphic designers and artists who design web page look and feel, potentially cutting out the necessity for nerds entirely. Not vice versa, though, because it's far easier to learn programming than to do graphical design.

^[3] An example of the former is this book's web site ”www.opensourcewebbook.com ”designed by the excellent artistic folks at BDGI (www.bdgi.com). An example of the latter can be found at www.ifokr.org.

Figure 1.4 shows how this type of processing works. Let's say the user enters the URL www.example.com/a.html . The web server grabs the HTML file, perhaps /var/www/html/a.html , and preprocesses it in some way, generating HTML by executing the code within the original HTML file; the result is then sent to the client.

Figure 1.4. Serving up embedded data

graphics/01fig04.gif

We examine four approaches to embedded programming:

SSI (Server Side Includes; see Chapter 9) ” a simple solution that is built into Apache, using a syntax that is unique to SSI
Embperl (see Chapter 10) ” a Perl module that enables an HTML file to have Perl code embedded within it
Mason (see Chapter 11) ” another Perl module that, like Embperl, enables an HTML file to have Perl code embedded within it
PHP (see Chapter 12) ” a language unto itself, Perl-like in its syntax and providing a rich collection of built-in functions to perform various tasks

SSI, being simple, is limited in what it can do. We discuss it only because you may come across SSI pages that need maintenance. Embperl, Mason, and PHP are rich in features. With these languages, the HTML page has access to posted form data, can connect to databases, can read and write files, and can perform any task that you can do in an arbitrary program. This enables you to make HTML files programs that can be bent to your will, creating web sites that not only are things of beauty, serving up live, dynamic data, but also can become applications, performing many tasks.