Flylib.com

Books Software

 
 
 

Hour 21. Introduction to CGI

 <  Day Day Up  >  

Hour 21. Introduction to CGI

What You'll Learn in This Hour:

  • The basics of how the Web works

  • What you need to know before writing CGI

  • How to write your first CGI program

There's no question in anyone 's mind that the explosion in popularity of the Internet is mostly due to the World Wide Web. Since the introduction of the first graphical web browser in 1993, the Internet has expanded at a phenomenal rate ”going from the number of hosts doubling every 20 months around 1993 to doubling every 12 months currently. The growth of private networks ” intranets ”has increased even more rapidly .

The content of the web has become more sophisticated since 1993, and the users of the web expect each web page to do more than simply show static (unchanging) web pages. A successful web site requires dynamic web pages ”web pages that provide up-to-date information. Complex pages with rapidly changing content would be almost impossible to keep current if not for Common Gateway Interface ( CGI ) .

By the Way

The next four hours require you to have some knowledge of Hypertext Markup Language (HTML). If you're unfamiliar with HTML, don't despair. It's really not hard to learn, and you won't need much to complete this book.

HTML is a markup language commonly used for constructing web pages. HTML consists of plain text with formatting codes embedded in the text to indicate how a web browser should display the text. For example, the text HTML is <I>not</I> hard to learn. is normal text except for the <I></I> markers. They are called tags and describe the formatting used to display the text. In this case, the word not should be displayed in italic by a web browser, if it can. (Remember, not all browsers are graphical.)

A full lesson in HTML is well beyond the scope of this book. It's not difficult; there's just a lot of material to cover. The specification for HTML is maintained by the World Wide Web Consortium (W3C) at http://www.w3c.org, and you can find some nice tutorials there as well. One good book on HTML is Sams Teach Yourself HTML in 24 Hours .


 <  Day Day Up  >  
 <  Day Day Up  >  

Browsing the Web

The Web, as you know it, is simply the interaction between two different systems trying to exchange data. The system that is trying to fetch a web page is known as the client . The client system usually runs a program called a web browser, such as Safari, Firefox, Internet Explorer, Opera, and so on. This is the extent of the Web that you're used to using every day. The web browser provides navigation buttons and bookmarks and is responsible for drawing web pages on your screen.

On the other end of the Web is a system known as the web server . This system takes the client's request for a page, retrieves the page from a local disk, and sends it to the client ”your web browser. This interaction is shown in Figure 21.1.

Figure 21.1. Web browser fetching a page.


Fetching a Static Web Page

A client requests a web page by examining a Uniform Resource Locator (URL) to determine the protocol, server, and request to make on that server. A typical URL might look like the following:

http://www.google.com:80/more.html

The parts of the URL can be broken down like this:

  • http ” This part is the protocol. HTTP, or Hypertext Transfer Protocol, is a protocol used for transferring web pages. You may also have seen File Transfer Protocol (ftp) or secure HTTP ( https ).

  • www.google.com ” This part is the name of the server ”also called a host name ”that contains the document you want. Sometimes, instead of a host name, you might see an IP address, usually written as four numbers separated by dots: 209.185.108.147. These addresses tend to be less reliable than the names , though.

  • :80 ” This part is a port number that determines on which port your client and the server will connect with each other. This portion is usually optional; the protocol used determines what port will be used. http usually means "use port 80."

  • more.html ” This is the request being made on the server. Usually it's a document you want to retrieve. Sometimes it's written as a pathname, such as /archives/foo.html, or it has other characters trailing at the end, such as (?&), but essentially it is what the client needs the server to retrieve.

The client then follows these steps for http (see Figure 21.2):

1.

The hostname (www.google.com) is converted to an IP address.

2.

A connection is established with the server at www.google.com using the IP address and the port number.

3.

The server is asked for the page more.html. The client waits for a response.

4.

The server sends the response ”in this case, the contents of more.html ”and drops the connection to the server.

5.

The client renders the response on the screen.

Figure 21.2. Requesting a page.


The nitty-gritty of the "conversation" between the client and the server is covered in depth in Hour 24, "Manipulating HTTP and Cookies."

Dynamic Web Content ”The CGI

During a normal web page fetch, the server simply locates the document requested , retrieves it from its disk storage, and sends it to the client, as illustrated in Figure 21.3.

Figure 21.3. Static web page fetch.


The server in Figure 21.3 doesn't process the data at all; it simply examines the request and passes the requested data back to the client.

One method to create dynamic content on the Web is through the use of CGI programs. CGI is an agreed-upon method that web servers use to run programs on the server to generate web content. When a URL indicates to a server that a CGI program should be run to generate the content, the server starts the program, the program generates the content, and the server passes the content back to the client, as illustrated in Figure 21.4.

Figure 21.4. CGI script-generated web page.


Each time the client requests a page that's really a CGI program, the following occurs:

  1. The server starts a new instance of the CGI program.

  2. The CGI program generates a page, or another response, using whatever information it needs.

  3. The page is sent back to the client.

  4. The CGI program exits.

The CGI program can be any kind of program. It can be a Perl script, which is what you'll learn about here. It can also be programmed in C, the Unix shell, Pascal, Lisp, TCL, or nearly any other programming language. The fact that many CGI programs are written in Perl is a happy coincidence . Perl happens to be very well-suited to writing programs that deal with text, and the output of CGI programs is often text.

The output of CGI programs can be almost anything, however. It can be images, HTML-formatted text, Zip files, streaming video, or any other kinds of content you might find on the Web. For the most part, the CGI programs you'll be writing will generate HTML-formatted text.

By the Way

CGI is not a language; it has nothing to do specifically with Perl, it has nothing to do with HTML, and it has very little to do with HTTP. It's simply an agreed-upon interface between web servers and programs run on their behalf . This informal interface wasn't codified until October 2004 in RFC 3875. You can read about this at http://www.ietf.org. You'll pick up bits and pieces of these details over the next four hours.


 <  Day Day Up  >