Static content makes up a huge part of the Web, but a lot of the action going on today is in the world of web applications. By taking advantage of server-side software, many companies and people are providing full-fledged applications over the Web. You can purchase goods from any number of online sites, read your email over the Web, or participate in discussion forums. Sites such as Yahoo! provide full personal information manager functionality over the Web, with email, calendaring, a to-do list, and an address book.
Let's talk about how web applications are created. As you know, when a browser wants a document that resides on a web server, it sends a request to that server using a URL. When a web application is in place, URLs point to programs instead of to static files. The program is passed any parameters (generally supplied by the user), and the output of the program is sent back to the browser.
When it comes to web applications, you can be as ambitious as you want to be. You can write a complex system that provides a web interface for an existing mainframe-based application, build a full-featured online store, or write a bit of code to include the same footer in all of your web pages. The scope of your effort is entirely up to you.
A huge number of languages are available for creating web applications. A few make up the lion's share of the market, but there are plenty of obscure platforms as well if you're interested in something esoteric. Let's look at some of the more popular ones that you may encounter.
If there's a platform for creating web applications that could be referred to as the old standby, it's CGI. CGI, which is short for Common Gateway Interface, is the original web application platform. When the NCSA web server was introduced, it included an interface to external programs. When a URL that points to a CGI script is requested, a script completely external to the web server is called and the output is sent back to the browser. The job of a CGI program is to process the data submitted to it (usually through an HTML form), and to generate a response (usually an HTML page).
Back in Lesson 10, "Designing Forms," I mentioned that forms have a couple of different submit methods, GET and POST, and explained how to choose which method to use. These methods are important when it comes to CGI programs (and other web applications), because they dictate how the data submitted by the user is provided to the program. When the GET method is used, the parameters are passed to the CGI program via an environment variable. When the POST method is used, they're passed to the program via standard input. If you don't know what either of those things is, don't worry about it. Just understand that they're handled in different ways, and that these days most web application platforms don't carethey give you access to the values passed in either way.
The nice thing about CGI is that it's simple and universal. Nearly all web servers support CGI, and it's easy to get started writing CGI programs because the interface is so simple. CGI has almost no infrastructure. You just put your program (which is written so that it gets input to the right place and prints out web content) in a directory set up to recognize requests for CGI programs, and the work is basically done. Other than generating web content, there's only one other hard and fast requirement for CGI programsthey have to generate the HTTP header that indicates what type of content the program generates.
Let's look at a simple example of a CGI program:
#!/usr/bin/perl use CGI qw(:standard); print header; print "<html><head><title>Example</title></head><body>\n"; print "<p>Welcome, ", param('name'), ".</p>\n"; print "</body></html>\n";
This program, helpfully called example.pl, is written in Perl. This book isn't even about CGI programming, much less Perl, so I'll just provide a very cursory overview of how it works. When the CGI program is requested, the server loads the Perl interpreter, which in turn compiles and runs the script. The first line of the file indicates that this program is a script, and the location of the interpreter used to run the script. This is standard UNIX notation for script files.
The third line in the script imports the CGI library that is distributed with Perl. At one time it was an add-on for Perl, but because Perl became so popular for writing CGI programs and the CGI library took so much grunt work out of it, it became a standard part of the Perl distribution. Most people use it to extract the parameters submitted by the user and make them easily accessible within the program. (It's thanks to the CGI library that the differences between GET and POST become invisible to the programmer. It checks for both GET and POST data and makes the data available to the programmer either way.) As you'll see on line 5, it's also used to generate the Content-type header that indicates what type of content is generated by the script. The default is text/html. To print the header, I just call the header subroutine, which is imported from the CGI library. If I weren't using the CGI library, I could generate the header by simply printing it out, like this:
print "Content-type: text/html\n\n";
The two line feeds (represented by \n\n) indicate that the script is done printing headers and that the remainder of the output is the body of the response.
Once the header has been generated, the rest of the script is devoted to printing out the body of the response: the web page that is displayed in the browser. This part of the script consists of three lines of code that print out a very simple HTML page. The only thing out of the ordinary here is the code that inserts the value of the name parameter. By calling param('name'), I retrieve the value of the name parameter that was submitted with the request. If the name parameter was not included, this value returns an empty string. (This script would be accessed by a form that includes an input field named name.)
Obviously this script is a lot simpler than most you'll see, but the idea is fundamentally the same regardless of what you ultimately use your CGI program to accomplish. As a web designer, one of the drawbacks of CGI is that you often wind up writing a program that has a bunch of HTML in it. That makes it difficult to prototype your pages or to create them using your favorite HTML editor. Web application platforms like Active Server Pages and PHP solve this problem by enabling you to embed script code within your HTML documents.
Active Server Pages
In the Microsoft world, Active Server Pages (or ASP) is the web application platform of choice. The approach with Active Server Pages is very different than the one taken with CGI programs. With ASP, you have pages that look a lot like HTML documents, except that there's some code mixed in as well. Unlike CGI, which is really an interface from a web server to external programs, support for Active Server Pages is built into Microsoft's Internet Information Server (IIS) web server.
<% response.write("This is printed using code.") %>
The scriptlet begins with <% and is closed with %>. Anything not enclosed in scriptlets is treated as standard HTML. In this code example, I use the write method of the response object to print out some text as part of the page. A shorter way to print out a string is to use the expression evaluator:
<%= "This is printed using code." %>
This scriptlet begins with <%=, which indicates that the only thing inside it will be an expression, and the results of the expression will be included as part of the content of the page. That example doesn't make it clear exactly how this works because it's just a static string. Take a look at this one:
<%= 2 + 2 %>
That expression would print out the number four. It could be used in a case like this:
<p>The sum of two and two is <%= 2 + 2 %>.</p>
This is a really, really simple example. ASP is used to create web applications of every level of complexity. At some point, it makes sense to stop throwing more and more code into your ASPs and start using external libraries instead. These libraries are generally created as COM objectsif you're familiar with Windows, they're .dll files. You can write code in Visual Basic or C++, bundle that code up into COM objects, and then use those COM objects from within your pages. For example, you can write a COM object to send email, or one that calculates sales tax on an order.
Let's look at how includes are used in the world of ASP. This is actually a sneak preview of the section later in the lesson on server-side includes, because ASP reuses the SSI include syntax. The basic format of an include is
<!--#include file="footer.html" -->
As you can see, the include directive looks like an HTML comment. This is convenient in cases where your includes aren't recognized by the web server. Rather than printing out the code as part of the page, they're hidden by the browser because they're treated as comments.
The earlier include directive attempts to find a file called footer.html, read it, and include its contents in the page in place of the directive. If you use the file attribute in the directive, the file being included must be in the current directory or a directory below it. The path cannot include . or begin with /. For example, ../includes/footer.html would not work, but includes/footer.html would.
If you need to use . or an absolute path, use the virtual attribute instead of file. Using virtual, you can access anything in the document root. When you use virtual, either of those two paths would be valid. You still can't include files that are outside the document root, but you shouldn't be doing that anyway.
When you use includes with ASP, if you include a file with the .asp extension, IIS will execute any scriptlets found in the file when it is included. This provides you with a capability above and beyond just including common content in your pages. If you have ASP code that you want to run on more than one page, you can put it in an included file. This enables you to create libraries of common code. This can be a huge timesaver.
In the Java world, JSP is the Web equivalent to ASP. It's short for Java Server Pages. J2EE is short for Java 2 Enterprise Edition, and it includes web technologies such as Java Server Pages and servlets, along with other server-side technologies that are beyond the scope of this book. You might be familiar with Java applets. Applets are Java programs that are downloaded along with a web page and generally displayed inline on a page. There are Java applet games, Java applet news tickers, and Java applet banner ads. In fact, when Java was originally created, it was widely thought of purely in terms of applets.
Server-side web programming in Java began with Java servlets, which were in some ways the server-side analogue of Java applets. Servlets are also similar to CGI programs. The main difference between a servlet and a CGI program is that servlets have to run in the context of an application that's known as a servlet container. In this sense, they're more like ASP programs. Just as the IIS understands how to interpret and execute ASPs, the servlet container understands how to pass requests on to servlets and send the output back to the user. In some cases, the web server serves as the servlet container; in others, the servlet container is a separate application that is connected to the web server.
In any case, you can write servlets and deploy them on your servlet container. Once you've written a servlet and mapped it to a particular URL, it can respond to requests the same way a CGI program can. What does this have to do with Java Server Pages? Java Server Pages (or JSPs) are just a simpler way to create servlets. A JSP looks a lot like an ASPit's an HTML page that optionally contains scriptlets and directives. In the JSP world, the scriptlets are written using Java. The trick here is that when a servlet container serves up a JSP, it converts it into a servlet, compiles that servlet into a Java class file, and then maps it to the path where the JSP is located. So, a JSP at /index.jsp is turned into a servlet that is called whenever that path is requested.
The syntax for JSPs is virtually identical to that of ASP files. Scriptlets are defined in exactly the same way. In Java, it looks like this:
<% String aString = "This is a string."; response.write(aString); %>
The expression evaluation feature in JSP is also exactly the same as it is in ASP. To print out the value of aString without bothering with response.write(), use the following:
<%= aString %>
There are some other constructs associated with JSP as well. These directives follow this pattern:
<%@page language="Java" %>
The fact that it starts with <%@ indicates that it's a directive. page is the name of the directive and language is an attribute. This directive indicates that the language used in the scriptlets on the page is Java. In truth, this is the only valid option; JSPs must use Java as their programming language. Perhaps the most common attribute of the page directive that you'll see is the import attribute, which is used to indicate that a particular class is used on your page. If you're not a Java programmer, imports might be a bit confusing for you. Just remember that you'll see a lot of them of you work on complex JSPs.
There's also an alternative form of directives for JSPs that use an XML-based notation. To see how they differ from normal directives, let's look at how you include files in the JSP world. The first method uses a normal-looking include:
<%@include file="footer.jsp" %>
This directive includes a file called footer.jsp from the current directory in place of the directive. When you use the include directive, the included file is inserted at compile time. What this means is that the file is included before the JSP is converted to a servlet. For programmers, it means that code in the included file can interact with code in the files that include it. For example, you could set the copyright date in a variable in the including file, and reference that variable to print out the variable in the included file.
You can also include files using JSP's XML-style directives. To include footer.jsp using the XML directive, the following code is used:
<jsp:include template="footer.jsp" />
When you use this type of include, it's treated as a runtime include. This differs from the previous in that runtime includes are only included after the page has been converted into a servlet and run. The include is processed or read in separately at that point, which means that variables can't be shared between the included file and the including file.
The last common constructs you'll hear about in the JSP world are taglibs. To make things easier for people who aren't Java programmers, the developers of the J2EE specification created a way to provide custom tags (called taglibs, short for tag libraries) that you can use as part of your pages. Not only can programmers create their own custom tags, but there are a number of projects working to create standard custom tags that encapsulate common functionality needed by many web applications. The taglib directive is used to make a tag library available for a JSP:
<%@ taglib uri="/WEB-INF/app.tld" prefix="app" %>
The uri attribute provides the URL for the descriptor file for the tag library. The prefix attribute indicates how tags associated with the tag library are identified. For example, if there's a tag library tag called blockquote, it is differentiated from the standard <blockquote> tag by using the prefix, like this:
Many programmers write their own tag libraries that provide functionality specific to their applications. However, Sun has also defined a standard set of tag libraries to provide functionality common to many applications. This group of libraries is called JSTL (Java Standard Tag Libraries). You can read more about JSTL at http://java.sun.com/products/jsp/jstl/.
You can find an actual implementation of JSTL at http://jakarta.apache.org/taglibs/doc/standard-doc/intro.html.
The JSTL tags provide functionality for things such as loops, conditional operations, and processing XML.
There's a lot more to building web applications using Java and J2EE than I've discussed here. I've just provided an overview for you in case you need to apply your HTML skills to a web application written in Java. Hopefully, when you run into one of these applications, you'll have seen enough here not to be confused by the JSP syntax.
PHP is yet another language that enables you to embed scripts in your web pages. ASP is really part of Microsoft's overall software development platform, and similarly, J2EE is part of the Java universe. PHP, on the other hand, is completely independent. Rather than building on a general-purpose language, PHP is a programming language unto itself. The language uses a C-like syntax that also has some things in common with Perl. Like ASP and JSP, it can be interspersed with your HTML. Usually, you'll find that PHP files have the extension .php, but the web server can be configured to treat any files as PHP files. You can even set things up so that files with the extension .html are treated as PHP files.
There are two ways to include script code in your pages:
<?php echo("Hello."); ?>
There's also a more concise notation for adding scripts to your page:
<? echo("Hello."); ?>
This was the traditional notation for adding PHP code to web pages, but it conflicts with XML, so <?php ?> was added to differentiate between the two. If you're starting out, you should stick with the <?php ?> notation because doing so could save you trouble later, and it's not that much more trouble.
One of the nicest things about PHP is that it's completely free, and can be easily installed to work with Apache, the most popular web server. For this reason, many, many web hosting providers include support for PHP with their hosting packages. It's fairly simple to install, and is neither large nor unwieldy, so you can run it yourself with little trouble. PHP is also easier to learn than some of the other systems because it's not just an extension of a larger programming environment. You can find out more about PHP at http://www.php.net/.
As I've done with the other technologies, let me explain how to include external files in your pages. In the PHP world, there are four functions that can be used to include external files in PHP documents. All of them are compile-time includes for the purposes of PHP. The functions are include(), require(), include_once(), and require_once().
Both include() and require() accept the path to a file as an argument. The difference is that if you use include() and the file cannot be read for some reason, a warning is printed but the page continues to be processed. If you use require(), a fatal error occurs if the included file cannot be read. include_once() and require_once() are exactly the same, except that if the file to be included has already been included earlier on the page, the include will be ignored. This may seem strange if you're thinking about including content, but it's helpful if you're including code. Let's say you have a file that sets up a bunch of variables used later on your page. It probably makes sense to use require_once() to make sure that those variables aren't set more than once.