15.3. WAR Files and Deployment
As we described in the introduction to this chapter, a WAR file is an archive that contains all the parts of a web application: Java class files for servlets, JSPs, HTML pages, images, and other resources. The WAR file is simply a JAR file with specified directories for the Java code and one very important file: the web.xml file, which tells the application server what to run and how to run it. WAR files always have the extension .war, but they can be created and read with the standard jar tool.
The contents of a typical WAR might look like this, as revealed by the jar tool:
$ jar tvf shoppingcart.war index.html purchase.html receipt.html images/happybunny.gif WEB-INF/web.xml WEB-INF/classes/com/mycompany/PurchaseServlet.class WEB-INF/classes/com/mycompany/ReturnServlet.class WEB-INF/lib/thirdparty.jar
When deployed, the name of the WAR becomes, by default, the root path of the web application, in this case, shoppingcart. Thus, the base URL for this web app, if deployed on http://www.oreilly.com, is http://www.oreilly.com/shoppingcart/, and all references to its documents, images, and servlets start with that path. The top level of the WAR file becomes the document root (base directory) for serving files. Our index.html file appears at the base URL we just mentioned, and our happybunny.gif image is referenced as http://www.oreilly.com/shoppingcart/images/happybunny.gif.
The WEB-INF directory (all caps, hyphenated) is a special directory that contains all deployment information and application code. This directory is protected by the web server, and its contents are not visible to outside users of the application, even if you add WEB-INF to the base URL. Your application classes can load additional files from this area using getresource( ) on the servlet context, however, so it is a safe place to store application resources. The WEB-INF directory contains the all-important web.xml file, which we'll talk more about in the next section.
The WEB-INF/classes and WEB-INF/lib directories contain Java class files and JAR libraries, respectively. The WEB-INF/classes directory is automatically added to the classpath of the web application, so any class files placed here (using the normal Java package conventions) are available to the application. After that, any JAR files located in WEB-INF/lib are appended to the web app's classpath (the order in which they are appended is, unfortunately, not specified). You can place your classes in either location. During development, it is often easier to work with the "loose" classes directory and use the lib directory for supporting classes and third-party tools. It's also possible to install JAR files directly in the servlet container to make them available to all web apps running on that server. This is often done for common libraries that will be used by many web apps (such as the JWSDP libraries we'll use later when building web services). The location for placing the libraries, however, is not standard and any classes that are deployed in this way cannot be automatically reloaded if changeda feature of WAR files that we'll discuss later. Servlet API 2.4 requires that each server provide a directory for these extension JARs and that the classes there will be loaded by a single classloader and made visible to the web application.
15.3.1. The web.xml File
The web.xml file is an XML file that lists the servlets to be run, the relative names (URL paths) under which to run them, their initialization parameters, and their deployment details, including security and authorization. We will assume that you have at least a passing familiarity with XML or that you can imitate the examples in a cut-and-paste fashion. (For details about working with Java and XML, see Chapter 24.) Let's start with a simple web.xml file for our HelloClient servlet example. It looks like this:
<web-app> <servlet> <servlet-name>helloclient1</servlet-name> <servlet-class>HelloClient</servlet-class> </servlet> <servlet-mapping> <servlet-name>helloclient1</servlet-name> <url-pattern>/hello</url-pattern> </servlet-mapping> </web-app>
The top-level element of the document is called <web-app>. Many types of entries may appear inside the <web-app>, but the most basic are <servlet> declarations and <servlet-mapping> deployment mappings. The <servlet> declaration tag is used to declare an instance of a servlet and, optionally, to give it initialization and other parameters. One instance of the servlet class is instantiated for each <servlet> tag appearing in the web.xml file.
At minimum, the <servlet> declaration requires two pieces of information: a <servlet-name>, which serves as a handle to reference the servlet elsewhere in the web.xml file, and the <servlet-class> tag, which specifies the Java class name of the servlet. Here, we named the servlet helloclient1. We named it like this to emphasize that we could declare other instances of the same servlet if we wanted to, possibly giving them different initialization parameters, etc. The class name for our servlet is, of course, HelloClient. In a real application, the servlet class would likely have a full package name, such as com.oreilly.servlets.HelloClient.
A servlet declaration may also include one or more initialization parameters, which are made available to the servlet through the ServletConfig object's getInitParameter( ) method:
<servlet> <servlet-name>helloclient1</servlet-name> <servlet-class>HelloClient</servlet-class> <init-param> <param-name>foo</param-name> <param-value>bar</param-value> </init-param> </servlet>
Next, we have our <servlet-mapping>, which associates the servlet instance with a path on the web server:
<servlet-mapping> <servlet-name>helloclient1</servlet-name> <url-pattern>/hello</url-pattern> </servlet-mapping>
In older versions of the spec, servlet mapping entries must appear in the web.xml file after all the servlet declaration entries. In Servlet API 2.4, the order of elements under <web-app> is not restricted.[*]
Here we mapped our servlet to the path /hello. If we later name our WAR learningjava.war and deploy it on www.oreilly.com, the full path to this servlet would be http://www.oreilly.com/learningjava/hello. Just as we could declare more than one servlet instance with the <servlet> tag, we could declare more than one <servlet-mapping> for a given servlet instance. We could, for example, redundantly map the same helloclient1 instance to the paths /hello and /hola. The <url-pattern> tag provides some very flexible ways to specify the URLs that should match a servlet. We'll talk about this in detail in the next section.
Finally, we should mention that although the web.xml example listed earlier will work on some application servers, it is technically incomplete because it is missing formal information that specifies the version of XML it is using and the version of the web.xml file standard with which it complies. To make it fully compliant with the standards, add a line such as:
<?xml version="1.0" encoding="ISO-8859-1"?>
Following that is the matter of identifying the version of the web.xml content itself. In Servlet API 2.3 and earlier, the web.xml version information would take the form of a header like the following, which specifies an XML Document Type Definition (DTD):
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">
As of Servlet API 2.4, the web.xml version information has changed to take advantage of XML Schemas. (We'll talk about XML DTDs and XML Schemas in Chapter 24.) As of Version 2.4, the additional information is inserted into the <web-app> element itself:
<web-app xmlns="http://java.sun.com/xml/ns/j2ee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd" version="2.4">
You can choose a format and copy and paste the appropriate lines in each web.xml file we use in this book. If you leave them out, the application may still run, but it will be harder for the servlet container to detect errors in your configuration and give you clear error messages.
15.3.2. URL Pattern Mappings
The <url-pattern> specified in the previous example was a simple string, /hello. For this pattern, only an exact match of the base URL followed by /hello would invoke our servlet. The <url-pattern> tag is capable of more powerful patterns, however, including wildcards. For example, specifying a <url-pattern> of /hello* allows our servlet to be invoked by URLs such as www.oreilly.com/learningjava/helloworld or .../hellobaby. You can even specify wildcards with extensionse.g., *.html or *.foo, meaning that the servlet is invoked for any path that ends with those characters.
Using wildcards can result in more than one match. Consider URLs ending in /scooby* and /scoobydoo*. Which should be matched for a URL ending in .../scoobydoobiedoo? What if we have a third possible match because of a wildcard suffix extension mapping? The rules for resolving these are as follows.
First, any exact match is taken. For example, /hello matches the /hello URL pattern in our example regardless of any additional /hello*. Failing that, the container looks for the longest prefix match. So /scoobydoobiedoo matches the second pattern, /scoobydoo*, because it is longer and presumably more specific. Failing any matches there, the container looks at wildcard suffix mappings. A request ending in .foo matches a *.foo mapping at this point in the process. Finally, failing any matches there, the container looks for a default, catchall mapping named /*. A servlet mapped to /* picks up anything unmatched by this point. If there is no default servlet mapping, the request fails with a "404 not found" message.
15.3.3. Deploying HelloClient
Once you've deployed the HelloClient servlet, it should be easy to add examples to the WAR as you work with them in this chapter. In this section, we'll show you how to build a WAR by hand. In "Building WAR Files with Ant" later in this chapter, we'll show a more realistic way to manage your applications using the wonderful build tool, Ant. You can also grab the full set of examples, along with their source code, in the learningjava.war file on the CD-ROM that comes with this book or from this book's web site at http://www.oreilly.com/catalog/learnjava3/).
To create the WAR by hand, we first create the WEB/INF and WEB-INF/classes directories. Place web.xml into WEB-INF and HelloClient.class into WEB-INF/classes. Use the jar command to create learningjava.war:
$ jar cvf learningjava.war WEB-INF
You can also include some documents in the top level of this WAR by adding their names after the WEB-INF directory. This command produces the file learningjava.war. You can verify the contents using the jar command:
$ jar tvf learningjava.war document1.html WEB-INF/web.xml WEB-INF/classes/HelloClient.class
Now all that is necessary is to drop the WAR into the correct location for your server. We assume you have downloaded and installed Tomcat. The location for WAR files is the path for your Tomcat installation, followed by /webapps. Place your WAR here, and start the server. If Tomcat is configured with the default port number, you should be able to point to the HelloClient servlet with one of the following URLs: http://localhost:8080/learningjava/hello or :8080/learningjava/hello">http://<yourserver>:8080/learningjava/hello, where <yourserver> is the name or IP address of your server.
22.214.171.124 Reloading web apps
All servlet containers are supposed to provide a facility for reloading WAR files; many support reloading of individual servlet classes after they have been modified. Reloading WARs is part of the servlet specification and is especially useful during development. Support for reloading web apps varies from server to server. Normally, all that you have to do is drop a new WAR in place of the old one in the proper location (e.g., the webapps directory for Tomcat) and the container shuts down the old application and deploys the new version. This works in Tomcat 5.x when the "autoDeploy" attribute is set (it is on by default) and also in BEA's WebLogic application server, when it is configured in development mode.
Some servers, including Tomcat, "explode" WARs by unpacking them into a directory under the webapps directory, or they allow you explicitly to configure a root directory (or "context") for your unpacked web app through their own configuration files. In this mode, they may allow you to replace individual files, which can be especially useful for tweaking HTML or JSPs. Tomcat 5.x automatically reloads WAR files when they change (unless configured not to), so all you have to do is drop an updated WAR over the old one and it will redeploy it as necessary. Tomcat 4.x does not automatically reload WARs but can be made to reload classes and documents from an exploded web app via the Tomcat "manager." For example, you can prompt the server to reload the web app using a URL with the format :8080/manager/reload?path=/learningjava">http://<yourserver>:8080/manager/reload?path=/learningjava. However, even with that mechanism, it will not reload the web.xml file (you'll have to restart the server).
Tomcat 5.x also provides a client-side "deployer" package that integrates with Ant to automate building, deploying and redeploying applications. We'll discuss Ant later in this chapter.
15.3.4. Error and Index Pages
One of the finer points of writing a professional-looking web application is taking care to handle errors well. Nothing annoys a user more than getting a funny-looking page with some technical mumbo-jumbo error information on it when they expected the receipt for their Christmas present. Through the web.xml file, it is possible to specify documents or servlets to handle error pages shown for various conditions, as well as the special case of welcome files (index files) that are invoked for paths corresponding to directories. Let's start with error handling.
You can designate a page or servlet that can handle various HTTP error status codes, such as "404 Not Found" and "403 Forbidden," using one or more <error-page> declarations:
<web-app> ... <error-page> <error-code>404</error-code> <location>/notfound.html</location> </error-page> <error-page> <error-code>403</error-code> <location>/secret.html</location> </error-page>
Additionally, you can designate error pages based on Java exception types that may be thrown from the servlet. For example:
<error-page> <exception-type>java.lang.IOException</exception-type> <location>/ioexception.html</location> </error-page>
This declaration catches any IOExceptions generated from servlets in the web app and displays the ioexception.html page. If no matching exceptions are found in the <error-page> declarations, and the exception is of type ServletException (or a subclass), the container makes a second try to find the correct handler. It looks for a wrapped exception (the "cause" exception) contained in the ServletException and attempts to match it to an error page declaration.
As we've mentioned, you can use a servlet to handle your error pages, just as you can use a static document. In fact, the container supplies several helpful pieces of information to an error-handling servlet, which the servlet can use in generating a response. The information is made available in the form of servlet request attributes through the method getAttribute( ):
Object requestAttribute = servletRequest.getAttribute("name");
Attributes are like servlet parameters, except that they can be arbitrary objects. We have seen attributes of the ServletContext in "The ServletContext API" section. In this case, we are talking about attributes of the request. When a servlet (or JSP or filter) is invoked to handle an error condition, the following string attributes are set in the request:
javax.servlet.error.servlet_name javax.servlet.error.request_uri javax.servlet.error.message
Depending on whether the <error-page> declaration was based on an <error-code> or <exception-type> condition, the request also contains one of the following two attributes:
// status code Integer or Exception object javax.servlet.error.status_code javax.servlet.error.exception
In the case of a status code, the attribute is an Integer representing the code. In the case of the exception type, the object is the actual instigating exception.
Indexes for directory paths can be designated in a similar way. Normally, when a user specifies a directory URL path, the web server searches for a default file in that directory to be displayed. The most common example of this is the ubiquitous index.html file. You can designate your own ordered list of files to look for by adding a <welcome-file-list> enTRy to your web.xml file. For example:
<welcome-file-list> <welcome-file>index.html</welcome-file> <welcome-file>index.htm</welcome-file> </welcome-file-list>
<welcome-file-list> specifies that when a partial request (directory path) is received, the server should search first for a file named index.html and, if that is not found, a file called index.htm. If none of the specified welcome files is found, it is left up to the server to decide what kind of page to display. Servers are generally configured to display a directory-like listing or to produce an error message.
15.3.5. Security and Authentication
One of the most powerful features of web app deployment with the Servlet API is the ability to define declarative security constraints, meaning that you can spell out in the web.xml file exactly which areas of your web app (URL paths to documents, directories, servlets, etc.) are login-protected, the types of users allowed access to them, and the class of security protocol required for communications. It is not necessary to write code in your servlets to implement these basic security procedures.
There are two types of entries in the web.xml file that control security and authentication. First are the <security-constraint> entries, which provide authorization based on user roles and secure transport of data, if desired. Second is the <login-config> enTRy, which determines the kind of authentication used for the web application.
15.3.6. Assigning Roles to Users
Let's take a look at a simple example. The following web.xml excerpt defines an area called "Secret documents" with a URL pattern of /secret/* and designates that only users with the role "secretagent" may access them. It specifies the simplest form of login process: the BASIC authentication model, which causes the browser to prompt the user with a simple pop-up username and password dialog box:
<web-app> ... <security-constraint> <web-resource-collection> <web-resource-name>Secret documents</web-resource-name> <url-pattern>/secret/*</url-pattern> </web-resource-collection> <auth-constraint> <role-name>secretagent</role-name> </auth-constraint> </security-constraint> <login-config> <auth-method>BASIC</auth-method> </login-config>
In Servlet API 2.3 and prior, the security constraint entry comes after all servlet and filter-related entries in the web.xml file. Each <security-constraint> block has one <web-resource-collection> section that designates a named list of URL patterns for areas of the web app, followed by an <auth-constraint> section listing user roles that are allowed to access those areas. You can add the example setup to the web.xml file for the learningjava.war file and prepare to try it out. However, there is one additional step you'll have to take to get this working: create the user role "secretagent" and an actual user with this role in your application server environment.
Access to protected areas is granted to user roles, not individual users. A user role is effectively just a group of users; instead of granting access to individual users by name, access is granted to roles, and users are assigned one or more roles. A user role is an abstraction from users. Actual user information (name and password, etc.) is handled outside the scope of the web app, in the application server environment (possibly integrated with the host platform operating system). Generally, application servers have their own tools for creating users and assigning individuals (or actual groups of users) their roles. A given username may have many roles associated with it.
When attempting to access a login-protected area, the user's valid login will be assessed to see if she has the correct role for access. For the Tomcat server, adding users and assigning them roles is easy; simply edit the file conf/tomcat-users.xml. To add a user named "bond" with the "secretagent" role, you'd add an entry such as:
<user name="bond" password="007" roles="secretagent"/>
For other servers, you'll have to refer to the documentation to determine how to add users and assign security roles.
15.3.7. Secure Data Transport
Before we move on, there is one more piece of the security constraint to discuss: the transport guarantee. Each <security-constraint> block may end with a <user-data-constraint> entry, which designates one of three levels of transport security for the protocol used to transfer data to and from the protected area over the Internet. For example:
<security-constraint> ... <user-data-constraint> <transport-guarantee>CONFIDENTIAL</transport-guarantee> </user-data-constraint> </security-constraint>
The three levels are NONE, INTEGRAL, and CONFIDENTIAL. NONE is equivalent to leaving out the section, indicating no special transport is required. This is the standard for normal web traffic, which is generally sent in plain text over the network. The INTEGRAL level of security specifies that any transport protocol used must guarantee the data sent is not modified in transit. This implies the use of digital signatures or some other method of validating the data at the receiving end but it does not require that the data be encrypted and hidden while it is transported. Finally, CONFIDENTIAL implies both INTEGRAL and encrypted. In practice, the only widely used secure transport used in web browsers is SSL. Requiring a transport guarantee other than NONE typically forces the use of SSL by the client browser.
15.3.8. Authenticating Users
The <login-conf> section determines exactly how a user authenticates himself or herself (logs in) to the protected area. The <auth-method> tag allows four types of login authentication to be specified: BASIC, DIGEST, FORM, and CLIENT-CERT. In our example, we showed the BASIC method, which uses the standard web browser login and password dialog. BASIC authentication sends the user's name and password in plain text over the Internet unless a transport guarantee has been used separately to start SSL and encrypt the data stream. DIGEST is a variation on BASIC that obscures the text of the password but adds little real security; it is not widely used. FORM is equivalent to BASIC, but instead of using the browser's dialog, we can use our own HTML form to post the username and password data to the container. The form data can come from a static HTML page or one generated by a servlet. Again, form data is sent in plain text unless otherwise protected by a transport guarantee (SSL). CLIENT-CERT is an interesting option. It specifies that the client must be identified using a client-side public key certificate. This implies the use of a protocol like SSL, which allows for secure exchange and mutual authentication using digital certificates. The exact method of setting up a client-side certificate is browser-dependent.
The FORM method is most useful because it allows us to customize the look of the login page (we recommend using SSL to secure the data stream). We can also specify an error page to use if the authentication fails. Here is a sample <login-config> using the form method:
<login-config> <auth-method>FORM</auth-method> <form-login-config> <form-login-page>/login.html</form-login-page> <form-error-page>/login_error.html</form-error-page> </form-login-config> </login-config>
The login page must contain an HTML form with a specially named pair of fields for the name and password. Here is a simple login.html file:
<html> <head><title>Login</title></head> <body> <form method="POST" action="j_security_check"> Username: <input type="text" name="j_username"><br> Password: <input type="password" name="j_password"><br> <input type="submit" value="submit"> </form> </body> </html>
The username field is called j_username, the password field is called j_password, and the URL used for the form action attribute is j_security_check. There are no special requirements for the error page, but normally you will want to provide a "try again" message and repeat the login form.
As strange as it may seem, there is currently no API for explicitly logging out a user. However, it is specified that a user's login is no longer valid after the user session times out or is invalidated. Therefore, you can effectively log out the user by calling invalidate( ) on the session:
request.getSession( ).invalidate( ); // log out
This is not quite the same thing as being logged out. What it means is that the user must log in again before performing another operation. Some servlet containers provide their own APIs for explicitly logging out a user. A portable logout API is expected to be included in Servlet API 2.5.
15.3.9. Procedural Security
We should mention that in addition to the declarative security offered by the web.xml file, servlets may perform their own active procedural (or programmatic) security using all the authentication information available to the container. We won't cover this in detail, but here are the basics.
The name of the authenticated user is available through the HttpServletRequest getRemoteUser( ) method, and the type of authentication provided can be determined with the getAuthType( ) method. Servlets can work with security roles using the isUserInRole( ) method. (To do this requires adding some additional mappings in the web.xml file allowing the servlet to refer to the security roles by reference names.)
For advanced applications, a java.security.Principal object for the user can be retrieved with the getUserPrincipal( ) method of the request. In the case where a secure transport like SSL was used, the method isSecure( ) returns true, and detailed information about how the principal was authenticatedthe cipher type, key size, and certificate chainis made available through request attributes. It is useful to note that the notion of being "logged in" to a web application, from the servlet container's point of view, is defined as there being a valid (non-null) value returned by the getUserPrincipal( ) method. This is what we meant when we said that there is no true API for logging out the user at this time, other than invalidating the session.