The form Tag | HTML & XHTML: The Complete Reference (Osborne Complete Reference Series)

< Day Day Up >

The <form> Tag

A form in HTML is contained within a form element. The form itself contains regular text, other HTML/XHTML elements such as tables, and form elements such as check boxes, pull-down menus , and text fields. The W3C specification calls these form elements controls . This is somewhat confusing because Microsoft also refers to ActiveX objects as controls. To avoid confusion and to follow the common industry jargon, form elements are referred to as "form fields."

In a form, a variety of fields can be inserted. Each field will be named by setting a value with its name attribute in traditional HTML or its id attribute in HTML 4 and XHTML. For backwards compatibility, typically both attributes are used. Once a user has finished filling out the form and the Submit button is pressed, the contents of each field is related to its name in the form of a name-value pair (for example, username=Thomas) and is typically sent to a server-based program such as a CGI script for processing. However, the contents of a form might even be mailed to a user for further inspection.

Given this basic overview, in order to make the form work, you must specify two things in the <form> tag: the address of the program that will handle the form contents using action and the method by which the form data will be passed using the method attribute. The name and id attributes might also be useful to set a name for the form so it can later be manipulated by a scripting language such as JavaScript. Finally, in some cases you might have to specify how the form will be encoded using the enctype attribute.

The action Attribute

How a form is to be handled is set using the action attribute for the form element. The action attribute usually is set to a URL of the program that will handle the form data. This URL will usually point to a CGI script to decode the form results. For example, the code

  <form action="http://www.democompany.com/cgi-bin/post-query.pl"   method="post">

would be for a script called post-query.pl in the cgi-bin directory on the server www.democompany.com. It also is possible to use a relative URL for the action attribute if the form is delivered by the same server that houses the form-handling program:

  <form action="../cgi-bin/post-query.pl" method="post">

Setting the action immediately begs this question: What program should the data be passed to? Generally , you or someone at your company will have a program set up to handle the form data. If a hosting vendor is used, there may even be canned programs to handle the contents of the form. However, what happens if there is no way to use a remotely hosted program? It is possible to create a "poor man's" form using the mailto URL. Remember that the action attribute is set to a URL. Thus, in some cases a form element such as

  <form action="mailto:formtest@democompany.com" method="post"   enctype="text/plain">

will work. It is even possible to use an extended form of mailto URL, which is supported by some browsers such as most versions of Netscape and newer versions of Internet Explorer. For example,

  <form action="mailto:formtest@democompany.com?   Subject="Comment%20Form%20Result" method="post"   enctype="text/plain">

Note	The %20 is simply the encoding of the space character.

Although the mailto URL style form seems the best way to do things, not all browsers support this properly and many users may not have their e-mail environment set up in such a way to allow this. Even if the browser supports the mailto style, the data should always be passed using the post method. It also might be useful to encode the data differently by setting it to use text/plain encoding rather than the default style, which is a cryptic encoding style similar to how URLs look. The next section will discuss the methods and the encoding type.

The method Attribute

It also is necessary to specify how the form will be submitted to the address specified by the action attribute. How data will be submitted is handled by the method attribute. There are two acceptable values for the method attribute: get and post . These are the HTTP methods that a browser uses to "talk" to a server. You'll find out more about that in a moment, as well as in Chapter 13. Note that if the method attribute remains unspecified, most browsers should default to the get method. Although much of the following discussion is more applicable to the people writing the programs that handle form data, it is important to understand the basic idea of each method.

Note

When discussing the HTTP methods, we ought to refer to them in uppercase as GET and POST, as defined by the HTTP specification. While traditional HTML tended to refer to the method attribute allowed values in the same manner, XHTML requires that get and post always be lowercase despite this not matching the HTTP specification. I will use the lowercase syntax throughout this discussion given the focus on markup rather than HTTP.

The get Method

The HTTP get method generally is the default method for browsers to submit information. In fact, HTML documents generally are retrieved by requesting a single URL from a Web server using the get method, which is part of the HTTP protocol. When you type a URL such as http://www.democompany.com/staff/thomas.html into your Web browser, it is translated into a valid HTTP get request like this:

 GET /staff/thomas.html HTTP/1.1

This request is then sent to the server www.democompany.com. What this request says, essentially , is "Get me the file thomas.html in the staff directory. I am speaking the 1.1 dialect of HTTP." How does this relate to forms? You really aren't getting a file per se when you submit a form, are you? In reality, you are running a program to handle the form data. For example, the action value might specify a URL such as http://www.democompany.com/cgi-bin/comment.exe, which is the address of a program that can parse your comment form. So wouldn't the HTTP request be something like the one shown here?

 GET /cgi-bin/comment.exe HTTP/1.1

Almost, but you also need to pass the form data along with the name of the program to run. To do this, all the information from the form is appended onto the end of the URL being requested . This produces a very long URL with the actual data in it, as shown here:

  http://www.democompany.com/cgi-bin/comments.exe?   Name=Matthew+Foley&Age=32&Sex=male

The get method isn't very secure because the data input appears in the URL. Furthermore, there is a limitation to just how much data can be passed with the get method. It would be impossible to append a 10,000-word essay to the end of a URL, as most servers limit a URL to around two thousand characters . Interestingly, under the HTML specification, the get method has been deprecated. Despite the fact that get is not recommended, it still is the default method when the method attribute is not specified, and it continues to be used heavily on the Web.

With these potential problems, why use get ? First, get is easy to deal with. An example URL like the following should make it obvious that the Name field is set to Matthew Foley, the Age is 32, and the Sex is male:

  http://www.democompany.com/cgi-bin/   comments.exe?Name=Matthew+Foley&Age=32&Sex=male

Form field names are set to values that generally are encoded with plus signs instead of spaces. Non- alphanumeric characters are replaced by "% nn " where nn is the hexadecimal ASCII code for the character, which turns out to be exactly the same as the URL encoding described in Chapter 4. The individual form field values are separated by ampersands. It would be trivial to write a parsing program to recover data out of this form, but it probably is better to use one of the many existing libraries to decode submitted data.

The other HTTP method, post , is just as easy, so simplicity of parsing should not be a motivating reason to use get . Perhaps the best reason to use get is that it comes in the form of a URL, so it can be bookmarked or set as a link. The get method is used properly in search engines. When a user submits a query to a search engine, the engine runs the query and then returns page upon page of results. It is possible to bookmark the query results and rerun the query later. It also is possible to create anchors that fire off canned server-side programs. This is particularly useful in certain varieties of dynamic Web sites. For example, the link shown next fires off a server-side program written in the ColdFusion Markup language (CFM) and passes it a value setting ”setting the ExecutiveID to 1.

  <a href="displaybio.cfm?Id=1">  Joe Somolovich  </a>

The query is built into the link; when the link is clicked, the server-side program accesses the appropriate database of executives and brings up information about Joe Somolovich.

Although the get method is far from perfect, there are numerous situations in which it makes a great deal of sense. It is unlikely that get will be truly deprecated for quite some time, if ever.

The post Method

In situations where a large amount of information must be passed back, the post method is more appropriate than get . The post method transmits all form input information as a data stream immediately after the requested URL. In other words, once the server has received a request from a form using post , it knows to continue "listening" for the rest of the information. The get method comes with the data to use right in the URL request. The encoding of the form data is handled in the same general way as the get method by default; spaces become plus signs and other characters are encoded in the URL fashion. A sample form might send data that would look like the following:

 Name=Jane+Smith&Age=30&Sex=female

Like data transmitted using the get method, the data will still have to be broken up to be used by the handling program. The benefit of using the post method is that a large amount of data can be submitted this way because the form contents are not in the URL. It is even possible to send the contents of files using this method. In the post example, the encoding of the form data is the same as get , although it is possible to change the encoding method using the enctype attribute.

Note	One potential downside of the post method is that pages generated by data submitted via post cannot be bookmarked. You may have noticed that the preservation of post data is such a challenge that browsers will even try to assist users with automatic reposting of form data.

The enctype Attribute

When data is passed from a form to a Web server, it typically is encoded just like a URL, as discussed in Chapter 4. In this encoding, spaces are replaced by the "+" symbol and non- alphanumeric characters are replaced by "% nn ", where nn is the hexadecimal ASCII code for the character. The form of this is described in the special MIME file format application/x-www-form-urlencoded . By default, all form data is submitted in this form. It is possible, however, to set the encoding method for form data by setting the enctype attribute. When using a mailto URL in the action attribute, the encoding type of text/plain might be more desirable. The result would look like the example shown here:

  First Name=Joe   Last Name=Smith   Sex=Male   Submit=Send it

Each form field is on a line of its own. Even with this encoding form, non-alphanumeric characters can be encoded in the hexadecimal form.

Another form of encoding is also important: multipart/form-data . When passing files back using a form, it is important to designate where each file begins and ends. A value of multipart/form-data for the enctype is used to indicate this style. In this encoding, spaces and non-alphanumeric characters are preserved; data elements are separated by special delimiter lines. The following file fragment shows the submission of a form with multipart/form-data encoding, including the contents of the attached files:

 Content-type: multipart/form-data; boundary=---------------------------2988412654262 Content-Length: 5289 -----------------------------2988412654262 Content-Disposition: form-data; name="firstname" Joe -----------------------------2988412654262 Content-Disposition: form-data; name="lastname" Smith -----------------------------2988412654262 Content-Disposition: form-data; name="myfile"; filename="C:\WINNT\PROFILES\ADMINISTRATOR\DESKTOP\TEST.HTML" Content-Type: text/html <html><head><title>Test File</title></head> <body><h1>Test File</h1></body></html> ---------------------------------------- 8/12/97 4:47:45 PM--SF_NOTIFY_PREPROC_HEADERS URL=/programs/postit.cfm? ---------------------------------------- 8/12/97 4:47:45 PM--SF_NOTIFY_URL_MAP URL=/programs/postit.cfm Physical Path=C:\InetPub\wwwroot\programs\postit.cfm ----------------------------------------

Simple Form Example

Given that we have some place to send form data, as specified by the action attribute and method (either get or post ), we can write a simple stub example for a form, as shown here:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">   <html xmlns="http://www.w3.org/1999/xhtml" lang="en">   <head>   <title>  Form Template  </title>   <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />   </head>   <body>   <form action="    insert URL to server-side program here    " method="post"   name="form1" id="form1">  <!--  Form fields and other standard XHTML markup and text  -->  </form>   </body>   </html>

Although this syntax is adequate to build the form framework in most cases, there are other attributes for the form element that might be useful for frame targeting, scripting, and style sheets; these are described in Appendix A.

< Day Day Up >