8.3 Where and How to Submit | Xforms Essentials

The questions of where and how are closely related, because the target of submission is a URI. The first part of a URI, called the scheme, indicates the general approach for the submit transaction, as in "http," "file," or "mailto." The remainder of the URI gives more specific information on where the destination for the data is to be.

Additionally, there need to be rules for how the in-memory instance data gets written down as a pattern of bytes on the wire. In addition to XML, several backward-compatible formats included in XForms are described in the following sections.

8.3.1 URI Scheme and Method

URI schemes, included as part of the action attribute on submission, are the broadest selector of where and how form data gets submitted. A more fine-grained distinction is the request method (often just simply called "method"), which defines details about the relationship between a URI and the representation of whatever resides at that URI.

The most common request method is GET, which is used for requesting most web pages, images, sound, and video through a web browser. GET is commonly used with forms, too, especially shorter ones. The second most common method is POST, which is described in the definition of HTTP/1.1 at RFC 2616 as the preferred way to provide:

Annotation of existing resources
Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles
Providing a block of data, such as the result of submitting a form, to a data-handling process
Extending a database through an append operation

In any case, the actual function performed by the POST method is determined by the server and is usually dependent on the URI that is part of the operation.

A third request method is PUT little used on the Web today, but hopefully something that XForms can help change. A PUT is also a write operation but, unlike POST, it implies that an existing resource indicated by the URI is getting replaced, rather than annotated or appended to. If there is no preexisting resource, then the PUT method has the effect of creating a new resource.

In XForms terms, the attribute method on submission indicates the author's selection of request method. The combination of URI scheme and method defines the overall processing that will happen during submit. Note, however, that nonsensical combinations are possible such as "mailto:" with PUT or "file:" with POST.

When to Use GET

The advent of web services has caused a considerable amount of discussion on the question of when to use GET rather than POST in HTTP. SOAP 1.1 defined only a POST binding which, when combined with a proliferation of point-and-click tools, led to an initial surge in the popularity of POST (and a corresponding decline in the popularity of GET).

A back-to-basics movement called REST (Representational State Transfer, formalized in the PhD thesis of Apache developer Roy T. Fielding), however, has shifted emphasis back to the virtues of GET. The W3C SOAP 1.2 effort, for example, has added machinery to allow SOAP calls to take place through a simple HTTP GET.

The W3C Technical Architecture Group (TAG) has also weighed in, issuing "URIs, Addressability, and the use of HTTP GET" at http://www.w3.org/2001/tag/doc/get7. As far as this relates to form authoring, a good rule of thumb is to always first consider using GET, which will produce a bookmarkable result in other words, creating an "addressable" resource that anyone on the Web can access just by entering a URL in their browser. The exception, when POST is preferable, occurs when one or more of the following is true:

The form has a large amount of data, which will usually be the case when an upload form control is present.
Submitting the form causes some significant obligation, such as placing an online order.
Form data is sensitive enough that it shouldn't be included in the URI, as it appears in the browser location bar or server logs, which will usually be the case when a secret form control is present.

One possible option to avoid the entire GET versus POST controversy is to use PUT instead. This method is best suited for situations where a form is tied to a single resource, so that the form can be considered a specialized editing or document creation tool for XML.

8.3.1.1 http or https

The http scheme is the staple of the Web. The https scheme is functionally equivalent, except that contents in transit are encrypted so that prying eyes can't read the contents on the wire.

GET, POST, and PUT all make sense for form data sent via the http scheme, under the right conditions. All conforming XForms processors are guaranteed to support http. The https scheme, however, might not be present on certain small devices that don't support the necessary encryption routines.

8.3.1.2 file

The file scheme represents access to the local filesystem. On the Windows platform, networked file shares, which are treated in a similar fashion to the local file system, can also be accessed through the file URI scheme.

Only PUT makes sense for form data sent via the file scheme. Since not every XForms processor is guaranteed to have a filesystem, this scheme isn't guaranteed to be supported.

The file scheme can be useful when used indirectly with relative paths. For example, the following declaration

<submission method="put" action="myfile.xml"/>

specifies that the action URI is relative. When the containing document is loaded from a file scheme, then the submission also goes to the file myfile.xml in the current directory. When the containing document is loaded from an http scheme, however, the submission gets PUT to a URI in the same directory as the current document, except ending in myfile.xml.

On DOS and Windows file systems, absolute file paths contain a drive letter. The official Internet standards are silent on how to include drive letters, especially since the colon character is reserved for a different purpose. As a result, there is no universally agreed-upon way to describe a file URI with a drive letter. Some variations in use include:

file:/C:/dir/file.xml file://C:/dir/file.xml file:///C:/dir/file.xml file:///C|/dir/file.xml file:///C%3A/dir/file.xml

UNC paths, starting with a double backslash, have a similar set of ambiguities when mapped to the file URI scheme.

8.3.1.3 mailto

The mailto scheme represents an electronic mailbox, to which messages may be posted.

Only POST makes sense for form data via the mailto scheme. Since not every XForms processor includes mail functionality, this scheme isn't guaranteed to be supported.

8.3.1.4 Others...

Any other URI schemes not listed here should be used only if you are in a controlled environment (such as an intranet) where you can ensure that browsers will support it, or if the form is non-critical and of little consequence to users who find that submit doesn't work.

Support for additional URI schemes is considered an extension to XForms, which is covered further in Chapter 11.

8.3.2 Serialization Formats for Data Submission

At some point the in-memory instance data gets converted, or serialized, into a stream of bytes suitable for sending over the wire. The following sections describe the serialization formats defined in XForms 1.0.

8.3.2.1 application/xml

XML for form data submission is one of the main motivations behind XForms. This is the most straightforward serialization format; after all, instance data is based on the XPath data model, which was specifically designed to model XML.

XForms borrows from XSLT several attributes that fine-tune the serialization process: indent, encoding, omit-xml-declaration, standalone, and cdata-section-elements. Note that these attributes maintain the original spelling, including dashes, as in XSLT. The following section describing the submission element contains all the details on what each attribute does. The important thing to note is that all of these attributes taken from XSLT are advisory only, and that an XForms processor is free to ignore any that are inconvenient for the implementer.

The media type of the submitted XML will be application/xml by default, though this can be overridden with the mediatype attribute.

Another attribute, includenamespaceprefixes, is the part of XForms that has to do with details of how namespaces are generally handled in XML-based specifications. The XPath data model contains, for each element node, one namespace node per in-scope namespace. As a result, inline instance data will have additional, generally unwanted, namespace nodes that get serialized. Example 8-2 shows code that will give this result.

Example 8-2. Serialization of namespace nodes

<xforms:instance>   <my:data/> </xforms:instance>

In Example 8-2, the XForms namespace is in scope, bound to the prefix xforms. Correspondingly, the XPath data model will contain a namespace node for the XForms namespace and, without taking any special action, the serialized XML will look like this:

<my:data xmlns:xforms="http://www.w3.org/2002/xforms"/>

In other words, the end result includes an unnecessary namespace declaration. Because of the widespread use of namespace prefixes in attribute values and text, it's not always safe to throw away unused prefixes. The solution is to specify the includenamespacesprefixes attribute, which will cause any prefixes that are not visibly used (for element or attribute names) to be suppressed, unless they are included in a space-separated list. A special value, #default, applies to the default namespace. So, to prevent the unwanted xforms namespace declaration seen earlier, a simple:

includenamespacesprefixes=""

on the submission element would do the trick.

8.3.2.2 application/x-www-form-urlencoded

The algorithm for urlencoding is quite simple, but nevertheless has caused many problems in the past. The reason for this is that the algorithm specification, as defined in HTML, didn't say what to do with characters outside the range of ASCII. As a result, numerous variations sprang into existence, with no way to tell which was which.

XForms fixes this by mandating UTF-8 as the one true basis for urlencoding. In UTF-8, a single character is represented by a single byte for characters in the ASCII range, and by between two and five bytes for other Unicode characters. Overall, the urlencoding algorithm for a given string boils down to:

Replace all space characters with +, and all reserved characters with %NN, where NN represents the uppercase hexadecimal notation for the character. (Reserved characters are semicolon, slash, question mark, colon, at sign, ampersand, equals, plus, dollar sign, and comma.)
Replace all characters outside of ASCII range with the (multiple byte) representation of that character in UTF-8, with each byte in turn represented as %NN, as above.

For example, the string "Ünited Stätes" after urlencoding, would be "%C3%9Cnited+St%C3%A4tes".^[1]

^[1] According to The Onion, 29 April 1997, the U.S. Congress planned to toughen the image of the country by adding umlauts to the name.

A bigger hurdle is representing structured XML as a flat list of name/value pairs. In this, XForms doesn't attempt to model an entire tree as a flat structure. Instead, only the leaf element nodes those that contain one and only one text child node are included in the serialization. That's right no attributes, no namespace information, and no elements that aren't leaf nodes. When such XML features are needed, application/xml is the appropriate serialization format.

The overall serialization follows the document order of the instance data, and is formatted as:

{element local name}={value of text node}{separator}

Where the element local name and value of the text node are urlencoded, separated by a literal equals character. Between each grouping is a separator character, a semicolon by default. For compatibility with older systems, this character can be changed to an ampersand through the separator attribute on submission (the ampersand is no longer favored, because it needs to be specially escaped as & when represented in XML).

8.3.2.3 multipart/related

One drawback of application/xml, and especially of urlencoded data, is that binary content can't be represented efficiently. The answer to this dilemma is a media type that allows binary content to be packaged separately from XML. A number of MIME types that start with multipart/, though originating as part of the global email system, have come into use as a way to package binary data along with XML.

All of the multipart formats break a message into smaller pieces, simply called parts. In multipart/related, the first part contains XML serialized just as in the application/xml serialization method. Subsequent parts contain binary resources that the user selected through <upload> form controls, which must be bound to instance data nodes of the XML Schema datatype anyURI.

For example, a simple form might capture an employee name as a string and a photo as an anyURI, like this:

<xforms:input ref="name">   <xforms:label>Name</xforms:label> </xforms:input> <xforms:upload ref="picture" mediatype="image/*">   <xforms:label>Photo</xforms:label> </xforms:upload>

Serialized as multipart/related, the result would be:

Content-Type: multipart/related; boundary=a42113842b; type=application/xml; start"=<000000@dubinko.info>" Content-Length: 65232 --f93dcbA3 Content-Type: application/xml; charset=UTF-8 Content-ID: <000000@dubinko.info> <?xml version="1.0"?> <root_element>   <name>Cordova Cassanova</name>   <picture>cid:000001@edubinko.info</picture> </root_element> --a42113842b Content-Type: image/jpg Content-Transfer-Encoding: binary Content-ID: <000001@dubinko.info> ...binary image data... --a42113842b--

Notice that the URI of the picture has been dereferenced, and the actual data now appears in the submitted data stream.

If the instance data contains encoded binary data, through the XML Schema base64Binary or hexBinary datatypes, for example, multipart/related is a poor choice of a serialization format, since the encoded binary data will appear inline in the first XML part, not as separate parts.

8.3.2.4 multipart/form-data

This serialization format, which is already widely deployed across the Web due to the <input type="file"> control in HTML forms, doesn't take any special advantage of XML. Following the same rules as application/x-www-form-urlencoded, every leaf node is treated as a separate part, each of which can exist in a separate encoding. Since individual sections can include binary characters without the overhead of escape characters, the overall data size can be much smaller than other serialization formats.

The technique this serialization format uses can lead to a proliferation of parts in a multipart stream, which can lead to overhead for larger bodies of instance data. For content that would otherwise be base64Encoded or hexEncoded, however, it can provide a substantial savings.