The questions of where and how are closely related, because the target of submission is a URI. The first part of a URI, called the scheme, indicates the general approach for the submit transaction, as in "http," "file," or "mailto." The remainder of the URI gives more specific information on where the destination for the data is to be. Additionally, there need to be rules for how the in-memory instance data gets written down as a pattern of bytes on the wire. In addition to XML, several backward-compatible formats included in XForms are described in the following sections. 8.3.1 URI Scheme and MethodURI schemes, included as part of the action attribute on submission, are the broadest selector of where and how form data gets submitted. A more fine-grained distinction is the request method (often just simply called "method"), which defines details about the relationship between a URI and the representation of whatever resides at that URI. The most common request method is GET, which is used for requesting most web pages, images, sound, and video through a web browser. GET is commonly used with forms, too, especially shorter ones. The second most common method is POST, which is described in the definition of HTTP/1.1 at RFC 2616 as the preferred way to provide:
In any case, the actual function performed by the POST method is determined by the server and is usually dependent on the URI that is part of the operation. A third request method is PUT little used on the Web today, but hopefully something that XForms can help change. A PUT is also a write operation but, unlike POST, it implies that an existing resource indicated by the URI is getting replaced, rather than annotated or appended to. If there is no preexisting resource, then the PUT method has the effect of creating a new resource. In XForms terms, the attribute method on submission indicates the author's selection of request method. The combination of URI scheme and method defines the overall processing that will happen during submit. Note, however, that nonsensical combinations are possible such as "mailto:" with PUT or "file:" with POST.
8.3.1.1 http or httpsThe http scheme is the staple of the Web. The https scheme is functionally equivalent, except that contents in transit are encrypted so that prying eyes can't read the contents on the wire. GET, POST, and PUT all make sense for form data sent via the http scheme, under the right conditions. All conforming XForms processors are guaranteed to support http. The https scheme, however, might not be present on certain small devices that don't support the necessary encryption routines. 8.3.1.2 fileThe file scheme represents access to the local filesystem. On the Windows platform, networked file shares, which are treated in a similar fashion to the local file system, can also be accessed through the file URI scheme. Only PUT makes sense for form data sent via the file scheme. Since not every XForms processor is guaranteed to have a filesystem, this scheme isn't guaranteed to be supported. The file scheme can be useful when used indirectly with relative paths. For example, the following declaration <submission method="put" action="myfile.xml"/> specifies that the action URI is relative. When the containing document is loaded from a file scheme, then the submission also goes to the file myfile.xml in the current directory. When the containing document is loaded from an http scheme, however, the submission gets PUT to a URI in the same directory as the current document, except ending in myfile.xml.
file:/C:/dir/file.xml file://C:/dir/file.xml file:///C:/dir/file.xml file:///C|/dir/file.xml file:///C%3A/dir/file.xml
8.3.1.3 mailtoThe mailto scheme represents an electronic mailbox, to which messages may be posted. Only POST makes sense for form data via the mailto scheme. Since not every XForms processor includes mail functionality, this scheme isn't guaranteed to be supported. 8.3.1.4 Others...Any other URI schemes not listed here should be used only if you are in a controlled environment (such as an intranet) where you can ensure that browsers will support it, or if the form is non-critical and of little consequence to users who find that submit doesn't work. Support for additional URI schemes is considered an extension to XForms, which is covered further in Chapter 11. 8.3.2 Serialization Formats for Data SubmissionAt some point the in-memory instance data gets converted, or serialized, into a stream of bytes suitable for sending over the wire. The following sections describe the serialization formats defined in XForms 1.0. 8.3.2.1 application/xmlXML for form data submission is one of the main motivations behind XForms. This is the most straightforward serialization format; after all, instance data is based on the XPath data model, which was specifically designed to model XML. XForms borrows from XSLT several attributes that fine-tune the serialization process: indent, encoding, omit-xml-declaration, standalone, and cdata-section-elements. Note that these attributes maintain the original spelling, including dashes, as in XSLT. The following section describing the submission element contains all the details on what each attribute does. The important thing to note is that all of these attributes taken from XSLT are advisory only, and that an XForms processor is free to ignore any that are inconvenient for the implementer. The media type of the submitted XML will be application/xml by default, though this can be overridden with the mediatype attribute. Another attribute, includenamespaceprefixes, is the part of XForms that has to do with details of how namespaces are generally handled in XML-based specifications. The XPath data model contains, for each element node, one namespace node per in-scope namespace. As a result, inline instance data will have additional, generally unwanted, namespace nodes that get serialized. Example 8-2 shows code that will give this result. Example 8-2. Serialization of namespace nodes<xforms:instance> <my:data/> </xforms:instance> In Example 8-2, the XForms namespace is in scope, bound to the prefix xforms. Correspondingly, the XPath data model will contain a namespace node for the XForms namespace and, without taking any special action, the serialized XML will look like this: <my:data xmlns:xforms="http://www.w3.org/2002/xforms"/> In other words, the end result includes an unnecessary namespace declaration. Because of the widespread use of namespace prefixes in attribute values and text, it's not always safe to throw away unused prefixes. The solution is to specify the includenamespacesprefixes attribute, which will cause any prefixes that are not visibly used (for element or attribute names) to be suppressed, unless they are included in a space-separated list. A special value, #default, applies to the default namespace. So, to prevent the unwanted xforms namespace declaration seen earlier, a simple: includenamespacesprefixes="" on the submission element would do the trick. 8.3.2.2 application/x-www-form-urlencodedThe algorithm for urlencoding is quite simple, but nevertheless has caused many problems in the past. The reason for this is that the algorithm specification, as defined in HTML, didn't say what to do with characters outside the range of ASCII. As a result, numerous variations sprang into existence, with no way to tell which was which. XForms fixes this by mandating UTF-8 as the one true basis for urlencoding. In UTF-8, a single character is represented by a single byte for characters in the ASCII range, and by between two and five bytes for other Unicode characters. Overall, the urlencoding algorithm for a given string boils down to:
For example, the string "Ünited Stätes" after urlencoding, would be "%C3%9Cnited+St%C3%A4tes".[1]
A bigger hurdle is representing structured XML as a flat list of name/value pairs. In this, XForms doesn't attempt to model an entire tree as a flat structure. Instead, only the leaf element nodes those that contain one and only one text child node are included in the serialization. That's right no attributes, no namespace information, and no elements that aren't leaf nodes. When such XML features are needed, application/xml is the appropriate serialization format. The overall serialization follows the document order of the instance data, and is formatted as: {element local name}={value of text node}{separator} Where the element local name and value of the text node are urlencoded, separated by a literal equals character. Between each grouping is a separator character, a semicolon by default. For compatibility with older systems, this character can be changed to an ampersand through the separator attribute on submission (the ampersand is no longer favored, because it needs to be specially escaped as & when represented in XML). 8.3.2.3 multipart/relatedOne drawback of application/xml, and especially of urlencoded data, is that binary content can't be represented efficiently. The answer to this dilemma is a media type that allows binary content to be packaged separately from XML. A number of MIME types that start with multipart/, though originating as part of the global email system, have come into use as a way to package binary data along with XML. All of the multipart formats break a message into smaller pieces, simply called parts. In multipart/related, the first part contains XML serialized just as in the application/xml serialization method. Subsequent parts contain binary resources that the user selected through <upload> form controls, which must be bound to instance data nodes of the XML Schema datatype anyURI. For example, a simple form might capture an employee name as a string and a photo as an anyURI, like this: <xforms:input ref="name"> <xforms:label>Name</xforms:label> </xforms:input> <xforms:upload ref="picture" mediatype="image/*"> <xforms:label>Photo</xforms:label> </xforms:upload> Serialized as multipart/related, the result would be: Content-Type: multipart/related; boundary=a42113842b; type=application/xml; start"=<000000@dubinko.info>" Content-Length: 65232 --f93dcbA3 Content-Type: application/xml; charset=UTF-8 Content-ID: <000000@dubinko.info> <?xml version="1.0"?> <root_element> <name>Cordova Cassanova</name> <picture>cid:000001@edubinko.info</picture> </root_element> --a42113842b Content-Type: image/jpg Content-Transfer-Encoding: binary Content-ID: <000001@dubinko.info> ...binary image data... --a42113842b-- Notice that the URI of the picture has been dereferenced, and the actual data now appears in the submitted data stream.
8.3.2.4 multipart/form-dataThis serialization format, which is already widely deployed across the Web due to the <input type="file"> control in HTML forms, doesn't take any special advantage of XML. Following the same rules as application/x-www-form-urlencoded, every leaf node is treated as a separate part, each of which can exist in a separate encoding. Since individual sections can include binary characters without the overhead of escape characters, the overall data size can be much smaller than other serialization formats. The technique this serialization format uses can lead to a proliferation of parts in a multipart stream, which can lead to overhead for larger bodies of instance data. For content that would otherwise be base64Encoded or hexEncoded, however, it can provide a substantial savings. |