Packaging is about support for documents composed of possible several entities that may include both structural and binary data. It may seem strange that after all these years of hard work on XML- related specifications, there is no final answer about the packaging mechanism, and new specifications keep coming up to specifically address this subject. Probably the easiest way to package information isn't to package it at all. Even though an HTML page doesn't include images, it's still possible for an application to traverse links and access and present them, as well as any additional information. Equally, resources can be accessed using another communication channel and referenced from within the XML document using XML Linking Language (XLink, http://www.w3.org/TR/xlink/) or similar specification. However, this isn't always feasible for security, performance, or infrastructure requirements reasons. XML looks like an obvious choice when considering a packaging mechanism. It's simple, and it's extensible. Still, to no surprise, there are several problems with XML (which all grow from the text nature of the XML format). First, it has to be extended to allow true binary data. XML is defined as a character stream, and there is no way to embed raw binary data directly within an XML document; all binary data must be encoded as characters . There are several possible encodings; for example, the XML Schema specification defines usage of Base64 or hexadecimal digits for including arbitrary data, but they aren't very efficient. Second, XML as a format is ill-suited for random access. Whitespace flexibility, entity expansion, and the expectation that document is serially scanned from beginning to end work against random access. It's suboptimal to have to receive and parse the entire document when only a fragment of it is required for further processing. XML Fragment Interchange specification (http://www.w3.org/TR/xml-fragment) was especially designed to address this problem. Third, the XML format isn't recursive. It's often difficult to embed an XML document within another XML document, even though "there is more than one way (to try) to do it." It's possible to:
Because XML isn't the most concise representation, the format produced by one of the compression algorithms is the next option to consider. While compression can be used at the transport level, and it's quite possible that it will be introduced in a way similar to XML Encryption (discussed later in this chapter) to allow selective compression of elements in XML document, it's highly unlikely that it will be used as a packaging format. The Multipurpose Internet Mail Extensions (MIME, http://www.ietf.org/rfc/rfc2045.txt) is a well-known and widely used format. Even though it can combine groups of files or documents into one package for transmission, it isn't without flaws:
One new option is the Direct Internet Message Encapsulation (DIME, http://www.ietf.org/internet-drafts/draft- nielsen -dime-02.txt) specification proposed by Microsoft jointly with IBM. DIME gets its strength from a short and focused requirement list and provides simple structure, performance benefits, and media-type definition decentralization (although, it doesn't have the extensibility MIME provides). 12.2.1 MIME and SOAP with AttachmentsThe MIME specification defines a message container (called an entity ) that contains one or more entities (an entity that is packed inside another entity is called a body part); each entity includes an entity body and entity header. Messages that have more than one part are described in terms of multipart MIME types (such as Multipart/Related or Multipart/Alternative ) and define the boundary of the contained body parts. The boundary is simply a unique text string that isn't present in any body part. The SOAP Messages with Attachments (SwA, http://www.w3.org/TR/SOAP-attachments) specification describes the simple way of encapsulating entities related to SOAP messages using the MIME multipart mechanism. The rules are very simple:
The message structure produced according to these rules is illustrated in Example 12-1. Example 12-1. A message created according to the SwA specificationMIME-Version: 1.0 Content-Type: Multipart/Related; boundary=MIME_boundary; type="text/xml" --MIME_boundary Content-Type: text/xml; charset=UTF-8 Content-Transfer-Encoding: 8bit <?xml version='1.0' ?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"> <SOAP-ENV:Body> <image xmlns="http://whatever.com/" href="cid:image"/> </SOAP-ENV:Body> </SOAP-ENV:Envelope> --MIME_boundary Content-Type: image/tiff Content-Transfer-Encoding: binary Content-ID: <image> ...binary TIFF image... --MIME_boundary-- There's only one thing to notice. The attached part includes a Content-ID header ( <image> in this example); an element in the SOAP envelope refers to that part using a href attribute ( cid:image in this example; the cid scheme refers to a specific part of a message). The SOAP::Lite module provides transparent access to data packaged this way (there is nothing to do on the client or server side to accept MIME-encoded messages; the value of the image element in this example will be the picture itself when accessed using SOAP::Lite ), however, the current version (0.55) doesn't help to create messages with attachments. The SOAP::MIME package (created by Byrne Reese and available from CPAN) extends support for parsing MIME messages providing direct access to the attachments and also allows composing messages with attachments. Its interface is simple: use SOAP::Lite; use SOAP::MIME; use MIME::Entity; my $cid = "bar"; my $ent = build MIME::Entity Type => "image/gif", Encoding => "base64", Path => "image.gif", 'Content-Id' => "<$cid>", Disposition => "attachment"; my $som = SOAP::Lite ->uri("...") ->proxy("...") ->parts($ent) ->send_image(SOAP::Data->name("foo")->attr({href => "cid:$cid"})); It's quite possible that this functionality will be included in a future version of the SOAP::Lite module. 12.2.2 DIME and WS-AttachmentsThe structure of DIME message is somewhat similar to the structure of MIME messages (yet it uses a different terminology). Parts are called records and they consist of a header and a payload, which can be either a complete object or a chunk of an object. Similar to the MIME format, DIME uses headers to carry metadata about an object encoded in a payload. There are several important differences between the DIME and MIME formats. First, MIME identifies the structure of content stored in the body by a type specified in a Content-Type header, whereas DIME indicates the type of the payload in two ways. The first way is identical to the usage of MIME media type as type identifier; the second (optional) method defines the type through the use of the URI. The second difference between the formats is that even though DIME records are 32-bit word aligned variable length records, each record includes a fixed length binary array with flags and lengths of four elements that follow: OPTIONS , TYPE , ID , and DATA . As a result, the length of the record is always known, and it's easy to implement random access. The DIME specification defines a mechanism to create chunked records; it allows the sender to split up an entity into multiple records to minimize memory requirements. A complete DIME or MIME message can be nested within a DIME (or MIME) message. The confusingly named WS-Attachments specification can be found at:
It's from Microsoft and IBM and defines a mechanism for encapsulating a SOAP message and zero or more attachments in a DIME message. The rules are pretty straightforward:
Because DIME is a binary format, it isn't easy to provide an example of the message. Some of the tracing tools (e.g., YATT, http://www.pocketsoap.com/yatt/, written by Simon Fell) support a hexadecimal view mode, which simplifies the debugging. The DIME specification is still fairly new, and it remains to be seen what role DIME will play in the web-services infrastructure and how well it will be supported by other vendors . |