Message Encodings


Over time, many of us have been conditioned to think of XML (and therefore SOAP) as structured text. After all, text is human readable, and every computing system can process text. The universal nature of text-based XML resonates with our desire to interoperate with connected systems. Text-encoded XML, while being easy to interpret, is inherently bulky. It is reasonable to expect some performance penalty when using XML. Just as it takes some effort to place a thank-you letter in an envelope, it takes some processing time to interact with XML. In some cases, however, the sheer size of text-encoded XML restricts its use, especially when we want to send an XML message over the wire.

Furthermore, if we restrict ourselves to text-encoded XML, how can we send binary data (like music or video) in an XML document? If you’ve read up on your standard XML Schema data types, you will know that two binary data types exist: xs:base64Binary and xs:hexBinary. Essentially, both of these data types represent data as an ordered set of octets. Using these XML data types might have solved the problem of embedding binary data in a document, but they have actually made the performance problem worse. It is a well-known fact that base64-encoded data inflates data size by roughly 30%. The story is worse for xs:hexBinary, since it inflates the resultant data by a factor of 2. Both of these factors assume an underlying text encoding of UTF-8. These factors double if UTF-16 is the underlying text encoding.

The XML Infoset

To find the answer to our performance dilemma, let’s take a closer look at exactly what makes up an XML document. If we look at the specifications, XML is a precise syntax for writing structured data (as defined at http://www.w3.org/TR/REC-xml/). It demands that well-formed XML documents have start and end elements, a root node, and so on. Oddly enough, after the XML specification was released, a need arose to abstractly define XML documents. The XML Infoset (as defined at http://www.w3.org/TR/xml-infoset/) provides this abstract definition.

In practice, the XML Infoset defines the relationship between items, without defining any specific syntax. This lack of a specific syntax in the XML Infoset leaves the door open for new, more efficient encodings. If our parser adheres to the XML Infoset, as opposed to the XML syntax, we can interpret a variety of different message encodings, including ones more efficient than text, without materially altering our application.

SOAP and the XML Infoset

Remember that SOAP is built on XML. This raises a question: Are SOAP messages built on the earlier XML syntax or on the XML Infoset? The answer is both. Two SOAP specifications exist: SOAP 1.1 and SOAP 1.2. SOAP 1.1 is built on the older XML syntax, while SOAP 1.2 is built on the XML Infoset. Given this fact, it is reasonable to assume that a SOAP 1.2 message might not be readable by a SOAP 1.1 parser. WCF is built on the XML Infoset, but it has the capability to process both SOAP 1.1 and SOAP 1.2 messages.

WCF can be adapted and customized to work with virtually any message encoding, as long as the message is SOAP 1.1 or 1.2 compliant (it can also work with messages that are not SOAP messages). As you will see in subsequent chapters, WCF has a very pluggable and composable architecture, so custom encoders can be easily added to the WCF message pipeline. As new encodings are developed and implemented, either Microsoft or third parties can create these new encoders and plug them into the appropriate messaging stack. I will describe message encoders in greater detail in Chapter 6, “Channels.” For now, let’s take a look at the encoders included in WCF. At the time of this writing, WCF ships with three encoders: text, binary, and Message Transmission Optimization Mechanism (MTOM).

The Text Encoder

As you can guess from its name, the output of the text encoder is text-encoded messages. Every system that understands Unicode text will be able to read and process messages that have been passed through this encoder, making it a great choice when interoperating with non-WCF systems. Binary data can be included in text-encoded messages via the xs:base64Binary Extensible Schema Definition (XSD) data type. Here is a message that has been encoded by using the WCF text encoder (with some elements removed for clarity):

   <s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope">     <s:Header>...</s:Header>     <s:Body>       <SubmitOrder xmlns="http://wintellect.com/OrderProcess">         <Order xmlns:i="http://www.w3.org/2001/XMLSchema-instance">         <OrderByte xmlns="http://wintellect.com/Order"> mktjxwyxKr/9oW/jO48IhUwrZvNOdyuuquZEAIcy08aa+HXkT3dNmvE/ +zI96Q91a9Zb17HtrCIgtBwmbSk4ys2pSEMaIzXV3cwCD3z4ccDWzpWx1/ wUrEtSxJtaJi3HBzBlk6DMW0eghvnl652lKEJcUJ6Uh/LRlZz3x1+aereeOgdLkt4gCnNOEFECL8CtrJtY/taPM4A+k/ 4E1JPnBgtCRrGWWpVkO0UqRXahz2XbShrDQnzgDwaHDf/ fHDXfZgpFwOgPF1IG88KQZO0JncSYKIp5I8OPYTeqD0yVhB8QSt9sWw59yzLHvU65UKoYfXA7RvOqZkJGtV6wZAgGcA= =           </OrderByte>         <OrderNumber xmlns="http://wintellect.com/Order">             12345           </OrderNumber>         </Order>       </SubmitOrder>     </s:Body>   </s:Envelope>

The Binary Encoder

The binary encoder is the most highly performing message encoder and is intended for WCF-to-WCF communication only. Of all the encoders in WCF, the binary encoder produces the smallest messages. Keep in mind that this encoder produces a serialized Infoset, even though it is in a binary format. It is likely that in the future, a standard binary encoding will be universally adopted, as these types of encodings can dramatically improve the efficiency of a messaging application.

The MTOM Encoder

The MTOM encoder creates messages that are encoded according to the rules stated in the MTOM specification. (The MTOM specification is available at http://www.w3.org/TR/soap12-mtom/.) Because the MTOM encoding is governed by a specification, other vendors are free to create infrastructures that send and receive MTOM messages. As a result, WCF messages that pass through the MTOM encoder can be sent to non-WCF applications (as long as those applications understand MTOM). In general, MTOM is intended to allow efficient transmission of messages that contain binary data, while also providing a mechanism for applying digital signatures. The MTOM message encoding enables these features through the use of Multipurpose Internet Mail Extensions (MIME) message parts and inline base64 encoding. The content of the MTOM message is defined by the Xml-binary Optimized Packaging recommendation. For more information, see http://www.w3.org/TR/xop10/.

At run time, the MTOM encoder creates an inline base64-encoded representation of the binary data for digital signature computation and makes the raw binary data available for packaging alongside the SOAP message. An MTOM encoded message looks as follows:

 // start of a boundary in the multipart message --uuid:+id=1 Content-ID: <http://wintellect.com/0> Content-Transfer-Encoding: 8bit // set the content type to xop+xml Content-Type: application/xop+xml;charset=utf8; type="application/soap+xml" <s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope">   <s:Header>...</s:Header>   <s:Body>     <SubmitOrder xmlns="http://wintellect.com/OrderProcess">       <order xmlns:i="http://www.w3.org/2001/XMLSchema-instance">         <OrderByte xmlns="http://wintellect.com/Order">            // add a reference to another message part           <xop:Include href=cid:http://wintellect.com/1/12345            xmlns:xop="http://www.w3.org/2004/08/xop/include"/>         </OrderByte>         <OrderNumber xmlns="http://wintellect.com/Order">           12345         </OrderNumber>       </order>     </SubmitOrder>   </s:Body> </s:Envelope> // end of the boundary in the first message part --uuid:+id=1 // add the binary data as an octect stream Content-ID: <http://wintellect.com/1/12345> Content-Transfer-Encoding: binary Content-Type: application/octet-stream // raw binary data here

Notice that the binary data is kept in its raw format in another part of the message and referenced from the SOAP body. Since the binary data is packaged in a message part that is external to the SOAP message, how can one apply a digital signature to the SOAP message? If we use an XML-based security mechanism, like those stated in XML Encryption and XML Digital Signature, we cannot reference external binary streams. These encryption and signing mechanisms demand that the protected data be wrapped in a SOAP message. At first glance, it appears that there is no way around this problem with multipart messages. In fact, this was the Achilles’ heel of Direct Internet Message Encapsulation (DIME) and SOAP with Attachments. MTOM provides an interesting way around this problem.

The MTOM encoding specification states that an MTOM message can contain inline binary data in the form of base64-encoded strings or as binary streams in additional message parts. It also states that a base64-encoded representation of any binary data must be available during processing. In other words, additional binary message parts can be created for message transmission, but inline base64 data must be temporarily available for operations like applying digital signatures. While the message is in this temporary inline base64-encoded state, an XML-based security mechanism can be applied to the SOAP message. After the security mechanism has been applied, the message can then be serialized as a multipart message. When the receiver receives the message, the message can be validated according to the rules set forth by the specific XML security mechanism.

It is also interesting to note that the WCF MTOM encoder reserves the right to serialize the binary chunks of a message as either inline base64-encoded strings or as binary streams in additional message parts. The WCF encoder uses the size of the binary data as a key determining factor. In our previous message, the OrderBytes element was about 800 KB. If we reduce the size of the OrderBytes element to 128 bytes and check the message format, we see the following:

 // start of a boundary in the multipart message --uuid:+id=1 Content-ID: <http://wintellect.com/0> Content-Transfer-Encoding: 8bit // set the content type to xop+xml Content-Type: application/xop+xml;charset=utf8; type="application/soap+xml" <s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope">   <s:Header>...</s:Header>   <s:Body>     <SubmitOrder xmlns="http://wintellect.com/OrderProcess">       <order xmlns:i="http://www.w3.org/2001/XMLSchema-instance">         <OrderByte xmlns="http://wintellect.com/Order"> kF+k2CQd/lCitSYvXnLhuOtaMCk/tZaFZIWeW7keC3YvgstAWoht/wiOiR5+HZPo+TzYoH+qE9vJHnSefqKXg6mw/ 9ymoV1i7TEhsCt3BkfytmF9Rmv3hW7wdjsUzoBl9gZ1zR62QVjedbJNiWKvUhgtq8hAGjw+uXlttSohTh6xu7kkAjgoO 3QJntG4qfwMQCQj5iO4JdzJNhSkSYwtvCaTnM2oi0/fBHBUN3trhRB9YXQG/mj7+ZbdWsskg/ Lo2+GrJAwuY7XUROKyY+5hXrAEJ+cXJr6+mKM3yzCDu4B9bFuZv2ADTv6/MbmFSJWnfPwbH1wK0LQi7Ixo95iF         </OrderByte>         <OrderNumber xmlns="http://wintellect.com/Order">           12345         </OrderNumber>       </order>     </SubmitOrder>   </s:Body> </s:Envelope> --uuid:+id=1-

In this case, the WCF encoder opted to serialize the binary element as an inline base64-encoded string. This optimization is perfectly legal according to the MTOM specification.

Choosing the Right Encoding

Choosing the correct message encoding forces one to consider current and future uses of the message. For the most part, application interoperability and the type of data in the message will dictate your choice of message encodings. Performance, however, can also play a role in determining which encoding is best suited to your system. Table 2-1 ranks encodings based on what type of message is being sent and what sorts of systems can receive the message.

Table 2-1: Message Encodings by Rank and Scenario
Open table as spreadsheet

Type of Message

Binary

Text

MTOM

Text payload, Interop with other WCF systems only

1

2

3

Text payload, Interop with modern non-WCF systems

N/A

1

2

Text payload, Interop with older non-WCF systems

N/A

1

N/A

Large binary payload, Interop with other WCF systems only

1

3

2

Large binary payload, Interop with modern non-WCF systems

N/A

2

1

Large binary payload, Interop with older non-WCF systems

N/A

1

N/A

Small binary payload, Interop with other WCF systems only

1

2

3

Small binary payload, Interop with modern non-WCF systems

N/A

1

2

Small binary payload, Interop with older non-WCF systems

N/A

1

N/A

It shouldn’t be surprising that the binary encoding is the most efficient means to send messages to other WCF systems. What may come as a surprise, however, is the fact that MTOM messages can be less efficient, in an end-to-end sense, than text messages. Interoperability and the size of the binary data being sent are the two factors that should help you decide between MTOM and text encodings in your application. For the most part, one can send MTOM only messages to systems that implement an MTOM encoder. At the time of this writing, MTOM is a fairly new specification, so only modern systems can effectively process MTOM messages. From a performance perspective, the MTOM encoder makes sense only when the binary data being wrapped in a message is fairly large. MTOM should never be used with messages that do not contain binary data because MTOM’s performance will always be worse than the regular text encoding. It is important, however, to run independent tests using messages that accurately represent those in production.

Luckily, as we’ll see in Chapter 4, “WCF 101,” WCF is designed in such a way that these encoding choices do not require a major change in the application. In fact, it is possible to have one service that can interact with different message encodings. For example, one service can interact with both binary-encoded and text-encoded messages. The benefit in this scenario is that the service can be very highly performing when communicating with other WCF participants and still interoperate with other platforms, like Java.




Inside Windows Communication Foundation
Inside Windows Communication Foundation (Pro Developer)
ISBN: 0735623066
EAN: 2147483647
Year: 2007
Pages: 106
Authors: Justin Smith

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net