E-Mail and MIME

team lib

Extending e-mail beyond simple text-based messaging.

In recent tutorials I've covered e-mail and how you can send and receive messages over the Internet. In this tutorial, I'd like to talk about MIME (Multipurpose Internet Mail Extensions) and how it allows you to use the Internet e-mail system to send more than just a simple, text-based message.

The ASCII's The Limit

Initially, the Internet e-mail system was limited to simple text messages because SMTP, the protocol used to transport mail across the Internet, could carry only 7-bit ASCII text. In the United States, the main ASCII standard used for e-mail is US-ASCII . This version of ASCII offers only a basic set of characters128 characters in all, each represented as a 7-bit binary number. This ASCII set was designed to cover the English alphabet, including both uppercase and lowercase letters , and the numbers 0 through 9, as well as some other characters.

However, our use of data, both at the workplace and at home, has moved beyond simple text. For example, with basic word processing programs, users can augment text with italics, boldface, bullets, and other types of enriched formatting. You can even embed graphics in documents. But because SMTP was developed to handle only basic text messages in a 7-bit formatas laid out by RFC 822, which defines the standard for Internet text messagingthese more sophisticated data formats can't be sent via e-mail.

The problem with US-ASCII lies with messages that contain information that doesn't fit into the 128 character set. For example, US-ASCII can't accommodate "rich text" characters such as an italicized or boldfaced a . Even more, as the Internet is clearly an international forum, foreign language character sets aren't represented in US-ASCII.

And, the 7-bit format required by SMTP prevents users from sending other types of data via e-mailfor example, the 8-bit binary data found in many executable files and in files created by applications such as Microsoft Word.

This limitation looms large when you realize that the Internet e-mail system is perfectly situated to act as a data delivery vehicle. Business and personal e-mail accounts are now widespread, meaning users have a pipeline to each other from desktop to desktop. If e-mail could handle diverse datafor example, word processing documents and image filesusers could share all sorts of data without having to ship disks or make any actual real-time network connections to copy or download files.

Enter Mime

The developers of MIME found a clever way to work around the limitation: MIME packages different data types into a 7-bit ASCII format. That way, all e-mail, regardless of the data it contains, appears as standard e-mail messages to the Internet's SMTP servers. The beauty of the solution lies in the fact that SMTP didn't have to change to handle such data. In other words, the solution didn't require that all Internet mail servers be upgraded to a new version of SMTP (which would have been an extraordinary task considering all the Internet mail relays in existence). In effect, the transport system remained untouched.

Although the solution sounds simple, it took developers tremendous foresight to define the different data types MIME can handle. It also took a good amount of technical sleight of hand to standardize the ways in which these data types can be packaged as ASCII and still be readable in their original format once the packages have been unpacked. The solution is a work in progress; new data types, which the Internet Assigned Numbers Authority (IANA) must judge for inclusion in the MIME standard, are constantly emerging.

As defined in RFC 2045, MIME provides three main enhancements to standard e-mail. First, with MIME, e-mail can contain text that goes beyond basic US-ASCII, including various keystrokes such as different line and page breaks, foreign language characters, and enriched text. Second, users can attach different types of data to their e-mail, including such files as executables, spreadsheets, audio, and images. And third, users can create a single e-mail message that contains multiple parts , and each part can be in a different data format. For example, you could compose a single e-mail message that consists of a plain text message, an image file, and a binary-based document, such as a Word file.

A Package Deal

RFC 2045 defines seven types of e-mail content that MIME can package and pass across the Internet: text, image, audio, video, application, message, and multipart. Each of these data types can come in a few different formats, or subtypes. (MIME also augments types and subtypes with certain parameters, which are specified in RFC 2045. However, these parameters contain too much detail to cover in this overview of MIME.)

Obviously, the text type of e-mail content supports messages carrying text. However, within the text type, MIME also supports the plain subtype, which is usually standard 7-bit ASCII. MIME also supports the rich text subtype, which allows for some simple formatting features, such as page breaks.

The image type supports image files, and its subtypes include the GIF (Graphics Interchange Format) and JPEG (a compressed image format developed by the Joint Photographic Experts Group). In the words of RFC 2045, the video type supports, "time-varying picture images," and for now, MPEG (a compressed video format developed by the Motion Picture Experts Group) is its only subtype. The audio type supports audio data, and its only subtype is basic.

According to the RFC, there is no one ideal audio format in use today. So, the developers of MIME tried to define a subtype that would be the lowest common denominator. The basic subtype for audio signifies "single channel audio encoded using 8-bit ISDN mu-law" at a sampling rate of 8KHz.

The application type supports two types of data: data that's meant to be processed by an application, and data that doesn't fall into any of the other categories. For now, it supports the octet-stream subtype, which means the message can carry arbitrary binary data. Also, it supports the postscript subtype, meaning the message can be sent to print as a PostScript file.

A couple of additional notes on the application type: If a mail agent receives a message whose content subtype it doesn't recognize, by default it will attempt to pass the message on as an application type message with a subtype of octet-stream (or, application/octet-stream). Also, in the future you can expect the list of application subtypes to grow, as specific programs are accepted by the IANA for inclusion in MIME. For example, you could see an application/access or application/quark designation as these types of files are acknowledged by MIME.

The remaining two content types allow for special handling of an e-mail message. For instance, the message type allows an e-mail to contain an encapsulated message (the rfc822 subtype). The external-body subtype allows an e-mail to indicate an external location where the intended body of the message resides. That way, the user can choose whether or not to retrieve the message body. The message type also allows MIME to send a large e-mail message as several small ones (the subtype for this is partial). The receiving MIME-enabled mail agent can then open the smaller e-mail messages and reassemble them into the original long version.

Finally, the multipart type allows an e-mail message to contain more than one body of data. The mixed subtype allows users to mix different data formats into one e-mail message. The alternative subtype allows a message to contain different versions of the same data, each version in a different format. MIME mail agents can then select the version that works best with the local computing environment. The digest subtype allows users to send a collection of messages in one e-mail, such as the kind used with Internet mailing lists sent in digest form. Finally, the parallel subtype allows mixed body parts, but the ordering of the body parts is not important. (For a quick summary of MIME content types and subtypes, refer to Table 1.)

Table 1: MIME Content Types and Their Subtypes

Content type

Subtype

Text

Plain text, rich text

Image

GIF, JPEG

Audio

Basic

Video

MPEG

Application

Octet-stream, PostScript

Message

RFC822, partial, external-body

Multipart

Mixed, alternative, digest, parallel

In addition, some companies and organizations are experimenting with their own content types. MIME accommodates this, allowing e-mail to carry these types of data, as long as the message has an x in its headers. The x tells the receiving MIME mail agent that the content type is experimental (more on MIME headers coming up). If the x isn't present, the receiving agent may try to convert the data according to some other convention, such as the application/octet-stream designationan act that could turn the message into an unreadable mess. This is another fine example of the foresight that went into the development of MIME.

Round Peg, Square Hole

To package the different data formats into the 7bit ASCII format, MIME uses five different encoding schemes: 7bit, 8bit, binary, quoted-printable, and base64. But, as you'll soon see, only the quoted-printable and base64 schemes actually encode data. The 7bit, 8bit, and binary schemes merely indicate what format the data is in; they leave it to individual mail systems to select an encoding process based on the data format.

Because MIME-enabled mail agents perform the encoding automatically, you don't need a comprehensive understanding of how they work. So, I'll give just a brief review of what they can do and in which situations they are used.

The 7bit scheme tells mail agents that the message contents are in plain ASCII. For this reason, no encoding is necessary as all mail systems should support ASCII. The 8bit scheme indicates that the contents contain 8-bit characters. It's up to the mail agents to encode this information using their preferred means, if they have any. Because not all mail agents use the same encoding method for 8-bit characters, there's a good chance the 8-bit characters won't appear correctly when the e-mail is opened. For this reason, the 8bit scheme currently isn't a reliable encoding scheme. The binary scheme, because it is similar to the 8bit scheme, shares the same problem.

The quoted-printable scheme is used for text that contains a mixture of 7-bit and 8-bit characters. Essentially, it allows 7-bit characters to go unencoded and converts each 8-bit character into a set of three 7-bit characters. As a result, mail servers and mail agents see an e-mail containing only 7-bit characters.

The base64 scheme is used for data that isn't text, such as data that constitutes an executable file. It works by breaking the data down into sets of three octets, each set containing 24 bits. Then, it converts each set of 24 bits into a four-character sequence. (In other words, every six bits of data is represented by a character.) The characters used in the sequencing come from a set of 65 characters, all of which can be found in any version of ASCII. Because it's in ASCII, data encoded by base64 should be readable by any mail server or mail agent.

Heads Up Thinking

Using MIME is simple. If your mail system supports MIME, it chooses the data type and encoding schemethe packaging, as it werefor the user, depending on the contents of the user's e-mail and what file or files are attached to it. Once it does so, it adds MIME headers to the traditional SMTP headers found at the top of e-mail messages. (In the case of multipart MIME messages, headers will also appear in the body of the message.) These headers tell receiving mail agents that they've received a MIME message and indicate how the mail agents should handle the message.

The main headers are MIME-Version, Content-Type, and Content-Transfer-Encoding. The last two headers refer to the data type found in the message and the scheme used to encode the data, respectively. Some other headers are Content-Description, which lets you type in a description of the message (much like SMTP's Subject header), and Content-ID, which is akin to SMTP's Message-ID.

Listing 1 shows a sample MIME message. The e-mail message shown is a multipart message in encoded form. As you can see, the top-level headers explain that the message data is of the type multipart and its subtype is mixed (multipart/mixed). It also contains a boundary parameter. This tells the receiving mail agent where each part of the message begins and ends. MIME adds two hyphens () in front of the boundary value when it appears in the message. MIME identifies the end of the message with two hyphens, the boundary value, and two more hyphens.

Listing 1: A Multipart, Encoded MIME Message
start example

From: joe_luthier@plucknplay.com

To: lchae@mfi.com

Subject: Info on Gibson guitar

MIME-Version: 1.0

Content-Type: multipart/mixed; boundary=17

17

Content-Type: text/enriched; charset="us-ascii"

Content-Transfer-Encoding: 8bit

Content-Description: Greetings

As promised , I'm getting back to you about the Gibson Southern Jumbo guitar you were interested in. I've enclosed a spec sheet on the guitar, which is in Microsoft Word.

I <bold>guarantee</bold> that you'll <bold>love</bold> it!

17

Content-Type: application/octet-stream

Content-Transfer-Encoding: base64

Content-Description: Spec sheet saved as MS Word file

<Encoded data for Word file would appear here.>

17

end example
 

Resources

To learn more about MIME, you can read the MIME- related RFCs (see "MIME RFCs"). Although these documents are aimed primarily at MIME developers and implementers, they are written in relatively plain language and should serve as a good education for those network professionals unfamiliar with the topicand even for those experienced with MIME, but who have further interest in its details.

A Listing of MIME-Related RFCs

RFC-822: Standard for the Format of ARPA Internet Text Messages

RFC-2045: Part one: Format of Internet Message Bodies

RFC-2046: Part two: Media Types

RFC-2047: Part three: Message Header Extensions for Non-ASCII Text

RFC-2048: Part four: Registration Procedures

RFC-2049: Part five: Conformance Criteria and Examples

This tutorial, number 110, by Lee Chae, was originally published in the October 1997 issue of Network Magazine.

 
team lib


Network Tutorial
Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)
ISBN: 0879303794
EAN: 2147483647
Year: 2003
Pages: 193

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net