Section 14.6. email: Parsing and Composing Mails


14.6. email: Parsing and Composing Mails

The second edition of this book used a handful of standard library modules (rfc822, StringIO, and more) to parse the contents of messages, and simple text processing to compose them. Additionally, that edition included a section on extracting and decoding attached parts of a message using modules such as mhlib, mimetools, and base64.

Those tools are still available, but were, frankly, a bit clumsy and error-prone. Parsing attachments from messages, for example, was tricky, and composing even basic messages was tedious (in fact, an early printing of the prior edition contained a potential bug, because I forgot one \n character in a complex string formatting operation). Adding attachments to sent messages wasn't even attempted, due to the complexity of the formatting involved.

Luckily, things are much simpler today. Since the second edition, Python has sprouted a new email packagea powerful collection of tools that automate most of the work behind parsing and composing email messages. This module gives us an object-based message interface and handles all the textual message structure details, both analyzing and creating it. Not only does this eliminate a whole class of potential bugs, it also promotes more advanced mail processing.

Things like attachments, for instance, become accessible to mere mortals (and authors with limited book real estate). In fact, the entire section on manual attachment parsing and decoding has been deleted in this editionit's essentially automatic with email. The new package parses and constructs headers and attachments; generates correct email text; decodes and encodes base64, quoted-printable, and uuencoded data; and much more.

We won't cover the email package in its entirety in this book; it is well documented in Python's library manual. Our goal here is to give some example usage code, which you can study in conjunction with the manuals. But to help get you started, let's begin with a quick overview. In a nutshell, the email package is based around the Message object it provides:


Parsing mail

A mail's full text, fetched from poplib or imaplib, is parsed into a new Message object, with an API for accessing its components. In the object, mail headers become dictionary-like keys, and components become a payload that can be walked with a generator interface (more on payloads in a moment).


Creating mail

New mails are composed by creating a Message object, using an API to attach headers and parts, and asking the object for its print representationa correctly formatted mail message text, ready to be passed to the smtplib module for delivery. Headers are added by key assignment and attachments by method calls.

In other words, the Message object is used both for accessing existing messages and for creating new ones from scratch. In both cases, email can automatically handle details like encodings (e.g., attached binary images can be treated as text with base64 encoding and decoding), content types, and more.

14.6.1. Message Objects

Since the email module's Message object is at the heart of its API, you need a cursory understanding of its form to get started. In short, it is designed to reflect the structure of a formatted email message. Each Message consists of three main pieces of information:


Type

A content type (plain text, HTML text, JPEG image, and so on), encoded as a MIME main type and a subtype. For instance, "text/html" means the main type is text and the subtype is HTML (a web page); "image/jpeg" means a JPEG photo. A "multipart/mixed" type means there are nested parts within the message.


Headers

A dictionary-like mapping interface, with one key per mail header ("From", "To", and so on). This interface supports almost all of the usual dictionary operations, and headers may be fetched or set by normal key indexing.


Content

A payload, which represents the mail's content. This can be either a string for simple messages, or a list of additional Message objects for multipart container messages with attached or alternative parts. For some oddball types, the payload may be a Python None object.

For example, mails with attached images may have a main top-level Message (type multipart/mixed), with three more Message objects in its payloadone for its main text (type text/plain), followed by two of type image for the photos (type image/jpeg). The photo parts may be encoded for transmission as text with base64 or another scheme; the encoding type, as well as the original image filename, are specified in the part's headers.

Similarly, mails that include both simple text and an HTML alternative will have two nested Messages in their payload, of type plain text (text/plain) and HTML text (text/html), along with a main root Message of type multipart/alternative. Your mail client decides which part to display, often based on your preferences.

Simpler messages may have just a root Message of type text/plain or text/html, representing the entire message body. The payload for such mails is a simple string. They may also have no explicitly given type at all, which generally defaults to text/plain. Some single-part messages are text/html, with no text/plain alternativethey require a web browser or other HTML viewer (or a very keen-eyed user).

Other combinations are possible, including some types that are not commonly seen in practice, such as message/delivery status. Most messages have a main text part, though it is not required, and may be nested in a multipart or other construct.

In all cases, these message structures are automatically generated when mail text is parsed, and are created by your method calls when new messages are composed. For instance, when creating messages, the message attach method adds parts for multipart mails, and set_payload sets the entire payload to a string for simple mails.

Message objects also have assorted properties (e.g., the filename of an attachment), and they provide a convenient walk generator method, which returns the next Message in the payload each time through in a for loop. Because the walker yields the root Message object first (i.e., self), this doesn't become a special case this; a nonmultipart message is effectively a Message with a single item in its payloaditself.

Ultimately, the Message object structure closely mirrors the way mails are formatted as text. Special header lines in the mail's text give its type (e.g., plain text or multipart), as well as the separator used between the content of nested parts. Since the underlying textual details are automated by the email packageboth when parsing and when composingwe won't go into further formatting details here.

If you are interested in seeing how this translates to real emails, a great way to learn mail structure is by inspecting the full raw text of messages displayed by the email clients we'll meet in this book. For more on the Message object, and email in general, consult the email package's entry in Python's library manual. We're skipping details such as its available encoders and MIME object classes here in the interest of space.

Beyond the email package, the Python library includes other tools for mail-related processing. For instance, mimetypes maps a filename to and from a MIME type:


mimetypes.guess_type(filename)

Maps a filename to a MIME type. Name spam.txt maps to text/plan.


mimetypes.guess_extension(contype)

Maps a MIME type to a filename extension. Type text/html maps to .html.

We also used the mimetypes module earlier in this chapter to guess FTP transfer modes from filenames (see Example 14-10), as well as in Chapter 6, where we used it to guess a media player for a filename (see the examples there, including playfile.py, Example 6-16). For email, these can come in handy when attaching files to a new message (guess_type) and saving parsed attachments that do not provide a filename (guess_extension). In fact, this module's source code is a fairly complete reference to MIME types. See the library manual for more on these tools.

14.6.2. Basic email Interfaces in Action

Although we can't provide an exhaustive reference here, let's step through a simple interactive session to illustrate the fundamentals of email processing. To compose the full text of a messageto be delivered with smptlib, for instancemake a Message, assign headers to its keys, and set its payload to the message body. Converting to a string yields the mail text. This process is substantially simpler and less error-prone than the text operations we used earlier in Example 14-19:

 >>> from email.Message import Message >>> m = Message( ) >>> m['from'] = 'Sue Jones <sue@jones.com>' >>> m['to']   = 'pp3e@earthlink.net' >>> m.set_payload('The owls are not what they seem...') >>> s = str(m) >>> print s From nobody Sun Jan 22 21:26:53 2006 from: Sue Jones <sue@jones.com> to: pp3e@earthlink.net The owls are not what they seem... 

Parsing a messages textlike the kind you obtain with poplibis similarly simple, and essentially the inverse: we get back a Message object from the text, with keys for headers and a payload for the body:

 >>> from email.Parser import Parser >>> x = Parser( ).parsestr(s) >>> x <email.Message.Message instance at 0x00A7DA30> >>> x['From'] 'Sue Jones <sue@jones.com>' >>> x.get_payload( ) 'The owls are not what they seem...' >>> x.items( ) [('from', 'Sue Jones <sue@jones.com>'), ('to', 'pp3e@earthlink.net')] 

This isn't much different from the older rfc822 module, but as we'll see in a moment, things get more interesting when there is more than one part. For simple messages like this one, the message walk generator treats it as a single-part mail, of type plain text:

 >>> for part in x.walk( ): ...     print x.get_content_type( ) ...     print x.get_payload( ) ... text/plain The owls are not what they seem... 

Making a mail with attachments is a little more work, but not much: we just make a root Message and attach nested Message objects created from the MIME type object that corresponds to the type of data we're attaching. The root message is where we store the main headers of the mail, and we attach parts here, instead of setting the entire payload (the payload is a list now, not a string).

 >>> from email.MIMEMultipart import MIMEMultipart >>> from email.MIMEText import MIMEText >>> >>> top = MIMEMultipart( ) >>> top['from'] = 'Art <arthur@camelot.org>' >>> top['to']   = 'pp3e@earthlink.net' >>> >>> sub1 = MIMEText('nice red uniforms...\n') >>> sub2 = MIMEText(open('data.txt').read( )) >>> sub2.add_header('Content-Disposition', 'attachment', filename='data.txt') >>> top.attach(sub1) >>> top.attach(sub2) 

When we ask for the text, a correctly formatted full mail text is returned, separators and all, ready to be sent with smptlibquite a trick, if you've ever tried this by hand:

 >>> text = top.as_string( )    # same as str( ) or print >>> print text Content-Type: multipart/mixed; boundary="===============0257358049==" MIME-Version: 1.0 from: Art <arthur@camelot.org> to: pp3e@earthlink.net --===============0257358049== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit nice red uniforms... --===============0257358049== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="data.txt" line1 line2 line3 --===============0257358049==-- 

If we are sent this message and retrieve it via poplib, parsing its full text yields a Message object just like the one we built to send this. The message walk generator allows us to step through each part, fetching their types and payloads:

 >>> from email.Parser import Parser >>> msg = Parser( ).parsestr(text) >>> msg['from'] 'Art <arthur@camelot.org>' >>> for part in msg.walk( ): ...     print part.get_content_type( ) ...     print part.get_payload( ) ...     print ... multipart/mixed [<email.Message.Message instance at 0x00A82058>,      # line-break added <email.Message.Message instance at 0x00A82260>] text/plain nice red uniforms... text/plain line1 line2 line3 

Although this captures the basic flavor of the interface, we need to step up to a larger example to see more of the email package's power. The next section takes us on the first of those steps.




Programming Python
Programming Python
ISBN: 0596009259
EAN: 2147483647
Year: 2004
Pages: 270
Authors: Mark Lutz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net