Recipe10.8.Selectively Copying a Mailbox File


Recipe 10.8. Selectively Copying a Mailbox File

Credit: Noah Spurrier, Dave Benjamin

Problem

You need to selectively copy a large mailbox file (in mbox style), passing each message through a filtering function that may alter or skip the message.

Solution

The Python Standard Library package email is the modern Python approach for this kind of task. However, standard library modules mailbox and rfc822 can also supply the base functionality to implement this task:

def process_mailbox(mailboxname_in, mailboxname_out, filter_function):     mbin = mailbox.PortableUnixMailbox(file(mailboxname_in,'r'))     fout = file(mailboxname_out, 'w')     for msg in mbin:         if msg is None: break         document = filter_function(msg, msg.fp.read( ))         if document:             assert document.endswith('\n\n')             fout.write(msg.unixfrom)             fout.writelines(msg.headers)             fout.write('\n')             fout.write(document)     fout.close( )

Discussion

I often write lots of little scripts to filter my mailbox, so I wrote this recipe's small module. I can import the module from each script and call the module's function process_mailbox as needed. Python's future direction is to perform email processing with the standard library package email, but lower-level modules, such as mailbox and rfc822, are still available in the Python Standard Library. They are sometimes easier to use than the rich, powerful, and very general functionality offered by package email.

The function you pass to process_mailbox as the third argument, filter_function, must take two argumentsmsg, an rfc822 message object, and document, a string that is the message's entire body, ending with two line-end characters (\n\n). filter_function can return False, meaning that this message must be skipped (i.e., not copied at all to the output), or else it must return a string terminated with \n\n that is written to the output as the message body. Normally, filter_function returns either False or the same document argument it was called with, but in some cases you may find it useful to write to the output file an altered version of the message's body rather than the original message body.

Here is an example of a filter function that removes duplicate messages:

import sets found_ids = sets.Set( ) def no_duplicates(msg, document):     msg_id = msg.getheader('Message-ID')     if msg_id in found_ids:         return False     found_ids.add(msg_id)     return document

In Python 2.4, you could use the built-in set rather than sets.Set, but for a case as simple as this, it makes no real difference in performance (and the usage is exactly the same, anyway).

See Also

Documentation about modules mailbox and rfc822, and package email, in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net