Recipe10.9.Building a Whitelist of Email Addresses From a Mailbox


Recipe 10.9. Building a Whitelist of Email Addresses From a Mailbox

Credit: Noah Spurrier

Problem

To help you configure an antispam system, you want a list of email addresses, commonly known as a whitelist, that you can trust won't send you spam. The addresses to which you send email are undoubtedly good candidates for this whitelist.

Solution

Here is a script to output "To" addresses given a mailbox path:

#!/usr/bin/env python """ Extract and print all 'To:' addresses from a mailbox """ import mailbox def main(mailbox_path):     addresses = {  }     mb = mailbox.PortableUnixMailbox(file(mailbox_path))     for msg in mb:         toaddr = msg.getaddr('To')[1]         addresses[toaddr] = 1     addresses = addresses.keys( )     addresses.sort( )     for address in addresses:         print address if _ _name_ _ == '_ _main_ _':     import sys     main(sys.argv[1])

Discussion

In addition to bypassing spam filters, identifying addresses of people you've sent mail to may also help in other ways, such as flagging emails from them as higher priority, depending on your mail-reading habits and your mail reader's capabilities. As long as your mail reader keeps mail you have sent in some kind of "Sent Items" mailbox in standard mailbox format, you can call this script with the path to the mailbox as its only argument, and the addresses to which you've sent mail will be emitted to standard output.

The script is simple because the Python Standard Library module mailbox does all the hard work. All the script needs to do is collect the set of email addresses as it loops through all messages, then emit them. While collecting, we keep addresses as a dictionary, since that's much faster than keeping a list and checking each toaddr in order to append it only if it wasn't already in the list. When we're done collecting, we just extract the addresses from the dictionary as a list because we want to emit its items in sorted order. In Python 2.4, function main can be made even slightly more elegant, thanks to the new built-ins set and sorted:

def main(mailbox_path):     addresses = set( )     mb = mailbox.PortableUnixMailbox(file(mailbox_path))     for msg in mb:         toaddr = msg.getaddr('To')[1]         addresses.add(toaddr)     for address in sorted(addresses):         print address

If your mailbox is not in the Unix mailbox style supported by mailbox.PortableUnixMailbox, you may want to use other classes supplied by the Python Standard Library module mailbox. For example, if your mailbox is in Qmail maildir format, you can use the mailbox.Maildir class to read it.

See Also

Documentation of the standard library module mailbox in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net