Hack 43 Archiving Yahoo Groups Messages with yahoo2mbox

Hack 43 Archiving Yahoo! Groups Messages with yahoo2mbox

figs/beginner.gif figs/hack43.gif

Looking to keep a local archive of your favorite mailing list? With yahoo2mbox, you can import the final results into your favorite mailer .

With the popularity of Yahoo! Groups (http://groups.yahoo.com/) comes a problem. Sometimes, you want to save the archives of a Yahoo! Group , but you want to be able to access it outside the Yahoo! Groups site. Or you want to move your list somewhere else and be able to take your existing archive with you.

Vadim Zeitlin had these same concerns, which is why he wrote yahoo2mbox (http://www.lpthe.jussieu.fr/~zeitlin/yahoo2mbox.html). This hack retrieves all the messages from a mailing list archive at Yahoo! Groups and saves them to a local file in mbox format. Plenty of options make this handy to have when you're trying to transfer information from Yahoo! Groups.

As of this writing, the program is still fairly new, so be sure to visit its URL (cited in the previous paragraph) to download the latest version. Note that you'll need Perl and several additional modules to run this code, including Getopt::Long , HTML::Entities , HTML::HeadParser , HTML::TokeParser , and LWP::UserAgent .

Running the Hack

Running the code looks like this:

 perl yahoo2mbox.pl [options] [-o <mbox>] <groupname> 

The options for running the program are as follows :

 --help          give the usage message showing the program options --version       show the program version and exit --verbose       give verbose informational messages (default) --quiet         be silent, only error messages are given -o mbox         save the message to mbox instead of file named groupname --start=n       start retrieving messages at index n instead of 1 --end=n         stop retrieving messages at index n instead of the last one --noresume      don't resume, **overwrites** the existing output file if any --user=name     login to eGroups using this username (default: guest login) --pass=pass     the password to use for login (default: none) --cookies=xxx   file to use to store cookies (default: none,                 'netscape' uses netscape cookies file). --proxy=url     use the given proxy; if 'no', don't use proxy                  at all (not even the environment variable http_proxy,                  which is used by default), may use http://username:password\                 @full.host.name/ notation --country=xx    use the given country code to access localized yahoo 

So, this command downloads messages from Weird Al Club, starting at message 3258:

 %  perl yahoo2mbox.pl --start=3258 weirdalclub2  Logging in anonymously... ok. Getting number of messages in group weirdalclub2... Retrieving messages 3258..3287: .............................. done! Saved 30 message(s) in weirdalclub2. 

Here, the messages are saved to a file called weirdalclub2 . Renaming the file weirdalclub2.mbx means that you can immediately open the messages in Eudora, as shown in Figure 4-1. Of course, you can also open the resulting files in any mail program that can import (or natively read) the mbox format.

Figure 4-1. A Yahoo! Groups archive in Eudora
figs/sphk_0401.gif

Hacking the Hack

Because this is someone else's program, there's not too much hacking to be done. On the other hand, you might find that you don't want to end this process with the mbox file; you might want to convert to other formats for use in other projects or archives. In that case, check out these other programs to take that mbox format a little further:


hypermail (http:// sourceforge .net/projects/hypermail/)

Converts mbox format to cross-referenced HTML documents.


mb2md (http://www.gerg.ca/hacks/mb2md/)

Converts mbox format to Maildir . Requires Python and Procmail.


Mb2md.pl (http://batleth.sapienti-sat.org/projects/mb2md/)

Converts mbox format to Maildir . Uses Perl.



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net