Recipe14.11.Generating OPML Files


Recipe 14.11. Generating OPML Files

Credit: Moshe Zadka, Premshree Pillai, Anna Martelli Ravenscroft

Problem

OPML (Outline Processor Markup Language) is a standard file format for sharing subscription lists used by RSS (Really Simple Syndication) feed readers and aggregators. You want to share your subscription list, but your blogging site provides only a FOAF (Friend-Of-A-Friend) page, not one in the standard OPML format.

Solution

Use urllib2 to open and read the FOAF page and xml.dom to parse the data received; then, output the data in the proper OPML format to a file. For example, LiveJournal is a popular blogging site that provides FOAF pages; here's a module with the functions you need to turn those pages into OPML files:

#!/usr/bin/python import sys import urllib2 import HTMLParser from xml.dom import minidom, Node def getElements(node, uri, name):     ''' recursively yield all elements w/given namespace URI and name '''     if (node.nodeType==Node.ELEMENT_NODE and         node.namespaceURI==uri and         node.localName==name):         yield node     for node in node.childNodes:         for node in getElements(node, uri, name):             yield node class LinkGetter(HTMLParser.HTMLParser):     ''' HTML parser subclass which collecs attributes of link tags '''     def _ _init_ _(self):         HTMLParser.HTMLParser._ _init_ _(self)         self.links = [  ]     def handle_starttag(self, tag, attrs):         if tag == 'link':             self.links.append(attrs) def getRSS(page):     ''' given a `page' URL, returns the HREF to the RSS link '''     contents = urllib2.urlopen(page)     lg = LinkGetter( )     try:         lg.feed(contents.read(1000))     except HTMLParser.HTMLParserError:         pass     links = map(dict, lg.links)     for link in links:         if (link.get('rel')=='alternate' and             link.get('type')=='application/rss+xml'):              return link.get('href') def getNicks(doc):     ''' given an XML document's DOM, `doc', yields a triple of info for         each contact: nickname, blog URL, RSS URL '''     for element in getElements(doc, 'http://xmlns.com/foaf/0.1/', 'knows'):         person, = getElements(element, 'http://xmlns.com/foaf/0.1/', 'Person')         nick, = getElements(person, 'http://xmlns.com/foaf/0.1/', 'nick')         text, =  nick.childNodes         nickText = text.toxml( )         blog, = getElements(person, 'http://xmlns.com/foaf/0.1/', 'weblog')         blogLocation = blog.getAttributeNS(              'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'resource')         rss = getRSS(blogLocation)         if rss:             yield nickText, blogLocation, rss def nickToOPMLFragment((nick, blogLocation, rss)):     ''' given a triple (nickname, blog URL, RSS URL), returns a string         that's the proper OPML outline tag representing that info '''     return '''     <outline text="%(nick)s"              htmlUrl="%(blogLocation)s"              type="rss"              xmlUrl="%(rss)s"/>     ''' % dict(nick=nick, blogLocation=blogLocation, rss=rss) def nicksToOPML(fout, nicks):     ''' writes to file `fout' the OPML document representing the         iterable of contact information `nicks' '''     fout.write('''<?xml version="1.0" encoding="utf-8"?>     <opml version="1.0">     <head><title>Subscriptions</title></head>     <body><outline title="Subscriptions">     ''')     for nick in nicks:         print nick         fout.write(nickToOPMLFragment(nick))     fout.write("</outline></body></opml>\n") def docToOPML(fout, doc):     ''' writes to file `fout' the OPLM for XML DOM `doc' '''     nicksToOPML(fout, getNicks(doc)) def convertFOAFToOPML(foaf, opml):     ''' given URL `foaf' to a FOAF page, writes its OPML equivalent to         a file named by string `opml' '''     f = urllib2.urlopen(foaf)     doc = minidom.parse(f)     docToOPML(file(opml, 'w'), doc) def getLJUser(user):     ''' writes an OPLM file `user'.opml for livejournal's FOAF page '''     convertFOAFToOPML('http://www.livejournal.com/users/%s/data/foaf' % user,                       user+".opml") if _ _name_ _ == '_ _main_ _':     # example, when this module is run as a main script     getLJUser('moshez')

Discussion

RSS feeds have become extremely popular for reading news, blogs, wikis, and so on. OPML is one of the standard file formats used to share subscription lists among RSS fans. This recipe generates an OPML file that can be opened with any RSS reader. With an OPML file, you can share your favorite subscriptions with anyone you like, publish it to the Web, and so on.

getElements is a convenience function that gets written in almost every XML DOM-processing application. It recursively scans the document, finding nodes that satisfy certain criteria. This version of getElements is somewhat quick and dirty, but it is good enough for our purposes. getNicks is where the heart of the parsing brains lie. It calls getElements to look for "foaf:knows" nodes, and inside those, it looks for the "foaf:nick" element, which contains the LiveJournal nickname of the user, and uses a generator to yield the nicknames in this FOAF document.

Note an important idiom used four times in the body of getNicks:

    name, = some iterable

The key is the comma after name, which turns the left-hand side of this assignment into a one-item tuple, making the assignment into what's technically known as an unpacking assignment. Unpacking assignments are of course very popular in Python (see Recipe 19.4 for a technique to make them even more widely applicable) but normally with at least two names on the left of the assignment, such as:

    aname, another = iterable yielding 2 items

The idiom used in getNicks has exactly the same function, but it demands that the iterable yield exactly one item (otherwise, Python raises a ValueError exception). Therefore, the idiom has the same semantics as:

    _templist = some iterable     if len(_templist) != 1:         raise ValueError, 'too many values to unpack'     name = _templist[0]     del _templist

Obviously, the name, = ... idiom is much cleaner and more compact than this equivalent snippet, which is worth keeping in mind for the next time you need to express the same semantics.

nicksToOPML, together with its helper function nickToOPMLFragment, generates the OPML, while docToOPML ties together getNicks and nicksToOPML into a FOAF->OPML convertor. saveUser is the main function, which actually interacts with the operating system (accessing the network to get the FOAF, and using a file to save the OPML).

The recipe has a specific function getLJUser(user) to work with the LiveJournal (http://www.livejournal.com) friends lists. However, the point is that the main convertFOAFToOPML function is general enough to use for other sites as well. The various helper functions can also come in handy in your own different but related tasks. For example, the getRSS function (with some aid from its helper class LinkGetter) finds and returns a link to the RSS feed (if one exists) for a given web site.

See Also

About OPML, http://feeds.scripting.com/whatIsOpml; for more on RSS readers, http://blogspace.com/rss/readers; for FOAF Vocabulary Specification, http://xmlns.com/foaf/0.1/.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net