Recipe 14.11. Generating OPML FilesCredit: Moshe Zadka, Premshree Pillai, Anna Martelli Ravenscroft ProblemOPML (Outline Processor Markup Language) is a standard file format for sharing subscription lists used by RSS (Really Simple Syndication) feed readers and aggregators. You want to share your subscription list, but your blogging site provides only a FOAF (Friend-Of-A-Friend) page, not one in the standard OPML format. SolutionUse urllib2 to open and read the FOAF page and xml.dom to parse the data received; then, output the data in the proper OPML format to a file. For example, LiveJournal is a popular blogging site that provides FOAF pages; here's a module with the functions you need to turn those pages into OPML files: #!/usr/bin/python import sys import urllib2 import HTMLParser from xml.dom import minidom, Node def getElements(node, uri, name): ''' recursively yield all elements w/given namespace URI and name ''' if (node.nodeType==Node.ELEMENT_NODE and node.namespaceURI==uri and node.localName==name): yield node for node in node.childNodes: for node in getElements(node, uri, name): yield node class LinkGetter(HTMLParser.HTMLParser): ''' HTML parser subclass which collecs attributes of link tags ''' def _ _init_ _(self): HTMLParser.HTMLParser._ _init_ _(self) self.links = [ ] def handle_starttag(self, tag, attrs): if tag == 'link': self.links.append(attrs) def getRSS(page): ''' given a `page' URL, returns the HREF to the RSS link ''' contents = urllib2.urlopen(page) lg = LinkGetter( ) try: lg.feed(contents.read(1000)) except HTMLParser.HTMLParserError: pass links = map(dict, lg.links) for link in links: if (link.get('rel')=='alternate' and link.get('type')=='application/rss+xml'): return link.get('href') def getNicks(doc): ''' given an XML document's DOM, `doc', yields a triple of info for each contact: nickname, blog URL, RSS URL ''' for element in getElements(doc, 'http://xmlns.com/foaf/0.1/', 'knows'): person, = getElements(element, 'http://xmlns.com/foaf/0.1/', 'Person') nick, = getElements(person, 'http://xmlns.com/foaf/0.1/', 'nick') text, = nick.childNodes nickText = text.toxml( ) blog, = getElements(person, 'http://xmlns.com/foaf/0.1/', 'weblog') blogLocation = blog.getAttributeNS( 'http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'resource') rss = getRSS(blogLocation) if rss: yield nickText, blogLocation, rss def nickToOPMLFragment((nick, blogLocation, rss)): ''' given a triple (nickname, blog URL, RSS URL), returns a string that's the proper OPML outline tag representing that info ''' return ''' <outline text="%(nick)s" htmlUrl="%(blogLocation)s" type="rss" xmlUrl="%(rss)s"/> ''' % dict(nick=nick, blogLocation=blogLocation, rss=rss) def nicksToOPML(fout, nicks): ''' writes to file `fout' the OPML document representing the iterable of contact information `nicks' ''' fout.write('''<?xml version="1.0" encoding="utf-8"?> <opml version="1.0"> <head><title>Subscriptions</title></head> <body><outline title="Subscriptions"> ''') for nick in nicks: print nick fout.write(nickToOPMLFragment(nick)) fout.write("</outline></body></opml>\n") def docToOPML(fout, doc): ''' writes to file `fout' the OPLM for XML DOM `doc' ''' nicksToOPML(fout, getNicks(doc)) def convertFOAFToOPML(foaf, opml): ''' given URL `foaf' to a FOAF page, writes its OPML equivalent to a file named by string `opml' ''' f = urllib2.urlopen(foaf) doc = minidom.parse(f) docToOPML(file(opml, 'w'), doc) def getLJUser(user): ''' writes an OPLM file `user'.opml for livejournal's FOAF page ''' convertFOAFToOPML('http://www.livejournal.com/users/%s/data/foaf' % user, user+".opml") if _ _name_ _ == '_ _main_ _': # example, when this module is run as a main script getLJUser('moshez') DiscussionRSS feeds have become extremely popular for reading news, blogs, wikis, and so on. OPML is one of the standard file formats used to share subscription lists among RSS fans. This recipe generates an OPML file that can be opened with any RSS reader. With an OPML file, you can share your favorite subscriptions with anyone you like, publish it to the Web, and so on. getElements is a convenience function that gets written in almost every XML DOM-processing application. It recursively scans the document, finding nodes that satisfy certain criteria. This version of getElements is somewhat quick and dirty, but it is good enough for our purposes. getNicks is where the heart of the parsing brains lie. It calls getElements to look for "foaf:knows" nodes, and inside those, it looks for the "foaf:nick" element, which contains the LiveJournal nickname of the user, and uses a generator to yield the nicknames in this FOAF document. Note an important idiom used four times in the body of getNicks: name, = some iterable The key is the comma after name, which turns the left-hand side of this assignment into a one-item tuple, making the assignment into what's technically known as an unpacking assignment. Unpacking assignments are of course very popular in Python (see Recipe 19.4 for a technique to make them even more widely applicable) but normally with at least two names on the left of the assignment, such as: aname, another = iterable yielding 2 items The idiom used in getNicks has exactly the same function, but it demands that the iterable yield exactly one item (otherwise, Python raises a ValueError exception). Therefore, the idiom has the same semantics as: _templist = some iterable if len(_templist) != 1: raise ValueError, 'too many values to unpack' name = _templist[0] del _templist Obviously, the name, = ... idiom is much cleaner and more compact than this equivalent snippet, which is worth keeping in mind for the next time you need to express the same semantics. nicksToOPML, together with its helper function nickToOPMLFragment, generates the OPML, while docToOPML ties together getNicks and nicksToOPML into a FOAF->OPML convertor. saveUser is the main function, which actually interacts with the operating system (accessing the network to get the FOAF, and using a file to save the OPML). The recipe has a specific function getLJUser(user) to work with the LiveJournal (http://www.livejournal.com) friends lists. However, the point is that the main convertFOAFToOPML function is general enough to use for other sites as well. The various helper functions can also come in handy in your own different but related tasks. For example, the getRSS function (with some aid from its helper class LinkGetter) finds and returns a link to the RSS feed (if one exists) for a given web site. See AlsoAbout OPML, http://feeds.scripting.com/whatIsOpml; for more on RSS readers, http://blogspace.com/rss/readers; for FOAF Vocabulary Specification, http://xmlns.com/foaf/0.1/. |