Grab and display XML-based RSS news feeds. Really Simple Syndication (RSS) is an XML-based format for publishing news, blog entries, and other fast-changing information. Thousands of web sites now provide RSS news feeds as an alternative to visiting the actual sites in a browser. An RSS feed reader lets you subscribe to various feeds. The reader periodically (usually not more than once per half hour) grabs the latest RSS file from each subscribed site, then lets you view those feeds. Some RSS feed readers are built into browsers (Firefox), others are integrated into mail clients (Opera), and others are entirely web-based. Because RSS feeds are simply XML files, they're easy for an Ajax application to digest. This hack will show you how to read an RSS feed from your server, parse the XML data, and format it for the browser. Handling RSS feeds is not limited to standalone feed readers. You may want to incorporate RSS data into other applications, such as web portals. RSS feeds are now used for a variety of data beyond just news. For example, the U.S. National Weather Service has weather forecasts and warnings available as RSS feeds (go to http://www.weather.gov/data/current_obs/ for a listing of available weather feeds). The following abridged RSS file illustrates the basic structure of an RSS feed: <?xml version='1.0' encoding='utf-8'?> <rss version='2.0' xmlns:dc='http://purl.org/dc/elements/1.1/' xmlns:itunes='http://www.itunes.com/dtds/podcast-1.0.dtd'> <channel> <title>O'Reilly Media, Inc. New Books</title> <link>http://www.oreilly.com/</link> <description>O'Reilly's New Books</description> <copyright>Copyright 2005, O'Reilly Media, Inc.</copyright> <itunes:author>O'Reilly Media, Inc.</itunes:author> <itunes:category text='Technology' /> <itunes:explicit>no</itunes:explicit> <language>en-US</language> <docs>http://blogs.law.harvard.edu/tech/rss</docs> <item> <title>C in a Nutshell</title> <link>http://www.oreilly.com/catalog/cinanut</link> <description><![CDATA[Covering the C programming language and C runtime library, this book. . .]]> </description> <author>webmaster@oreillynet.com (Tony Crawford, Peter Prinz)</author> <dc:date>2005-12-16T22:51:09-08:00</dc:date> </item> <item> <title>Run Your Own Web Server Using Linux & Apache</title> <link>http://www.oreilly.com/catalog/0975240226</link> <description><![CDATA[Learn to install Linux and Apache 2.0 on a home or office computer for testing and development, and . . .]]> </description> <author>webmaster@oreillynet.com (Tony Steidler-Dennison)</author> <dc:date>2005-12-15T22:52:17-08:00</dc:date> </item> </channel> </rss> Most RSS feeds contain a single channel element. In RSS files for news and blogs, the channel usually contains multiple items (one for each article).
A Simple RSS ReaderFor our RSS reader, let's assume you've set up some mechanism to grab fresh RSS files periodically and store them on your server. This can be as simple as setting up a crontab entry on your Linux server: 0/30 * * * * wget q O /var/www/html/feeds/oreilly_new_titles.rss.xml \\ http://www.oreillynet.com/pub/feed/29?format=rss2 Figure 4-24 shows the simple user interface of our RSS reader: a pull-down list to select the RSS feed, and a checkbox to let users select more details for each article displayed. Figure 4-24. A simple RSS feed readerSelect a news feed, and the matching RSS file is grabbed from the server. The RSS reader extracts information from the file and builds the HTML for the web page, as shown in Figure 4-25. Figure 4-25. Displaying RSS feed contentOur RSS feed reader is contained in the files rss.html and rss_parse.js (and the ubiquitous JavaScript file xhr.js, which provides a browser-neutral XMLHttpRequest object). The first file, shown here, defines the web page itself: <HTML> <HEAD> <TITLE>O'Reilly RSS Reader</TITLE> <script language="javascript" src="/books/4/254/1/html/2/xhr.js"></script> <script language="javascript" src="/books/4/254/1/html/2/rss_parse.js"></script> </HEAD> <BODY> <b>O'Reilly RSS Reader</b><p> <form > <select onChange="get_rss_feed( );"> <option value="">SELECT A FEED</option> <option value="oreilly_news_articles.rss.xml"> O'Reilly News and Articles </option> <option value="oreilly_new_titles.rss.xml"> O'Reilly New Titles </option> <option value="oreillynet_articles_blogs.rss.xml"> O'Reilly Network Articles and Weblogs </option> </select> <input type=checkbox onClick='format_rss_data ("content", last_xml_response);' > show details </form> <div > </div> </BODY> </HTML> The web page references rss_parse.js, which defines the three JavaScript functions needed to implement the RSS reader. How It WorksA handler is attached to the listbox's onChange event. When the user selects an item from the list, the get_rss_feed( ) JavaScript function is called: <select onChange="get_rss_feed( );"> This function grabs the URL of the selected RSS file from the listbox and passes it to the get_xml_file( ) function. The second function does the work of retrieving the XML file from the server. This code shows these functions: function get_xml_file (url) { var httpreq = getHTTPObject( ); //Precondition: must have a URL if (url == "") return; httpreq.open("GET", url, true); httpreq.onreadystatechange = function ( ) { if (httpreq.readyState == 4) { var content = document.getElementById("content"); content.innerHTML = "Parsing XML...<br>"; last_xml_response = httpreq.responseXML; format_rss_data ("content", last_xml_response); } } var content = document.getElementById("content"); content.innerHTML = "Retrieving XML...<br>"; httpreq.send (null); } function get_rss_feed ( ) { //Get selected RSS feed var lbFeeds = document.getElementById("lbFeeds"); if (lbFeeds.value != "") { get_xml_file (lbFeeds.value); } }
The retrieved XML file is stored as a Document object. We pass this object to our third and final function, format_rss_data( ). This is where the Document object is examined and we pull out the items we need. Each news snippet is enclosed in an item element. For our RSS reader, we want to extract three pieces of information from each item: the title, the link to the full article, and a brief description of the article. Here's how it works: function format_rss_data (divname, response) { var html = ""; var doc = response.documentElement; var items = doc.getElementsByTagName('item'); for (var i=0; i < items.length; i++) { var title = items[i].getElementsByTagName('title')[0]; var link = items[i].getElementsByTagName('link')[0]; html += "<b><a href='" + link.firstChild.data + "'>" + title.firstChild.data + "</a></b><br>"; var cbDetails = document.getElementById("cbDetails"); if (cbDetails.checked) { var desc = items[i].getElementsByTagName('description')[0]; html += "<font size='-1'>" + desc.firstChild.data + "</font><p>"; } } var target_div = document.getElementById(divname); target_div.innerHTML = html; } The format_rss_data( ) function uses a for loop to iterate over each item element in the RSS Document object. Using the getElementsByTagName( ) method, extract the title, link, and description information, and build the HTML displayed on the web page. Now save the most recent Document object in the last_xml_response variable. If the user checks (or unchecks) the "show details" checkbox, you can reformat the current RSS data with another call to format_rss_data( ), and without another request to the server. Figure 4-26 shows the page with "show details" unchecked. In this view, the descriptions are hidden, and the user is presented with a simple list of article links. Figure 4-26. The RSS reader with descriptions hiddenHacking the HackThis hack doesn't display all the information for each articleauthor and date information is omitted, and no general channel information is displayed. If you want to use this hack as a generic way to include feed information in web pages, you need to expand format_rss_data( ) to (at least) display the channel title. Having the RSS feeds hardcoded into the listbox isn't very flexible, either. You can maintain a list of RSS feeds on your server (as an XML file, perhaps), but even this may be unwieldy if you monitor hundreds of feeds. You might consider using a "categories" listbox that populates the "feeds" listbox instead. Mark Pruett |