Why Do You Need to Produce Feeds? | Professional Web APIs with PHP. eBay, Google, PayPal, Amazon, FedEx, Plus Web Feeds

Feeds have several advantages, primarily related to consumption, over traditional HTML formats. Many desktop applications are devoted to reading feeds at regular intervals, and many of the new batch of web browsers include features for reading feeds. These free the user from manually checking various sources (websites) for new information. Instead, the automated tool checks the subscribed feeds every few minutes and presents them to the user (usually organized in a user-configurable manner). The standard and predictable format makes this a much easier task than traditional page scraping methods that parse HTML. Feed aggregators are also coming to popularity on the Web with sites such as Planet PHP (www.planet-php.org/) and Feedster (www.feedster.com/). Finally, popular news sites (Google News, for example) compile the feeds of various news outlets to provide a single source of current news.

By simply updating a corporate news/public relations page to provide a web feed, a company suddenly finds new outlets for its information, and by adding feeds to your own pages, you can make it easy and convenient for your audience to keep up to date with your content.

Additional Considerations When Producing a Feed

Once you discover how easy it is to produce a web feed and how easy the plethora of feed readers out there makes it for your users to consume your feed, you may have some inner drive to produce feeds for everything. Don't.

First, as with any project, consider how useful the feed will be to outside users, what new information it will provide, and how it will be used. If this is a business site, consider how the feed will help achieve your corporate goals. Sure it would be cool to have a feed that ran the current weather in your office's city, but A) chances are you have windows (the look outside kind, not the operating system kind), B) a variety of sites already provide exactly this service, and C) is this something worth devoting resources toward?

Second, consider the load requirements of generating the data. Remember that feeds are usually consumed by software automatically, at preset intervals. Many preset to small increments such as half an hour. Users who may have visited a given page once per day are now downloading the feed up to 48 times per day! Multiply that by the number of potential users, and you have a lot of additional traffic, and, unless you play your caching cards right, a lot of load on the server. Some of this, however, is negated by the smaller document size overall. For example, visiting http://slashdot.org/ involves a total of 23 HTTP requests, and a total size (images and all) of 21,819 bytes, whereas visiting http://slashdot.org/index.rss involves only one HTTP request, and only 4,515 bytes — definitely a significant savings.

Finally, do consider the usefulness of this feed to your competitors. Placing all of your current pricing information or weekly production schedules in a feed may seem like a great boon internally (especially to upper management), but consider the repercussions if that URL becomes known to your competitors! Because feeds are generally consumed by software automatically, it can be more difficult to secure access. I strongly recommend against providing confidential data in a feed without first undergoing a strenuous security audit and seriously considering the alternatives.

Publicizing Your Feed

As with any web project, your work is worthless unless you publicize it to your target audience. This is becoming easier and easier with some recent developments in the browser world. Mozilla in particular has made this quite easy in recent releases. When the browser sees the appropriate alternate link code in a document header, it presents an RSS icon in the lower-right corner of the browser window to inform the user, who may then create a "Live Bookmark" to monitor the feed. For example:

 <LINK REL=" alternate" TITLE=" Slashdot RSS" HREF="//slashdot.org/index.rss" TYPE=" application/rss+xml">

However, publicizing your feed involves at least one more step. Create at least one page for your site that lists the URLs for the various feeds you will be providing, as well as what format the feeds will follow (RSS, Atom, and so on). You should also consider offering more detailed information about what type of content should be expected within the feed itself — what HTML tags should be expected, maximum length, and so on. This is important, because responsible, security-conscious users of your feed will need to filter the information they receive, and can make a much more intelligent decision about how to do that if you let them know what to expect. The topic of filtering feeds is covered in greater detail in the previous chapter.

Standard Feed Formats

You can slap some XML together, update the document regularly, and then call it a feed, but it would serve limited usefulness. Luckily (or not), you can choose from a wide range of standard feed formats out there already, supported by a wide range of aggregators. I have included the specifications for the main feed formats in Appendix B as well as the URLs where current versions of the specifications can be found. You can also find some code that generates dates in the required format in Appendix A.

RSS

One of the most popular formats for syndicating information is RSS. There are currently three versions: 0.91, 1.0, and 2.0. All of the versions are forward compatible, so a feed created with the 0.91 specification will be readable by an aggregator created with 1.0 or 2.0 in mind. Most aggregators, whether stand-alone client or web-based, can support all three versions, so it really comes down to what you and your users need. As with any project, examine the specific capabilities of your users before making any decisions (for example, if this is for a corporate intranet, examine the software and versions common throughout the organization. Frequently, Information Technology departments run a version or two ahead of the rest of the organization, and hence the target market for your feed must be closely examined).

Note

Current work on refining and expanding the RSS specification is outlined at www.rssboard.org/. This website provides the specifications for past and present versions of the RSS specification as well as useful discussion about various features.

The following sections introduce the three versions of the standard, explaining what tags are available in each, then go through some examples using different versions of the standard. For the basis of explanation, producing a feed for a news site is discussed. This is merely for simplicity because the paradigms match closely.

Note

For all three revisions of the RSS schema, the height and width elements for an image are considered optional. However, note that the assumed values are a width of 88 px and a height of 31px (this is approximately the size of those "web buttons" you see on various sites promoting something).

RSS 0.91

RSS 0.91 is commonly used by many blogs and news sites. It allows basic textual information syndication, without too many required tags. For a basic news site, this allows a base channel tag to describe the site (name, description, language, webmaster, copyright, logo, and so on), then an item tag for each individual story containing further information (title, link, description). The link in each item should be to the page containing the story itself, not the root page for the site.

 <?xml version="1.0"?> <!DOCTYPE rss SYSTEM "http://my.netscape.com/publish/formats/rss-0.91.dtd"> <rss version="0.91">   <channel>     <language>en</language>     <title>Example News.com</title>     <description>News and commentary from the cross-platform scripting community.</description>         <image>       <title>Wrox News .com</title>       <link>http://www.wroxnews.com/</link>       <url>http://www.wroxnews.com/pics/button.png</url>     </image>     <item>       <title>Feed usage up 1000%!</title>       <link>http://www.wroxnews.com/story2/index.html</link>       <description>Feed usage has increased a dramatic 1000% in the past 48hrs alone, this may...</description>     </item>     <item>       <title>RSS Feed Goes Live!</title>       <link>http://wroxnews.org/story1/index.html</link>       <description>Wroxnews.com is proud to announce the launch of our premier RSS service...</description>     </item>   </channel> </rss>

RSS 1.0

The 1.0 version of the specification didn't really change much; it adds a few restrictions to ensure compliance with RDF, and that's about it. The restrictions it imposes aren't a big deal — just a few subtle changes to ensure things happen nicely.

RSS 2.0

This is where the real fun begins. The biggest change you will notice is with the introduction of many much-needed tags in the item element, including enclosure (to attach a file with to the specific item), author, category, comments, pubDate, and so on. These additions will probably serve to make RSS 2.0 the schema of choice if it is available to your audience.

This example builds on the one presented for 0.91 by adding a third item and fleshing out the information provided. Because the information permitted in RSS 0.91 is a subset of 2.0, it is a good idea to provide both 0.91 and 2.0 feeds if you decide to go with 2.0.

 <rss version="2.0">   <channel>     <language>en</language>     <title>Example News.com</title>     <description>News and commentary from the cross-platform scripting community.</description>     <link>http://www.wroxnews.com/</link>     <image>       <title>Example News.com</title>       <link>http://www.wroxnews.com/</link>       <url>http://www.wroxnews.com/pics/button.png</url>     </image>     <item>       <title>Feed usage up 1000%!</title>       <link>http://www.wroxnews.com/story2/index.html</link>       <description>Feed usage has increased a dramatic 1000% in the past 48hrs alone, this may...</description>     </item>     <item>       <title>RSS Feed Goes Live!</title>       <link>http://wroxnews.org/story1/index.html</link>       <description>Wroxnews.com is proud to announce the launch of our premier RSS service...</description>     </item>   </channel> </rss>

Expanding RSS

RSS provides a mechanism for adding tags to an RSS document. This can be done by adding additional XML namespaces to the document in the rss tag. The syntax is relatively simple. For example:

 <rss version="2.0" xmlns:cc="http://backend.userland.com/creativeCommonsRssModule">

This defines the document as being RSS 2.0, adds an additional namespace (cc) to the document, and indicates where to find more information about the namespace. In this case, the cc namespace is for the Creative Commons license. Adding this tag will allow inclusion of several cc tags to define the copyright restrictions (or lack thereof) placed on the document. To declare a tag within the cc namespace, just prepend the tag with cc.

For example:

 <item>   <title>RSS Feed Goes Live!</title>   <link>http://wroxnews.org/story1/index.html</link>   <description>Wroxnews.com is proud to announce the launch of our premier RSS service...</description>   <cc:license>http://www.creativecommons.org/licenses/by-nc/1.0<cc:license>  </item>

This extensibility allows feed creators to include any desired information with their feed in a recognizable way. It may be tempting to define namespaces as required to fit your data, but don't. Chances are good (especially by the time this book goes to print) that there is already one defined way, if not several, to get your information across. It may not be quite the format you were hoping for (possibly a different date format, or different nesting rules), but it is worth the effort. The Internet really doesn't need more incompatible standards with similar data.

Note

You will often see namespaces referencing a URL beginning with http://purl.org/. This is not a central repository of XML namespaces; instead it offers a way to offer a Permanent URL to a specific resource located elsewhere.

Atom

Atom was created to resolve some ambiguities within the RSS specification (RSS 2.0 was declared to be the final version, so these issues can't be resolved by releasing new versions of RSS). This section covers Atom 0.3, which is the latest version.

 <?xml version="1.0" encoding="utf-8"?> <feed version="0.3" xmlns="http://purl.org/atom/ns#">   <title>dive into mark</title>   <link rel=" alternate" type=" text/html" href="http://diveintomark.org/"/>   <modified>2003-12-13T18:30:02Z</modified>   <author>     <name>Mark Pilgrim</name>   </author>   <entry>     <title>Atom 0.3 snapshot</title>     <link rel="alternate" type="text/html"      href="http://diveintomark.org/2003/12/13/atom03"/>     <id>tag:diveintomark.org,2003:3.2397</id>     <issued>2003-12-13T08:29:29-04:00</issued>     <modified>2003-12-13T18:30:02Z</modified>   </entry> </feed>

As you can see, while the Atom specification may use different terms for several of the elements, there really isn't too large a difference between the specifications. Personally, I appreciate the addition of the modified tag, to remove some of the ambiguity associated with posts that were later updated. That being said, I prefer the date format used in the RSS feeds. Beyond that, which feed formats you decide to produce is entirely dependent on what your user base desires/is able to interpret. When unsure, produce both. The full Atom spec for versions 0.1–0.3 is available in Appendix B.

Note

The Atom 0.3 specification calls for the Content-Type header to be set to application/atom+xml. However, while I am debugging feeds with a browser, I find it convenient to set the Content-Type header to text/xml, because most browsers will attempt to locate an external registered handler for that type.

Testing Your Feed

It is probably safe to assume that most web programmers have interacted with some semblance of an HTML validator (because you always write valid HTML, right?). There are similar sites devoted to testing feeds in the common formats. It is a good idea to run your feed through one of them for testing. You can access one at http://feedvalidator.org/. It is capable of checking both RSS and Atom feeds.

Producing a Feed with Basic Content

The first stage in producing a feed is choosing a format to provide the feed in. For the purpose of examples in this book RSS 2.0 is used. Figure out what you need from your feed (do you need to attach files, what can your users support, and so on), look at the options, and pick one. From a production standpoint, it doesn't really matter that much, you are still just producing XML in a basic repeating format.

 <?php  header("Content-Type: text/xml"); ?><rss version="2.0">   <channel>     <language>en</language>     <title>Easy Recipes</title>     <description>Easy recipes for the computer hacker/culinary slacker.</description>     <link>http://example.preinheimer.com/feed1.php</link>     <image>       <title>Easy Recipes</title>       <link>http://example.preinheimer.com/feed1.php</link>       <url>http://example.preinheimer.com/feed1.png</url>     </image><?php     include("./common_db.php");     $query = "SELECT title, link, description, author, category,       DATE_FORMAT(pubdate,'%a, %d %b %Y %T EST') as pubdate       FROM 11_basic_feed";     $recipes = getAssoc($query);     foreach($recipes AS $item)     {       echo "<item>\n";       echo "<title>{$item['title']}</title>\n";       echo "<link>{$item['link']}</link>\n";       echo "<description>{$item['description']}</description>\n";       echo "<author>{$item['author']}</author>\n";       echo "<category>{$item['category']}</category>\n";       echo "<pubdate>{$item['pubdate']}</pubdate>\n";       echo "</item>\n";     } ?></channel> </rss>

As you can see, producing the feed is pretty simple. There are several places that code can be trimmed down, but that should serve for easy readability. Note that header("Content-Type: text/xml"); is critical; you need to set this up before you send any other type of output. You will also notice that there is no white space between the opening and closing of the php tags (<?php ?>) and the following XML tag. This is because XML does not ignore white space (unlike HTML). One final element I would like to bring to your attention is the definition of the SQL query, namely DATE_FORMAT(pubdate,'%a, %d %b %Y %T EST') as pubdate. This instructs MySQL to give the date in the RFC 822 format, rather than the MySQL internal timestamp format. Whenever you have the option of either formatting information in PHP, or having your database engine doing it for you, I would recommend the latter. It is likely more efficient (it is, after all, working with one of its internal formats), and will avoid further processing either with a loop or array_walk. Also note that as per the format, the <author> tag must contain an email address. Many individuals I have spoken to consider this to be one of the weak points in the spec, because the format is very easily scraped by automated spiders looking to spam the world. Because the spec requires a syntactfully valid email address, normal tricks like paul (at) example (dot) org and the like won't work. Since the author tag is optional, you may want to encapsulate further information within a namespace.

The output of the script is as follows:

 <rss version="2.0">   <channel>     <language>en</language>     <title>Easy Recipes</title>     <description>Easy recipes for the computer hacker/culinary slacker.</description>     <link>http://example.preinheimer.com/feed1.php</link>     <image>       <title>Easy Recipes</title>       <link>http://example.preinheimer.com/feed1.php</link>       <url>http://example.preinheimer.com/feed1.png</url>     </image><item> <title>Waffles</title> <link>http://example.preinheimer.com/feed1.php?item=2</link> <description>1. Take box out of freezer 2. Remove two waffles from box 3. Place waffles in toaster 4. Depress button 5. Wait for toaster to pop 6. Remove waffles from toaster, place on plate 7. Pour Canadian Maple Syrup on waffles 8. Enjoy</description> <author>Paul</author> <category>Breakfast</category> <pubDate>Tue, 02 Nov 2004 20:28:15 EST</pubDate> </item> <item> <title>Chocolate Chip Cookies</title> <link>http://example.preinheimer.com/feed1.php?item=1</link> <description>1. Take the tube out of the fridge 2. Place the cookie sheet on the counter 3. Cut the tube open 4. Slice the cookie batter into 12 equally sized pieces 5. Place each slice on the cookie sheet 6. Preheat the oven to 350F 7. Place cookie sheet on center rack 8. Wait 20 minutes 9. Remove cookies from oven 10. Burn fingers and enjoy</description> <author>Paul</author> <category>Baking</category> <pubDate>Tue, 02 Nov 2004 20:28:15 EST</pubDate> </item> </channel> </rss>

Note that you do not see the results of the Content-Type header tag; this is an HTTP header that is interpreted by the browser, and (unless you have something like Live Headers in Mozilla Firefox enabled) not shown to the end user.

The database used to provide this example is trivial, but for completeness sake, here it is (just a little reminder: all of this code is online at www.wrox.com for download, so don't waste your time typing this in manually):

 CREATE TABLE '11_basic_feed' (   `id` int(11) NOT NULL auto_increment,   `title` varchar(100) NOT NULL default '',   `link` varchar(100) NOT NULL default '',   `description` text NOT NULL,   `author` varchar(50) NOT NULL default '',   `category` varchar(25) NOT NULL default '',   `pubdate` timestamp(14) NOT NULL,   PRIMARY KEY ('id') ) TYPE=MyISAM; INSERT INTO `11_basic_feed` VALUES (1, 'Chocolate Chip Cookies', 'http://example.preinheimer.com/feed1.php?item=1', '1. Take the tube out of the fridge\r\n2. Place the cookie sheet on the counter\r\n3. Cut the tube open\r\n4. Slice the cookie batter into 12 equally sized pieces\r\n5. Place each slice on the cookie sheet\r\n6. Preheat the oven to 350F\r\n7. Place cookie sheet on center rack\r\n8. Wait 20 minutes\r\n9. Remove cookies from oven\r\n10. Burn fingers and enjoy', 'Paul', 'Baking', '20041102202815'); INSERT INTO '11_basic_feed' VALUES (2, 'Waffles', 'http://example.preinheimer.com/feed1.php?item=2', '1. Take box out of freezer\r\n2. Remove two waffles from box\r\n3. Place waffles in toaster\r\n4. Depress button\r\n5. Wait for toaster to pop\r\n6. Remove waffles from toaster, place on plate\r\n7. Pour Canadian Maple Syrup on waffles\r\n8. Enjoy', 'Paul', 'Breakfast', '20041102202815');

Caching Your Feed

As mentioned earlier with the frequent page loads common to feeds, caching is a really good idea. Just as an example, I update my blog about five times a week; several people appear to have their feed aggregators (mis)configured to grab the feed every 5 minutes. There are two main ways to go about caching. You can either take care of caching on the side of the feed script itself, or you can move the feed generation logic to the script that allows users/administers to add data to the feed. The latter is more efficient, but also more restrictive. These methods are called feed-side caching and generation-side caching, respectively.

Feed-Side Caching

The essence of feed-side caching is for the script being called to generate the feed to handle caching on its own. This has several advantages. First, it fits with the paradigm most of us are comfortable with: The user visits a page, and the page generates and returns the appropriate result. Second, it allows for greater flexibility in terms of how the feed itself is updated. The output of these cached pages is identical. During development and thereafter you may want to consider adding an XML comment to your feed to mark whether it was a cached copy (and if so, the time at which it was cached) or freshly generated.

To update the earlier simple feed script to allow for feed-side caching, you just need to add a few function calls:

 <?php  header("Content-Type: text/xml");  include("./feedcache.php");  if (cacheIsRecent())  {    echo getCache();  }else  {    ob_start(); ?><rss version="2.0">   <channel>     <language>en</language>     <title>Easy Recipes</title>     <description>Easy recipes for the computer hacker/culinary slacker.</description>     <link>http://example.preinheimer.com/feed1.php</link>     <image>       <title>Easy Recipes</title>       <link>http://example.preinheimer.com/feed1.php</link>       <url>http://example.preinheimer.com/feed1.png</url>     </image><?php     include("./common_db.php");     $query = "SELECT * FROM 11_basic_feed";     $recipes = getAssoc($query);     foreach($recipes AS $item)     {       echo "<item>\n";       echo "<title>{$item['title']}</title>\n";       echo "<link>{$item['link']}</link>\n";       echo "<description>{$item['description']}</description>\n";       echo "<author>{$item['author']}</author>\n";       echo "<category>{$item['category']}</category>\n";       echo "<pubdate>{$item['pubdate']}</pubdate>\n";       echo "</item>\n";     }     echo "</channel>\n</rss>";   updateCache(ob_get_contents());   ob_end_flush(); } ?>

The use of a few function calls to manage the cache allows you to use the same basic file for all the feedbased cache examples.

Timed Cache Release

This method checks for the existence of and the timestamp on the cached data every run. If it exists, and was written less than a specified number of minutes ago, it is sent to the users. If the cache doesn't exist, or is too old, the script runs as earlier and updates the cache. Using a flat file can be tempting because of its simplicity; however, you will likely quickly encounter race conditions where more than one invocation of your script begins trying to update the cache simultaneously. You could solve this problem with file locks and timeouts; however, it is probably easier to use a tool explicitly designed for the purpose, a database. SQLite is a perfect choice, lightweight and fast.

Creating the SQLite database, and table:

 <?php $db = new SQLiteDatabase("/tmp/11.timedcache.sqlite"); $db->query("BEGIN;   CREATE TABLE timedCache(id INTEGER PRIMARY KEY, cache BLOB, tstamp TEXT);   COMMIT;"); ?>

As you can see, you don't need the table to hold much, just an ID to serve as a primary key, the cache itself, and the timestamp for when it was created.

Checking to see if the cache is recent:

 function cacheIsRecent() {   $db = sqlite_open("/tmp/11.timedcache.sqlite");   $query = "SELECT tstamp FROM timedCache WHERE id = 1";   $result = sqlite_query($db, $query);   $row = sqlite_fetch_array($result);   if (time() - $row['tstamp'] > (60 * 10))   {     return false;   }else   {     return true;   } }

Loading data into the cache:

 function updateCache($body) {   $db = new SQLiteDatabase("/tmp/11.timedcache.sqlite");   $time = time();   $query = "REPLACE INTO timedCache (id, cache, tstamp) VALUES (1, '$body', '$time')";   $db->query("BEGIN; $query; COMMIT;"); }

Retrieving data from the cache:

 function getCache() {   $db = sqlite_open("/tmp/11.timedcache.sqlite");   $query = "SELECT cache FROM timedCache WHERE id = 1";   $result = sqlite_query($db, $query);   $row = sqlite_fetch_array($result);   return $row['cache']; }

This method can definitely be improved. It uses two queries against the same record when the cache is old. Moving the checking and retrieval logic together (return the row if the cache is current, or null if it is not, for example) and working with that would see some savings.

It should also be noted that during benchmarking, this method of caching actually performed worse than no caching at all! You will likely see similar results if you benchmark against such a contrived dataset (two records in one table designed only to hold feed information). In real-world cases, the query probably spans at least two tables and returns many rows. This leads into a route to performance enhancement not covered here: changing your database structure. Keeping all of the applicable fields in one table would certainly show a performance improvement over cross-table queries. If you are using a database package that supports triggers, these can be used to automate the process (also beyond the scope of this book).

Update Cache at Specified Intervals

If your feed receives sufficient traffic, you can realize significant savings over the previous example by working with the current time, and ignoring any cached data until you need it. Note that under this specific example all requests occurring over the one-second target will refresh the cache:

 function cacheIsRecentTimeOfDay() {   if ((time() % (10 * 60)) == 0)   {          return false;   }else   {          return true;   } }

Update Cache Randomly

The entire point of a cache is to be fast. Any solution involving database queries to determine if the cache is old will lose a few cycles to perform that check. Doing some quick math based on the number of hits your feed receives per day, you can use random numbers to update the cache, on average, throughout the day. Say the feed receives 10,000 hits per day. With 86,400 seconds in a day, that means that your feed receives a hit every 8.4 seconds (assuming queries are evenly distributed throughout the day, which they aren't). If you want the cache to be updated every 10 minutes or so, that means that you should update the cache approximately once for every 71 hits.

 function cacheIsRecentRandom() {   if (rand(0, 71) == 42)   {     return false;   }else   {     return true;   } }

Once the feed has been public for a while, take a look at the number of hits generated per hour throughout the day. It may be wise to refine your algorithm to update less often (say, 1 in 200 hits) during peak hours, and more often (say, 1 in 40 hits) during the slow hours. This can help reduce load during peak times, and still ensure that the feed is as up to date as possible during the slow times.

Update Cache on Update

This method compares the timestamp in the cache and the timestamp on the most recent post. If the post is newer, the page is generated as shown earlier and saved to the cache. This method doesn't present much in the way of savings for this simple example. However, in real-world feeds, where the data needs to come from several tables and be massaged into the appropriate format before display, it does have some use.

 function cacheIsRecent() {   $db = sqlite_open("/tmp/11.timedcache.sqlite");   $query = "SELECT tstamp FROM timedCache WHERE id = 1";   $result = sqlite_query($db, $query);   $row = sqlite_fetch_array($result);   if (time() - $row['tstamp'] > (60 * 10))   {     return false;   }else   {      return true;   } }

Generation-Side Caching

Generation-side caching utilizes scripts to update the cache whenever new content is added to the feed. Though this almost always guarantees the cache is up to date, it can limit the ways that new content can be added.

Cache Deletion on Update

This is probably the most efficient method that still relies on the feed page itself to generate the page. This method assumes you have a script that is used to update the feed, and when it does so, it deletes the cache. The feed script detects this and re-creates the cache. This method will require its own CacheIsRecent function similar to the previous examples, as well as some code in the page to be executed after the database containing the feeds is updated.

 function cacheIsRecentExists() {   $db = sqlite_open("/tmp/11.timedcache.sqlite");   $query = "SELECT tstamp FROM timedCache WHERE id = 1";   $result = sqlite_query($db, $query);   if (sqlite_num_rows($result) == 1)   {     return true;   }else   {     return false;   } }

As with previous examples, this one can be greatly improved by combining the check for data, and when it exists, returning it directly.

Feed Pre-Generation

If you are going to go as far as to involve the script that creates the feed information and deletes the record, why not just create a flat file, and keep all the logic off the heavily trafficked page? This is the most efficient method in terms of server load for each hit against the feed. However, with this and the previous method, you need to ensure that all updates to the feed data are done through capable scripts — no cheating and posting directly to the database. Accomplishing this should be pretty easy, especially with previous code examples. At the end of the script being used to update the feed, tack on code that would generate the feed itself, encapsulated in output buffering. Save that output buffer either to a database as shown earlier, or to a flat file. You can either pass that file through with readfile(), or (for much improved performance) just configure your web server to serve the flat file directly with the appropriate Content-Type header.

Under Apache, you can save the file with a specified extension, configure that extension under the Apache configuration file mime.types, and then restart Apache:

 text/xml     xml

Second, you can save all of your feeds from a specific directory, and use ForceType in that directory to ensure the Content-Type header is set correctly:

 <Location /www/feeds/>   ForceType text/xml </Location>

Finally, you can use the AddType directive to set the type for a specified file type. This has the advantage of being configurable on a VirtualHost, directory, or .htaccess level:

 AddType text/xml .xml

Caching Summary

As you can tell, your options for caching are extensive. Because many of the options presented here can be combined (for example, check for an updated document randomly, or check time since feed was updated randomly), your full set of options is almost endless. Important metrics to consider when trying to choose a caching method include the following: How resource intensive is generating the feed, how often is the feed updated, how is the feed updated, and how many hits does the feed receive? Take a look at what you have and the pros and cons of the various caching options to choose one that is best for you.

Blogs

If you own this book, you know what a blog is. Writing your own blogging software is a common trait among PHP programmers; however, as with all things, the devil is in the details, and there are quite a few things to consider. Feed production is only one of them.

Some of the examples here differ little from the previous examples; however, they go the additional step to provide an example of a standard HTML output for the purposes of exploring trackbacks later in the chapter. The examples continue to use RSS 2.0; converting these to other formats should be trivial.

Note

Often when reading a blog post, one is inspired to write their own, possibly on a related title. Trackbacks are the method used for the second blogger to inform the first of the link. For example: Tim writes a blog post about baking cookies. Bob reads this post, and decides to post his own blog about baking cookies. The use of trackbacks will allow readers of Tim's blog to know that Bob also wrote about a similar topic, and depending on the software used at both ends, readers of both blogs may be informed of the relationship between the posts.

Simple Blog Example

This example uses three files: admin.php, index.php, and feed.php. Each is introduced initially with basic capabilities, and will be expanded to include more advanced features as the example progresses.

First, here is the code for the Database:

 CREATE TABLE '03_simple_blog' (   `id` int(11) NOT NULL auto_increment,   `email` varchar(70) NOT NULL default '',   `subject` varchar(50) NOT NULL default '',   `category` varchar(50) NOT NULL default '',   `post` text NOT NULL,   `date` datetime NOT NULL default '0000-00-00 00:00:00',   `name` varchar(50) NOT NULL default '',   PRIMARY KEY ('id') )

This is the admin.php:

 <?php include("./common_db.php"); if ($_POST['name'] != "") {   $name = $_POST['name']; $email = $_POST['email'];   $subject = $_POST['subject']; $category = $_POST['category'];   $post = $_POST['post']; $date = date('Y-m-d G:i:s');   $query = "INSERT INTO_03_simple_blog    (`id`, `name`, `email`, `subject`, `category`, `post`, `date`)    VALUES (null, '$name', '$email', '$subject', '$category', '$post', '$date')";    $id = insertQueryReturnID($query);    $messages = "Post added, Post id <a href=\"./index.php?entry=$id\">$id</a>"; } ?><html>  <head>   <title>Simple Blog</title>  </head>  <body>   <?php echo $messages; ?>   <form action="#" method=" post">    Name: <input type=" text" name=" name"><br>    Email: <input type=" text" name=" email"><br>    Subject: <input type=" text" name=" subject"><br>    Category: <input type=" text" name=" category"><br>    Post:<br><textarea name=" post"></textarea><br>    <input type=" submit" value=" Submit">   </form>  </body> </html>

This is your basic, zero-security admin page. This will suffice to allow you to add entries to your basic blog, as long as they don't contain any quotes or other things that could confuse the simple script. The same common_db.inc file used in previous examples has been used here; you can view the full code of this file in Appendix A.

Figure 4-1 shows how the sample administration page should look.

image from book
Figure 4-1

Here is the code for the index.php file:

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"    "http://www.w3.org/TR/html4/strict.dtd"> <html> <head>  <META HTTP-EQUIV=" Content-Type" CONTENT=" text/html; charset=iso-8859-1">  <title>My Sample Blog</title>  <link rel=" alternate" type=" application/rss+xml" title=" RSS"   href="http://example.preinheimer.com/blog/feed.php"> </head> <body> <h1>Sample Blog</h1> <?php   include("./common_db.php");   if (is_numeric($_GET['entry']))   {     $query = "SELECT * FROM 03_sample_blog WHERE id = '{$_GET['entry']}'";   }else   {     $query = "SELECT * FROM 03_sample_blog ORDER by `id` DESC";   }   $blogEntries = getAssoc($query,2);   foreach($blogEntries AS $entry)   {     echo "<h2><a href=\"./index.php?entry={$entry['id']}\">       {$entry['subject']}</a></h2>\n";     echo "<b>{$entry['category']}</b>\n";     echo "<p>{$entry['post']}</p>\n";     echo "<a href=\" mailto:{$entry['email']}\">{$entry['name']}</a>\n";     echo "({$entry['date']})\n";   } ?> </body> </html>

Again, note that this example as it stands contains major security flaws, primarily the use of unfiltered data from a GET request in a SQL query. Using this script on a live web server is just asking to be the target of a SQL Injection attack. This script includes the ability to load a specific post, rather than all of them, so this should assist in ensuring that links created from the feed can point to a relevant page.

Figure 4-2 shows a sample index page.

image from book
Figure 4-2

Finally, here is the code for the feed.php file:

 <?php  header("Content-Type: text/xml"); ?><rss version="2.0">   <channel>     <language>en</language>     <title>Sample Blog</title>     <description>My Sample blog example.</description>     <link>http://example.preinheimer.com/blog/</link>     <image>       <title>Sample Blog Logo</title>       <link>http://example.preinheimer.com/feed1.php</link>       <url>http://example.preinheimer.com/feed1.png</url>     </image><?php     include("./common_db.php");     $query = "SELECT subject, id, post, email, name, category,       DATE_FORMAT(date,'%a, %d %b %Y %T EST') as date       FROM 03_simple_blog ORDER BY id DESC";     $recipes = getAssoc($query);     $url = "http://example.preinheimer.com/blog/index.php?post=";     foreach($recipes AS $item)     {       echo "<item>\n";       echo "<title>{$item['subject']}</title>\n";       echo "<link>$url{$item['id']}</link>\n";       echo "<description>{$item['post']}</description>\n";       echo "<author>{$item['email']}</author>\n";       echo "<category>{$item['category']}</category>\n";       echo "<pubDate>{$item['date']}</pubDate>\n";       echo "</item>\n";     }     echo "</channel>\n</rss>"; ?>

Here is the sample output for the feed.php file:

 <rss version="2.0">   <channel>     <language>en</language>     <title>Sample Blog</title>     <description>My Sample blog example.</description>     <link>http://example.preinheimer.com/blog/</link>     <image>       <title>Sample Blog Logo</title>       <link>http://example.preinheimer.com/feed1.php</link>       <url>http://example.preinheimer.com/feed1.png</url>     </image><item> <title>Raking Leaves</title> <link>http://example.preinheimer.com/blog/index.php?post=2</link> <description>I hate raking leaves. Over the past two weeks we have raked the leaves from the property into a pile that is 90 feet long, three feet high, and four feet wide. </description> <author>paul@preinheimer.com</author> <category>Chores</category> <pubDate>Wed, 17 Nov 2004 12:30:16 EST</pubDate> </item> <item> <title>Snowing</title> <link>http://example.preinheimer.com/blog/index.php?post=1</link> <description>It is snowing outside! This is the first snowfall of the season, so it is definately very exciting.</description> <author>paul@preinheimer.com</author> <category>Weather</category> <pubDate>Mon, 08 Nov 2004 12:43:52 EST</pubDate> </item> </channel> </rss>

This code example should look remarkably similar to the one earlier. Different database, some different fields, but not much of a real change.

Trackbacks

To empower trackbacks, you need to add capability to both the admin and index scripts. This will allow the author to send trackbacks to other blogs while posting, and to receive trackbacks via the index page for their own posts. Trackbacks were invented by the authors of Moveable Type to allow bloggers to source the ideas for their comments.

Trackbacks to Other Blogs

The following code includes a sendTrackBack function within the admin.php script. This will allow the administrator to send trackbacks for his/her posts.

 function sendTrackBack($url, $title, $excerpt, $postURL) {   $blogname = urlencode("Sample Blog");   $title = urlencode($title);   if (strlen($excerpt) > 252)   {     $excerpt = substr($excerpt, 0, 252) . "...";   }   $excerpt = urlencode($excerpt);   $url_info = parse_url($url);   $host = $url_info['host'];   $path = $url_info['path'] . "?" . $url_info['query'];   $url = urlencode($url); $data=" tb_url=$url&url=$postURL&blog_name=$blogname&title=$title&excerpt=$excerpt";

First, you should declare the function to initialize the blog name (this is probably going to remain the same for all trackbacks; if not, it can be added to the function's variable list). The excerpt should be no longer than 255 characters, so it is trimmed if necessary, and all appropriate variables are URL encoded. The HTTP request requires different parts of the trackback URL to be used differently, so the host must be specified on its own in the request, as must the path and query. The parse_url() function makes this a snap. The $data variable contains all the information that will be passed with the request; it should be constructed in advance, so that the appropriate Content-Length header can be passed.

   $fp=fsockopen($host, 80);   fputs($fp, "POST ". $path . " HTTP/1.1\r\n");   fputs($fp, "Host: ". $host ."\r\n");   fputs($fp, "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain\r\n");   fputs($fp, "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n");   fputs($fp, "Connection: close\r\n");   fputs($fp, "Content-Type: application/x-www-form-urlencoded\r\n");   fputs($fp, "Content-Length: ". strlen($data) . "\r\n\r\n");   fputs($fp, "$data");

A socket is opened to the specified URI, headers are passed, as is the data from the request. Note that the Connection: close header does not close the connection; it is an HTTP header that indicates that the connection should be closed after this request (as opposed to Connection: keep-alive). Also note the Content-Type header, letting the server/script know that the data is going to be URL encoded.

   $response="";   while(!feof($fp))   {     $response.=fgets($fp, 128);   }   fclose($fp);

Retrieve all of the response information from the host, and then close the connection.

   list($http_headers, $http_content)=explode("\r\n\r\n", $response);   if (substr_count($http_content, "<error>0</error") > 0)   {     return "Trackback Successful";   }else if (substr_count($http_content, "<error>1</error") > 0)   {     return "Trackback Failed";   }else   {     return "Unrecognized response, Bad URL?";   } }

Split the response into its header and content portions, then determine if the request was successful. A more robust (but more difficult to read) method of checking the response would be with regex, to allow for new lines before and after the response code. If the response doesn't contain either of the expected responses, it is likely that a bad URL was entered.

Changing the existing admin.php file to make use of the trackback function is trivial. Add the appropriate function call:

   $postURL = "<a href=\"./index.php?entry=$id\">$id</a>"   $messages = "Post added, Post id $postURL";   if ($_POST['trackback'] != "")   {     echo sendTB($_POST['trackback'], $_POST['subject'], $_POST['post'], $postURL);   }

And the appropriate form element:

 Category: <input type=" text" name=" category"><br> Trackback: <input type=" text" name=" trackback"><br> Post:<br><textarea name=" post"></textarea><br>

That's it. Trackbacks are a useful tool for bloggers to communicate with each other and reference sources. There is also the added benefit of increased visibility for each link. Note that you didn't save the trackback URL to be displayed with the post. When sending a trackback, it is generally desirable to either list the trackbacks you sent at the end of the post, or (more commonly) just to reference the original source for your ideas inline with the rest of your post. If you choose to specifically list which trackbacks were sent at the end of a post, just record the URL into the database when the trackback succeeds, and add the element to your index.php script.

Receiving Trackbacks

You can send trackbacks, so why not return the favor by receiving them? Receiving trackbacks is simpler than sending them, because you do not need to generate a raw HTTP request to accomplish the results. A few changes to index.php is all that is required. You need to store these trackbacks in a separate table because it will likely end up being a one-to-many relationship between the post and associated trackbacks.

 SQL-query: CREATE TABLE '03_simple_blog_trackback' (   `blog_id` INT NOT NULL ,   `blogName` VARCHAR(80) NOT NULL ,   `title` VARCHAR(80) NOT NULL ,   `url` VARCHAR(150) NOT NULL ,   `excerpt` VARCHAR(255) NOT NULL );

The field values should be pretty self-explanatory; the excerpt field is 255 characters long (as is specified by the spec), and the rest were pretty arbitrary. While I am generally a fan of studlyCaps over underscores, I feel using underscores in field values is an effective way to indicate relationships.

Updating index.php will require three changes: an addition to provide for receipt and receipt acknowledgment of trackbacks from remote scripts, a minor addition to display trackback links for each story, and finally to display trackbacks to users viewing the page normally.

 if ($_GET['action'] == "trackback")   {     echo "Trackback";     if ($_POST['url'] == "")     {       echo '<?xml version="1.0" encoding="iso-8859-1"?>       <response>         <error>1</error>         <message>URL required</message>       </response>       ';       exit;

Under the trackback specification, a URL is required; if one is not received, return the appropriate error code.

     }else if (!is_numeric($_GET['id']))     {       echo '<?xml version="1.0" encoding="iso-8859-1"?>       <response>         <error>1</error>         <message>Invalid Trackback ID</message>       </response>       ';       exit;

If the ID that the remote user is attempting to trackback is non-numeric, it cannot be valid (in this instance), so an error should be returned. This check could be extended to ensure that the ID was numeric, and that being a valid post for accepting trackbacks.

     }else     {       $id = $_GET['id'];       $blogName = mysql_escape_string($_POST['blog_name']);       $title = mysql_escape_string($_POST['title']);       $excerpt = $_POST['excerpt'];       if (strlen($excerpt) > 252)       {         $excerpt = substr($excerpt, 0, 252) . "...";       }       $excerpt = mysql_escape_string($excerpt);       $query = "INSERT INTO 03_simple_blog_trackback       (`blog_id`, `blogName`, `title`, `url`, `excerpt`) VALUES       ('$_GET[id]', '$_POST[blog_name]', '$_POST[title]', '$_POST[url]',       '$_POST[excerpt]')";       insertQuery($query);       echo '<?xml version="1.0" encoding=" iso-8859-1"?>       <response>         <error>0</error>       </response>       ';       exit;     }   }else if (is_numeric($_GET['entry']))   {     $query = "SELECT * FROM 03_simple_blog WHERE id = '{$_GET['entry']}'";   }else

Finally, at this point assume that the trackback is valid, and insert the appropriate data (after escaping any quotes, and trimming the excerpt if necessary) into the database.

Publicizing Trackback URL

In this section, you declare the existence and location of the trackback URL in two different manners: first in regular HTML for a human-readable form, and secondly using XML/RDF to support auto discovery of the trackback URLs.

The human-readable link is trivial, continuing with the earlier index.php example:

 echo "<a href=\"mailto:{$entry['email']}\">{$entry['name']}</a>\n"; echo "({$entry['date']})\n<br>"; echo "<a href=\"http://example.preinheimer.com/blog/index.php?   action=trackback&id={$entry['id']}\">Trackback</a>";

The auto discovery declaration is a little more complicated. You will recognize the rdf declarations from the discussion on extending RSS feeds. The completed foreach loop to display each of the entries looks like this:

 foreach($blogEntries AS $entry)   {     $pageURL = "http://example.preinheimer.com/blog/index.php";     echo "<h2><a href=\"$pageURL?entry={$entry['id']}\">       {$entry['subject']}</a></h2>\n";     echo "<b>{$entry['category']}</b>\n";     echo "<p>{$entry['post']}</p>\n";     echo "<a href=\" mailto:{$entry['email']}\">{$entry['name']}</a>\n";     echo "({$entry['date']})\n<br>";     echo "<a href=\"$postURL?action=trackback&id={$entry['id']}\">Trackback</a>";     echo '<!--\n<rdf:RDF xmlns:rdf=" http://www.w3.org/1999/02/22-rdf-syntax-ns#"        xmlns:dc="http://purl.org/dc/elements/1.1/"        xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">      <rdf:Description        rdf:about="$pageURL?entry=' . $entry['id'] . '"        dc:identifier="$pageURL?entry=' . $entry['id'] . '"        dc:title="' . $entry['subject'] . '"        trackback:ping="$pageURL?action=trackback&id=' . $entry['id'] . '"/>      </rdf:RDF>\n-->'; }

The addition of the $pageURL variable was necessary to ensure the lines of code fit on this page. It also helps separate the URLs that are local and the RDF declarations that point to external sites.

The XML code shown here is exactly what the spec specifies — nothing too drastic.

You can also publicize your trackback links in the RSS feed for your document. Here is some example output. Changes required to output this document should be obvious based on previous examples.

 <rss version="2.0"   xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" >  <channel>   <language>en</language>   <title>Example News.com</title>   <description>News and commentary from the cross-platform scripting community.</description>   <link>http://www.wroxnews.com/</link>   <image>    <title>Example News.com</title>    <link>http://www.wroxnews.com/</link>    <url>http://www.wroxnews.com/pics/button.png</url>   </image>   <item>     <title>Feed usage up 1000%!</title>     <link>http://www.wroxnews.com/story2/index.html</link>     <description>Feed usage has increased a dramatic 1000% in the past 48hrs alone, this may...</description>     <trackback:ping rdf:resource=" http://www.wroxnews.com/story2/tb.cgi?tb_id=2"/>   </item>   <item>     <title>RSS Feed Goes Live!</title>     <link>http://wroxnews.org/story1/index.html</link>     <description>Wroxnews.com is proud to announce the launch of our premier RSS service...</description>     <trackback:ping rdf:resource=" http://www.wroxnews.com/story2/tb.cgi?tb_id=2"/>   </item>  </channel> </rss>

Trackback Spam

Unfortunately, trackback spam is slowly becoming more widespread. The point with trackback spam is not usually to generate clicks, but instead to assist with search engine indexing by creating thousands of links from hundreds of different domains to specific pages using certain keywords. This widespread linking can improve the search engine ranking of a particular site considerably. The attack is quite similar to those used in comment spam — some automated crawler searches the Web for URLs matching a certain pattern (often seeking specific blogging software, but occasionally not) and uses a pregenerated GET or POST request against the URL to advertise its wares. Unfortunately, my favorite defense against comment spam (using a Completely Automated Public Turing Test to Tell Humans and Computers Apart (CAPTHCA)) doesn't work with trackbacks, as both desired and undesired trackbacks will be generated by automated scripts. You do, however, have a few options.

Check for Links

First, recognize what trackbacks are for, and hence a common attribute that most valid trackbacks should have, and their site should link to yours. If Sally read your blog and decided to comment on it, it is probably a reasonable assumption that she posted a link to your original story somewhere within her blog, either in the post itself or underneath where her software lists outgoing trackbacks. This function is designed to be used after determining that the trackback is valid (contains a URL and such), but before the data is placed in the database. It simply downloads the remote page and checks for a link back to your page. Upon discovery, it returns true; if it is missing, it returns false.

   function checkLinkBack($remoteURL, $localURL)   {     $page = implode('', file($remoteURL));     if (stristr($page, $localURL) != FALSE)     {       return true;     }else     {       return false;     }   }

This method does have a few failings:

The remote software package might send trackback pings before saving the post, and as such the URL containing the post might not be available or might not contain the URL to your blog yet, for entirely valid reasons.
It could provide an easy method to launch a denial of service attack against your blog, making repeated trackbacks indicating a URL that either resolves to a really long page or a particularly slow connection, wasting resources on repeated requests.
The author of the post might only refer to your site in a textual manner; for example, "I was reading Paul's blog today," rather than actually containing a link.

That being said, the method is still worth consideration. The first failing could be mitigated by having your blog software check trackbacks in the database after the fact (say, 24 hours later), or when instructed to do so by an administrator.

Blacklisted Words

The comment and trackback spam I receive on a regular basis generally contains one of a few keywords over and over again. By checking for the presence of commonly spammed words, you can likely detect trackbacks without having to download a remote page.

 function checkBadWords($excerpt) {   $wordList = array('debt', 'poker', 'weight-loss', 'phentermine', 'diet');   foreach($wordList as $word)   {     if (stristr ($excerpt, $word) != FALSE)     {       return false;     }   }   return true; }

You should of course change the wordlist to match your own experiences with spammers. They appear to work in phases, sending a lot of spam of a certain type for a while and then moving on. As with the previous method, this one contains a few weaknesses:

If this method of blacklisting becomes common, trackback spammers will follow the lead of those who send email spam, and begin misspelling words, encoding characters, and so on.
Valid posts could contain one of those words, and thusly be ignored.
Trackback spam changes over time, so you will need to update the word list to stay current.

URL Dissection

When sending a trackback, the trend is to link to a specific post rather than to the root domain of a page. The reasons behind this are obvious — the referenced post will not be on the front page anymore, so following trackback links would be pointless. Comment and trackback spam, however, often link to the root domain, and as such, should be identifiable.

 function checkURL($remoteURL) {   $urlInfo = parse_url($remoteURL);   if (str_len($urlInfo['path'] > 1))   {     return true;   }else   {     return false;   } }

The parse_url() function makes this process trivial, splitting the URL into its separate segments for you (those being scheme, host, path, query). You want the path portion of the URL to be longer than a single character (www.wrox.com/ would have a path of "/") to indicate that a more complex path was used. As always, this method does have its weaknesses:

The spec for trackbacks is pretty loose; someone could send you a trackback to their root page.
Trackback spammers could start linking to pages deeper within their domains.

Ultimately, the methods you decide to run with will depend on how big a problem trackback spam becomes for you. None of these methods are terribly complex, and should any particular spammer decide to investigate your blog, they can be easily broken. That, however, usually isn't the problem. Spammers use automated software to deluge tens or hundreds of blogs. The fact that your blog in particular is resistant likely won't be noticed. A longer-term solution (and more advanced one) could use some combination of the previous methods together with admin intervention.

Retail Store Example

This example takes an imaginary store, Tom's Garden Shed, and creates a sample RSS feed that the store could use to inform its eager public. The store will actually offer several feeds: a product feed (including pricing information), a weekly garden tip feed, and a weekly sale item feed. These three feeds will make use of both the RSS and Atom specifications, and will extend where appropriate via XML namespacing. I'm going to ignore the data entry side of these feeds to concentrate on the ways the feeds can be produced. As such, I have included some sample data with the table specs. Please remember that all this code is available online, at www.wrox.com, so don't type this in if you can avoid it.

Here's the product table:

 CREATE TABLE '03_store_products' (   `id` int(11) NOT NULL auto_increment,   `name` varchar(50) NOT NULL default '',   `description` text NOT NULL,   `category` varchar(25) NOT NULL default '',   `price` decimal(5,2) NOT NULL default '0.00',   `unit` varchar(25) NOT NULL default '',   PRIMARY KEY ('id') ) TYPE=MyISAM; INSERT INTO `03_store_products` VALUES (1, 'Marigolds', 'Beautiful perennial flower', 'perennial', 0.99, 'pack of 100 seeds'); INSERT INTO `03_store_products` VALUES (2, 'Tulip', 'Beautiful flower with two lips', 'Annual', 3.69, 'bulb');

Here's the sale table:

 CREATE TABLE '03_store_sales' (   `week` int(11) NOT NULL default '0',   `product_id` int(11) NOT NULL default '0',   `sale_price` decimal(5,2) NOT NULL default '0.00',   `sale_unit` varchar(25) NOT NULL default '',   `blurb` text NOT NULL ) TYPE=MyISAM; INSERT INTO `03_store_sales` VALUES (1, 1, 0.69, 'pack of 100 seeds', 'Marigolds are on sale this week only!'); INSERT INTO `03_store_sales` VALUES (1, 2, 4.59, 'Twin pack of 2 bulbs', 'Our supplier sent us the wrong shipment, so we can pass on the savings of these great twin packs on to you!');

Finally, here's the tips table:

 CREATE TABLE '03_store_tips' (   `id` int(11) NOT NULL auto_increment,   `week` int(11) NOT NULL default '0',   `name` varchar(25) NOT NULL default '',   `email` varchar(50) NOT NULL default '',   `title` varchar(50) NOT NULL default '',   `tip` text NOT NULL,   PRIMARY KEY ('id') ) TYPE=MyISAM; INSERT INTO `03_store_tips` VALUES (1, 1, 'Tom', 'tom@tomsgardenshed.com', 'Landscaping a shady hill', 'One question I often get asked is, Tom, how can I get the grass to grow on the top of, and the slope of, a small shaded hill on my property? No matter what I try it dies off mid season!.<br><br>My Answer: You don't. Yes, there are hardier strains of grass out there, but with the combination of shade, poor irrigation and the packed clay soil prevalent in the area you are going to waste a lot of time on a small portion of your yard. You do however have several options. Ground covering ivy, or other low to ground shrubs will thrive in this environment, and should have shallow enough roots to avoid damaging your trees. ');

Feeds

You will start off with one of the easier feeds, the tip feed. This feed is designed to provide customers with a weekly garden tip. With rapid adoption of feed support (both RSS and to a lesser extent Atom) into mainstream email clients, feeds are quickly becoming an attractive alternative to mailing lists.

Advantages of providing a feed versus running a mailing list are as follows:

Feeds require no administration; users handle registration and removal on the client side.
Users may forget they subscribed to a particular list and mark the message as spam, placing delivery of messages from your entire domain (not just your mailing list) at risk. Once you are listed as a sender of unsolicited mail, it can be very difficult to lose that label.
No stress on your mail server; sending a large batch of messages out on a weekly basis can delay delivery of regular email.

Disadvantages of providing a feed versus running a mailing list are as follows:

Users may be unfamiliar with the feed concept, even though they likely own software that supports reading it. Down the road it will likely become possible to create a link with a registered protocol that will automatically create a subscription in the appropriate software, but it isn't here yet.
You can't forward a feed. Often mailing list recipients will forward a particular message to a friend. This may not work quite like they expect in the average client.
Repeated pointless loads. Even if you only update your feed once a month, most clients will probably hit your feed at least once an hour, even if your feed indicates the appropriate update interval.

With these points in mind, Tom still wants a feed for his weekly tips. He will use some of the lessons learned in Chapter 3 to present the most recent tip on his homepage.

Because Tom wants to provide for both Atom and RSS users, one script will provide for both types of output:

 <?php  include("../common_db.php");  switch ($_GET['format'])  {   case "atom":    displayATOM();    break;   case "rss":   default:    displayRSS();    break;  }

The feed gets off to an easy start — it expects to be called with either ?format=atom or ?format=rss, and will use that information to decide with which format to answer the request.

This is the Atom format:

 function displayATOM() {  header("Content-Type: application/atom+xml");  $url = "http://example.preinheimer.com/store/index.php";  $query = "SELECT id, name, email, title, tip,   DATE_FORMAT(pubDate,'%Y-%c-%dT%H:%i:%S-04:00') as pubDate   FROM 03_store_tips ORDER BY pubDate";

The date format shown here differs greatly from the one seen in RSS feeds because the requirements are quite different. Other than that, the code required so far differs little from an RSS feed.

   $tips = getAssoc($query, 2);   ?><?xml version="1.0" encoding="utf-8"?>   <feed version="0.3" xmlns="http://purl.org/atom/ns#">    <title>Tom's Garden Shed</title>    <link rel=" alternate" type="text/html"     href=" http://example.preinheimer.com/store/index.php" />    <modified><?php echo $tips[0]['pubDate'] ?></modified>    <author>     <name>Tom's Garden Shed</name>    </author>

Notice the XML namespace declaration, but unlike previous examples, it isn't prefixed with a namespace (they often read xmlns:namespace). As such, elements within the namespace may also forgo the usual namespace prefix.

    <?php    foreach($tips AS $item)    {     if (strlen($item['tip']) > 252)     {      $item['tipTrim'] = substr($item['tip'], 0, 252) . "...";     }else     {      $item['tipTrim'] = $item['tip'];     }

As usual, we will trim the tip to provide a summary field. A more professional feed would likely have a true summary, rather than the first X characters from the feed. Further enhancements might include stripping any HTML tags from the summary to ensure it displays properly in non-HTML environments.

     echo "<entry>\n";     echo "<title>{$item['subject']}</title>";     echo "<link rel=\" alternate\" type=\"text/html\" href=\"$url?entry={$item['id']}\" />";     echo "<id>$url?entry={$item['id']}</id>";     echo "<summary>{$item['tipTrim']}</summary>\n";     echo "<content>{$item['tip']}</content>";     echo "<issued>{$item['pubDate']}</issued>";     echo "<modified>{$item['pubDate']}</modified>";     echo "</entry>\n";    }    echo "</feed>"; }

The issued and modified dates are both populated from the same database field. Again, in a more professional feed these would likely be independent elements to allow for updates to be appropriately declared.

Sample output of code:

 <?xml version="1.0" encoding=" utf-8"?> <feed version="0.3" xmlns="http://purl.org/atom/ns#">  <title>Tom's Garden Shed, Weekly Tips</title>  <link rel=" alternate" type="text/html" href="http://example.preinheimer.com/store/index.php" />  <modified>2004-10-31T23:59:59-04:00</modified>  <author>   <name>Tom</name>  </author>  <entry>   <title>Planting Tips</title>   <link rel=" alternate" type="text/html" href="http://example.preinheimer.com/store/index.php?entry=2" />   <id>http://example.preinheimer.com/store/index.php?entry=2</id>   <summary>I often get called in to examine a garden that the owner feels is performing poorly. The most common problem is usually over crowding. Don't forget to space your plants adequately, the seed package, or the information packet that comes with a bulb for a...</summary>   <content>I often get called in to examine a garden that the owner feels is performing poorly. The most common problem is usually over crowding. Don't forget to space your plants adequately, the seed package, or the information packet that comes with a bulb for appropriate spacing information. As a general rule of thumb you should plant two items no closer than half of its fully grown height. </content>   <issued>2004-10-31T23:59:59-04:00</issued>   <modified>2004-10-31T23:59:59-04:00</modified>  </entry>  <entry>   <title>Landscaping a shady hill</title>   <link rel=" alternate" type="text/html" href="http://example.preinheimer.com/store/index.php?entry=1" />   <id>http://example.preinheimer.com/store/index.php?entry=1</id>   <summary>One question I often get asked is, 'Tom, how can I get the grass to grow on the top of, and the slope of, a small shaded hill on my property? No matter what I try it dies off mid season'. My Answer: You don't. Yes, there are hardier strains of grass ...</summary>   <content>One question I often get asked is, 'Tom, how can I get the grass to grow on the top of, and the slope of, a small shaded hill on my property? No matter what I try it dies off mid season'. My Answer: You don't. Yes, there are hardier strains of grass out there, but with the combination of shade, poor irrigation and the packed clay soil prevalent in the area you are going to waste a lot of time on a small portion of your yard. You do however have several options. Ground covering ivy, or other low to ground shrubs will thrive in this environment, and should have shallow enough roots to avoid damaging your trees. </content>   <issued>2004-11-07T23:59:59-04:00</issued>   <modified>2004-11-07T23:59:59-04:00</modified>  </entry> </feed>

The RSS feed isn't too different from the Atom example, nor from previous examples.

 function displayRSS() {   header("Content-Type: text/xml");    $query = "SELECT id, name, email, title, tip,     DATE_FORMAT(pubDate,'%a, %d %b %Y %T EST') as pubDate     FROM 03_store_tips ORDER BY pubDate";     $tips = getAssoc($query, 2);   ?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">   <channel>     <language>en</language>     <title>Tom's Garden Shed, Weekly Tips</title>     <description>Weekly gardening tips from Tom</description>     <generator>Tom's Feed Generator, v0.01b</generator>     <ttl>1440</ttl>

Notice the content XML namespace. This allows you to include the entire post, not just a summary within the feed. I've also included the TTL declaration in this script; aggregators should notice this and refresh the feed only once a week. You could also look into the skipDays and skipHours elements to further define this more clearly.

     <pubDate><?php echo $tips[0]['pubDate'] ?></pubDate>     <link>http://example.preinheimer.com/store/</link>     <image>       <title>Tom's Garden Shed</title>       <link>http://example.preinheimer.com/store/</link>       <url>http://example.preinheimer.com/store/tom.png</url>     </image><?php     foreach($tips AS $item)     {       $url = "http://example.preinheimer.com/store/index.php";       if (strlen($item['tip']) > 252)       {         $item['tipTrim'] = substr($item['tip'], 0, 252) . "...";       }else       {         $item['tipTrim'] = $item['tip'];       }       echo "<item>\n";       echo "<title>{$item['title']}</title>\n";       echo "<link>$url?item={$item['id']}</link>\n";       echo "<description>{$item['tipTrim']}</description>\n";       echo "<content:encoded>{$item['tip']}</content:encoded>";       echo "<author>{$item['email']}</author>\n";       echo "<pubDate>{$item['pubDate']}</pubDate>\n";       echo "</item>\n";     }     echo "</channel>\n</rss>"; }

Overall the script shouldn't be that confusing.

Sample output:

 <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">   <channel>     <language>en</language>     <title>Tom's Garden Shed, Weekly Tips</title>     <description>Weekly gardening tips from Tom</description>     <generator>Tom's Feed Generator, v0.01b</generator>     <ttl>1440</ttl>     <pubDate>Sun, 07 Nov 2004 23:59:59 EST</pubDate>     <link>http://example.preinheimer.com/store/</link>     <image>       <title>Tom's Garden Shed</title>       <link>http://example.preinheimer.com/store/</link>       <url>http://example.preinheimer.com/store/tom.png</url>     </image><item> <title>Landscaping a shady hill</title> <link>http://example.preinheimer.com/store/index.php?item=1</link> <description>One question I often get asked is, 'Tom, how can I get the grass to grow on the top of, and the slope of, a small shaded hill on my property? No matter what I try it dies off mid season'. My Answer: You don't. Yes, there are hardier strains of grass ...</description> <content:encoded>One question I often get asked is, 'Tom, how can I get the grass to grow on the top of, and the slope of, a small shaded hill on my property? No matter what I try it dies off mid season'. My Answer: You don't. Yes, there are hardier strains of grass out there, but with the combination of shade, poor irrigation and the packed clay soil prevalent in the area you are going to waste a lot of time on a small portion of your yard. You do however have several options. Ground covering ivy, or other low to ground shrubs will thrive in this environment, and should have shallow enough roots to avoid damaging your trees. </content:encoded><author>tom@tomsgardenshed.com</author> <pubDate>Sun, 07 Nov 2004 23:59:59 EST</pubDate> </item> </channel> </rss>

One of the items from this output has been cut to save on space.

Product Feed

Just to keep things interesting, Tom's product feed will use the Smarty Templating System.

Note

If you are interested in learning more about Smarty, take a look at http://smarty.php.net/. It serves to facilitate separation of business and presentation logic. A detailed exploration of Smarty's feature set is beyond the scope of this book.

Here is the product template:

 <rss version="2.0"   xmlns:content="http://purl.org/rss/1.0/modules/content/"   xmlns:plantInfo="http://example.preinheimer.com/store/plantInfo">   <channel>     <language>en</language>     <title>Tom's Garden Shed, Weekly Tips</title>     <description>Weekly gardening tips from Tom</description>     <generator>Tom's Feed Generator, v0.01b</generator>     <ttl>1440</ttl>     <pubDate>Sun, 07 Nov 2004 23:59:59 EST</pubDate>     <link>http://example.preinheimer.com/store/</link>     <image>       <title>Tom's Garden Shed</title>       <link>http://example.preinheimer.com/store/</link>       <url>http://example.preinheimer.com/store/tom.png</url>     </image>      {section name=tip loop=$tips}       <item>         <title>{$tips[tip].title}</title>         <link>http://example.preinheimer.com/pinfo.php?item={$tips[tip].id}</link>         <description>{$tips[tip].name}</description>         <content:encoded>{$tips[tip].description}</content:encoded>         <pubDate>{$tips[tip].pubDate}</pubDate>         <plantInfo:type>{$tips[tip].category}</plantInfo:type>         <plantInfo:price>{$tips[tip].price}</plantInfo:price>         <plantInfo:unit>{$tips[tip].unit}</plantInfo:unit>        </item>       {/section}   </channel> </rss>

Here is the product.php file:

 <?php   header("Content-Type: text/xml");   include("../common_db.php");   require('Smarty.class.php');   $smarty = new Smarty;   $smarty->template_dir = '/www/smarty/example.preinheimer.com/templates/';   $smarty->compile_dir = '/www/smarty/example.preinheimer.com/templates_c/';   $smarty->config_dir = '/www/smarty/example.preinheimer.com/configs/';   $smarty->cache_dir = '/www/smarty/example.preinheimert.com/cache/';    $query = "SELECT id, name, description, category, unit, price     FROM 03_store_products ORDER BY id";     $tips = getAssoc($query, 2);    $url = "http://example.preinheimer.com/store/index.php";   $smarty->assign('url', $url);   $smarty->assign('tips', $tips);   $smarty->display('product.tpl'); ?>

And here is the sample output:

 <rss version="2.0"   xmlns:content="http://purl.org/rss/1.0/modules/content/"   xmlns:plantInfo="http://example.preinheimer.com/store/plantInfo">  <channel>   <language>en</language>   <title>Tom's Garden Shed, Weekly Tips</title>   <description>Weekly gardening tips from Tom</description>   <generator>Tom's Feed Generator, v0.01b</generator>   <ttl>1440</ttl>   <pubDate>Sun, 07 Nov 2004 23:59:59 EST</pubDate>  <link>http://example.preinheimer.com/store/</link>   <image>    <title>Tom's Garden Shed</title>    <link>http://example.preinheimer.com/store/</link>    <url>http://example.preinheimer.com/store/tom.png</url>   </image>   <item>     <title></title>         <link>http://example.preinheimer.com/pinfo.php?item=1</link>         <description>Marigolds</description>         <content:encoded>Beautiful perennial flower</content:encoded>         <pubDate></pubDate>         <plantInfo:type>perennial</plantInfo:type>         <plantInfo:price>0.99</plantInfo:price>         <plantInfo:unit>pack of 100 seeds</plantInfo:unit>        </item>             <item>         <title></title>         <link>http://example.preinheimer.com/pinfo.php?item=2</link>         <description>Tulip</description>         <content:encoded>Beautiful flower with two lips</content:encoded>         <pubDate></pubDate>         <plantInfo:type>Annual</plantInfo:type>         <plantInfo:price>3.69</plantInfo:price>         <plantInfo:unit>bulb</plantInfo:unit>        </item>         </channel> </rss>