Section 10.9. The W3C Validator to RSS


10.9. The W3C Validator to RSS

Of all the tasks of Hercules, the one where he had to keep his web site's XHTML validated was the hardest. Without wanting to approach the whole Valid XHTML Controversy, we can still safely say that keeping a site validated is a pain. You have to validate your code, most commonly using the W3C validator service at http://validator.w3.org, and you have to keep going back there to make sure nothing has broken.

You have to do that unless, of course, you're subscribed to a feed of validation results. This script does just that, providing an RSS interface to the W3C validator.

You pass the URL you want to test as a query in the feed URL, like so: http://www.example.org/validator.cgi?url=http://www.example.org/index.html.

10.9.1. Walking Through the Code

We're using the traditional Perl start plus LWP::Simple and XML::Simple, which will parse the results coming back from the validator. Note that, in the classic gotcha, LWP::Simple and CGI clash, so we have to add those additional flags to prevent a type mismatch.

use warnings; use strict; use XML::RSS; use CGI qw(:standard); use LWP::Simple 'get'; use XML::Simple;

Now, grab the URL from the query string, and use LWP::Simple to retrieve the results. The W3C provides an XML output mode for the validator, and this is what we're using here. It is, however, classed as beta and flakey, and might not always work.

my $cgi = CGI::new( ); my $url = $cgi->param('url'); my $validator_results_in_xml =   get("http://validator.w3.org/check?uri=$url;output=xml");

Curiously enough, the top of the XML that is returned causes XML::Simple to throw an error. Use a split function to trim off this broken section:

my ( $broken_xml_to_ignore, $trimmed_validator_results_in_xml ) =   split ( /]>/, $validator_results_in_xml );

Now, place the valid XML into an XML::Simple object, and parse it:

my $parsed_validator_results = XMLin($trimmed_validator_results_in_xml);

Now is a good a time as any to set up the top of the feed:

my $rss = new XML::RSS( version => '2.0' ); $rss->channel( title => "XHTML Validation results for $url", link  => "http://validator.w3.org/check?uri=$url", description => "w3c validation results for $url" );

Then it's a simple matter of running through each error message the validator gives and turning it into a feed item:

foreach my $error ( @{ $parsed_validator_results->{'messages'}->{'msg'} } ) {     $rss->add_item(         title       => "Line $error->{'line'} $error->{'content'}",         link        => "http://validator.w3.org/check?uri=$url",         description => "Line $error->{'line'} $error->{'content'}",     ); }

Finally, serve it up:

print header('application/xml+rss'); print $rss->as_string;

10.9.2. The Entire Listing

#!/usr/bin/perl use warnings; use strict; use XML::RSS; use CGI qw(:standard); use LWP::Simple 'get'; use XML::Simple; my $cgi = CGI::new( ); my $url = $cgi->param('url'); my $validator_results_in_xml =   get("http://validator.w3.org/check?uri=$url;output=xml"); my ( $broken_xml_to_ignore, $trimmed_validator_results_in_xml ) =   split ( /]>/, $validator_results_in_xml ); my $parsed_validator_results = XMLin($trimmed_validator_results_in_xml); my $rss = new XML::RSS( version => '2.0' ); $rss->channel( title => "XHTML Validation results for $url", link  => "http://validator.w3.org/check?uri=$url", description => "w3c validation results for $url" ); foreach my $error ( @{ $parsed_validator_results->{'messages'}->{'msg'} } ) {     $rss->add_item(         title       => "Line $error->{'line'} $error->{'content'}",         link        => "http://validator.w3.org/check?uri=$url",         description => "Line $error->{'line'} $error->{'content'}",     ); } print header('application/xml+rss'); print $rss->as_string;



    Developing Feeds with RSS and Atom
    Developing Feeds with Rss and Atom
    ISBN: 0596008813
    EAN: 2147483647
    Year: 2003
    Pages: 118

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net