Section 10.5. FedEx Parcel Tracker


10.5. FedEx Parcel Tracker

Christmas is coming, and Santa has outsourced his deliveries to Federal Express. Lucky us, as that means we can use FedEx's online shipment tracker to watch our parcels wend their merry way here. Except the tiresome chore of refreshing the FedEx site is just too much to handle. Let's let a nice script elf take care of it and create a feed for every parcel.

Although FedEx and its rivals do provide APIs, we won't be using them here. FedEx's page is easy enough to scrape by brute force, and it's fun to do so. Of course, when it next changes its page layout, this script will need rejigging. It's easy to see how to do that when it happens.

10.5.1. Walking Through the Code

So, starting with the usual Perl standards of warnings;, strict;, XML::RSS, and CGI, let's use LWP::Simple to retrieve the page and the marvellous HTML::TokeParser to do the dirty work. More on that anon.

use warnings; use strict; use XML::RSS; use CGI qw(:standard); use LWP::Simple 'get'; use HTML::TokeParser;

Now let's set up some variables to use later, then fire up the CGI module and grab the tracking number from the query string. To use this script, therefore, you need to request:

http://www.example.org/fedextracker.cgi?track=123456789

where 123456789 is the tracking number of the parcel:

my ( $tag, $headline, $url, $date_line ); my $last_good_date; my $table_end_check; my $cgi             = CGI::new( ); my $tracking_number = $cgi->param('track');

Now we're ready to jingle. Using LWP::Simple's get method, pull down the page from the FedEx site. FedEx, bless them, employ openly understandable URLs, so this is easy to set up. Once that's downloaded, throw it into a new instance of the HTML::TokeParser module, and we're ready for scraping:

my $tracking_page =   get( "http://fedex.com/Tracking?action=track&tracknumber_list=$tracking_number&cntry_code=us"   ); my $stream = HTML::TokeParser->new( \$tracking_page );

Now is as good a time as any to start off XML::RSS and fill in some channel details:

my $rss = XML::RSS->new( ); $rss->channel(     title => "FedEx Tracking: $tracking_number",     link  => "http://fedex.com/Tracking?action=track&tracknumber_list=$tracking_number&cntry_code=us" );

From now on, we're using the HTML::TokeParser module, skipping from tag to tag until we get to the section of the HTML to scrape. The inline comments say what we're up to.

# Go to the right part of the page, skipping 13 tables (!!!) $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); # Now go inside the tracking details table $stream->get_tag("table"); $stream->get_tag("tr"); $stream->get_tag("/tr"); $stream->get_tag("tr"); $stream->get_tag("/tr");

By this point, you're at the table to parse, so loop through it, getting the dates and locations. You need to stop at the bottom of the table, so test for a closing /table tag. You can do so with a named loop and a last...if... command.

You'll notice that in this section, we use those mysterious variables from earlier. Because the table is displayed with the date mentioned only once per day, no matter how many stops the parcel makes on that day, you need to keep track of it.

PARSE: while ( $tag = $stream->get_tag('tr') ) {     $stream->get_tag("td");     $stream->get_tag("/td");     # Test here for the closing /tr. If it exists, we're done.     # Now get date text     $stream->get_tag("td");     $stream->get_tag("b");     my $date_text = $stream->get_trimmed_text("/b");     # The page only mentions the date once, so we need to fill in any blanks     # that might occur.          if ( $date_text eq "\xa0" ) {         $date_text = $last_good_date;     }     else {         $last_good_date = $date_text;     }     # Now get the time text     $stream->get_tag("/b");     $stream->get_tag("/td");     $stream->get_tag("td");     my $time_text = $stream->get_trimmed_text("/td");     $time_text =~ s/\xa0//g;     # Now get the status     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("td");     my $status = $stream->get_trimmed_text("/td");     $status =~ s/\xa0//g;     # Now get the location     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("td");     my $location = $stream->get_trimmed_text("/td");     $location =~ s/\xa0//g;     # Now get the comment     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("td");     my $comment = $stream->get_trimmed_text("/td");     $comment =~ s/\xa0//g;     # Now go to the end of the block     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/tr");     # OK, now we have the details, we need to put them into a feed     # Do what you want with the info:

Still inside the loop, create an item for the RSS feed:

      if ($status) {         $rss->add_item(             title => "$status $location $date_text $time_text",             link  => "http://fedex.com/us/tracking/?action=track&tracknumber_list=$tracking_number",             description => "Package number $tracking_number was last seen in $location at $time_text on $date_ text, with the status, $status. $comment Godspeed, little parcel! Onward, tiny package!"         );     }     # Stop parsing after the pickup line.     last PARSE if ( $status eq "Picked up " ); }

All that done, you can serve it up nice and festive:

print header('application/rss+xml'); print $rss->as_string;

10.5.2. The Entire Listing

#!/usr/bin/perl use warnings; use strict; use XML::RSS; use CGI qw(:standard); use LWP::Simple 'get'; use HTML::TokeParser; my ( $tag, $headline, $url, $date_line ); my $last_good_date; my $table_end_check; my $cgi             = CGI::new( ); my $tracking_number = $cgi->param('track'); my $tracking_page =   get( "http://fedex.com/Tracking?action=track&tracknumber_list=$tracking_number&cntry_code=us"   ); my $stream = HTML::TokeParser->new( \$tracking_page ); my $rss = XML::RSS->new( ); $rss->channel(     title => "FedEx Tracking: $tracking_number",     link  => "http://fedex.com/Tracking?action=track&tracknumber_list=$tracking_number&cntry_code=us" ); # Go to the right part of the page, skipping 13 tables (!!!) $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); $stream->get_tag("table"); # Now go inside the tracking details table $stream->get_tag("table"); $stream->get_tag("tr"); $stream->get_tag("/tr"); $stream->get_tag("tr"); $stream->get_tag("/tr"); PARSE: while ( $tag = $stream->get_tag('tr') ) {     $stream->get_tag("td");     $stream->get_tag("/td");     # Test here for the closing /tr. If it exists, we're done.     # Now get date text     $stream->get_tag("td");     $stream->get_tag("b");     my $date_text = $stream->get_trimmed_text("/b");     # The page only mentions the date once, so we need to fill in any blanks     # that might occur.          if ( $date_text eq "\xa0" ) {         $date_text = $last_good_date;     }     else {         $last_good_date = $date_text;     }     # Now get the time text     $stream->get_tag("/b");     $stream->get_tag("/td");     $stream->get_tag("td");     my $time_text = $stream->get_trimmed_text("/td");     $time_text =~ s/\xa0//g;     # Now get the status     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("td");     my $status = $stream->get_trimmed_text("/td");     $status =~ s/\xa0//g;     # Now get the location     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("td");     my $location = $stream->get_trimmed_text("/td");     $location =~ s/\xa0//g;     # Now get the comment     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("td");     my $comment = $stream->get_trimmed_text("/td");     $comment =~ s/\xa0//g;     # Now go to the end of the block     $stream->get_tag("/td");     $stream->get_tag("/td");     $stream->get_tag("/tr");     # OK, now we have the details, we need to put them into a feed     # Do what you want with the info:     if ($status) {         $rss->add_item(             title => "$status $location $date_text $time_text",             link  => "http://fedex.com/us/tracking/?action=track&tracknumber_list=$tracking_number",             description => "Package number $tracking_number was last seen in $location at $time_text on $date_ text, with the status, $status. $comment Godspeed, little parcel! Onward, tiny package!"         );     }     # Stop parsing after the pickup line.     last PARSE if ( $status eq "Picked up " ); } print header('application/rss+xml'); print $rss->as_string;



    Developing Feeds with RSS and Atom
    Developing Feeds with Rss and Atom
    ISBN: 0596008813
    EAN: 2147483647
    Year: 2003
    Pages: 118

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net