Hack 76 Getting the Best Travel Route by Train

figs/moderate.gif figs/hack76.gif

A web scraper can help you find faster train connections in Europe .

If you ever visit Europe and want to travel by train, you will find the PKP (Polskie Koleje Panstwowe, or Polish State Railways) server (http://www.rozklad.pkp.pl) a handy place to find information about European train connections.

This hack queries the timetables of the PKP site and scrapes a variety of information from the results, including the time of departure and arrival, as well as the number of changes you'll have to make along the way.

The Code

Save the following code as broute.pl :

 #!/usr/bin/perl -w # # broute.pl #  # A European train timetable hack that displays available train connections # between two cities, with dates, times, and the number of changes. You # can limit the number of acceptable changes with -c. If there are no # connections, try earlier/later times/dates or search again for connections # with intermediate stops, e.g., instead of Manchester -> Roma, choose  # Manchester -> London, London -> Paris, and Paris -> Roma. #  # This code is free software; you can redistribute it and/or # modify it under the same terms as Perl itself. # use strict; use LWP::UserAgent; use Net::HTTP; use Getopt::Std; my $help = <<"EOH"; --------------------------------------------------------------------------- Best train routes in Europe Options: -a   depart from          -z   arrive in          -d   date (of departure, if -s d; arrival, if -s a)               in dd.mm.yy format (e.g. June 1, 2004 is 01.06.04)          -t   time (of departure, if -s d; arrival, if -s a)               in hh:mm format (e.g. 12:45)          -s   select time point for -d and -t options, default -s d          -c   maximum number of changes, default 0          -h   print this help EOH # set out command-line options, # requirements, and defaults. my %args; getopt('ha:z:d:t:s:c:', \%args); die $help if exists $args{h}; die $help unless $args{a}; die $help unless $args{z}; die $help unless $args{t}; $args{'s'} = 'depart' unless $args{'s'}; $args{'s'} = 'depart' if $args{'s'} eq 'd'; $args{'s'} = 'arrive' if $args{'s'} eq 'a'; # our requesting agent. define our URL and POST. my $url  = 'http://www.rozklad.pkp.pl/cgi-bin/new/query.exe/en'; my $post = "protocol=http:&from=$args{a}&to=$args{z}&datesel=custom".            "&date=$args{d}&timesel=$args{s}&time=$args{t}"; # the headers we'll send off... my $hdrs = HTTP::Headers->new(Accept => 'text/plain',                  'User-Agent' => 'PKPTrainTimetableLookup/1.0'); # and the final requested documents. my $uable = HTTP::Request->new(POST, $url, $hdrs, $post); my $ua    = LWP::UserAgent->new; my $req = $ua->request($uable); # if a success, # let's parse it! die $req->message   unless $req->is_success; my $doc = $req->content; $doc =~ s/[\f\t\n\r]//isg; # remove linefeeds. while ($doc =~ m/ NAME=sel[0-9]{1,2}>/isg) {     my $begin = pos($doc);     $doc =~ m/<TR>/isg;     my $end = pos($doc);     next unless $begin;     next unless $end;     # munch our content into columns.     my $content = substr($doc, $begin, ($end -= 5) - $begin);     $doc = substr($doc, $end);     my @columns = split(/<TD/, $content); shift @columns;     foreach my $column (@columns) {         $column = '<TD' . $column;         $column =~ s/<[^>]*>//g;         $column =~ s/<[^>]*//g;     }     # skip schedules that have more hops than we want.     if ($args{c} and int $args{c} < int $columns[2]) { next; }     # and print out our data.     print "-" x 80 . "\n";     print "             From: $columns[0]\n";     print "               To: $columns[1]\n";     print "          Changes: $columns[2]\n";     print "Date of Departure: $columns[3]\n" if $args{'s'} eq 'depart';     print "  Date of Arrival: $columns[3]\n" if $args{'s'} eq 'arrive';     print "   Departure Time: $columns[4]\n";     print "     Arrival Time: $columns[5]\n"; } 

Running the Hack

The script has several command-line options that are viewable in the code or by requesting its display with perl broute.pl -h .

Here are a couple of example runs. Let's find all connections from Berlin to Szczecin with an arrival time of 8:00 A.M. on December 15, 2004 with no changes:

 %  perl broute.pl -a Berlin -z Szczecin -s a -d 12.15.04 -t 8:00 -c 0  

How about all connections from Manchester to Rome with departure time of 8:00 A.M. on December 15, 2004 with a maximum of four changes:

 %  perl   broute.pl -a Manchester -z Roma -s d -d 12.15.04 -t 8:00 -c 4  

A typical run looks something like this:

 trying http://www.rozklad.pkp.pl/cgi-bin/new/query.exe/en ... -------------------------------------------------------------------------              From: Berlin Ostbf                To: Szczecin G_wny           Changes: 0   Date of Arrival: 05.07.03 Departure Time: 5:55   Arrival Time: 7:41 

Hacking the Hack

There a few things you can do to expand this hack. For example, you could add subroutines that find connections within 24 hours (12 hours before and 12 hours ahead) of the given time of departure or arrival. Another addition could be a module that displays names of the transfer stations .

Jacek Artymiak



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net