Hack 80 Reformatting Bugtraq Reports

Hack 80 Reformatting Bugtraq Reports

figs/expert.gif figs/hack80.gif

Since Bugtraq is such an important part of a security administrator's watch list, it'll only be a matter of time before you'll want to integrate it more closely with your daily habits .

In this hack, we will write some code to extract the latest Bugtraq reports from http://www.security-focus.com and then output the simplified results for your viewing pleasure . Bugtraq, if you're not familiar with it, is a moderated discussion list devoted to security issues. Discussions are detailed accounts of new security issues and vulnerabilities, both how they're exploited and how they can be fixed. Let's start by examining the web page where the Bugtraq report is located: http://www.security-focus.com/archive/1.

One nice thing to notice about this page is that the data is formatted in a table, complete with column headers. We can use those headers to simplify the data-scraping process by using a handy Perl module called HTML::TableExtract (http://search.cpan.org/author/MSISK/HTML-TableExtract/). TableExtract allows us to scrape the data from the web page without tying our code to a particular layout (at least, not too much). It accomplishes this feat by using those nice column headers. As long as those column headers stay the same, then the script should continue to work, even if SecurityFocus gives the page a facelift. In addition to that nice feature, TableExtract takes all the hard work out of parsing the HTML for the data we're after. Let's get started.

In the end, this script will use runtime options to allow the user to choose from a number of output formats and locations. I'm not a big fan of those one-letter flags sent to scripts to choose options, so we'll be using short words instead.

The Code

You'll need the HTML::TableExtract and LWP::Simple modules to grab the Bugtraq page. As we add more features, you'll also need XML::RSS , Net::AIM , and Net::SMTP . You could use other modules like URI::URL or HTML::Element to simplify this hack even further.

There are a couple of things to note about this code. We start by retrieving the arguments passed to the script that will be used to determine the output formats; we'll discuss those later. Next, the data scraped from the Bugtraq page is stuck into a custom data structure to make accessing it easier for later additions to this hack. Also, a subroutine is added to format the data contained in the data structure to ensure minimal code duplication once we have to format for multiple types of output.

Save the following code to a file called bugtraq_hack.pl :

 #!/usr/bin/perl -w use strict; use LWP::Simple; use HTML::TableExtract; use Net::SMTP; use Net::AIM; use XML::RSS; # get params for later use. my $RUN_STATE = shift(@ARGV); # the base URL of the site we are scraping and # the URL of the page where the bugtraq list is located. my $base_url = "http://www.security-focus.com"; my $url      = "http://www.security-focus.com/archive/1"; # get our data. my $html_file = get($url) or die "$!\n"; # create an iso date. my ($day, $month, $year) = (localtime)[3..5]; $year += 1900; my $date = "$year-$month-$day"; # since the data we are interested in is contained in a table, # and the table has headers, then we can specify the headers and # use TableExtract to grab all the data below the headers in one # fell swoop. We want to keep the HTML code intact so that we # can use the links in our output formats. start the parse: my $table_extract =    HTML::TableExtract->new(      headers   => [qw(Date Subject Author)],      keep_html => 1 ); $table_extract->parse($html_file); # parse out the desired info and # stuff into a data structure. my @parsed_rows; my $ctr = 0; foreach my $table ($table_extract->table_states) {    foreach my $cols ($table->rows) {       @$cols[0] =~ m(\d+/\d+/\d+);       my %parsed_cols = ( "date" =>  );       # since the subject links are in the 2nd column, parse unwanted HTML       # and grab the anchor tags. Also, the subject links are relative, so       # we have to expand them. I could have used URI::URL, HTML::Element,       # HTML::Parse, etc. to do most of this as well.       @$cols[1] =~ s/ class="[\w\s]*"//;       @$cols[1] =~ m(<a href="(.*)">(.*)</a>);       $parsed_cols{"subject_html"} = "<a href=\"$base_url\"></a>";       $parsed_cols{"subject_url"}  = "$base_url";       $parsed_cols{"subject"}      = ;       # the author links are in the 3rd       # col, so do the same thing.       @$cols[2] =~ s/ class="[\w\s]*"//;       @$cols[2] =~ m(<a href="mailto:(.*@.*)">(.*)</a>);       $parsed_cols{"author_html"}  = ;       $parsed_cols{"author_email"} = ;       $parsed_cols{"author"}       = ;       # put all the information into an       # array of hashes for easy access.       $parsed_rows[$ctr++] = \%parsed_cols;    } }   # if no params were passed, then # simply output to stdout. unless ($RUN_STATE) { print &format_my_data(  ); } # formats the actual # common data, per format. sub format_my_data(  ) {    my $data = "";    foreach my $cols (@parsed_rows)  {       unless ($RUN_STATE) { $data .= "$cols->{'date'} $cols->{'subject'}\n"; }    }    return $data; } 

Running The Hack

Invoke the script on the command line to view the latest Bugtraq listings:

 %  perl bugtraq.pl  07/11/2003 Invision Power Board v1.1.2 07/11/2003 LeapFTP remote buffer overflow exploit 07/11/2003 TSLSA-2003-0025 - apache 07/11/2003 W-Agora 4.1.5 ...etc... 

Okay, that was easy, but what if you want it in HTML, RSS, email, or sent to your AIM account? No problem.

Hacking the Hack

Before we get to the code that handles the different outputs, let's start with the format_my_data( ) subroutine. This will be used to decide what format we want our data to be presented in, tweak the display based on that decision, and then return the results. We'll use the $RUN_STATE variable to decide what action format_my_data( ) will take. Normally, I would try to keep the code and variables used inside a subroutine as black-boxed as possible, but in this case, to keep things simple and compact, we'll be accessing the dreaded global variables directly. Here's the new code:

 sub format_my_data(  ) {    my $data = "";    foreach my $cols (@parsed_rows)  {       unless ($RUN_STATE  $RUN_STATE eq 'file') {          $data .= "$cols->{date} $cols->{subject}\n";        }       elsif ($RUN_STATE eq 'html') {          $data .= "<tr>\n<td>$cols->{date}</td>\n".                   "<td>$cols->{subject_html}</td>\n".                   "<td>$cols->{author_html}</td>\n</tr>\n";       }       elsif ($RUN_STATE eq 'email') {          $data .= "$cols->{date} $cols->{subject}\n".                   "link: $cols->{subject_url}\n";       }       elsif ($RUN_STATE eq 'aim') {          $data .= "$cols->{date} $cols->{subject} $cols->{subject_url}\n";       }    }    return $data; } 

Now, let's implement the different runtime options. We'll set up similar conditional code from the format_my_data( ) function in the main body of the script so that the script can handle all of the various output tasks . Here's the code for outputting to email, file, RSS, HTML, and AIM. The AIM networking code is similar to [Hack #99], so, in the interest of brevity, I've declined to show it here:

 unless ($RUN_STATE) { print &format_my_data(  ); } elsif ($RUN_STATE eq 'html') {    my $html .= "<html><head><title>Bugtraq $date</title></head><body>\n";    $html    .= "<h1>Bugtraq listings for: $date</h1><table border=0>\n";    $html    .= "<tr><th>Date</th><th>Subject</th><th>Author</th></tr>\n";    $html    .= &format_my_data(  ) . "</table></body></html>\n";    print $html; } elsif ($RUN_STATE eq 'email') {    my $mailer = Net::SMTP->new('   your mail server here   ');    $mailer->mail('   your sending email address   ');    $mailer->to('   your receiving email address   ');    $mailer->data(  );    $mailer->datasend("Subject: Bugtraq Report for $date\n\n");    $mailer->datasend( format_my_data );    $mailer->dataend(  );    $mailer->quit; } elsif ($RUN_STATE eq 'rss') {    my $rss = XML::RSS->new(version => '0.91');    $rss->channel(title           => 'SecurityFocus Bugtraq',                  link            => $bugtraq_url,                  language        => 'en',                  description     => 'Latest Bugtraq listings' );        # add items to the RSS object.    foreach my $cols (@parsed_rows) {       $rss->add_item(title       => $cols->{date},                      link        => $cols->{subject_url},                      description => $cols->{subject} );    } print $rss->as_string; } elsif ($RUN_STATE eq 'aim') {   # AIM-related code goes here.   } 

So what else could you do to enhance this hack? How about adding support for other instant messengers or allowing multiple command-line options at once? Alternatively, what about having the AIM bot email the Bugtraq report upon request, or make it a CGI script and output the RSS to an RSS aggregator like AmphetaDesk (http://www. disobey .com/amphetadesk/) or NetNewsWire (http://ranchero.com/netnewswire)?

William Eastler



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net