Hack53.Build Your Own Web Measurement Application: Marketing Data


Hack 53. Build Your Own Web Measurement Application: Marketing Data

At this point, we're sure you're itching to generate some real, useful data with your "build your own" application. In this hack, we attack common marketing measurements, including number of visits, page views per visit, referrers, search terms, and entry pages.

In this hack, we shall continue writing our miniature web analytics program. In [Hack #12], we parsed a logfile and collated the individual lines into visitor sessions. Now we shall report some actual results.

3.18.1. The Code

Previously, we used a class called Data to hold the statistics, but we didn't define that class. It's time to do that now. Save this code into a file called Data.pm.

     package Data;     use strict;     The number of items to list in each report     my $top_n = 100; 

At this stage, we will report the total number of sessions, the total number of requests, the list of referrers [Hack #1], and the list of search terms [Hack #43]. We shall also report the list of entry pages; assuming you have set up your ad campaigns to have different entry pages [Hack #58], this also tells you the number of visits from each campaign.

This constructor initializes all the variables we will need at this stage. We will add more variables in subsequent hacks.

      sub new {        return bless {          total_sessions => 0,          total_requests => 0,          referrers => {},          search_terms => {},          entry_pages => {},       };     }     # Just before deleting an old session, add its data to the totals.     sub AddSession {     my ($self, $sess) = @_;     ++$self->{total_sessions};     my $reqs = $sess->NumRequests();     $self->{total_requests} += $reqs;     my $referrer = $sess->Referrer();     ++$self->{referrers}->{$referrer} if ($referrer);     my $search_term = $sess->SearchTerm();     ++$self->{search_terms}->{$search_term} if ($search_term);     ++$self->{entry_pages}->{$sess->EntryPage()};    } 

The rest of the functions just output the data in a very simple (and not very beautiful) format.

    sub WriteReport {      my $self = shift;      $self->WriteSummary();      $self->WriteHash('Referrers', 'referrers');      $self->WriteHash('Search Terms', 'search_terms');      $self->WriteHash('Entry Pages', 'entry_pages');    }    # Write a report title, underlined.    sub ReportTitle {     my ($self, $title) = @_;     print "\n$title\n";     print "-" for 1..(length $title);     print "\n";   }    # Write the summary statistics.   sub WriteSummary {     my $self = shift;     $self->ReportTitle('Summary Statistics');     printf "Total sessions: %d\n", $self->{total_sessions};      printf "Total pages: %d\n", $self->{total_requests};     printf "Pages per session: %.1f\n",        $self->{total_sessions} == 0 ? 0:        $self->{total_requests} / $self->{total_sessions};    }   # Sort and write one of the hash tables. This function will output a hash   table in this format:   # 13: web analytics demystified   # 5: web analytics   # 2: web analytics reviews   # 2: analytics demystified   sub WriteHash {    my ($self, $report_name, $hashname) = @_;    $self->ReportTitle($report_name);   # Sort the items in order of frequency, and print in columns.    my $hashref = $self->{$hashname}; my $n = scalar keys %$hashref; if ($top_n < $n) { $n = $top_n; } for ((sort {$hashref->{$b} <=> $hashref->{$a}} keys %$hashref)[0..$n-1]) { printf "%9s: %s\n", $hashref->{$_}, $_; }   } 

Next, we need to enhance the Session (Session.pm) class we previously defined to report some statistics about the session.

   # The number of requests the session contains is the length of the array of   requests.   package Session;   …   sub NumRequests {     my $self = shift;     return scalar @$self;  }   # The entry page and the referrer for the session are the URL and referrer  of the first request.  sub EntryPage {    my $self = shift;    return $self->[0]->{file}; }  sub Referrer {    my $self = shift;    return $self->[0]->{referrer}; } 

The search term is more complicated. We need to extract the relevant part of the referrer. For this, we need a list of all the search engines and which parameters they use for the search term.

     my %search_engines =          (a9 => 'q', altavista => 'q', aol => 'query', ask => 'q',  dmoz => 'search', google => 'q', kanoodle => 'query',          msn => 'q', teoma => 'q', yahoo => 'p');    sub SearchTerm {       my $self = shift;       my $referrer = $self->[0]->{referrer};       if (!$referrer) { return undef; }   # Check the search engines one by one.   # Is the referrer in the correct format?   # If so, return the found search term.   # If we fail to find a search term, return undef. keys %search_engines; # resets the iterator for the following "each"  while (my ($engine, $param) = each %search_engines) { if ($referrer =~ m!^http:// # starts with http:// (?:[\w\.]+\.)? # e.g. "www." or "search." or "" $engine\. # e.g. "google." .*\? # the URL stem followed by "?" (?:.*&)? # possibly some arguments ending in ampersand $param=([^&]*)!x) # the parameter=value we are looking for { return $1; }   } return undef;  }  # Finally, the main program calls Sessions::WriteReport().  # We need to make that function devolve to Data::WriteReport().  package Sessions;  …  sub WriteReport {    my $self = shift;    $self->{DATA}->WriteReport();  } 

3.18.2. Running the Code

To run the program, you will need Perl installed on your computer. If you are using Unix or Linux, you almost certainly have Perl already, but if you are using Windows, you may not. You can download ActiveState's Perl for Windows from http://www.activestate.com/Products/ActivePerl.

All that remains now is to tell readlog.pl where the page.log filegenerated by the readtag.pl program and your JavaScript page tag [Hack #12]is located, and the rest is automatic!

From the command line, assuming that page.log is in the same directory as readlog.pl, all you need to do is type:

    perl readlog.pl page.log 

Figure 3-19 has sample output showing summary statistics and the number of visits coming to your site from each measured referring URL [Hack #1].

Figure 3-19. Output from readlog.pl


The program is now self-contained, and can run and produce data. In subsequent hacks, we will add additional data collection and more reports to further increase the functionality of the basic system.

Dr. Stephen Turner and Eric T. Peterson



    Web Site Measurement Hacks
    Web Site Measurement Hacks: Tips & Tools to Help Optimize Your Online Business
    ISBN: 0596009887
    EAN: 2147483647
    Year: 2005
    Pages: 157

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net