Graphing search results over time can lead to interesting discoveries . If you're doing regular research over time, the quality of results might become just as interesting as the quantity. In other words, you might find it useful to track how popular certain words are getting on the Internet as events occur and time passes . Many search engines offer varying levels of date-search capacity, including Google. With other engines, however, we'd have to use some scraping techniques to do result counts by date. With Google, we just need to use the Google API and some code. In order to use this code, you'll need the Julian::Date module and a Google API key (which can be obtained for free by registering at http://api.google.com/). Before we continue, there are two things of note:
The CodeSave the following code as goocount.pl : #!/usr/bin/perl -w # goocount.pl # Runs the specified query for every day between the specified # start and end dates, returning date and count as CSV. From # Tara Calishain, Rael Dornfest, and Google Hacks. # # usage: goocount.pl query="{query}" start={date} end={date} # where dates are of the format: yyyy-mm-dd, e.g. 2002-12-31 # use strict; use SOAP::Lite; use Time::JulianDay; use CGI qw/:standard/; # Your Google API developer's key. my $google_key = 'insert key here'; # Location of the GoogleSearch WSDL file. my $google_wdsl = "./GoogleSearch.wsdl"; # For checking date validity. my $date_regex = '(\d{4})-(\d{1,2})-(\d{1,2})'; # Make sure all arguments are passed correctly. ( param('query') and param('start') =~ /^(?:$date_regex)?$/ and param('end') =~ /^(?:$date_regex)?$/ ) or die qq{usage: goocount.pl query="{query}" start={date} end={date}\n}; # Julian date manipulation. my $query = param('query'); my $yesterday_julian = int local_julian_day(time) - 1; my $start_julian = (param('start') =~ /$date_regex/) ? julian_day(,,) : $yesterday_julian; my $end_julian = (param('end') =~ /$date_regex/) ? julian_day(,,) : $yesterday_julian; # Create a new Google SOAP request. my $google_search = SOAP::Lite->service("file:$google_wdsl"); # Start our CSV file. print qq{"date","count"\n}; # Iterate over each of the Julian dates for your query. foreach my $julian ($start_julian..$end_julian) { $full_query = "$query daterange:$julian-$julian"; my $results = $google_search->doGoogleSearch( $google_key, $full_query, 0, 10, "false", "", "false", "", "latin1", "latin1" ); # Output our CSV record. print '"', sprintf("%04d-%02d-%02d", inverse_julian_day($julian)), qq{","$result->{estimatedTotalResultsCount}"\n}; } Running the HackRun the code from the command line, like so: % perl goocount.pl query="PalmOS" start=2002-01-01 end=2002-12-31 This query searches for the keyword " PalmOS " over the entire year of 2002. (Since each day takes one query key, running the script with these parameters would take 365 keys.) As output, you'll get a list of dates and numbers on the screen in this format: "date", "count" "2001-01-01", "200" "2001-01-02", "210" And so on and so on. If you want to save the results to a comma-delimited format (for easy import into Excel) append your query with a filename, like this: % perl goocount.pl query="PalmOS" start=2002-01-01 end=2002-12-31 > data.csv Perhaps you want to run this script under cron to gather information every day. Just run it without a date in the query (it'll default to today's date) and a >> to write additional information to the comma-delimited file: % perl goocount.pl query="PalmOS" >>data.csv Hacking the HackAs written in this hack, the Google count script is a client-side application, but you can turn it into a web-based application with a little tweaking. Just change the program as noted in the following code (changes are shown in bold). And remember, this application can use a lot of API keys. Don't make this application publicly available unless you give users the option of using their own keys. Otherwise, you'll probably burn out your key! ... print header( ) , start_html("GooCount: $query") , start_table({-border=>undef}, caption("GooCount:$query")) , Tr([ th(['Date', 'Count']) ]); foreach my $julian ($start_julian..$end_julian) { $full_query = "$query daterange:$julian-$julian"; my $results = $google_search->doGoogleSearch( $google_key, $full_query, 0, 10, "false", "", "false", "", "latin1", "latin1" ); print Tr([ td([ sprintf("%04d-%02d-%02d", inverse_julian_day($julian)) , $result->{estimatedTotalResultsCount} ]) ]); } print end_table( ) , end_html; See Also
Tara Calishain and Rael Dornfest |