Hack 79 Word Associations with Lexical Freenet

figs/moderate.gif figs/hack79.gif

There will come a time when you want a little more than simple word definitions, synonyms, or etymologies. Lexical Freenet takes you beyond these simple results, providing associative data, or "paths," from your word to others .

Lexical Freenet (http://www.lexfn.com) allows you to search for word relationships like puns, rhymes, concepts, relevant people, antonyms, and so much more. For example, a simple search for the word disease returns a long listing of word paths , each associated with other words by different types of connecting arrows: disease triggers both aids and cancer ; comprises triggers symptoms ; and bio triggers such relevant persons as janet elaine adkins , james parkinson , alois alzheimer , and so on. This is but a small sampling of the available and verbose output.

In combination with Super Word Lookup" [Hack #78], a command-line utility of the Lexical Freenet functionality would bring immense lookup capabilities to writers, librarians, and researchers. This hack shows you how to create said interface, with the ability to customize which relationships you'd like to see, as well as turn the visual connections into text.

The Code

Save the following code as lexfn.pl :

 #!/usr/bin/perl-w # # Hack to query and report from www.lexfn.com # # This code is free software; you can redistribute it and/or # modify it under the same terms as Perl itself. # # by rik - ora@rikrose.net # ###################### # support stage      # ###################### use strict; use Getopt::Std qw(getopts); use LWP::Simple qw(get); use URI::Escape qw(uri_escape uri_unescape); use HTML::TokeParser; sub usage (  ) { print " usage: lexfn [options] word1 [word2] options available:  -s Synonymous     -a Antonym        -b Birth Year  -t Triggers       -r Rhymes         -d Death Year  -g Generalizes    -l Sounds like    -T Bio Triggers  -S Specialises    -A Anagram of     -k Also Known As  -c Comprises      -o Occupation of  -p Part of        -n Nationality  or -x for all word1 is mandatory, but some searches require word2\n\n" } ###################### # parse stage        # ###################### # grab arguments, and put them into %args hash, leaving nonarguments # in @ARGV for us to process later (where word1 and word2 would be) # if we don't have at least one argument, we die with our usage. my %args; getopts('stgScparlAonbdTkx', \%args); if (@ARGV > 2  @ARGV == 0) { usage(  ); exit 0; } # turn both our words into queries. $ARGV[0] =~ s/ /\+/g; $ARGV[1] = ""; if ($ARGV[1]) { $ARGV[1] =~ s/ /\+/g; } # begin our URL construction with the keywords. my $URL = "http://www.lexfn.com/l/lexfn-cuff.cgi?sWord=$ARGV[0]".           "&tWord=$ARGV[1]&query=show&maxReach=2"; # now, let's figure out our command-line arguments. each # argument is associated with a relevant search at LexFN, # so we'll first create a mapping to and fro. my %keynames = (  s => 'ASYN', t => 'ATRG', g => 'AGEN', S => 'ASPC', c => 'ACOM',   p => 'APAR', a => 'AANT', r => 'ARHY', l => 'ASIM', A => 'AANA',   o => 'ABOX', n => 'ABNX', b => 'ABBX', d => 'ABDX', T => 'ABTR',   k => 'ABAK' ); # if we want everything all matches # then add them to our arguments hash, # in preparation for our URL. if (defined($args{'x'}) && $args{'x'} == 1) {    foreach my $arg (qw/s t g l S c p a r l A o n b d T k/){        $args{$arg} = 1; # in preparation for URL.    } delete $args{'x'}; # x means nothing to LexFN. } # build the URL from the flags we want. foreach my $arg (keys %args) { $URL .= '&' . $keynames{$arg} . '=on'; } ###################### # request stage      # ###################### # and download it all for parsing. my $content = get($URL) or die $!; ###################### # extract stage      # ###################### # with the data sucked down, pass it off to the parser. my $stream = HTML::TokeParser->new( $content ) or die $!; # skip the form on the page, then it's the first <b> # after the form that we start extracting data from my $tag = $stream->get_tag("/form"); while ($tag = $stream->get_tag("b")) {     print $stream->get_trimmed_text("/b") . " ";     $tag = $stream->get_tag("img");     print $tag->[1]{alt} . " ";     $tag = $stream->get_tag("a");     print $stream->get_trimmed_text("/a") . "\n"; } exit 0; 

The code is split into four basic stages:


Support code

Such as includes and any subroutines you will need


The parsing stage

Where we work out what the user actually wants and build a URL to perform the request


The request stage itself

Where we retrieve the results


The extract stage

Where we recover the data

In this case, the Lexical Freenet site is basic enough that the request is a single URL. A typical Freenet URL looks something like this:

 http://www.lexfn.com/l/lexfn-cuff.cgi?fromresub=on& ASYN=on&ATRG=on&AGEN=on&ASPC=on&ACOM=on&APAR=on&AANT=on& ARHY=on&ASIM=on&AANA=on&ABOX=on&ABNX=on&ABBX=on&ABDX=on& ABTR=on&ABAK=on&sWord=lee+harvey+oswald&tWord=disobey&query=SHOW 

The data we wish to extract is formed by repeatedly pulling the information from a standard and repetitive chunk of HTML in the search results. This allows us to use the simple HTML::TokeParser module [Hack #20] to retrieve chunks of data easily by parsing the HTML tags, allowing us to query their attributes and retrieve the surrounding text. As you can tell from the previous code, this is not too difficult.

Running the Hack

As you can see from the code, the hack has several switches available for you to decide which kind of word results you want. In this case, we'll run a search for everything related to disease :

 %  perl lexfn.pl -x disease  disease triggers aids disease triggers cancer disease triggers patients disease triggers virus disease triggers doctor ... disease is more general than blood disorder disease is more general than boutonneuse fever disease is more general than cat scratch disease ... disease rhymes with breeze disease rhymes with briese disease rhymes with cheese disease rhymes with crees ... 

Or perhaps a person's name is more to your liking:

 %  perl lexfn.pl -bdonT "lee harvey oswald"  lee harvey oswald was born in 1939 lee harvey oswald died in 1963 lee harvey oswald has the nationality american lee harvey oswald has the occupation assassin lee harvey oswald triggers 1956-1959 lee harvey oswald triggers 1959 lee harvey oswald triggers 1962 lee harvey oswald triggers attempted lee harvey oswald triggers become lee harvey oswald triggers book lee harvey oswald triggers citizen lee harvey oswald triggers communist ... 

Richard Rose



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net