62 Defining Words Online


#62 Defining Words Online

In addition to grabbing information off web pages, a shell script can also feed certain information to a website and scrape the data that the web page spits back. An excellent example of this technique is to implement a command that looks up the specified word in an online dictionary and returns its definition. There are a number of dictionaries online, but we'll use the WordNet lexical database that's made available through the Cognitive Science Department of Princeton University.

Learn more  

You can read up on the WordNet project ” it's quite interesting ” by visiting its website directly at http://www.cogsci.princeton.edu/~wn/

The Code

 #!/bin/sh # define - Given a word, returns its definition. url="http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1?stage=1&word=" if [ $# -ne 1 ] ; then   echo "Usage: 
 #!/bin/sh # define - Given a word, returns its definition. url="http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1?stage=1&word=" if [ $# -ne 1 ] ; then echo "Usage: $0 word" >&2 exit 1 fi lynx -source "$url$1"  \ grep -E '(^[[:digit:]]+\. has [[:digit:]]+$)'  \ sed 's/<[^>]*>//g'  ( while read line do if [ "${line:0:3}" = "The" ] ; then part="$(echo $line  awk '{print $2}')" echo "" echo "The $part $1:" else echo "$line"  fmt  sed 's/^/ /g' fi done ) exit 0 
word" >&2 exit 1 fi lynx -source "$url" \ grep -E '(^[[:digit:]]+\. has [[:digit:]]+$)' \ sed 's/<[^>]*>//g' ( while read line do if [ "${line:0:3}" = "The" ] ; then part="$(echo $line awk '{print }')" echo "" echo "The $part :" else echo "$line" fmt sed 's/^/ /g' fi done ) exit 0

How It Works

Because you can't simply pass fmt an input stream as structurally complex as a word definition without completely ruining the structure of the definition, the while loop attempts to make the output as attractive and readable as possible. Another solution would be a version of fmt that wraps long lines but never merges lines, treating each line of input distinctly, as shown in script #33, toolong .

Worthy of note is the sed command that strips out all the HTML tags from the web page source code:

 sed 's/<[^>]*>//g' 

This command removes all patterns that consist of an open angle bracket ( < ) followed by any combination of characters other than a close angle bracket ( > ), finally followed by the close angle bracket. It's an example of an instance in which learning more about regular expressions can pay off handsomely when working with shell scripts.

Running the Script

This script takes one and only one argument: a word to be defined.

The Results

 $  define limn  The verb limn:   1.  delineate, limn, outline -- (trace the shape of)   2.  portray, depict, limn -- (make a portrait of; "Goya wanted to   portray his mistress, the Duchess of Alba") $  define visionary  The noun visionary:   1.  visionary, illusionist, seer -- (a person with unusual powers   of foresight) The adjective visionary:   1.  airy, impractical, visionary -- (not practical or realizable;   speculative; "airy theories about socioeconomic improvement";   "visionary schemes for getting rich") 

Hacking the Script

WordNet is just one of the many places online where you can look up words in an automated fashion. If you're more of a logophile, you might appreciate tweaking this script to work with the online Oxford English Dictionary, or even the venerable Webster's. A good starting point for learning about online dictionaries (and encyclopedias, for that matter) is the wonderful Open Directory Project. Try http://dmoz.org/Reference/Dictionaries/ to get started.




Wicked Cool Shell Scripts. 101 Scripts for Linux, Mac OS X, and Unix Systems
Wicked Cool Shell Scripts
ISBN: 1593270127
EAN: 2147483647
Year: 2004
Pages: 150
Authors: Dave Taylor

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net